Archive for October, 2008

ChemMantis is now in alpha release and under tests. ChemMantis is our Chemistry Markup And Nomenclature Transformation Integrated System. The movie below can likely tell a better story than I can write. So, let’s start with this movie…and more will follow. The premise is upload a document, find chemical names, convert names/identifiers to chemical structures and find related information. In this case we are demonstrating how structures are linked to information on ChemSpider and from there out to other information on the web. There are more such displays to come….

Buy me a Coffee

I’ve posted over at the ChemConnector Blog about the potential need for a neutral review of the performance of Optical Structure Recognition algorithms. I’m interested in the technology because we are now using it on ChemSpider for our document markup and structure recogition. I’d welcome your thoughts and comments…visit the blog post.

Buy me a Coffee

There’s no shortage of possibilities regarding where we could go next with ChemSpider and we’re always thinking ahead. At present we are focused on chemistry document markup and the development of ChemMantis. Moving forward we are considering how chemists might want to use ChemSpider. Based on comments from organic chemists over the past few months a lot of chemists are using ChemSpider to source chemicals for purchase for screening and specifically to find starting materials for further reactions.

Recently we added the ChemSynthesis structure collection. That database offers links out to over 45,000 articles regarding reaction synthesis. We are now being encouraged to manage reactions directly on ChemSpider. While we of course have the skills to do so it’s not in our near future. But, what if we did?  Then retrosynthetic analysis might be possible. At the ACS meeting in Philadelphia in August I gave a presentation on ARChem Route Designer, a software product marketed by SimBioSys . It was my privilege to give this presentation on behalf of one of the most respected chemists, Peter Johnson, someone who has been at the forefront of tools for synthesis design and structure based drug design. Take a look at the presentation about ARChem…for chemists interested in software tools for Retrosynthetic Analysis it may be of interest…and I wonder whether a platform like this might be of interest to integrate to ChemSpider…what do YOU think????

Buy me a Coffee

Jean-Claude Bradley, our collaborator at Drexel University, recently posted on “There are no facts…in science - only measurement embedded within assumptions.” He refers to information on ChemSpider a number of times to make his arguments and I point you to his original post to read.

Some specific sections are quoted “There are properties that have been determined so many times by different researchers and different techniques that we can treat a narrow range of values by consensus as if they were absolute facts. An example would be considering the boiling point of methanol at 1 atm to be 65C within one degree of accuracy. For most purposes that will suffice, as long as we understand the source of our confidence.”

When we deposit property information onto ChemSpider we make attributions with the outlinks. So, if you look at this record for ethyl acetate you will see a lot of property informtion listed as shown below. Unfortunately the “units” are not always directly available when we gather the data and we need to add the ability to add/edit units soon. However, there IS generally information in the record for at least one of the entries defining the units and the outlinks (shown by the blue arrows) will take the user to the original data source anyway.

  • experimental physchem properties
    • Melting Point: -84

    • Melting Point: -84

    • Melting Point: -84

    • Melting Point: -84

    • Melting Point: -84

    • Melting Point: -84 C

    • Boiling Point: 76-77

    • Boiling Point: 77

    • Boiling Point: 77

    • Boiling Point: 77

    • Boiling Point: 77

    • Boiling Point: 171F

    • Boiling Point: 77º

    • Boiling Point: 77 C

    • Flash Point: -3(26F)

    • Flash Point: -3(26F)

    • Flash Point: -3(26F)

    • Flash Point: -3(26F)

    • Flash Point: -3(26F)

    • Flash Point: 24F

    • Flash Point: -4 C

    • Freezing Point: -117F

    • Specific Gravity: 0.902

    • Specific Gravity: 0.902

    • Specific Gravity: 0.902

    • Specific Gravity: 0.902

    • Specific Gravity: 0.902

    • Specific Gravity: 0.90

    • Specific Gravity: 0.894 - 0.898

    • Refraction Index: 1.3720

    • Refraction Index: 1.3720

    • Refraction Index: 1.3720

    • Refraction Index: 1.3720

    • Refraction Index: 1.3720

    • Refraction Index: 1.371 - 1.376

    • Ionization Potential: 10.01 eV

    • Vapor Pressure: 73 mmHg

  • miscellaneous
    • Appearance: Colorless liquid with an ether-like, fruity odor.

    • Appearance: colourless liquid with fruit-like odour

    • Appearance: Colourless liquid, volatile at low temperatures with a fragrant, acetic, ethereal odour

    • Applications: Pesticide residue, environmental, and GC analysis

    • Stability: Stable. Incompatible with various plastics, strong oxidizing agents. Highly flammable. Vapour/air mixtures explosive. May be moisture sensitive.

    • Toxicity: ORL-RAT LD50 5620 mg kg-1, SKN-RBT LD50 > 20 ml kg-1, SCU-GPG LD50 3000 mg kg-1, IPR-MUS LD50 709 mg kg-1

    • Safety: FLAMMABLE / IRRITANT

    • Safety: DANGER: FLAMMABLE, irritates skin, eyes, lungs

    • Safety: DANGER: FLAMMABLE, causes CNS injury, lung & eye irritation

    • Safety: DANGER: FLAMMABLE, causes CNS injury, lung & eye irritation

    • Safety: DANGER: FLAMMABLE, causes CNS injury, lung & eye irritation

    • Safety: Safety glasses, adequate ventilation.

    • First Aid: Eye: Irrigate immediately Skin: Water flush promptly Breathing: Respiratory support Swallow: Medical attention immediately

    • Exposure Routes: inhalation, ingestion, skin and/or eye contact

    • Symptoms: Irritation eyes, skin, nose, throat; narcosis; dermatitis

    • Target Organs: Eyes, skin, respiratory system

    • Incompatibilities and Reactivities: Nitrates; strong oxidizers, alkalis & acids

    • Personal protection and Sanitation: Skin: Prevent skin contact Eyes: Prevent eye contact Wash skin: When
      contaminated Remove: When wet (flammable) Change: No recommendation

Jean-Claude goes on to discuss his project regarding the measurement of non-aqueous solubility and the differences between experimental and predicted properties. His discussions highlight the advantages of Open Notebook Science in terms of access to information regarding how measurements are performed…information that is missing otherwise. We advocate access to this type of information and will be linking to JC’s non-aqueous solubility measuresment on his wiki shortly. FYI, his entire presentation is online here.

Buy me a Coffee

I have posted a number of blogs previously about chemistry document markup and our efforts in this area (1,2,3) then last week announced ChemMantis, our Chemistry Document Markup alpha-release. In the original presentation I gave on our document markup system at the ACS in Philly (online here) I talked about he possibility of integrating optical structure recognition tools. These tools are software packages/components that convert structure drawings to connection tables (4,5). I have discussed these previously on this blog in terms of my work with CLiDE (6,7) and with OSRA (8).

OSRA is an open source package for Optical Structure Recognition developed by Igor Filippov at the National Cancer Institute. My early experience with OSRA wasn’t all positive (8) but since it is Open Source we have integrated the latest software to ChemMantis and we have been testing it out. There are instances where the software works perfectly and the structure generated from the image is perfect and there are examples where the conversion fails. Examples of both are shown below. The top image shows an incorrectly converted image and the bottom one a correctly  converted image. At present it is clear that such conversions should be inspected by the user and edited if necessary. OSRA certainly offers an opportunity to shortcut the drawing of chemical structures.

Buy me a Coffee

THIS IS A REPOST BECAUSE OF ISSUES WITH PEOPLE SEEING THE LINK

Recently I asked how people used ChemSpider. I received feedback from Jan Hummel from the Max Planck Institute of Molecular Plant Physiology and have posted it below for the blog readers.

Several years ago our institute was a pioneer in establishing GC MS-based approaches for metabolomic analysis in plants and also in other organisms. The GC MS-based approaches are mostly targeted since only compounds that have been previously measured as standard/reference substances can be reliably analyzed/identified in biological samples. Accordingly, we decided to expand our analysis strategies to more untargeted metabolite analysis approaches. For this purpose we considered what the best way would be to achieve this goal, and we decided that high resolution MS (eg. FT-ICR MS), might be the way to go. With these MS machines we can resolve thousands of masses extremely accurately with resolutions up to 1ppm. Combining this information with fragmentation data of individually measured masses, isotope labelling and retention times from the chromatographic separation means a plethora of data that has to be integrated into meaningful information is produced. Obviously this data is difficult to handle if there is no useful initial annotation. This is where ChemSpider comes into play. We use the immense repository of chemical data and knowledge provided by this well curated data collection as the entry point for the conversion of experimentally measured masses to possible chemical compounds. In an initial step we perform simple database matching of the measured masses to all the masses derived from the compounds present in ChemSpider. This allows us to associate a large number of measured masses to one or more possible chemical formulas. In a subsequent step we then make use of the structural information provided by the ChemSpider database to evaluate which of the initial considered compounds matches not only the measured mass, but also can explain the measured fragmentation pattern provided by the MS/MS data. For this purpose access to a large number of structural isomers is an invaluable tool.

Additionally, by using the structural data we can also make use of the collection of predicted properties of the compounds collected in ChemSpider by simply comparing them to the properties (mostly retention time in the LC run) of the measured compounds. This often helps us to sort out incorrectly annotated structures.

Even though many of these analyses are still manual and tedious, the huge data collected and provided by ChemSpider allows us a straight forward spectrum annotation, which hopefully in the future will be performed in a more automated manner. A paper entitled “High-Resolution Direct Infusion-Based Mass Spectrometry in Combination with Whole 13C Metabolome Isotope Labeling Allowing Unambiguous Assignment of Chemical Sum Formulas” (Giavalisco P et. al.) describing our approach was recently accepted in Analytical Chemistry. In this paper we used PubChem as the reference database.

In comparison to our studies performed using a PubChem based formula repository from May this year, a kindly provided data export from ChemSpider increased the amount of unique sum formulas in our system by more than 180,000 formulae. It appears that ChemSpider is growing at a very good rate!

Buy me a Coffee

When  ChemSpider was rolled out to the world as a part of ChemZoo we always knew we would be introducing more “critters”. We are happy to announce our progree with our new development ChemMantis. Why Mantis? Well…it’s the Markup And Nomenclature Transformation Integrated System. Fits perfectly into our zoo!

We have been working on the markup of chemistry documents for a number of months and I unveiled the first aspects of our work at the ACS meeting in Philadelphia. The presentation is available online on my Slideshare account. What we are trying to do is to use our ChemSpider platform as the foundation of a document markup system whereby chemical names are automatically identified and can either be converted to chemical structures (possible using algorithms for name to structure conversion) or are retrieved from our ChemSpider database. We have invested a lot of efforts to curate and validate the ChemSpider database of over 21.5 million unique chemical entities over the past year and are now sitting on a foundation of information allowing us to connect between chemical identifiers, chemical structures and out to rich sources such as Wikipedia and PubChem and to provide information such as chemical vendors and other online systems. ChemMantis is well and truly weved into the web of ChemSpider now.

We are now in alpha release and are adding some finishing tweaks to the markup system, the visualization elements and the  workflow. You can see the immediate effects of our recent work on improving the quality of structure images in the balloon below.

We_would_like to test the system on YOUR documents if you are willing to participate. What we are looking for are WORD documents for already published papers. They can be Open or Closed access papers. We are not expecting copyright transfer - we want to markup the documents and return to you for feedback. In the process we will be testing the quality of our Dictionary, our conversions, our visulaizations and our process. We welcome your support. Feel free to connect with us at infoATchemspiderDOTcom. Over the next few weeks you will hear more about ChemMantis and our contributions to text mining and markup of chemistry documents.

Buy me a Coffee

Recently a new website connecting chemicals to synthesis references went online. The site is ChemSynthesis and as well as synthesis references the database also contains physical properties for many of the listed substances. There are currently more than 40 000 compounds and more than 45 000 synthesis references in the database and there is an intention to keep the database growing with contributions from the community. Presently ChemSynthesis is indexing information from quite an extensive list of journals given below.

The Journal of the American Chemical Society, Canadian Journal of Chemistry, Chemical and Pharmaceutical Bulletin, Chemistry Letters, Journal of Heterocyclic Chemistry, Journal of Medicinal Chemistry, The Journal of Organic Chemistry, Organic Syntheses, Synthesis, Synthetic Communications, Tetrahedron Letters, Tetrahedron

An example record can be found here and a list of hits from a text search is shown below.

Linking_from ChemSpider to ChemSynthesis seemed like a natural way to help our users source potential synthesis details. So, that’s done. Also we have exchanged the appropriate information with ChemSynthesis so that we have completed the loop. Users searching ChemSynthesis can navigate directly to the ChemSpider record with one click.

To review the entire ChemSynthesis dataset on ChemSpider simply follow this link. It is >40,000 molecules so might take a while to load. Another contribution to the community of connected chemists….

Buy me a Coffee

Something good is happening in regards to ChemSpiders reputation it seems. I’ve chosen to interpret it as an indication that we are doing a stellar job at running our website and contributing to the community of chemistry. I’ll take it as kudos for the quality of what we do. Maybe it’s just an indication that the world is in economic turmoil and people are looking for jobs? In any case, the weekly requests coming in now to join ChemSpider, here’s my resume etc is very interesting. In the past 48 hours I have had to respond to 4 people that while we ARE a professionial organization (i.e. we’re all professionals and good at our jobs) we are not hiring at present and actually don’t have any of us gainfully “employed” by ChemSpider per se.

Despite this situation at present we DO have plans as to how to start to recoup some of the investments we have made over the past year and a half. We are discussing the ChemSpider Appliance with some organizations at present. The ChemSpider Appliance is what exactly? It will be a stripped down and read only version of ChemSpider installed INSIDE a company’s firewall with daily synchronization between the public server and the company’s system. We will not be able to provide the entire database to a company, especially for predicted properties etc because that might damage the business of some of our collaborators. All of this is presently under discussion. Watch this space…

Buy me a Coffee

We’ve been working on structure depictions on ChemSpider and overall we are very happy with where we have got to. These structure depictions are going to be showing up in various parts of our system now.

However, we should qualify the difference between structure images and structure layout. The depictions and the layout are governed by different algorithms.While a structure image can be attractive the layout may not be perfect. it is possible to improve the layout of the molecule deposited on ChemSpider. Notice for the structure on the left that there is overlap with the methyl group.

For details on how to CLEAN structures on ChemSpider please read the Technical Note here: Interactive Cleaning of Molecules During Curation and Deposition.

The result of performing cleaning is shown below. This layout may also not be the perfect layout but there is no overlap. The user can continue to manually optimize the structure for the preferred layout.

Buy me a Coffee

A lot of people have been helping to improve the quality of ChemSpider content by depositing new data and “Cleaning up” errors in the data over the past few months. it’s been a long climb. Our thanks to all of you who have contributed. I’ll be the first one to put my hand up and acknowledge that in some ways I have not made the act of contributing to the curation process very easily since I’ve been feeding the data out via the blog in chunks, as it has developed. Following a recent “long flight” I am happy to announce that the Curators Handbook/Bible is now available in its first form and is available online here. This document gives some pretty detailed guidance regarding how to curate the ChemSpider database. As always we welcome feedback. If something is not clear let us know and we will expand/enhance as appropriate.

What I also want to do is to thank those people who have commented on how truly impressed they are with the rate at which we are cleaning the data. In general most curation requests identified on the site are addressed within 24 hours. There are some issues hanging out there that we don’t have solutions for at present, specifically in regards to organometallic data handling, but we are still thinking about a path forward.

Buy me a Coffee

It is finally time to rollout more attractive structure depictions. We have needed some more attractive structure depictions for a while but they have become an absolute must have as we rollout the following new capabilities:

1) The ability to make YOUR chemical blog structure searchable (watch this space…). We suggested one path previously…this is BETTER…

2) Structure balloons for using with our document markup tools, both browser-based and Microsoft Word based

We all judge quality of visual aesthetics quickly. We know a good structure when we see one. This is an announcement that we will be rolling out new structures across the site in the next few days. You will see better looking structures showing up across the site - during deposition, during service-based predictions, during searches and, well, everywhere. While not perfect as yet a little more tweaking and the entire database will be supported by the new structure depiction algorithms. As it is you should see some examples now on the database…one shown below. We welcome your feedback!

Buy me a Coffee

That is not a misspelling in the title…I do mean Word Docmination and not world domination. It’s a rather tongue in cheek comment based on some recent discussions where a friend working in the domain of cheminformatics laughingly joked that continuing to develop ChemSpider at our present rate and linking up document markup capabilities could lead to “world domination”. Hardly. Anyhow, we’re more for collaboration and integration rather than domination. However, taking the comment with it’s intent I did find it funny considering what we are working on at present…”Word docmination for Chemistry ”

Over the past couple of weeks you will have noticed that the blog has gone quite quiet. There have been a number of reasons for this…all positive. Collectively our team us involved in a number of projects and these have timelines and deliverables. My own personal time that can be dedicated to blogging is much diminished, for right now. That said there are some exciting things to report. We have progressed a long way with our document markup system since the presentation at ACS-Philadelphia. What we showed at the conference was very much proof of concept. Since then we have been improving our workflows for markup, have been validating the identification of chemical names versus “other text”, have been working on easy ways to build “dictionaries” of good and bad names into the interface, have been improving our structure layout and depiction and have started to compare with other document markup approaches. Our intention with our document markup approach has been to take advantage of much of the work we have done over the past 18 months. Specifically, we have created a large foundation of chemical entities with associated properties (identifiers, experimental and predicted properties, links to publications, services and other related information) and now we can leverage this database as we perform document markup. Not only can the validated structure-name pairs be used to great effect for this work but when chemical names are converted to structures they can immediately be used for lookup on the ChemSpider database. When chemical structures are identified in documents (either via chemical name conversion or by extraction of the chemical structure from the OLE container in a Word Document) then they can be deposited into the ChemSpider database together with a link to the original document or appropriate meta data. As we have moved through our project we have been focusing in on the next phase of the project which should be integration to the most common desktop word processor (?) Microsoft Word. At present we are working on utilizing the markup capabilities we have been developing in Internet Explorer and building integration to Word. These capabilities will initially be called via our ChemSpider web services but in the future might involve some desktop components for markup remote from ChemSpider. Time will tell. Watch this space for more news as we unveil the new capabilties. We are certainly interested in talking to any publishers who might be interested in looking at our markup capabilities as we add them.

Reblog this post [with Zemanta]

Buy me a Coffee