Archive for October 28th, 2008

Jean-Claude Bradley, our collaborator at Drexel University, recently posted on “There are no facts…in science – only measurement embedded within assumptions.” He refers to information on ChemSpider a number of times to make his arguments and I point you to his original post to read.

Some specific sections are quoted “There are properties that have been determined so many times by different researchers and different techniques that we can treat a narrow range of values by consensus as if they were absolute facts. An example would be considering the boiling point of methanol at 1 atm to be 65C within one degree of accuracy. For most purposes that will suffice, as long as we understand the source of our confidence.”

When we deposit property information onto ChemSpider we make attributions with the outlinks. So, if you look at this record for ethyl acetate you will see a lot of property informtion listed as shown below. Unfortunately the “units” are not always directly available when we gather the data and we need to add the ability to add/edit units soon. However, there IS generally information in the record for at least one of the entries defining the units and the outlinks (shown by the blue arrows) will take the user to the original data source anyway.

  • experimental physchem properties
    • Melting Point: -84

    • Melting Point: -84

    • Melting Point: -84

    • Melting Point: -84

    • Melting Point: -84

    • Melting Point: -84 C

    • Boiling Point: 76-77

    • Boiling Point: 77

    • Boiling Point: 77

    • Boiling Point: 77

    • Boiling Point: 77

    • Boiling Point: 171F

    • Boiling Point: 77º

    • Boiling Point: 77 C

    • Flash Point: -3(26F)

    • Flash Point: -3(26F)

    • Flash Point: -3(26F)

    • Flash Point: -3(26F)

    • Flash Point: -3(26F)

    • Flash Point: 24F

    • Flash Point: -4 C

    • Freezing Point: -117F

    • Specific Gravity: 0.902

    • Specific Gravity: 0.902

    • Specific Gravity: 0.902

    • Specific Gravity: 0.902

    • Specific Gravity: 0.902

    • Specific Gravity: 0.90

    • Specific Gravity: 0.894 – 0.898

    • Refraction Index: 1.3720

    • Refraction Index: 1.3720

    • Refraction Index: 1.3720

    • Refraction Index: 1.3720

    • Refraction Index: 1.3720

    • Refraction Index: 1.371 – 1.376

    • Ionization Potential: 10.01 eV

    • Vapor Pressure: 73 mmHg

  • miscellaneous
    • Appearance: Colorless liquid with an ether-like, fruity odor.

    • Appearance: colourless liquid with fruit-like odour

    • Appearance: Colourless liquid, volatile at low temperatures with a fragrant, acetic, ethereal odour

    • Applications: Pesticide residue, environmental, and GC analysis

    • Stability: Stable. Incompatible with various plastics, strong oxidizing agents. Highly flammable. Vapour/air mixtures explosive. May be moisture sensitive.

    • Toxicity: ORL-RAT LD50 5620 mg kg-1, SKN-RBT LD50 > 20 ml kg-1, SCU-GPG LD50 3000 mg kg-1, IPR-MUS LD50 709 mg kg-1

    • Safety: FLAMMABLE / IRRITANT

    • Safety: DANGER: FLAMMABLE, irritates skin, eyes, lungs

    • Safety: DANGER: FLAMMABLE, causes CNS injury, lung & eye irritation

    • Safety: DANGER: FLAMMABLE, causes CNS injury, lung & eye irritation

    • Safety: DANGER: FLAMMABLE, causes CNS injury, lung & eye irritation

    • Safety: Safety glasses, adequate ventilation.

    • First Aid: Eye: Irrigate immediately Skin: Water flush promptly Breathing: Respiratory support Swallow: Medical attention immediately

    • Exposure Routes: inhalation, ingestion, skin and/or eye contact

    • Symptoms: Irritation eyes, skin, nose, throat; narcosis; dermatitis

    • Target Organs: Eyes, skin, respiratory system

    • Incompatibilities and Reactivities: Nitrates; strong oxidizers, alkalis & acids

    • Personal protection and Sanitation: Skin: Prevent skin contact Eyes: Prevent eye contact Wash skin: When
      contaminated Remove: When wet (flammable) Change: No recommendation

Jean-Claude goes on to discuss his project regarding the measurement of non-aqueous solubility and the differences between experimental and predicted properties. His discussions highlight the advantages of Open Notebook Science in terms of access to information regarding how measurements are performed…information that is missing otherwise. We advocate access to this type of information and will be linking to JC’s non-aqueous solubility measuresment on his wiki shortly. FYI, his entire presentation is online here.

Buy me a Coffee

I have posted a number of blogs previously about chemistry document markup and our efforts in this area (1,2,3) then last week announced ChemMantis, our Chemistry Document Markup alpha-release. In the original presentation I gave on our document markup system at the ACS in Philly (online here) I talked about he possibility of integrating optical structure recognition tools. These tools are software packages/components that convert structure drawings to connection tables (4,5). I have discussed these previously on this blog in terms of my work with CLiDE (6,7) and with OSRA (8).

OSRA is an open source package for Optical Structure Recognition developed by Igor Filippov at the National Cancer Institute. My early experience with OSRA wasn’t all positive (8) but since it is Open Source we have integrated the latest software to ChemMantis and we have been testing it out. There are instances where the software works perfectly and the structure generated from the image is perfect and there are examples where the conversion fails. Examples of both are shown below. The top image shows an incorrectly converted image and the bottom one a correctly  converted image. At present it is clear that such conversions should be inspected by the user and edited if necessary. OSRA certainly offers an opportunity to shortcut the drawing of chemical structures.

Buy me a Coffee

THIS IS A REPOST BECAUSE OF ISSUES WITH PEOPLE SEEING THE LINK

Recently I asked how people used ChemSpider. I received feedback from Jan Hummel from the Max Planck Institute of Molecular Plant Physiology and have posted it below for the blog readers.

Several years ago our institute was a pioneer in establishing GC MS-based approaches for metabolomic analysis in plants and also in other organisms. The GC MS-based approaches are mostly targeted since only compounds that have been previously measured as standard/reference substances can be reliably analyzed/identified in biological samples. Accordingly, we decided to expand our analysis strategies to more untargeted metabolite analysis approaches. For this purpose we considered what the best way would be to achieve this goal, and we decided that high resolution MS (eg. FT-ICR MS), might be the way to go. With these MS machines we can resolve thousands of masses extremely accurately with resolutions up to 1ppm. Combining this information with fragmentation data of individually measured masses, isotope labelling and retention times from the chromatographic separation means a plethora of data that has to be integrated into meaningful information is produced. Obviously this data is difficult to handle if there is no useful initial annotation. This is where ChemSpider comes into play. We use the immense repository of chemical data and knowledge provided by this well curated data collection as the entry point for the conversion of experimentally measured masses to possible chemical compounds. In an initial step we perform simple database matching of the measured masses to all the masses derived from the compounds present in ChemSpider. This allows us to associate a large number of measured masses to one or more possible chemical formulas. In a subsequent step we then make use of the structural information provided by the ChemSpider database to evaluate which of the initial considered compounds matches not only the measured mass, but also can explain the measured fragmentation pattern provided by the MS/MS data. For this purpose access to a large number of structural isomers is an invaluable tool.

Additionally, by using the structural data we can also make use of the collection of predicted properties of the compounds collected in ChemSpider by simply comparing them to the properties (mostly retention time in the LC run) of the measured compounds. This often helps us to sort out incorrectly annotated structures.

Even though many of these analyses are still manual and tedious, the huge data collected and provided by ChemSpider allows us a straight forward spectrum annotation, which hopefully in the future will be performed in a more automated manner. A paper entitled “High-Resolution Direct Infusion-Based Mass Spectrometry in Combination with Whole 13C Metabolome Isotope Labeling Allowing Unambiguous Assignment of Chemical Sum Formulas” (Giavalisco P et. al.) describing our approach was recently accepted in Analytical Chemistry. In this paper we used PubChem as the reference database.

In comparison to our studies performed using a PubChem based formula repository from May this year, a kindly provided data export from ChemSpider increased the amount of unique sum formulas in our system by more than 180,000 formulae. It appears that ChemSpider is growing at a very good rate!

Buy me a Coffee