Recently I asked how people used ChemSpider. I received feedback from Jan Hummel from the Max Planck Institute of Molecular Plant Physiology and have posted it below for the blog readers.

Several years ago our institute was a pioneer in establishing GC MS-based approaches for metabolomic analysis in plants and also in other organisms. The GC MS-based approaches are mostly targeted since only compounds that have been previously measured as standard/reference substances can be reliably analyzed/identified in biological samples. Accordingly, we decided to expand our analysis strategies to more untargeted metabolite analysis approaches. For this purpose we considered what the best way would be to achieve this goal, and we decided that high resolution MS (eg. FT-ICR MS), might be the way to go. With these MS machines we can resolve thousands of masses extremely accurately with resolutions up to 1ppm. Combining this information with fragmentation data of individually measured masses, isotope labelling and retention times from the chromatographic separation means a plethora of data that has to be integrated into meaningful information is produced. Obviously this data is difficult to handle if there is no useful initial annotation. This is where ChemSpider comes into play. We use the immense repository of chemical data and knowledge provided by this well curated data collection as the entry point for the conversion of experimentally measured masses to possible chemical compounds. In an initial step we perform simple database matching of the measured masses to all the masses derived from the compounds present in ChemSpider. This allows us to associate a large number of measured masses to one or more possible chemical formulas. In a subsequent step we then make use of the structural information provided by the ChemSpider database to evaluate which of the initial considered compounds matches not only the measured mass, but also can explain the measured fragmentation pattern provided by the MS/MS data. For this purpose access to a large number of structural isomers is an invaluable tool.

Additionally, by using the structural data we can also make use of the collection of predicted properties of the compounds collected in ChemSpider by simply comparing them to the properties (mostly retention time in the LC run) of the measured compounds. This often helps us to sort out incorrectly annotated structures.

Even though many of these analyses are still manual and tedious, the huge data collected and provided by ChemSpider allows us a straight forward spectrum annotation, which hopefully in the future will be performed in a more automated manner. A paper entitled “High-Resolution Direct Infusion-Based Mass Spectrometry in Combination with Whole 13C Metabolome Isotope Labeling Allowing Unambiguous Assignment of Chemical Sum Formulas” (Giavalisco P et. al.) describing our approach was recently accepted in Analytical Chemistry. In this paper we used PubChem as the reference database.

In comparison to our studies performed using a PubChem based formula repository from May this year, a kindly provided data export from ChemSpider increased the amount of unique sum formulas in our system by more than 180,000 formulae. It appears that ChemSpider is growing at a very good rate!

Stumble it!

Leave a Reply