I have been at the German Conference for Cheminformatics for the past three days. The conference is in Goslar. I twittered the conference using #goslarcheminf and it appears that there was little interest in twittering here…seems like it’s an “American” thing to do. I gave a presentation entitled “ChemSpider – Building a Foundation for the Semantic Web by Hosting a Crowd Sourced Databasing Platform for Chemistry” and have put it on SlideShare here. The abstract for the talk is below as well as the embedded Slideshare widget for the talk. This talk was a lot less rushed than usual…not just 20 minutes and I personally enjoyed giving this talk to the audience. Commonly I feel that the talks I give are very rished and I only get to scratch the surface of what we are up to with ChemSpider. It’s amazing how an additional 15 minutes allowed me to expand on the issues and the work. The presentation drew a lot of questions and attention after the session and I’m hoping that many of the discussions regarding collaboration and depositions of new data come to fruition.
There is an increasing availability of free and open access resources for chemists to use on the internet. Coupled with the increasing availability of Open Source software tools we are in the middle of a revolution in data availability and tools to manipulate these data. ChemSpider is a free access website for chemists built with the intention of providing a structure centric community for chemists. It was developed with the intention of aggregating and indexing available sources of chemical structures and their associated information into a single searchable repository and making it available to everybody, at no charge.
There are tens if not hundreds of chemical structure databases such as literature data, chemical vendor catalogs, molecular properties, environmental data, toxicity data, analytical data etc. and no single way to search across them. Despite the fact that there were a large number of databases containing chemical compounds and data available online their inherent quality, accuracy and completeness was lacking in many regards. The intention with ChemSpider was to provide a platform whereby the chemistry community could contribute to cleaning up the data, improving the quality of data online and expanding the information available to include data such as reaction syntheses, analytical data, experimental properties and linking to other valuable resources. It has grown into a resource containing over 21 million unique chemical structures from over 200 data sources.
ChemSpider has enabled real time curation of the data, association of analytical data with chemical structures, real-time deposition of single or batch chemical structures (including with activity data) and transaction-based predictions of physicochemical data. The social community aspects of the system demonstrate the potential of this approach. Curation of the data continues daily and thousands of edits and depositions by members of the community have dramatically improved the quality of the data relative to other public resources for chemistry.
This presentation will provide an overview of the history of ChemSpider, the present capabilities of the platform and how it can become one of the primary foundations of the semantic web for chemistry. It will also discuss some of the present projects underway since the acquisition of ChemSpider by the Royal Society of Chemistry.