Archive for April 3rd, 2008

Over the past year ChemSpider has been working hard to build a functional and stable platform for the hosting, deposition and curation of structure-based data. This is to form the foundation of our mission to build a Structure-Based Community for Chemists. Our deposition system is in place and well-tested. Our indexing of articles is proven, and continues. We have indexed multiple Open Access articles. We support the deposition of analytical data (spectra and CIF files) into ChemSpider.

It is now time to take this to the next level and I would like to extend an invitation to Open Access publishers to work with us to design an interface (preferably a web service) to facilitate direct deposition of data into ChemSpider. We’d like to design an interface where you can feed your articles in with Title, Authors, Journal reference, DOI and Abstract. We would associate the article with the chemical structures in one of two specific ways - 1) extract the chemical names from the title and/or abstract and convert on the fly to deposit and/or associate with structures on ChemSpider and 2) allow the publisher to pass us a series of SMILES strings, InChI Strings, molfiles or chemical names to deposit on ChemSpider. Based on what we have already done it is clear this process is feasible, and will require some manual intervention until we optimize processes. If we do this we can design an interface and input format that can be made public, reusable by other groups for the deposition of information into their systems and, potentially, move away from the need for extracting information out of PDF files (and other formats). The outcome of this work would be a freely accessible structure and substructure searchable index of Open Access articles with links back to the Open Access article. We are already indexing articles so, with permission from even the non-Open Access publishers we could use similar processes to index abstracts and make articles structure/substructure searchable based on titles and abstracts.

So, my question. Are there any Open Access/Free Access publishers willing to discuss the possibilities I have outlined? If any of you will be at the ACS meeting and would like to discuss please post a response here or contact me at the usual email address (antonyDOTwilliamsATchemspiderDOTcom) and let’s talk about building a disruptive and enabling technology for chemists around the world

Buy me a Coffee

During a recent discussion about ChemSpider interest was expressed in whether or not ChemSpider would be supporting toxicity and Safety data. It’s been on our list for a while but questions result in action…so, check out the following links. Scroll to the bottom to the supplementary information.

Sodium Acetate Trihydrate

sodium-acetate.png

Benzoyl Peroxide

benzoyl-peroxide.png

There are about another 3000 records with such information on the website now. Click on any of the question marks and up pops a dialog box explaining what the property is…and admittedly the example below is rather obvious!

help-boxes.png

Also, notice the wiki link wikilink.png which takes you out to the originating site for the data.

As an example of the process used to map the fields see below how we take the original fields and then map them to other fields to “homogenize”. Notice the meta info layer too, specifically the associated units.

mapping-fields.png

Mapping choices are made according to the pulldown menu shown below.

mapping-fields-2.png

The_process to gather, map and publish this data has now been tested on two different datasets. It is not yet perfect but we improve with every iteration and, I believe, will shortly have an optimized process for scraping and publishing. I believe that the processes we are developing here will provide a smooth and highly functional system for gathering depositing data and efficiently integrate it to our database and make it easy to expand the associated data and knowledge associated with the structures on ChemSpider.

Buy me a Coffee

Molecule of the Day (MOTD) is one of those fun blogs that the public will likely enjoy…and there’s enough there foe even us chemists to remember just how much fun chemistry is. It seemed like a good idea to try out the ability to link URLs to ChemSpider on a few of the Molecule on the day articles. About an hours manual work and the entire MOTD blog archive could be made structure searchable. As it is I’ve linke dup a few tonight….about 60 seconds each to click on the MOTD blog post, search the structure in ChemSpider and paste the article title and URL. Voila…searchable Molecule of the Day. Some examples linked up…scroll to the bottom of the record:

Sodium Acetate Trihydrate

Benzoyl Peroxide 

1,3,5-Trioxane

Buy me a Coffee