Archive for the ChemSpider Content Category

When ChemSpider went live in March of this year one of our collaborators, ACD/Labs, contributed certain prediction algorithms to us to allow us to generate a number of PhysChem properties including logP. There are of course MANY logP prediction algorithms available and many discussions regarding which algorithm delivers the best overall performance. We will not engage ourselves in those discussions since the issue of algorithm validation is a long debate and has been reported on many times in other venues. What we have done however is chosen to list the logP values from a number of other algorithms. You will notice that the logP column now contains values from 3 different algorithms: the original ACD/LogP (version 10) algorithm, the XlogP algorithm and the AlogP algorithm (thanks to Igor Tetko). Values have not yet been predicted for all structures but about half of the structures have values from all three algorithms. We hope to add even more properties to ChemSpider in the near future. Watch this space.

LogP values

The ChemSpider database is populated with millions of structures and associated identifiers in the form of systematic names, trade names and synonyms and associated structure numbers. Unfortunately, many of these identifiers are incorrect. We have performed some robotic cleansing of the database and continue to improve the algorithms to cleanse the data. An update of these cleansed data will be made shortly.

We are conscious of the fact that despite our best efforts a level of manual curation is necessary. With this in mind we have allowed manual curation to registered users. Once logged in every record with one or more names or identifiers will be shown with a red cross, a green tick or a “revert” sign as shown below.

Curating the synonym data

For any name or identifier that is deemed to be suspicious or incorrect can be removed from the list by clicking on the red cross. This puts a strike out through the name but does NOT remove it from the database. Only a master curator, a role granted to a very small subset of ChemSpider users can approve the final removal of the data from the database. If a synonym is “approved” then it is underlined and moved to the top of the list. The list is as shown below after some level of curation.

Curation level 1

The green double headed arrow allows the curation to be reversed. Every curation activity is logged and with time we will be able to identify new master curators. In the near future we will be locking in a list of names, synonyms and identifiers based on the input of our master curators and our own processes for identifying the best identifiers.

We encourage everyone to help us clean up the identifiers. If you believe there are other issues with a structural record then use the other mode of curating by selecting Help Curate Data at the top right hand side of each record. Details are given elsewhere about general curation.

I’ve blogged previously on our introduction of data curating of the ChemSpider database. The process continues and there are multiple pages of suggested changes already made and we will be rolling these suggested edits into the database shortly. We will also be rolling out additional capabilities to curate shortly. Watch this space.

For now we are continuing to acknowledge those users of ChemSpider contributing to the cleansing of the data in ChemSpider. This month we want to acknowledge Barrie Walker, our newest member of the Advisory Group for the detailed work he has performed on analyzing the data over the past few weeks. We are about to add some more of Barrie’s content onto the ChemSpider database from his Chirals Database. I have known Barrie for a number of years and have worked with him very closely on the analysis of systematic nomenclature…in this domain Barrie has an incredible eye for detail…and ChemSpider is fortunate to benefit! Thanks Barrie.

A series of new databases have been deposited into ChemSpider. The index has been expanded with additions from the following depositors:

1) Over 200,000 chemical structures from the Journal of Heterocyclic Chemistry have been added to the database and linked back to the publication website
2) UsefulChem molecules have been indexed and linked to the UsefulChem website. Example
3) The MDPI data have been indexed and link back to an information page. There are no molecules online so this is simply a connection to the data source.
4) Many hundreds of thousands of chemical structures supplied by Enamine have been added and linked to the structures on their website. Example
5) The Nanogens collection is now in ChemSpider and linked back to information about the data collection. Example

These structures are presently available for searching by text only as they need to be indexed for structure and substructure searching.