The ChemSpider database is populated with millions of structures and associated identifiers in the form of systematic names, trade names and synonyms and associated structure numbers. Unfortunately, many of these identifiers are incorrect. We have performed some robotic cleansing of the database and continue to improve the algorithms to cleanse the data. An update of these cleansed data will be made shortly.

We are conscious of the fact that despite our best efforts a level of manual curation is necessary. With this in mind we have allowed manual curation to registered users. Once logged in every record with one or more names or identifiers will be shown with a red cross, a green tick or a “revert” sign as shown below.

Curating the synonym data

For any name or identifier that is deemed to be suspicious or incorrect can be removed from the list by clicking on the red cross. This puts a strike out through the name but does NOT remove it from the database. Only a master curator, a role granted to a very small subset of ChemSpider users can approve the final removal of the data from the database. If a synonym is “approved” then it is underlined and moved to the top of the list. The list is as shown below after some level of curation.

Curation level 1

The green double headed arrow allows the curation to be reversed. Every curation activity is logged and with time we will be able to identify new master curators. In the near future we will be locking in a list of names, synonyms and identifiers based on the input of our master curators and our own processes for identifying the best identifiers.

We encourage everyone to help us clean up the identifiers. If you believe there are other issues with a structural record then use the other mode of curating by selecting Help Curate Data at the top right hand side of each record. Details are given elsewhere about general curation.

Stumble it!

Leave a Reply

Spam protection by WP Captcha-Free