Guide to Database Curation and Annotation
ChemSpider is a free chemistry search engine. It has been built to aggregate and index chemical structures and their associated information into a single searchable repository. In order to curate data, upload structures, add associated information, download search results and use our embedding tools, you need to be a registered user. The process for registering is described in the help page Registering with ChemSpider.
What is Curation?
Curation of the ChemSpider database refers to the manual annotation and correction of data, such as structural information, the nomenclature of chemical entities and the links to publications.
There are two main ways to help curate data on ChemSpider;
- Registered users with curation rights can review and remove erroneous data or mark it for master curation.
- Post comments on a record – a Master curator will then review these comments and try to resolve any issues raised.
By highlighting and removing incorrect data from the database you help to improve the quality of data that is available to you and the rest of the scientific community.
Posting Comments on ChemSpider
Any user can post comments regarding erroneous data. This could be an incorrect name or a structure that is incorrectly drawn. If you believe that you have found an error in a record, you can submit feedback by clicking on the Leave Feedback button in the top-right corner of the page.
A feedback form is then displayed on top of the record.
You can also select a Status (Low, Normal, High, Extreme) for the feedback.
Finally, you need to complete the CAPTCHA request and select Submit.
Curating Identifiers on ChemSpider
To be able to curate a ChemSpider record you need to have Curator privileges and be logged into your account.
Tip: If you need to request Curator privileges or are unsure that you have them you can check/update your profile by clicking on My Profile label (this is displayed at the top-right hand side of all pages on ChemSpider when you are logged in).
To change any information in the Name and Identifiers section you need to open the editing dialog.
Overview of the Identifiers editing dialog
Open the Names and Identifiers editing dialog by clicking on the Identifier label at the top right of the record.
Alternatively, if the record already has some identifiers you can scroll down to the Names and Identifiers section and click on the Edit button.
The editing dialog displays a list of the identifiers in the record together with a set of check boxes that allow you to select the identifiers that you want to change. As a Curator it is not possible to make changes to validated names (indicated in bold face). At the top of the dialog you can see:
Below these there is a key which reminds you of the different styles of text formatting that are used to indicate the current status of an identifier.
In the main body of the dialog you can see: the Identifiers are grouped with Validated identifiers appearing at the top of the list, followed by Normal identifiers and
Rejected identifiers appearing at the bottom of the list. Within these groupings the individual identifiers are listed in alphabetical order.
At the very bottom of the dialog there are buttons to Save your changes or Cancel them.
Adding an Identifier
To add a new identifier to a record open the Identifiers editing dialog (described in the previous section), and click on the Add button.
The identifier entry box will appear, type (or paste) the identifier into the Synonym field and select any appropriate check boxes that further define the nature of the identifier(See Guide to the options for adding synonyms below). You can also mark the synonym as approved by selecting the checkbox in the bottom of the dialog.
Note: Certain characters will not be displayed correctly and should not be used (for instance, Greek characters generated by using the Symbol font face).
When you have finished, click on the Add button and this will return you to the identifiers editing dialog. You should now be able to see the identifier that you added.
To save your changes, click on the Save button and the identifiers editing dialog will close.
A comment box will pop up for the further addition of comments which would be helpful to a master curator when reviewing the suggested identifiers.
Click Ok when complete to return you to the main record and an e-mail will be sent to the Master Curators for review and approval.
Guide to the options for adding synonyms
If the name entered is in a foreign language then the drop-down menu can be used to select the language
Rejecting and Approving Identifiers
The process of rejecting or approving identifiers involves the selection of one or more identifiers and then specifying what the new state of these identifiers should be. A common selection is to reject an identifier since it is incorrectly associated with the structure.
There are four states:
- Open the Identifiers editing dialog.
- Select the identifier(s) that you wish to change.
- Click on the Update button and this brings up a dialog box which allows you to select the state that you want to apply to the selected identifiers.
- Click Ok. This returns you to the Identifiers editing dialog – the altered identifiers will have changed their position in the list of identifiers (approved go to the top of the list, rejected to the bottom) and will be formatted to display their new state.
Click Save. You will be prompted to supply comments that explain your changes to the Master Curator when reviewing your curation.
State changes can be done on groups of identifiers at one time. However, it is necessary to separately approve or reject in separate operations. It is not necessary to save the state changes between these operations.
Guidelines for Removal and Approval of Identifiers
What we are trying to achieve with the actions of approval or rejection of identifiers is state changes which will assist the master curators in speeding up the process of database cleansing. Master Curators have the responsibility of moving curated identifiers to a final approval state of Confirmed or Deleted identifiers based on further research work, reversing the changes or leaving in their present state.
The intention is to remove the associations between structures and identifiers that cause confusion, mislead chemists in their understanding of the chemical structure and provide clarification.
There are various confusions requiring clarification.
Specifically, these are:
- All systematic names should match the structure as drawn. All stereochemistry in the name must be represented in the structure shown.
- Any systematic name should be adequate enough to unambiguously convert the name to the matching structure.
Any Registry numbers must be for the compound as shown. If the compound shown is the neutral base compound then the registry numbers should not be for the sodium salt or the chloride salt for example.
- Identifiers are not meant to be descriptors per se. For example, the description “One of a series of hexamethylcyclohexanes” is not a good identifier. Duplicates can be subtly different but do need to be curated.