I am very proud at the response from our user base to my request for assistance with curating ChemSpider in regards to carbohydrates. Carbohydrates are complex in nature. They can be represented in linear form and cyclic form, they exist in ChemSpider with a common name but no defined stereochemistry, there are pentoses, hexoses and many stereoisomers per skeleton. There are MANY common carbohydrates with trivial names - RiboseArabinoseXyloseLyxoseAlloseAltroseMannoseGuloseIdoseGalactoseTalose

Carbohydrates have been very challenging for us at ChemSpider…many depositors have not been careful with the  association between the chemical structure and the associated identifiers. With a chemical structure as the primary key on a record we find confusing associations with structures. For example, a search on Maltotriose as an identifier turns up 5 structures on ChemSpider. Maltotriose is defined on Wikipedia as “trisaccharide (three-part sugar) consisting of three glucose molecules linked with 1,4 glycosidic bonds.” This should mean that it is not appropriate for the identifier maltotriose to be associated with this structure. The registry number associated with this structure should be deleted also based on Wikipedia as a resource. How many of the other identifiers should be deleted? Maybe all???

Looking at this record we see identifiers such as: alpha-D-G​lc-(1->4)​-alpha-D-​Glc-(1->4​)-D-Glc; alpha-D-G​lc, O-alp​ha-D-glc; GLC-(4-1)​GLC-(4-1)​GLC-(4-4)​GTE and O-alpha-D​D-Glucopy​ranosyl-(​1->4)-O-a​lpha-D-gl​ucopyrano​syl-(1->4​)-D-gluco​se . Are these appropriate for this compound?

The challenge for maltotriose is therefore to identify the CORRECT structure associated with that name. “Maybe” it is the structure on Wikipedia but don’t forget that we have an effort underway to validate the structures on Wikipedia and make sure they are correctly associated with the monograph title. Is Maltotriose an identifier for a unique stereoconfiguration or is there alpha- and beta-maltotriose?  I am not sure. What needs to be determined is the correct association between structures and identifiers. Incorrect associations should be removed so that they do not turn up the incorrect structures in ChemSpider when searched.

This is the start of the validation process for carbohydrates…its iterative, complex and hard work. Its going to begin with giving the group of interested parties curator power over on ChemSpider and asking them to work on this challenge. We welcome their assistance. The efforts of contributors like this will be essential. 

Stumble it!

Leave a Reply