Archive for September 2nd, 2008

I am very proud at the response from our user base to my request for assistance with curating ChemSpider in regards to carbohydrates. Carbohydrates are complex in nature. They can be represented in linear form and cyclic form, they exist in ChemSpider with a common name but no defined stereochemistry, there are pentoses, hexoses and many stereoisomers per skeleton. There are MANY common carbohydrates with trivial names - RiboseArabinoseXyloseLyxoseAlloseAltroseMannoseGuloseIdoseGalactoseTalose

Carbohydrates have been very challenging for us at ChemSpider…many depositors have not been careful with the  association between the chemical structure and the associated identifiers. With a chemical structure as the primary key on a record we find confusing associations with structures. For example, a search on Maltotriose as an identifier turns up 5 structures on ChemSpider. Maltotriose is defined on Wikipedia as “trisaccharide (three-part sugar) consisting of three glucose molecules linked with 1,4 glycosidic bonds.” This should mean that it is not appropriate for the identifier maltotriose to be associated with this structure. The registry number associated with this structure should be deleted also based on Wikipedia as a resource. How many of the other identifiers should be deleted? Maybe all???

Looking at this record we see identifiers such as: alpha-D-G?lc-(1->4)?-alpha-D-?Glc-(1->4?)-D-Glc; alpha-D-G?lc, O-alp?ha-D-glc; GLC-(4-1)?GLC-(4-1)?GLC-(4-4)?GTE and O-alpha-D?D-Glucopy?ranosyl-(?1->4)-O-a?lpha-D-gl?ucopyrano?syl-(1->4?)-D-gluco?se . Are these appropriate for this compound?

The challenge for maltotriose is therefore to identify the CORRECT structure associated with that name. “Maybe” it is the structure on Wikipedia but don’t forget that we have an effort underway to validate the structures on Wikipedia and make sure they are correctly associated with the monograph title. Is Maltotriose an identifier for a unique stereoconfiguration or is there alpha- and beta-maltotriose?  I am not sure. What needs to be determined is the correct association between structures and identifiers. Incorrect associations should be removed so that they do not turn up the incorrect structures in ChemSpider when searched.

This is the start of the validation process for carbohydrates…its iterative, complex and hard work. Its going to begin with giving the group of interested parties curator power over on ChemSpider and asking them to work on this challenge. We welcome their assistance. The efforts of contributors like this will be essential.?/p>

Buy me a Coffee

Slow down I want to get off…I am writing this at 1:20AM. My life is getting too busy. I am entranced by the things we can do at ChemSpider and am swept up with email, blogs, slideshare (uploaded 8 old talks tonight at http://www.slideshare.net/AntonyWilliams), Google reader and on, and on I find myself fascinated with the pace at which everything is moving. Just when I having finished tweaking Firefox with my latest add-ons, specificlly Ubiquity, then comes the announcement (via a comic) that Google will release their own browser, Google Chrome. Ugh….while I’m excited enough already….(okay…can’t wait to play to play!!!)

A fresh take on the browser

9/01/2008 02:10:00 PM

At Google, we have a saying: “launch early and iterate.” While this approach is usually limited to our engineers, it apparently applies to our mailroom as well! As you may have read in the blogosphere, we hit “send” a bit early on a comic book introducing our new open source browser, Google Chrome. As we believe in access to information for everyone, we’ve now made the comic publicly available — you can find it here. We will be launching the beta version of Google Chrome tomorrow in more than 100 countries.”

Buy me a Coffee