I love Wikipedia. I use it at least half a dozen times a week…probably more of late. That said I have previously questioned the level of curation of the data on Wikipedia. (2,3) I DO believe that contributors to Wikipedia are making valiant efforts to ensure the quality of the data but I also believe that tools must be developed soon, or processes developed to ensure the quality of the data. Here’s why…

This is the chemical structure of Mupirocin on Wikipedia. Now, if you bothered to redraw that chemical structure in a drawing package showing the molecular mass (like I did) then you would see that it is NOT what is listed in the DrugBox

The structure, molecular formula and molecular mass are shown below taken directly from Free ChemSketch but of course all the drawing packages can do this!

Looking on ChemSpider I found three structures (two are identical but not yet deduplicated – this is presently going on in the background). two are shown below…

Structure 16739332, the top structure, is the correct one while the bottom one is in error. The structure comes from one data source only – Drugbank. Previously for Taxol, Drugbank contained the correct version of the structure. The problem is that ALL of our systems, including ChemSpider, have issues like this….we all have errors and they need curation. Wikipedia is great…the changes were made by me tonight…see here. I added a IUPAC Name, removed the link to Drugbank and updated the molecular mass.

I am committed to assisting in the curating of Wikipedia…many of us are. However, I think there must be a better way and will continue my discussions with the Wikipedia Chemistry Team to get access to all of the chemical compounds on Wikipedia if possible and validate the data in a batch using ChemSpider and associated tools.

2 Responses to “A Need to Improve Chemical Structure Handling on Wikipedia”

  1. Antony Williams says:

    I saw this in my Akismet Spam Box just as I was deleting…oops. Hit Back in Browser and pasted in here…

    Craig Knox | craigknox@gmail.com | drugbank.ca | IP:

    Turns out this was an error in PubChem (http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=6476007). I have been bitten a few times by assuming that PubChem is the gold standard. I have corrected the structure in the new release of DrugBank, which will be out in a month.

  2. Arvin says:

    Hello Tony,

    The structure you got from ChemSketch is wrong. The epoxide is incorrectly shown with a forward and backward wedge bond. I checked ChemSketch v.10 and v.11 and both have it with the correct stereochemistry.

