Wikipedia is great. I use it regularly. I’ve been working, with a team of experts, on curating and validating the “structure-based data” in the ChemBoxes and DrugBoxes for almost a year and a half. It’s been a long path and on the journey I have met some great people and made some true friends. I also HAVE NOT met most of the people I share the IRC chats with. We are a highly opinionated bunch of people but with a common focus of making Wikipedia better and making the data and content as accurate as possible.

We have the Wikipedia article lead in thousands of records on ChemSpider now. They are updated regularly as Wikipedia itself expands. One of the areas we have been focused on since the inception of the work was getting correct structures in place with the associated data. This includes the molecular formula, molecular weight, SMILES, InChI String, InChIKey, systematic name and so on. In order to help the process of expanding Wikipedia with new records and to provide a lot of these data automatically we have set about providing a Wikipedia Service so that Wikipedians can use ChemSpider as the source of the chemical structures of interest and generate the DrugBox and ChemBox content from ChemSpider. It’s a rather simple process…

Assume that you wanted to create a ChemBox for Domoic Acid you would search Domoic Acid on ChemSpider. You would then validate whether the structure on ChemSpider named domoic acid is correct and. if so, you would generate the Wikibox by clicking on the link to the right of the Quick Links

wikibox1

Following this simple button click the user is shown a new window displaying the “Design Wikibox” functionality. There are various flavors of ChemBoxes and Drugboxes which can be generated and the image below shows the “Simple ChemBox”

wikibox2

At present we fill the box with those data we have easy access to from ChemSpider and based on the chemical structure. We list all other fields for Wiki depositors to populate. For the Simple ChemBox this looks like this for Domoic Acid

{{Chembox
| ImageFile =
| ImageSize =
| IUPACName = (2S,3S,4S)-3-(carboxymethyl)-4-[(1Z,3E,5R)-6-hydroxy-1,5-dimethyl-6-oxo-hexa-1,3-dienyl]pyrrolidine-2-carboxylic acid
| OtherNames =
| Section1 = {{Chembox Identifiers
| CASNo =
| PubChem = 5282253
| ChemSpiderID = 4445428
| SMILES = O=C(O)[C@H]1NC[C@H](/C(=C\C=C\[C@H](C(=O)O)C)C)[C@@H]1CC(=O)O }}
| Section2 = {{Chembox Properties
| Formula = C15H21NO6
| MolarMass = 311.3303
| Appearance =
| Density =
| MeltingPt =
| BoilingPt =
| Solubility = }}
| Section3 = {{Chembox Hazards
| MainHazards =
| FlashPt =
| Autoignition = }}
}}

We insert the PubChemID associated with the particular structure if there is a related PubChem record. We also insert the ChemSpider ID in case the user wants to link back to ChemSpider.  A Full ChemBox is much longer:

{{Chembox
| Name =
| ImageFile =
| ImageSize =
| IUPACName = (2S,3S,4S)-3-(carboxymethyl)-4-[(1Z,3E,5R)-6-hydroxy-1,5-dimethyl-6-oxo-hexa-1,3-dienyl]pyrrolidine-2-carboxylic acid
| SystematicName = (2S,3S,4S)-3-(carboxymethyl)-4-[(1Z,3E,5R)-6-hydroxy-1,5-dimethyl-6-oxo-hexa-1,3-dienyl]pyrrolidine-2-carboxylic acid
| OtherNames =
| Section1 = {{Chembox Identifiers
| Abbreviations =
| CASNo =
| EINECS =
| EINECSCASNO =
| PubChem = 5282253
| ChemSpiderID = 4445428
| SMILES = O=C(O)[C@H]1NC[C@H](/C(=C\C=C\[C@H](C(=O)O)C)C)[C@@H]1CC(=O)O
| InChI = InChI=1S/C15H21NO6/c1-8(4-3-5-9(2)14(19)20)11-7-16-13(15(21)22)10(11)6-12(17)18/h3-5,9-11,13,16H,6-7H2,1-2H3,(H,17,18)(H,19,20)(H,21,22)/b5-3+,8-4-/t9-,10+,11-,13+/m1/s1
| RTECS =
| MeSHName = domoic acid
| ChEBI =
| KEGG = C13732
| ATCCode_prefix =
| ATCCode_suffix =
| ATC_Supplemental =}}
| Section2 = {{Chembox Properties
| Formula = C15H21NO6
| MolarMass = 311.3303
| Appearance =
| Density =
| MeltingPt =
| Melting_notes =
| BoilingPt =
| Boiling_notes =
| Solubility =
| SolubleOther =
| Solvent =
| LogP =
| VaporPressure =
| HenryConstant =
| AtmosphericOHRateConstant =
| pKa =
| pKb = }}
| Section3 = {{Chembox Structure
| CrystalStruct =
| Coordination =
| MolShape = }}
| Section4 = {{Chembox Thermochemistry
| DeltaHf =
| DeltaHc =
| Entropy =
| HeatCapacity = }}
| Section5 = {{Chembox Pharmacology
| AdminRoutes =
| Bioavail =
| Metabolism =
| HalfLife =
| ProteinBound =
| Excretion =
| Legal_status =
| Legal_US =
| Legal_UK =
| Legal_AU =
| Legal_CA =
| PregCat =
| PregCat_AU =
| PregCat_US = }}
| Section6 = {{Chembox Explosive
| ShockSens =
| FrictionSens =
| ExplosiveV =
| REFactor = }}
| Section7 = {{Chembox Hazards
| ExternalMSDS =
| EUClass =
| EUIndex =
| MainHazards =
| NFPA-H =
| NFPA-F =
| NFPA-R =
| NFPA-O =
| RPhrases =
| SPhrases =
| RSPhrases =
| FlashPt =
| Autoignition =
| ExploLimits =
| LD50 =
| PEL = }}
| Section8 = {{Chembox Related
| OtherAnions =
| OtherCations =
| OtherFunctn =
| Function =
| OtherCpds = }}
}}

The user can also use the ChemSpider image and can resize it and click on the image to download it as a PNG file. We believe that our images are attractive and appropriate for web display. Wikipedia present favors the ACS format so based on feedback we can change the config file behind the image generator to produce a different format for display.

We are considering extending the system to support direct uploads of Molfiles and/or other structure formats rather than depending on a compound being on ChemSpider. However, it is VERY likely that chemical compounds of value to the Wikipedia encyclopedic content already exist on ChemSpider. The trick is to find them since they may not have the Wikipedia article chemical name associated with the record. An InChI-based, SMILES-based or alternative name search might help locate the record. Alternatively a full structure search via the applet will find the record OR the user can DEPOSIT the structure to ChemSpider and work from there. The system is flexible enough.

This is our first release of the Wikipedia Services so we welcome any and all feedback. It’s one more way we are giving back to the Wikipedia community for their service. The outcome for us will also be crowdsourced curation of ChemSpider…as Wikipedia articles are written we will clean up related structures on ChemSpider. Everyone wins.

By the way…check OUR structure for Domoic Acid with that one on ChemSpider. Does anyone know which is correct?

Reblog this post [with Zemanta]
Stumble it!

5 Responses to “Providing Some Structured Support with ChemSpider’s Wikipedia Services”

  1. Martin Walker says:

    Just tested it on the sodium periodate page, and everything worked fine. I think this will be an excellent tool for us to use on Wikipedia. If the structures can be adjusted to ACS format, it’ll be perfect. MANY thanks!

  2. bill says:

    You mean, check the CS structure vs the one shown in Wikipedia right now? The PubChem link from the Wikipedia entry goes to a structure drawn the same way as the one in CS, but the Wikipedia one is different in two ways: it’s flipped vertically, and the two ring-associated carboxylic acid groups are shown in different relation to one another. Since I see no reason they can’t rotate freely, that seems to be just a drawing choice, and the two structures look the same to me.

    What am I missing? :-)

    Something else I found… interesting, was to click the CAS number link on the Wikipedia entry. Good thing there’s a free alternative, no?

  3. Antony Williams says:

    Bill – if you compare the PubChem structure with the Wikipedia structure they are the same. But check both with the ChemSPider structure and they are different. Welcome to the world of validating structures. Compare the systematic name on ChemSpider with that on PubChem and you will see a difference in the name and it will help you find the difference on the structure

    PubChem name: (2S,3S,4S)-3-(carboxymethyl)-4-[(2Z,4E)….

    ChemSpider name: (2S,3S,4S)-3-(carboxymethyl)-4-[(1Z,3E,5R)….

    Also, checking the InChI on PubChem you’ll see in the stereo a “question mark”
    InChI=1S/C15H21NO6c1-8(4-3-5-9(2)14(19)20)11-7-16-13(15(21)22)10(11)6-12(17)18/h3-5,9-11,13,16H,6-7H2,1-2H3,(H,17,18)(H,19,20)(H,21,22)/b5-3+,8-4-/t9?,10-,11+,13-/m0/s1

    Now the question is whether undefined stereo for domoic acid is correct or not?

  4. Antony Williams says:

    Bill..in regards to your comments about the CAS number linking out to CommonChemistry.Org and NOT finding Domoic Acid the reason is that not all Wikipedia articles are represented in CommonChemistry.org. There is a certain overlap but it is far from complete.

  5. ChemSpider Blog » Blog Archive » German and Spanish Wikibox Support Added says:

    [...] recently released ChemSpider’s WikiBox service. Then we made a call for support so we could release multilingual support. Our friends on the [...]

Leave a Reply