Archive for the Wikipedia Services Category

roadrunnerAs an active member of the Wikipedia Chemistry team I continue to be impressed with the dedication and commitment that the members have to improving the quality AND quantity of information available on Wikipedia for chemists. The number of lost hours of sleep freely given to the benefit of Wikipedia, and in this specific case to the chemistry community, is immense. The number of “Compound Pages” on Wikipedia dedicated to drugs/chemicals has continued to grow and, despite a sincere effort on our part to keep everything linked up from ChemSpider to Wikipedia it’s a little like chasing the Road Runner….we’re always behind!

We have been working with the WikiChem team of late to embed links from Wikipedia back to ChemSpider. I am humbled to know that our hard work to establish ChemSpider as a source of quality information has reached a level of trust such that Wikipedia now links from the ChemBoxes out to ChemSpider. The links are being updated on an on going basis at present with hundreds of new links already established and more being generated on an ongoing basis. Wikipedia User: Beetstra has written a ‘bot that is inserting ChemSpiderIDs across the database (see below) and we ARE doing rigorous checking of all of the links.This was using a file that we generated on our side showing links to Wikipedia from ChemSpider.

beetstra

We will then be able to generate a list of all ChemBoxes/DrugBoxes without links from Wikipedia to ChemSpider and we will then make the links on our side, manually curating the structures, and then hand back a file to finish all linking. At this point we will have the backfile under control and we can perform ongoing updates as new compound pages are created on ChemSpider and, if we curate and find errors on Wikipedia or ChemSpider making a few manual edits is easy.

There are very dedicated teams on Wikipedia and ChemSpider carefully poring over data with their robots and eyeballs to create a linked data set of quality chemistry. It’s long, tedious AND important work. When its done we will have an expanded set of data to semantically link from RSC articles when we do markup.

Last week I had the pleasure of being on an agenda with a number of people whose work I applaud and who I genuinely enjoy spending time with and sharing thoughts about “what if?” Martin Walker, one of the people I collaborate with on Wikipedia, invited me to speak in his session “Publishing and Promoting Chemistry in the Internet Age“. Martin gave an introduction to the session and spoke about Chemistry on the Internet. Beth Brown gave an overview of the Chemist’s Toolkit for Publishing and Promoting your work on the Internet. I followed with an overview about what’s going on with ChemSpider and the issues of connectedness and quality of chemistry on the internet. JC Bradley spoke about transparency and Open Notebook Science. My hat’s off to Martin for arranging the speakers in that order. Considering we didn’t coordinate our talks it was an excellent trajectory throughout the session and very much an integrated overview of activities regarding chemistry on the internet.

My talk is posted on SlideShare here and is available below. Any comments and questions are welcomed.

Beth Brown has her talk online here and JC Bradley will post his online here.

JC Bradley and I had a good talk about ways we can collaborate together more closely on Open Notebook Science. We have a path forward so that ChemSpider can provide additional support and will be discussing the path forward offline.

I blogged yesterday about our release of Wikipedia Services on ChemSpider and how we are working to support authors on Wikipedia articles. Of course there are MANY languages of Wikipedia (as shown below) and we are willing to produce multilingual support. All we need is someone from the specific language version of Wikipedia to contact us and map the ChemBoxes and Drugboxes into their relevant languages. Let us know if you are interested.

languages

Reblog this post [with Zemanta]

Wikipedia is great. I use it regularly. I’ve been working, with a team of experts, on curating and validating the “structure-based data” in the ChemBoxes and DrugBoxes for almost a year and a half. It’s been a long path and on the journey I have met some great people and made some true friends. I also HAVE NOT met most of the people I share the IRC chats with. We are a highly opinionated bunch of people but with a common focus of making Wikipedia better and making the data and content as accurate as possible.

We have the Wikipedia article lead in thousands of records on ChemSpider now. They are updated regularly as Wikipedia itself expands. One of the areas we have been focused on since the inception of the work was getting correct structures in place with the associated data. This includes the molecular formula, molecular weight, SMILES, InChI String, InChIKey, systematic name and so on. In order to help the process of expanding Wikipedia with new records and to provide a lot of these data automatically we have set about providing a Wikipedia Service so that Wikipedians can use ChemSpider as the source of the chemical structures of interest and generate the DrugBox and ChemBox content from ChemSpider. It’s a rather simple process…

Assume that you wanted to create a ChemBox for Domoic Acid you would search Domoic Acid on ChemSpider. You would then validate whether the structure on ChemSpider named domoic acid is correct and. if so, you would generate the Wikibox by clicking on the link to the right of the Quick Links

wikibox1

Following this simple button click the user is shown a new window displaying the “Design Wikibox” functionality. There are various flavors of ChemBoxes and Drugboxes which can be generated and the image below shows the “Simple ChemBox”

wikibox2

At present we fill the box with those data we have easy access to from ChemSpider and based on the chemical structure. We list all other fields for Wiki depositors to populate. For the Simple ChemBox this looks like this for Domoic Acid

{{Chembox
| ImageFile =
| ImageSize =
| IUPACName = (2S,3S,4S)-3-(carboxymethyl)-4-[(1Z,3E,5R)-6-hydroxy-1,5-dimethyl-6-oxo-hexa-1,3-dienyl]pyrrolidine-2-carboxylic acid
| OtherNames =
| Section1 = {{Chembox Identifiers
| CASNo =
| PubChem = 5282253
| ChemSpiderID = 4445428
| SMILES = O=C(O)[C@H]1NC[C@H](/C(=C\C=C\[C@H](C(=O)O)C)C)[C@@H]1CC(=O)O }}
| Section2 = {{Chembox Properties
| Formula = C15H21NO6
| MolarMass = 311.3303
| Appearance =
| Density =
| MeltingPt =
| BoilingPt =
| Solubility = }}
| Section3 = {{Chembox Hazards
| MainHazards =
| FlashPt =
| Autoignition = }}
}}

We insert the PubChemID associated with the particular structure if there is a related PubChem record. We also insert the ChemSpider ID in case the user wants to link back to ChemSpider.  A Full ChemBox is much longer:

{{Chembox
| Name =
| ImageFile =
| ImageSize =
| IUPACName = (2S,3S,4S)-3-(carboxymethyl)-4-[(1Z,3E,5R)-6-hydroxy-1,5-dimethyl-6-oxo-hexa-1,3-dienyl]pyrrolidine-2-carboxylic acid
| SystematicName = (2S,3S,4S)-3-(carboxymethyl)-4-[(1Z,3E,5R)-6-hydroxy-1,5-dimethyl-6-oxo-hexa-1,3-dienyl]pyrrolidine-2-carboxylic acid
| OtherNames =
| Section1 = {{Chembox Identifiers
| Abbreviations =
| CASNo =
| EINECS =
| EINECSCASNO =
| PubChem = 5282253
| ChemSpiderID = 4445428
| SMILES = O=C(O)[C@H]1NC[C@H](/C(=C\C=C\[C@H](C(=O)O)C)C)[C@@H]1CC(=O)O
| InChI = InChI=1S/C15H21NO6/c1-8(4-3-5-9(2)14(19)20)11-7-16-13(15(21)22)10(11)6-12(17)18/h3-5,9-11,13,16H,6-7H2,1-2H3,(H,17,18)(H,19,20)(H,21,22)/b5-3+,8-4-/t9-,10+,11-,13+/m1/s1
| RTECS =
| MeSHName = domoic acid
| ChEBI =
| KEGG = C13732
| ATCCode_prefix =
| ATCCode_suffix =
| ATC_Supplemental =}}
| Section2 = {{Chembox Properties
| Formula = C15H21NO6
| MolarMass = 311.3303
| Appearance =
| Density =
| MeltingPt =
| Melting_notes =
| BoilingPt =
| Boiling_notes =
| Solubility =
| SolubleOther =
| Solvent =
| LogP =
| VaporPressure =
| HenryConstant =
| AtmosphericOHRateConstant =
| pKa =
| pKb = }}
| Section3 = {{Chembox Structure
| CrystalStruct =
| Coordination =
| MolShape = }}
| Section4 = {{Chembox Thermochemistry
| DeltaHf =
| DeltaHc =
| Entropy =
| HeatCapacity = }}
| Section5 = {{Chembox Pharmacology
| AdminRoutes =
| Bioavail =
| Metabolism =
| HalfLife =
| ProteinBound =
| Excretion =
| Legal_status =
| Legal_US =
| Legal_UK =
| Legal_AU =
| Legal_CA =
| PregCat =
| PregCat_AU =
| PregCat_US = }}
| Section6 = {{Chembox Explosive
| ShockSens =
| FrictionSens =
| ExplosiveV =
| REFactor = }}
| Section7 = {{Chembox Hazards
| ExternalMSDS =
| EUClass =
| EUIndex =
| MainHazards =
| NFPA-H =
| NFPA-F =
| NFPA-R =
| NFPA-O =
| RPhrases =
| SPhrases =
| RSPhrases =
| FlashPt =
| Autoignition =
| ExploLimits =
| LD50 =
| PEL = }}
| Section8 = {{Chembox Related
| OtherAnions =
| OtherCations =
| OtherFunctn =
| Function =
| OtherCpds = }}
}}

The user can also use the ChemSpider image and can resize it and click on the image to download it as a PNG file. We believe that our images are attractive and appropriate for web display. Wikipedia present favors the ACS format so based on feedback we can change the config file behind the image generator to produce a different format for display.

We are considering extending the system to support direct uploads of Molfiles and/or other structure formats rather than depending on a compound being on ChemSpider. However, it is VERY likely that chemical compounds of value to the Wikipedia encyclopedic content already exist on ChemSpider. The trick is to find them since they may not have the Wikipedia article chemical name associated with the record. An InChI-based, SMILES-based or alternative name search might help locate the record. Alternatively a full structure search via the applet will find the record OR the user can DEPOSIT the structure to ChemSpider and work from there. The system is flexible enough.

This is our first release of the Wikipedia Services so we welcome any and all feedback. It’s one more way we are giving back to the Wikipedia community for their service. The outcome for us will also be crowdsourced curation of ChemSpider…as Wikipedia articles are written we will clean up related structures on ChemSpider. Everyone wins.

By the way…check OUR structure for Domoic Acid with that one on ChemSpider. Does anyone know which is correct?

Reblog this post [with Zemanta]