Archive for the ChemSpider Services Category

In the first of many integration projects presently underway inside the RSC to bring together the benefits of ChemSpider with existing systems we’re happy to announce that the Prospected compound pages are now using structure images from ChemSpider as shown below. We spent a lot of time creating aesthetically pleasing structure images for ChemSpider and especially for display on webpages and blogs so we’re happy to see them show up in other venues too.

We unveiled the ability to embed chemical structure images as well as embedding spectra last year. Now there are multiple blogs using the embed functionality, structures are starting to show up on Wikipedia and our web services are being used for structure image retrieval. We encourage you to make use of the resources we are delivering and any feedback.

prospect

Buy me a Coffee

social widget

Following on from other posts in this series from this week I’m going to continue to list new functionality over the holiday season. I’ll continue with the “Social Widget”. What IS the Social Widget? Well…it’s this thing to the left….it is an AddThis Button that is available for every compound page on ChemSpider now. If there is a particular chemical of interest on ChemSpider that you want to include into your social networking then you can do so by choosing the social networking site of interest and “adding” the link in there. For some it posts the link and for others it posts a thumbnail of the structure there that is linked back directly into ChemSpider.

So, if I posted to Friendfeed it will send the link directly into Friendfeed. I just did it..worked perfectly. For Facebook it actually carries the thumbnail as shown below on my Facebook page. SO, deposit some of your molecules onto ChemSpider and let the world know! Add some data, tell a story, post a reaction…and use AddThis to tell your network!

photochromism

Buy me a Coffee

While some say “Silence is Golden” some of us find it deafening! One of my common statements regarding Press Releases and political commentaries is there is as much said in the “unsaid”. Why this lead in to this blog post? Well….the truth is we haven’t been very productive in the past few weeks with the delivery of new functionality onto ChemSpider and people have been asking me why we haven’t been so prolific with our updates. Well….in this case Silence is Golden based on the new functionality and data rolling out soon!

Historically we were introducing new functionality every few days and rolling it out with a “continuous beta” approach to delivery. We were also working on only three computers and were challenged with issues of uptime and handling. At the RSC we have access to development, test and live environments, we have a stable compute environment supporting the system that provides power support where previously we would have been at risk of outages. We have a support team who have “got our backs” and we are not dealing with all of the issues regarding keeping the environment healthy for the ChemSpider platform. With our new hosted environment and the drive to move away from our previous constant and ongoing updates to a more controlled process for rollout, specifically including internal testing prior to going Live, we have been working on procedures to ensure the best delivery. In parallel we have been working on a series of internal projects that are very exciting and you should see the results soon!

With our new processes in place, and our new systems now established we have been working on new functionality development and are happy to announce that we will now be moving towards regular updates, every few weeks. We’re starting this week with the roll out of a set of new capabilities for you to try out. I’ll highlight these in a series of blog posts over the coming days.Let’s start with this one…

We are happy to announce an improved integration to the patent web service provided to us via our collaboration with SureChem. We announced our initial integration to this service at the ACS meeting last fall in Washington and received a lot of positive feedback regarding the implementation. That rollout only provided integration to a subset of the entire collection, the USPTO. SureChem host data from a number of patent agencies and the collection includes USPTO Granted, USPTO Applications, European Granted, European Applications, WO/PCT and Japanese Abstracts. Thanks to their web service we now have the ability to retrieve information regarding those sources also. The image below shows the patents retrieved for Xanax. Check it out…give us your feedback and extend holiday cheer to SureChem also for their contribution to the community.

patents

Buy me a Coffee

roadrunnerAs an active member of the Wikipedia Chemistry team I continue to be impressed with the dedication and commitment that the members have to improving the quality AND quantity of information available on Wikipedia for chemists. The number of lost hours of sleep freely given to the benefit of Wikipedia, and in this specific case to the chemistry community, is immense. The number of “Compound Pages” on Wikipedia dedicated to drugs/chemicals has continued to grow and, despite a sincere effort on our part to keep everything linked up from ChemSpider to Wikipedia it’s a little like chasing the Road Runner….we’re always behind!

We have been working with the WikiChem team of late to embed links from Wikipedia back to ChemSpider. I am humbled to know that our hard work to establish ChemSpider as a source of quality information has reached a level of trust such that Wikipedia now links from the ChemBoxes out to ChemSpider. The links are being updated on an on going basis at present with hundreds of new links already established and more being generated on an ongoing basis. Wikipedia User: Beetstra has written a ‘bot that is inserting ChemSpiderIDs across the database (see below) and we ARE doing rigorous checking of all of the links.This was using a file that we generated on our side showing links to Wikipedia from ChemSpider.

beetstra

We will then be able to generate a list of all ChemBoxes/DrugBoxes without links from Wikipedia to ChemSpider and we will then make the links on our side, manually curating the structures, and then hand back a file to finish all linking. At this point we will have the backfile under control and we can perform ongoing updates as new compound pages are created on ChemSpider and, if we curate and find errors on Wikipedia or ChemSpider making a few manual edits is easy.

There are very dedicated teams on Wikipedia and ChemSpider carefully poring over data with their robots and eyeballs to create a linked data set of quality chemistry. It’s long, tedious AND important work. When its done we will have an expanded set of data to semantically link from RSC articles when we do markup.

Buy me a Coffee

I’ve been in discussions with JC Bradley and Andy Lang about the Open Notebook Science Solubility Data project. Specifically we’ve been comparing  logP predictions from the CDK versus those listed on ChemSpider. We actually have six values of logP listed for some records. For example, for toluene we have 4 predicted values, 1 experimental value from a database and 1 experimental value from a publication. These are shown below:

toluene4 logpThere are three predicted logP values from three different algorithms (ACD/LogP, XlogP and AlogPs) as shown at the top of the figure. There is a predicted value and a database value from the EPISuite from the EPA (middle of the figure) and there is a LogP value from a publication with the link out indicated by the arrow (this datum was deposited by Egon Willighagen when he deposited the data from his publication). If you examine the list of data, both experimental and predicted, you will see a general value of  around 2.65+/- error. This should be compared with the CDK value listed in the ONS spreadsheet that gives a predicted value of 0.64. This was the primary reason that we were discussing the comparison…the values of predicted logP from CDK were different from the predicted values listed on ChemSpider for a number of examples in the spreadsheet.

Egon and I exchanged a couple of emails discussing the fact that logP predictions could be generated by a number of parties if there was a good Open Data training set available. A recent publication entitled “Calculation of Molecular Lipophilicity:State of the Art and Comparison of Log P Methods on More Than 96000 Compounds” performed a thorough analysis of different logP methods on a very large dataset. The publication is available online here. They compared “the predictive power of representative methods for one public (N = 266) and two in house datasets from Nycomed(N = 882) and Pfizer (N = 95 809). A total of 30 and 18 methods were tested for public and industrial datasets, respectively.” During the work they derived a simple equation based on the number of carbon atoms, NC, and the number of hetero atoms, NHET: log P = 1.46(±0.02) + 0.11(±0.001) NC – 0.11(±0.001) NHET. This equation was shown to outperform a large number of programs benchmarked in this study. This would certainly be easy to implement on ChemSpider and, just out of interest, applying this equation to toluene gives us a value of 2.23. Compare this with the values listed above.

Unfortunately there doesn’t appear to be too many Open logP datasets available around for people to use as training sets. Also, with the thorough work reported in the publication above is it necessary to build yet another logP prediction algorithm? ACD/Labs have made their logP prediction software free for download (http://www.acdlabs.com/download/logp.html), the VCCLab software is available for free (http://www.vcclab.org/lab/alogps/), the EPISuite software is available for free (http://www.epa.gov/oppt/exposure/pubs/episuite.htm) and if you just want to predict a value for a compound not on ChemSpider then you can use the services here: http://www.chemspider.com/Services.aspx.

However, even though there are a lot of predictors available it still makes sense to gather data and provide it as an experimental dataset, made available as Open Data for the developers of such algorithms to ake the benefits of structural diversity and fresh data to potentially improve their models. If you have any logP data available please point me to the data to download or contact me offline to discuss. We are presently working on enhancing our data model to provide improved access to experimental data on ChemSpider as well as access to the predicted data via web services. More to follow…

Buy me a Coffee

Last week I had the pleasure of being on an agenda with a number of people whose work I applaud and who I genuinely enjoy spending time with and sharing thoughts about “what if?” Martin Walker, one of the people I collaborate with on Wikipedia, invited me to speak in his session “Publishing and Promoting Chemistry in the Internet Age“. Martin gave an introduction to the session and spoke about Chemistry on the Internet. Beth Brown gave an overview of the Chemist’s Toolkit for Publishing and Promoting your work on the Internet. I followed with an overview about what’s going on with ChemSpider and the issues of connectedness and quality of chemistry on the internet. JC Bradley spoke about transparency and Open Notebook Science. My hat’s off to Martin for arranging the speakers in that order. Considering we didn’t coordinate our talks it was an excellent trajectory throughout the session and very much an integrated overview of activities regarding chemistry on the internet.

My talk is posted on SlideShare here and is available below. Any comments and questions are welcomed.

Beth Brown has her talk online here and JC Bradley will post his online here.

JC Bradley and I had a good talk about ways we can collaborate together more closely on Open Notebook Science. We have a path forward so that ChemSpider can provide additional support and will be discussing the path forward offline.

Buy me a Coffee

Google are riding the surf associated with their release of Wave, even to a very small group of testers. Just do a search of Google Wave and you’ll see what I mean. There is a certain amount of “wave envy” in our domain right now as people want to get accounts to test. Test accounts are however being freed up quite quickly and there will be a number of cheminformaticians eager to insert their code into Wave as robots and enable specific integrations. When I was at Scifoo a few weeks ago we were granted Wave accounts to play around. I was impressed with the possibilities but found the system to be a little underwhelming in terms of stability and a little unfriendly in terms of usability. But, these are issues acknowledged by the team and, like many things Google, we are sure to see Wave get picked up by the masses when it’s released. And, if WILL release, with great fanfare.

Cameron Neylon has been the most vocal advocate of Google Wave ever since the first announcements were made about the platform. He has been pivotal in getting a voice for science with the Google Wave team and coordinated a meeting for us with members of the dev team at SciFoo. It was clear in that meeting that the meshing of ChemSpider web services into Google Wave would enable Waves to be enhanced with (semi-)semantic markups so that, at a minimum, chemical names could be used to lookup chemicals on ChemSpider and retrieve a structure image so that hovering over the name in the document would sow the structure image. Unfortunately we’ve been swamped with migrating ChemSpider to RSC servers and preparing for and attending the IUPAC Congress and ACS Fall Meeting in Washington. So, we got a grand sum of  nothing done integrating Wave and ChemSpider.

Fortunately, we did well when the web services were built and Cameron has moved ahead with coding up ChemSpidey on his own. He announced that ChemSpider is alive and kicking, with all eight legs, in his blog post here. Stealing shamelessly from Cameron’s post:

“If ChemSpidey is added to a wave it watches for text of the form “chem[ChemicalName{;weight {m}g}]” where the curly bracketed parts are optional. When a blip is submitted by hitting the “done” button ChemSpidey searches through the blip looking for this text and if it finds it, strips out the name and sends it) to the ChemSpider SimpleSearch service. ChemSpider returns a list of database ids and the robot currently just pulls the top one off the list and adds the text ChemicalName (csid:####) to the wave, where the id is linked back to ChemSpider. If there is a weight present it asks the ChemSpider MassSpec API for the nominal molecular weight calculates the number of moles and inserts that. You can see video of it working here (look along the timeline for the ChemSpidey tag).”

Go nd watch the movie. You’ll likely have to watch it while zoomed in to see what is gong on. Cameron went on further than I’d originally consider by pulling back Mw from our MassSpec Web service in order to do calculations on the fly etc. The display of the structure by hovering over the CSID embedded in the Wave is not yet implemented and we need to cover this for sure.

This is a good start to build on and some things that we have to work on…

1) If a call is made to retrieve a chemical based on a chemical name and there are MULTIPLE compounds with that name then figure out how to allow the user to select the one they want

2) Display the structure image with direct link back to ChemSpider – and if appropriate extend to include links to PubChem, Wikipedia, RSC journal articles etc, presence of analytical data etc. (all the things we were going to do with ChemMantis!)

3) Change data model to mark “Fully Curated”  structures so that when a structure image and associated meta data are passed to ChemSpidey the robot knows that this isn’t just a name-structure relationship but that humans have curated the data and say “it’s correct”. Then of course…humans can be wrong too!

4) Provide access to other services -from a structure in a Google Wave document allow generation of InChI, InChIKey, SMILES, search PubMed, search Patents, “world is my oyster.com”

We are now working in multiweek development sprints and will look to include some time for ChemSpidey enhancement/development in a future sprint. I have a lot of faith in wha Google Wave will bring to us all and despite the early teething troubles,as with all things Google (as far as I can tell) it will improve in terms of stability and usability but may be in perptual beta for a few years!

Buy me a Coffee

For those of you who have been using ChemSpider for the past few months you will be aware that historically we had an integration in place to SureChem’s Patent Portal. A few months ago that integration was unfortunately broken as SureChem improved their service. Also, we were un-synchronized with their growing set of chemical structures as they updated their patents. The previous integration was very limited in nature anyway as it simply showed the presence of patents associated with the ChemSpider structure in the SureChem database. Certainly a more ideal solution is the one that we introduced just in time for the ACS meeting in Washington.

The new solution lists not only the number of patents containing the chemical compound shown in the ChemSpider record but also show the first 10 patents, by title, and provides direct link-throughs to the patents on SureChem. This is a much improved integration and we hope you enjoy it.  The next stage is to deposit the latest SureChem structure collection that has grown significantly since our last deposition. Thanks to our collaborators at SureChem from offering you, our users, access to their service.

xanaxpatent

Reblog this post [with Zemanta]

Buy me a Coffee

It was a busy week at the ACS meeting in Washington. I gave three presentations and the title, abstracts and links to Slideshare are given below:

Oops and Downs of Resolving InChIs For the Chemistry Community (Link to Slideshare)

The InChI resolver was rolled out to the community in March 2009 with the purpose of providing a centralized resource for chemists to resolve InChIs (International Chemical Identifiers). This presentation will provide an overview of the development of the underlying technologies associated with the InChI resolver, and how the resolver is being used, integrated and enhanced to provide additional value to the chemistry community. We will discuss present limitations to application of the resolver for providing access to databases and chemistry information distributed across the internet and define our vision for enhancing interconnectivity across Open databases using the InChI resolver as the glue.

ChemSpider: Building a knowledge-based community for chemists using social and data networking technologies (Link to Slideshare)

In less than 2 years ChemSpider has become one of the primary online resources for chemists providing access to an unsurpassed aggregate of free-access knowledge and data. ChemSpider was developed with the intention of providing a structure centric community for chemists that would be enhanced by data depositions, curations and annotations by the community. The system presently hosts over 21.5 million chemical compounds from over 200 data sources. Working with a network of advisors, collaborators and data providers ChemSpider has created a unique resource of integrated information for chemists. These efforts have enabled us to support the curation of the Wikipedia chemistry pages, the production of a community supported Open Access chemistry journal and provision of web services integrated to spectrometer systems distributed around the world. This talk will provide an overview of how ChemSpider utilized social and data networking to create a community for chemistry.

Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources (Link to Slideshare)

The extraction of chemical entities from documents such as patents and publications has been pursued for a number of years. We wish to report on ChemMantis, an integrated system for chemistry-based entity extraction and document mark-up enabling access to the rich resource of online chemistry know as ChemSpider. We will discuss the development of the platform from its inception as a series of dictionaries to the integration of an entity extraction algorithm and its expansion to a public deposition and publishing platform for chemistry. Chemistry articles can now be deposited, marked-up and exposed to the public within a few minutes in many cases making it an ideal platform for communicating research and providing integrated access to data sources including PubChem, ChEBI, Wikipedia and Entrez.

Reblog this post [with Zemanta]

Buy me a Coffee

ChemSpider will go offline today for the next 24 hours. We will switch the servers off at around 11am today (give or take some latitude). We will do a differential backup and restore to the RSC servers all changes to the database and switch over to their systems overnight. Testing performed over the weekend has proceeded rather well and we are hoping for a seamless transition, acknowledging that we will have this one day of downtime.

We apologize in advance for any disruptions. We know that there are a lot of people now using ChemSpider services to feed your own systems so our apologies in advance. We expect improved service for all when this transition is complete.

We’ll see you on the other side of this transition in just over 24 hours. Wish us luck…

Buy me a Coffee

I blogged yesterday about our release of Wikipedia Services on ChemSpider and how we are working to support authors on Wikipedia articles. Of course there are MANY languages of Wikipedia (as shown below) and we are willing to produce multilingual support. All we need is someone from the specific language version of Wikipedia to contact us and map the ChemBoxes and Drugboxes into their relevant languages. Let us know if you are interested.

languages

Reblog this post [with Zemanta]

Buy me a Coffee

Wikipedia is great. I use it regularly. I’ve been working, with a team of experts, on curating and validating the “structure-based data” in the ChemBoxes and DrugBoxes for almost a year and a half. It’s been a long path and on the journey I have met some great people and made some true friends. I also HAVE NOT met most of the people I share the IRC chats with. We are a highly opinionated bunch of people but with a common focus of making Wikipedia better and making the data and content as accurate as possible.

We have the Wikipedia article lead in thousands of records on ChemSpider now. They are updated regularly as Wikipedia itself expands. One of the areas we have been focused on since the inception of the work was getting correct structures in place with the associated data. This includes the molecular formula, molecular weight, SMILES, InChI String, InChIKey, systematic name and so on. In order to help the process of expanding Wikipedia with new records and to provide a lot of these data automatically we have set about providing a Wikipedia Service so that Wikipedians can use ChemSpider as the source of the chemical structures of interest and generate the DrugBox and ChemBox content from ChemSpider. It’s a rather simple process…

Assume that you wanted to create a ChemBox for Domoic Acid you would search Domoic Acid on ChemSpider. You would then validate whether the structure on ChemSpider named domoic acid is correct and. if so, you would generate the Wikibox by clicking on the link to the right of the Quick Links

wikibox1

Following this simple button click the user is shown a new window displaying the “Design Wikibox” functionality. There are various flavors of ChemBoxes and Drugboxes which can be generated and the image below shows the “Simple ChemBox”

wikibox2

At present we fill the box with those data we have easy access to from ChemSpider and based on the chemical structure. We list all other fields for Wiki depositors to populate. For the Simple ChemBox this looks like this for Domoic Acid

{{Chembox
| ImageFile =
| ImageSize =
| IUPACName = (2S,3S,4S)-3-(carboxymethyl)-4-[(1Z,3E,5R)-6-hydroxy-1,5-dimethyl-6-oxo-hexa-1,3-dienyl]pyrrolidine-2-carboxylic acid
| OtherNames =
| Section1 = {{Chembox Identifiers
| CASNo =
| PubChem = 5282253
| ChemSpiderID = 4445428
| SMILES = O=C(O)[C@H]1NC[C@H](/C(=C\C=C\[C@H](C(=O)O)C)C)[C@@H]1CC(=O)O }}
| Section2 = {{Chembox Properties
| Formula = C15H21NO6
| MolarMass = 311.3303
| Appearance =
| Density =
| MeltingPt =
| BoilingPt =
| Solubility = }}
| Section3 = {{Chembox Hazards
| MainHazards =
| FlashPt =
| Autoignition = }}
}}

We insert the PubChemID associated with the particular structure if there is a related PubChem record. We also insert the ChemSpider ID in case the user wants to link back to ChemSpider.  A Full ChemBox is much longer:

{{Chembox
| Name =
| ImageFile =
| ImageSize =
| IUPACName = (2S,3S,4S)-3-(carboxymethyl)-4-[(1Z,3E,5R)-6-hydroxy-1,5-dimethyl-6-oxo-hexa-1,3-dienyl]pyrrolidine-2-carboxylic acid
| SystematicName = (2S,3S,4S)-3-(carboxymethyl)-4-[(1Z,3E,5R)-6-hydroxy-1,5-dimethyl-6-oxo-hexa-1,3-dienyl]pyrrolidine-2-carboxylic acid
| OtherNames =
| Section1 = {{Chembox Identifiers
| Abbreviations =
| CASNo =
| EINECS =
| EINECSCASNO =
| PubChem = 5282253
| ChemSpiderID = 4445428
| SMILES = O=C(O)[C@H]1NC[C@H](/C(=C\C=C\[C@H](C(=O)O)C)C)[C@@H]1CC(=O)O
| InChI = InChI=1S/C15H21NO6/c1-8(4-3-5-9(2)14(19)20)11-7-16-13(15(21)22)10(11)6-12(17)18/h3-5,9-11,13,16H,6-7H2,1-2H3,(H,17,18)(H,19,20)(H,21,22)/b5-3+,8-4-/t9-,10+,11-,13+/m1/s1
| RTECS =
| MeSHName = domoic acid
| ChEBI =
| KEGG = C13732
| ATCCode_prefix =
| ATCCode_suffix =
| ATC_Supplemental =}}
| Section2 = {{Chembox Properties
| Formula = C15H21NO6
| MolarMass = 311.3303
| Appearance =
| Density =
| MeltingPt =
| Melting_notes =
| BoilingPt =
| Boiling_notes =
| Solubility =
| SolubleOther =
| Solvent =
| LogP =
| VaporPressure =
| HenryConstant =
| AtmosphericOHRateConstant =
| pKa =
| pKb = }}
| Section3 = {{Chembox Structure
| CrystalStruct =
| Coordination =
| MolShape = }}
| Section4 = {{Chembox Thermochemistry
| DeltaHf =
| DeltaHc =
| Entropy =
| HeatCapacity = }}
| Section5 = {{Chembox Pharmacology
| AdminRoutes =
| Bioavail =
| Metabolism =
| HalfLife =
| ProteinBound =
| Excretion =
| Legal_status =
| Legal_US =
| Legal_UK =
| Legal_AU =
| Legal_CA =
| PregCat =
| PregCat_AU =
| PregCat_US = }}
| Section6 = {{Chembox Explosive
| ShockSens =
| FrictionSens =
| ExplosiveV =
| REFactor = }}
| Section7 = {{Chembox Hazards
| ExternalMSDS =
| EUClass =
| EUIndex =
| MainHazards =
| NFPA-H =
| NFPA-F =
| NFPA-R =
| NFPA-O =
| RPhrases =
| SPhrases =
| RSPhrases =
| FlashPt =
| Autoignition =
| ExploLimits =
| LD50 =
| PEL = }}
| Section8 = {{Chembox Related
| OtherAnions =
| OtherCations =
| OtherFunctn =
| Function =
| OtherCpds = }}
}}

The user can also use the ChemSpider image and can resize it and click on the image to download it as a PNG file. We believe that our images are attractive and appropriate for web display. Wikipedia present favors the ACS format so based on feedback we can change the config file behind the image generator to produce a different format for display.

We are considering extending the system to support direct uploads of Molfiles and/or other structure formats rather than depending on a compound being on ChemSpider. However, it is VERY likely that chemical compounds of value to the Wikipedia encyclopedic content already exist on ChemSpider. The trick is to find them since they may not have the Wikipedia article chemical name associated with the record. An InChI-based, SMILES-based or alternative name search might help locate the record. Alternatively a full structure search via the applet will find the record OR the user can DEPOSIT the structure to ChemSpider and work from there. The system is flexible enough.

This is our first release of the Wikipedia Services so we welcome any and all feedback. It’s one more way we are giving back to the Wikipedia community for their service. The outcome for us will also be crowdsourced curation of ChemSpider…as Wikipedia articles are written we will clean up related structures on ChemSpider. Everyone wins.

By the way…check OUR structure for Domoic Acid with that one on ChemSpider. Does anyone know which is correct?

Reblog this post [with Zemanta]

Buy me a Coffee

I’ve blogged previously about ChemSpider in your hand. I use ChemSpider in my hand daily via Safari on an iPhone but a mobile app is under development by James Jack (Symyx consultant). James has been burning the candle at both ends progressing the iPhone application…and not without a lot of hurdles. In Skype discussions with him yesterday he has progressed well and will be finished shortly. The first screenshots through the iPhone emulator look good and one is shown below.

iphone

Buy me a Coffee

ChemSpider has been working on polishing both single structure and SDF file deposition. We are now using these tried and tested approaches to deposit large blocks of data, commonly many thousands of records. For depositions of 100s of thousands we do break the depositions into smaller chunks of 5-10 thousand each.

An example of depositing a couple of large SDF files was given to us when the following publication was released at JCIM.

Global Bayesian Models for the Prioritization of Antitubercular Agents
by Philip Prathipati, Ngai Ling Ma* and Thomas H. Keller
J. Chem. Inf. Model., 2008, 48 (12), pp 2362–2370
DOI: 10.1021/ci800143n

This paper offers us a few thousand SMILES strings in CSV files that we could deposit into ChemSpider and associate with the article.Visit n example here and you will see the article connected via DOI in the supplementary information.

article

It is easy for us to deposit such datasets so if you have publications with such datasets that you would like to see on ChemSpider send us the SDF file and the DOI and they will be deposited.

Reblog this post [with Zemanta]

Buy me a Coffee

Egon Willighagen has been growing the Linked Open Chemistry Data with his work on rdf.openmolecules.net. He has now integrated to the InChI Resolver to enhance the integration as shown below. We’re looking forward to hearing from users benefiting from this!

OpenMolecules RDF

About http://rdf.openmolecules.net/?InChI=1/CH4/h1H4
Identifier info:inchi/InChI=1/CH4/h1H4
InChI InChI=1/CH4/h1H4
Source Chemical blogspace
Source ChEBI
ChEBI ID CHEBI:16183
owl:sameAs http://bio2rdf.org/chebi:16183
Source Connotea
Tag NewTag
Tag alkanes
Tag Gas
Tag InChI
Source DBPedia
owl:sameAs http://dbpedia.org/resource/Methane
Source NMRShiftDB
owl:sameAs http://pele.farmbio.uu.se/nmrshiftdb/?moleculeId=20029286
NMRShiftDB mol ID 20029286
Source ChemSpider
ChemSpider ID 291

RDF Resource Description Framework Powered Icon

Reblog this post [with Zemanta]

Buy me a Coffee

A few years ago I was involved in the development of chemistry databases on both Palm and Pocket PC platforms. I wrote an article about it here: the products were named ChemPalm and ChemPocket …we even had a proof of concept 2D barcode scanner where the structures were encoded into 2D barcodes. The power of the devices we can hold in our hand now has jumped in leaps and bounds. Internet access from the phone is expected and web services integrated to phone applications can open up new avenues, especially when it comes to accessing Chemistry data.

A few months ago I heard so many people lauding the iPhone. How could it be that big a game changer. Then, on a long drive I plugged my iPod Nano in to charge on my car adapter and fried the Nano on the spot. The heat coming off was enough to bronze the stainless steel and you could smell it in the air. Turns out it was a problem for a very small % of Gen I Nanos. I looked at the price of a new iPod and was in the market for a GPS so thought I’d take a look at the iPhone which would give me both…and would be a phone replacement too. I haven’t regretted it one bit. The iPhone is one of the most useful devices I have ever owned….period. Free apps are installed in abundance and even though I don’t think of myself as a gamer I even have a couple of games to while away the hours when sitting on the runway…the long flight to Salt Lake City from North Carolina gave me a chance to get wet with this one today…watch it on YouTube here. I am the father of two 6.5 year old boys and if I let them see this their addiction to video games will begin (we don’t let them play video games at all yet).

We have been discussing putting together an application to allow browsing ChemSpider from the phone. presently I use Safari on the the iPhone to access ChemSpider as a website but with our web services putting together something a little skinnier is of course possible. Fortunately we collaborate with some very creative people and I was pinged this week by James Jack, a Symyx Consultant. I’ve been working with James to support his integration from Symyx Draw to ChemSpider and to the InChI Resolver. Actually, support is an overstatement…James is highly productive with minimum assistance and I think that gives credence to the quality of our web services too.

The screenshot below is a proof of concept only at present running on the visual studio emulator and runs on Windows Mobile 4 through 6. iPhone is next.  The app is called ChemMobi and allows viewing of the structure, “suppliers” and properties. …ChemSpider will soon be in the hands of chemists…we hope it’s in their hearts too!

mobilephone

Reblog this post [with Zemanta]

Buy me a Coffee

My colleague Will originally developed the ChemRefer service. When ChemSpider started up Will brought the ChemRefer technology and joined us to help expand the capabilities of our services. We integrated ChemRefer and released the text searching capabilities. Will indexed more and more journals and grew the index by 100s of thousands of articles. Unfortunately the downside was that the speed of the search decreased dramatically. Also, we kept hearing the comparison with the Google service and that their advantage was in their citations. So, Will has taken a few months off from indexing and has focused his efforts on developing his technologies to dramatically improve the speed of searching as well as implementing a system for recognizing citations. The system has been made available online for beta-testing just in time for the ACS meeting here in Salt Lake City BUT it is not yet integrated into ChemSpider.

I have performed some basic tests focused on searching chemical names initially. The literature search on ChemSpider has a lot more journals indexed but in order to perform the comparison I searched ONLY the RSC and Journal of Biological Chemistry articles since that is all we have indexed so far on the new system. The search results were as follows. The numbers compare number of hits for the old versus new literature search. The new search has indexed the latest RSC and JBC articles also so in theory should provide more hits.

Searching on Taxol: 626 hits found in 22 seconds (OLD) vs 717 hits in 1 seconds (NEW)

Searching on phenolphthalein: 47 hits found in 5 seconds OLD) vs 1514 hits in 1 second (NEW)

Searching on benzene: 846 hits found in 75 seconds vs 15260 hits in 4 seconds (NEW)

Clearly the searches are MUCH faster with the new system but it is also returning much more results. These are very early results and we will explain more about the system, the results and our future development shortly…

Try out the new system here for now and send us feedback at info@chemspider.com. Thanks

Reblog this post [with Zemanta]

Buy me a Coffee

inchis_rscIn what seems like an eon since I first blogged about the need for an InChI Resolver ChemSpider has continued its efforts to provide valuable resources for chemists while benefiting from the advantages of InChI and working through many associated challenges. I will give a presentation tomorrow at the ACS Meeting here in Salt Lake City (and a gorgeous place it is!) in a session dedicated specifically to the InChI identifier and its increasing penetration into the world of Cheminformatics, publishing and internet Chemistry. The talk will be posted to SlideShare here as usual.

 

Following the declaration of the need for an InChI Resolver I discussed the project with a number of groups (five in total) and wrote up project descriptions and hypothetical timelines to deliver a resolver. We finally announced a joint project with the Royal Society of Chemistry on December 1st 2008 and started work on producing a beta release version of the resolver by ACS Spring 2009..that would be TODAY. The alpha release went live about 4 weeks ago and logins were provided to a number of interested parties. From all of the people who tested the system we received a couple of bug reports and small requests for enhancement and all of those changes have been implemented just in time to release the Resolver for general public consumption here at the ACS.

 

We already have a list of things we want to deliver to enhance the system but will be waiting for feedback from the community regarding the value and workflows associated with this system as it functions presently in Beta release. An overview about the system is available here in Powerpoint and shown below. Go try it out at inchis.chemspider.com. It is in BETA release so send us any feedback please to info@chemspider.com. Thanks! 

Buy me a Coffee

There are some interesting articles showing up on ChemSpider from across the blogosphere. We have just added to our list of high priorities to generate an RSS feed of structures, short descriptions and ChemSpider IDs so that anyone can access them. When we add new descriptions we will add snippets to the RSS feed.

New Articles include:

Teen Chemist and Splenda

A Discussion about the Synthesis of Spirangien A from the TotallySynthetic Blog by Paul Docherty

A Discussion about the Synthesis of Omaezakianol from the TotallySynthetic Blog by Paul Docherty

Reblog this post [with Zemanta]

Buy me a Coffee

ons1We’ve been working with Jean-Claude Bradley and his Open Notebook Solubility Challenge group to assist where we can. This has included enhancing some of our services (though there is more work to be done…), populating data into ChemSpider and, now, linking us up to the Data Tables built by Andy Lang (of The Spectral Game fame…we’re quite a team).

The Open Notebook Solubility Challenge is described here. The present list of compounds for which we have created the integration to be described below is here. WHen you open that link you’ll see the first bunch…notice the little icons showing patent links, Wikipedia links and the presence of spectra on those records.

WHat we have done now is deposit the links into the Data Source tables for these compounds and providing the direct link to the ONS tables. They can be viewed WITHOUT leaving the site simply by hovering over the link…OR you can click on the link to view the data directly. An example of the link view is shown below. To find these tables simply look up the Open Notebook Solubility Challenge data source in the table.

 

ons2

Buy me a Coffee

InChIs are a powerful way to communicate chemical structures. They are going to enable internet chemistry and when we roll out the InChI Resolver shortly then the community will have access to a resource to resolve InChIKeys and ultimately navigate chemistry on the web. We commonly receive chemical structures in the form of InChIs and in order to deposit the structures we have to convert the InChIs back to chemical structures, commonly into SDF format for batch deposition. For simple organics this is not a difficult process…the tools we have at our disposal can deal with the layout of simple organics. However, for some of the chemical structures we receive optimizing 2D layout is very challenging. Many of the issues come with fullerenes (See examples below) but not only. Carbohydrates, complex cycles etc are big challenges.

clean

In building the InChI resolver we hope to provide attractive visual depictions of the associated structures. Without AuxInfo data carrying the coordinates,  or without the deposition of SDF files containing the layout coordinates we have a major challenge ahead of us. Auxinfo data are shown below for erythromycin. These data are rarely generated when people generate InChIKeys and the issue of structure layout will dominate the interpretation of complex structures.

auxinfo

Since beauty is in the eye of the beholder my judgement is that automatc layour algorithms should only assist in the appropriate layout and eyeballs will need to make the final decision. That is why it is better to deposit SDF files of InChIs with Auxinfo carrying the coordinates than it is to deposit InChIs only and leave the structure layout to an algorithm. It will fail.

I am interested in seeing what people can do with their structure cleaning algorithms on InChIs like this:

InChI=1/C66H103N17O16S/c1-9-35(6)52(69)66-72-32-48(100-66)63(97)80-43(26-34(4)5)59(93)75-42(22-23-50(85)86)58(92)83-53(36(7)10-2)64(98)76-40-20-15-16-25-71-55(89)46(29-49(68)84)78-62(96)47(30-51(87)88)79-61(95)45(28-39-31-70-33-73-39)77-60(94)44(27-38-18-13-12-14-19-38)81-65(99)54(37(8)11-3)82-57(91)41(21-17-24-67)74-56(40)90/h12-14,18-19,31,33-37,40-48,52-54H,9-11,15-17,20-30,32,67,69H2,1-8H3,(H2,68,84)(H,70,73)(H,71,89)(H,74,90)(H,75,93)(H,76,98)(H,77,94)(H,78,96)(H,79,95)(H,80,97)(H,81,99)(H,82,91)(H,83,92)(H,85,86)(H,87,88)/t35u,36u,37u,40-,41+,42+,43-,44+,45-,46-,47+,48u,52-,53-,54-/m0/s1

The images below show the iterative application of DIFFERENT structure layout algorithms. One caution…your layout algorithm should produce the SAME InChI at the end and NOT flip stereocenters. Interesting challenge. Who says cheminformatics isn’t challenging? And who thought building an InChI Resolver would be easy?

layout1layout2layout3layout4

Reblog this post [with Zemanta]

Buy me a Coffee

freebase3There has been encouragement that we look at Freebase as an additional online resource to integrate to. In terms of chemical entities some of the Wikipedia structure collection has made its way onto Freebase and has been enhanced to include InChIs and SMILEs. It’s not clear to me whether the InChIs on Freebase are all obtained FROM Wikipedia or were layered on later onto Freebase. So, I approached the Freebase group and asked if they could provide me a dump of the InChIStrings and the SMILES strings together with the associated FreeBase IDs and the chemical names. In this way we would be able to generate SDF files for depositions and end up with the structures (converted from InChIs and SMILES) as well as the associated chemical names and Freebase IDs. Simple idea right?

freebase11So, we converted InChIs and SMILES and generated the depositions. Freebase links now show up in the Data Sources section and, if you put your cursor over the GUID you see an image of the page and can click through to the record on FreeBase. See the image above. The Freebase GUID for Benzene is here: #9202a8c04000641f800000000000ac66

All seems well. I have a question though…I look at a structure like Dapagliflozin on Wikipedia here and see full stereochemistry explicitly defined in the name and in the image. However, on Freebase I note that the stereochemistry is NOT explicitly defined in the InChI. The InChI is:  1/C21H25ClO6/c1-2-27-15-6-3-12(4-7-15)9-14-10-13(5-8-16(14)22)21-20(26)19(25)18(24)17(11-23)28-21/h3-8,10,17-21,23-26H,2,9,11H2,1H3/t17?,18?,19?,20?,21-/m0/s1

So, when we take the InChIs and the chemical names, convert the InChIs and deposit the chemical structures we end up with a “destruction” of the curation work we have done on ChemSpider. We end up with TWO structures for Dapagliflozin, not one (See below)

freebase2

And now we need to start the curation efforts AGAIN to clean out misassociations of names and structures. So, what we are going to do is delete the deposition of Freebase structures and redeposit without the chemical names. In this case the outlinks to Freebase will be in place but the structures will not be found by a name search UNLESS the Freebase GUID is associated with an already curated name-structure pair that is coincident with the Freebase name.

I can say that the Freebase team were a pleasure to work with and, in theory, once the Wikipedia curation project is finished the SMILES and InChIs on Freebase will be correct and such linkages back to Freebase will be easier, and correct. In the meantime I am interetsed in where the Freebase SMILES And InChIs are coming fro (I think a lot of them are from Wikipedia but am not sure) and we are going to make certain on our side that we remove the chemical names so as to not decrease the quality of our curation efforts.

Reblog this post [with Zemanta]

Buy me a Coffee

I’ve posted previously about embedding structure images and spectra into blogs and webpages. One of the side effects of this is that for structure images specifically the ChemSpider record is linked back to the webpage that the structure is embedded into. Structures are embedded in various places now into wikis and blogs. An example of 11 embedded structures is shown here.

When Arvin Moser wrote in his blog about Letrozole and embedded the structure image into his blog post a link BACK to his blog post was created in the Data Source table. See the image below.

letrozole

With this capability, as more people embed structures from ChemSpider into their online pages/blogs more of the internet will become structure searchable and ultimately linked. It does not require adding InChIs to webpages (though that is encouraged for indexing by search engines).

(Caveat: The system is not yet optimal and we are working on filtering out comments on blogs that presently get added as additional links. All “doubles” will be filtered out later)

Reblog this post [with Zemanta]

Buy me a Coffee

I’ve previously posted about the work going on regarding the NMR Game…now morphed to the spectral game and described in detail by JC Bradley. We’ve been working hard to increase the number of spectra available as part of the game (now in the 100s of spectra!) and Andy has been working hard to improve the flow of data. The original structure images have been replaced with ChemSpider structure images and we have delivered a web service to allow Andy to continue to update the spectral collection as more data are added to the database.

When users see issues with the spectra they get to leave comments regarding their observations. This can be very valuable for us to curate the spectral data. This will allow us to perform game-based crowdsourcing of the spectral data and the feedback is already of value.

We have about another 30 spectra to add to the present collection of spectral Open Data and then we’ll take a break and I’ll be approaching the spectrometer vendors and a few other friends to see whether they have any data to contribute to the game. We are already considering adding the ability to add a “Company Logo” to be associated with a spectrum so that the vendors/contributors get fair recognition for their contribution to the game. If you are interested in providing data we will upload it for you. Contact us at infoATchemspiderDOTcom.

JC Bradley has now uploaded a short tutorial to YouTube regarding how to play the movie and I have embedded it below. JC’s also announced a prize for the best player. Go test your skills..

Reblog this post [with Zemanta]

Buy me a Coffee

Jean-Claude Bradley has recently posted about about an NMR Game running on Second Life. Read his blog for details but I excerpt some of the comments here:

Andy and I brainstormed some new chemistry games that we could introduce to Second Life to leverage our recent tools. One of the applications is the NMR game. By combining the orac molecule rezzer, the SL spectral viewing tool and ChemSpider Open Data spectra I think we have a pretty good game.

The idea is simple: click on the molecule that is represented by the spectrum. If it is correct you get 2 points and get another spectrum. You lose a point by clicking on an incorrect molecule. After going through all the spectra your score gets posted on the web to a top10 list. For equal scores the best time takes it.”

So, here at ChemSpider we are delivering spectra as Open Data to help with the game. And we’re happy to do so. It’s always been our intention to have ChemSpider provide value like this. ANY registered user can upload spectra to the ChemSpider website. The details are outlined here (I just noticed the interface has changed since I wrote that but you should still be able to follow the process). We need the spectra to be in JCAMP format and if you want them to be available for the game, and for people to download, they MUST be declared as Open Data.

Right now we have 100s of spectra. You can find them here. But we need more. Much more!We’d like you to contribute them. if you don’t want to upload them yourself then contact us directly and we will process and uplood for you. We need the data and the name/structure of the associated molecule.

And how will the game be used on these spectra? The game will be used to “curate and validate” the spectra. As the game is being played a score of how many people say it is correct will be kept. And of course what is wrong. Based on these scores our curators will be directed to “problematic spectra” for their attention. This is true crowdsourcing and a great way to do spectral validation.

We would like the spectral collection to grow and welcome contributions from anyone. They do NOT have to be just NMR. They can be IR, MS, Raman etc too. Ultimately a Spectral game will be unveiled. Please consider ChemSpider as a repository for your data as it will benefit the community of chemists and, in particular, the process of teaching students and allowing them to “game their way” through the process. Watch where this goes…it’s VERY interesting to consider how it can improve…there is an NMR game website in development so you won’t have to go just to Second Life.

Buy me a Coffee