Archive for the ChemSpider Services Category

ChemSpider will go offline today for the next 24 hours. We will switch the servers off at around 11am today (give or take some latitude). We will do a differential backup and restore to the RSC servers all changes to the database and switch over to their systems overnight. Testing performed over the weekend has proceeded rather well and we are hoping for a seamless transition, acknowledging that we will have this one day of downtime.

We apologize in advance for any disruptions. We know that there are a lot of people now using ChemSpider services to feed your own systems so our apologies in advance. We expect improved service for all when this transition is complete.

We’ll see you on the other side of this transition in just over 24 hours. Wish us luck…

Buy me a Coffee

I blogged yesterday about our release of Wikipedia Services on ChemSpider and how we are working to support authors on Wikipedia articles. Of course there are MANY languages of Wikipedia (as shown below) and we are willing to produce multilingual support. All we need is someone from the specific language version of Wikipedia to contact us and map the ChemBoxes and Drugboxes into their relevant languages. Let us know if you are interested.

languages

Reblog this post [with Zemanta]

Buy me a Coffee

Wikipedia is great. I use it regularly. I’ve been working, with a team of experts, on curating and validating the “structure-based data” in the ChemBoxes and DrugBoxes for almost a year and a half. It’s been a long path and on the journey I have met some great people and made some true friends. I also HAVE NOT met most of the people I share the IRC chats with. We are a highly opinionated bunch of people but with a common focus of making Wikipedia better and making the data and content as accurate as possible.

We have the Wikipedia article lead in thousands of records on ChemSpider now. They are updated regularly as Wikipedia itself expands. One of the areas we have been focused on since the inception of the work was getting correct structures in place with the associated data. This includes the molecular formula, molecular weight, SMILES, InChI String, InChIKey, systematic name and so on. In order to help the process of expanding Wikipedia with new records and to provide a lot of these data automatically we have set about providing a Wikipedia Service so that Wikipedians can use ChemSpider as the source of the chemical structures of interest and generate the DrugBox and ChemBox content from ChemSpider. It’s a rather simple process…

Assume that you wanted to create a ChemBox for Domoic Acid you would search Domoic Acid on ChemSpider. You would then validate whether the structure on ChemSpider named domoic acid is correct and. if so, you would generate the Wikibox by clicking on the link to the right of the Quick Links

wikibox1

Following this simple button click the user is shown a new window displaying the “Design Wikibox” functionality. There are various flavors of ChemBoxes and Drugboxes which can be generated and the image below shows the “Simple ChemBox”

wikibox2

At present we fill the box with those data we have easy access to from ChemSpider and based on the chemical structure. We list all other fields for Wiki depositors to populate. For the Simple ChemBox this looks like this for Domoic Acid

{{Chembox
| ImageFile =
| ImageSize =
| IUPACName = (2S,3S,4S)-3-(carboxymethyl)-4-[(1Z,3E,5R)-6-hydroxy-1,5-dimethyl-6-oxo-hexa-1,3-dienyl]pyrrolidine-2-carboxylic acid
| OtherNames =
| Section1 = {{Chembox Identifiers
| CASNo =
| PubChem = 5282253
| ChemSpiderID = 4445428
| SMILES = O=C(O)[C@H]1NC[C@H](/C(=C\C=C\[C@H](C(=O)O)C)C)[C@@H]1CC(=O)O }}
| Section2 = {{Chembox Properties
| Formula = C15H21NO6
| MolarMass = 311.3303
| Appearance =
| Density =
| MeltingPt =
| BoilingPt =
| Solubility = }}
| Section3 = {{Chembox Hazards
| MainHazards =
| FlashPt =
| Autoignition = }}
}}

We insert the PubChemID associated with the particular structure if there is a related PubChem record. We also insert the ChemSpider ID in case the user wants to link back to ChemSpider.  A Full ChemBox is much longer:

{{Chembox
| Name =
| ImageFile =
| ImageSize =
| IUPACName = (2S,3S,4S)-3-(carboxymethyl)-4-[(1Z,3E,5R)-6-hydroxy-1,5-dimethyl-6-oxo-hexa-1,3-dienyl]pyrrolidine-2-carboxylic acid
| SystematicName = (2S,3S,4S)-3-(carboxymethyl)-4-[(1Z,3E,5R)-6-hydroxy-1,5-dimethyl-6-oxo-hexa-1,3-dienyl]pyrrolidine-2-carboxylic acid
| OtherNames =
| Section1 = {{Chembox Identifiers
| Abbreviations =
| CASNo =
| EINECS =
| EINECSCASNO =
| PubChem = 5282253
| ChemSpiderID = 4445428
| SMILES = O=C(O)[C@H]1NC[C@H](/C(=C\C=C\[C@H](C(=O)O)C)C)[C@@H]1CC(=O)O
| InChI = InChI=1S/C15H21NO6/c1-8(4-3-5-9(2)14(19)20)11-7-16-13(15(21)22)10(11)6-12(17)18/h3-5,9-11,13,16H,6-7H2,1-2H3,(H,17,18)(H,19,20)(H,21,22)/b5-3+,8-4-/t9-,10+,11-,13+/m1/s1
| RTECS =
| MeSHName = domoic acid
| ChEBI =
| KEGG = C13732
| ATCCode_prefix =
| ATCCode_suffix =
| ATC_Supplemental =}}
| Section2 = {{Chembox Properties
| Formula = C15H21NO6
| MolarMass = 311.3303
| Appearance =
| Density =
| MeltingPt =
| Melting_notes =
| BoilingPt =
| Boiling_notes =
| Solubility =
| SolubleOther =
| Solvent =
| LogP =
| VaporPressure =
| HenryConstant =
| AtmosphericOHRateConstant =
| pKa =
| pKb = }}
| Section3 = {{Chembox Structure
| CrystalStruct =
| Coordination =
| MolShape = }}
| Section4 = {{Chembox Thermochemistry
| DeltaHf =
| DeltaHc =
| Entropy =
| HeatCapacity = }}
| Section5 = {{Chembox Pharmacology
| AdminRoutes =
| Bioavail =
| Metabolism =
| HalfLife =
| ProteinBound =
| Excretion =
| Legal_status =
| Legal_US =
| Legal_UK =
| Legal_AU =
| Legal_CA =
| PregCat =
| PregCat_AU =
| PregCat_US = }}
| Section6 = {{Chembox Explosive
| ShockSens =
| FrictionSens =
| ExplosiveV =
| REFactor = }}
| Section7 = {{Chembox Hazards
| ExternalMSDS =
| EUClass =
| EUIndex =
| MainHazards =
| NFPA-H =
| NFPA-F =
| NFPA-R =
| NFPA-O =
| RPhrases =
| SPhrases =
| RSPhrases =
| FlashPt =
| Autoignition =
| ExploLimits =
| LD50 =
| PEL = }}
| Section8 = {{Chembox Related
| OtherAnions =
| OtherCations =
| OtherFunctn =
| Function =
| OtherCpds = }}
}}

The user can also use the ChemSpider image and can resize it and click on the image to download it as a PNG file. We believe that our images are attractive and appropriate for web display. Wikipedia present favors the ACS format so based on feedback we can change the config file behind the image generator to produce a different format for display.

We are considering extending the system to support direct uploads of Molfiles and/or other structure formats rather than depending on a compound being on ChemSpider. However, it is VERY likely that chemical compounds of value to the Wikipedia encyclopedic content already exist on ChemSpider. The trick is to find them since they may not have the Wikipedia article chemical name associated with the record. An InChI-based, SMILES-based or alternative name search might help locate the record. Alternatively a full structure search via the applet will find the record OR the user can DEPOSIT the structure to ChemSpider and work from there. The system is flexible enough.

This is our first release of the Wikipedia Services so we welcome any and all feedback. It’s one more way we are giving back to the Wikipedia community for their service. The outcome for us will also be crowdsourced curation of ChemSpider…as Wikipedia articles are written we will clean up related structures on ChemSpider. Everyone wins.

By the way…check OUR structure for Domoic Acid with that one on ChemSpider. Does anyone know which is correct?

Reblog this post [with Zemanta]

Buy me a Coffee

I’ve blogged previously about ChemSpider in your hand. I use ChemSpider in my hand daily via Safari on an iPhone but a mobile app is under development by James Jack (Symyx consultant). James has been burning the candle at both ends progressing the iPhone application…and not without a lot of hurdles. In Skype discussions with him yesterday he has progressed well and will be finished shortly. The first screenshots through the iPhone emulator look good and one is shown below.

iphone

Buy me a Coffee

ChemSpider has been working on polishing both single structure and SDF file deposition. We are now using these tried and tested approaches to deposit large blocks of data, commonly many thousands of records. For depositions of 100s of thousands we do break the depositions into smaller chunks of 5-10 thousand each.

An example of depositing a couple of large SDF files was given to us when the following publication was released at JCIM.

Global Bayesian Models for the Prioritization of Antitubercular Agents
by Philip Prathipati, Ngai Ling Ma* and Thomas H. Keller
J. Chem. Inf. Model., 2008, 48 (12), pp 2362–2370
DOI: 10.1021/ci800143n

This paper offers us a few thousand SMILES strings in CSV files that we could deposit into ChemSpider and associate with the article.Visit n example here and you will see the article connected via DOI in the supplementary information.

article

It is easy for us to deposit such datasets so if you have publications with such datasets that you would like to see on ChemSpider send us the SDF file and the DOI and they will be deposited.

Reblog this post [with Zemanta]

Buy me a Coffee

Egon Willighagen has been growing the Linked Open Chemistry Data with his work on rdf.openmolecules.net. He has now integrated to the InChI Resolver to enhance the integration as shown below. We’re looking forward to hearing from users benefiting from this!

OpenMolecules RDF

About http://rdf.openmolecules.net/?InChI=1/CH4/h1H4
Identifier info:inchi/InChI=1/CH4/h1H4
InChI InChI=1/CH4/h1H4
Source Chemical blogspace
Source ChEBI
ChEBI ID CHEBI:16183
owl:sameAs http://bio2rdf.org/chebi:16183
Source Connotea
Tag NewTag
Tag alkanes
Tag Gas
Tag InChI
Source DBPedia
owl:sameAs http://dbpedia.org/resource/Methane
Source NMRShiftDB
owl:sameAs http://pele.farmbio.uu.se/nmrshiftdb/?moleculeId=20029286
NMRShiftDB mol ID 20029286
Source ChemSpider
ChemSpider ID 291

RDF Resource Description Framework Powered Icon

Reblog this post [with Zemanta]

Buy me a Coffee

A few years ago I was involved in the development of chemistry databases on both Palm and Pocket PC platforms. I wrote an article about it here: the products were named ChemPalm and ChemPocket …we even had a proof of concept 2D barcode scanner where the structures were encoded into 2D barcodes. The power of the devices we can hold in our hand now has jumped in leaps and bounds. Internet access from the phone is expected and web services integrated to phone applications can open up new avenues, especially when it comes to accessing Chemistry data.

A few months ago I heard so many people lauding the iPhone. How could it be that big a game changer. Then, on a long drive I plugged my iPod Nano in to charge on my car adapter and fried the Nano on the spot. The heat coming off was enough to bronze the stainless steel and you could smell it in the air. Turns out it was a problem for a very small % of Gen I Nanos. I looked at the price of a new iPod and was in the market for a GPS so thought I’d take a look at the iPhone which would give me both…and would be a phone replacement too. I haven’t regretted it one bit. The iPhone is one of the most useful devices I have ever owned….period. Free apps are installed in abundance and even though I don’t think of myself as a gamer I even have a couple of games to while away the hours when sitting on the runway…the long flight to Salt Lake City from North Carolina gave me a chance to get wet with this one today…watch it on YouTube here. I am the father of two 6.5 year old boys and if I let them see this their addiction to video games will begin (we don’t let them play video games at all yet).

We have been discussing putting together an application to allow browsing ChemSpider from the phone. presently I use Safari on the the iPhone to access ChemSpider as a website but with our web services putting together something a little skinnier is of course possible. Fortunately we collaborate with some very creative people and I was pinged this week by James Jack, a Symyx Consultant. I’ve been working with James to support his integration from Symyx Draw to ChemSpider and to the InChI Resolver. Actually, support is an overstatement…James is highly productive with minimum assistance and I think that gives credence to the quality of our web services too.

The screenshot below is a proof of concept only at present running on the visual studio emulator and runs on Windows Mobile 4 through 6. iPhone is next.  The app is called ChemMobi and allows viewing of the structure, “suppliers” and properties. …ChemSpider will soon be in the hands of chemists…we hope it’s in their hearts too!

mobilephone

Reblog this post [with Zemanta]

Buy me a Coffee

My colleague Will originally developed the ChemRefer service. When ChemSpider started up Will brought the ChemRefer technology and joined us to help expand the capabilities of our services. We integrated ChemRefer and released the text searching capabilities. Will indexed more and more journals and grew the index by 100s of thousands of articles. Unfortunately the downside was that the speed of the search decreased dramatically. Also, we kept hearing the comparison with the Google service and that their advantage was in their citations. So, Will has taken a few months off from indexing and has focused his efforts on developing his technologies to dramatically improve the speed of searching as well as implementing a system for recognizing citations. The system has been made available online for beta-testing just in time for the ACS meeting here in Salt Lake City BUT it is not yet integrated into ChemSpider.

I have performed some basic tests focused on searching chemical names initially. The literature search on ChemSpider has a lot more journals indexed but in order to perform the comparison I searched ONLY the RSC and Journal of Biological Chemistry articles since that is all we have indexed so far on the new system. The search results were as follows. The numbers compare number of hits for the old versus new literature search. The new search has indexed the latest RSC and JBC articles also so in theory should provide more hits.

Searching on Taxol: 626 hits found in 22 seconds (OLD) vs 717 hits in 1 seconds (NEW)

Searching on phenolphthalein: 47 hits found in 5 seconds OLD) vs 1514 hits in 1 second (NEW)

Searching on benzene: 846 hits found in 75 seconds vs 15260 hits in 4 seconds (NEW)

Clearly the searches are MUCH faster with the new system but it is also returning much more results. These are very early results and we will explain more about the system, the results and our future development shortly…

Try out the new system here for now and send us feedback at info@chemspider.com. Thanks

Reblog this post [with Zemanta]

Buy me a Coffee

inchis_rscIn what seems like an eon since I first blogged about the need for an InChI Resolver ChemSpider has continued its efforts to provide valuable resources for chemists while benefiting from the advantages of InChI and working through many associated challenges. I will give a presentation tomorrow at the ACS Meeting here in Salt Lake City (and a gorgeous place it is!) in a session dedicated specifically to the InChI identifier and its increasing penetration into the world of Cheminformatics, publishing and internet Chemistry. The talk will be posted to SlideShare here as usual.

 

Following the declaration of the need for an InChI Resolver I discussed the project with a number of groups (five in total) and wrote up project descriptions and hypothetical timelines to deliver a resolver. We finally announced a joint project with the Royal Society of Chemistry on December 1st 2008 and started work on producing a beta release version of the resolver by ACS Spring 2009..that would be TODAY. The alpha release went live about 4 weeks ago and logins were provided to a number of interested parties. From all of the people who tested the system we received a couple of bug reports and small requests for enhancement and all of those changes have been implemented just in time to release the Resolver for general public consumption here at the ACS.

 

We already have a list of things we want to deliver to enhance the system but will be waiting for feedback from the community regarding the value and workflows associated with this system as it functions presently in Beta release. An overview about the system is available here in Powerpoint and shown below. Go try it out at inchis.chemspider.com. It is in BETA release so send us any feedback please to info@chemspider.com. Thanks! 

Buy me a Coffee

There are some interesting articles showing up on ChemSpider from across the blogosphere. We have just added to our list of high priorities to generate an RSS feed of structures, short descriptions and ChemSpider IDs so that anyone can access them. When we add new descriptions we will add snippets to the RSS feed.

New Articles include:

Teen Chemist and Splenda

A Discussion about the Synthesis of Spirangien A from the TotallySynthetic Blog by Paul Docherty

A Discussion about the Synthesis of Omaezakianol from the TotallySynthetic Blog by Paul Docherty

Reblog this post [with Zemanta]

Buy me a Coffee

ons1We’ve been working with Jean-Claude Bradley and his Open Notebook Solubility Challenge group to assist where we can. This has included enhancing some of our services (though there is more work to be done…), populating data into ChemSpider and, now, linking us up to the Data Tables built by Andy Lang (of The Spectral Game fame…we’re quite a team).

The Open Notebook Solubility Challenge is described here. The present list of compounds for which we have created the integration to be described below is here. WHen you open that link you’ll see the first bunch…notice the little icons showing patent links, Wikipedia links and the presence of spectra on those records.

WHat we have done now is deposit the links into the Data Source tables for these compounds and providing the direct link to the ONS tables. They can be viewed WITHOUT leaving the site simply by hovering over the link…OR you can click on the link to view the data directly. An example of the link view is shown below. To find these tables simply look up the Open Notebook Solubility Challenge data source in the table.

 

ons2

Buy me a Coffee

InChIs are a powerful way to communicate chemical structures. They are going to enable internet chemistry and when we roll out the InChI Resolver shortly then the community will have access to a resource to resolve InChIKeys and ultimately navigate chemistry on the web. We commonly receive chemical structures in the form of InChIs and in order to deposit the structures we have to convert the InChIs back to chemical structures, commonly into SDF format for batch deposition. For simple organics this is not a difficult process…the tools we have at our disposal can deal with the layout of simple organics. However, for some of the chemical structures we receive optimizing 2D layout is very challenging. Many of the issues come with fullerenes (See examples below) but not only. Carbohydrates, complex cycles etc are big challenges.

clean

In building the InChI resolver we hope to provide attractive visual depictions of the associated structures. Without AuxInfo data carrying the coordinates,  or without the deposition of SDF files containing the layout coordinates we have a major challenge ahead of us. Auxinfo data are shown below for erythromycin. These data are rarely generated when people generate InChIKeys and the issue of structure layout will dominate the interpretation of complex structures.

auxinfo

Since beauty is in the eye of the beholder my judgement is that automatc layour algorithms should only assist in the appropriate layout and eyeballs will need to make the final decision. That is why it is better to deposit SDF files of InChIs with Auxinfo carrying the coordinates than it is to deposit InChIs only and leave the structure layout to an algorithm. It will fail.

I am interested in seeing what people can do with their structure cleaning algorithms on InChIs like this:

InChI=1/C66H103N17O16S/c1-9-35(6)52(69)66-72-32-48(100-66)63(97)80-43(26-34(4)5)59(93)75-42(22-23-50(85)86)58(92)83-53(36(7)10-2)64(98)76-40-20-15-16-25-71-55(89)46(29-49(68)84)78-62(96)47(30-51(87)88)79-61(95)45(28-39-31-70-33-73-39)77-60(94)44(27-38-18-13-12-14-19-38)81-65(99)54(37(8)11-3)82-57(91)41(21-17-24-67)74-56(40)90/h12-14,18-19,31,33-37,40-48,52-54H,9-11,15-17,20-30,32,67,69H2,1-8H3,(H2,68,84)(H,70,73)(H,71,89)(H,74,90)(H,75,93)(H,76,98)(H,77,94)(H,78,96)(H,79,95)(H,80,97)(H,81,99)(H,82,91)(H,83,92)(H,85,86)(H,87,88)/t35u,36u,37u,40-,41+,42+,43-,44+,45-,46-,47+,48u,52-,53-,54-/m0/s1

The images below show the iterative application of DIFFERENT structure layout algorithms. One caution…your layout algorithm should produce the SAME InChI at the end and NOT flip stereocenters. Interesting challenge. Who says cheminformatics isn’t challenging? And who thought building an InChI Resolver would be easy?

layout1layout2layout3layout4

Reblog this post [with Zemanta]

Buy me a Coffee

freebase3There has been encouragement that we look at Freebase as an additional online resource to integrate to. In terms of chemical entities some of the Wikipedia structure collection has made its way onto Freebase and has been enhanced to include InChIs and SMILEs. It’s not clear to me whether the InChIs on Freebase are all obtained FROM Wikipedia or were layered on later onto Freebase. So, I approached the Freebase group and asked if they could provide me a dump of the InChIStrings and the SMILES strings together with the associated FreeBase IDs and the chemical names. In this way we would be able to generate SDF files for depositions and end up with the structures (converted from InChIs and SMILES) as well as the associated chemical names and Freebase IDs. Simple idea right?

freebase11So, we converted InChIs and SMILES and generated the depositions. Freebase links now show up in the Data Sources section and, if you put your cursor over the GUID you see an image of the page and can click through to the record on FreeBase. See the image above. The Freebase GUID for Benzene is here: #9202a8c04000641f800000000000ac66

All seems well. I have a question though…I look at a structure like Dapagliflozin on Wikipedia here and see full stereochemistry explicitly defined in the name and in the image. However, on Freebase I note that the stereochemistry is NOT explicitly defined in the InChI. The InChI is:  1/C21H25ClO6/c1-2-27-15-6-3-12(4-7-15)9-14-10-13(5-8-16(14)22)21-20(26)19(25)18(24)17(11-23)28-21/h3-8,10,17-21,23-26H,2,9,11H2,1H3/t17?,18?,19?,20?,21-/m0/s1

So, when we take the InChIs and the chemical names, convert the InChIs and deposit the chemical structures we end up with a “destruction” of the curation work we have done on ChemSpider. We end up with TWO structures for Dapagliflozin, not one (See below)

freebase2

And now we need to start the curation efforts AGAIN to clean out misassociations of names and structures. So, what we are going to do is delete the deposition of Freebase structures and redeposit without the chemical names. In this case the outlinks to Freebase will be in place but the structures will not be found by a name search UNLESS the Freebase GUID is associated with an already curated name-structure pair that is coincident with the Freebase name.

I can say that the Freebase team were a pleasure to work with and, in theory, once the Wikipedia curation project is finished the SMILES and InChIs on Freebase will be correct and such linkages back to Freebase will be easier, and correct. In the meantime I am interetsed in where the Freebase SMILES And InChIs are coming fro (I think a lot of them are from Wikipedia but am not sure) and we are going to make certain on our side that we remove the chemical names so as to not decrease the quality of our curation efforts.

Reblog this post [with Zemanta]

Buy me a Coffee

I’ve posted previously about embedding structure images and spectra into blogs and webpages. One of the side effects of this is that for structure images specifically the ChemSpider record is linked back to the webpage that the structure is embedded into. Structures are embedded in various places now into wikis and blogs. An example of 11 embedded structures is shown here.

When Arvin Moser wrote in his blog about Letrozole and embedded the structure image into his blog post a link BACK to his blog post was created in the Data Source table. See the image below.

letrozole

With this capability, as more people embed structures from ChemSpider into their online pages/blogs more of the internet will become structure searchable and ultimately linked. It does not require adding InChIs to webpages (though that is encouraged for indexing by search engines).

(Caveat: The system is not yet optimal and we are working on filtering out comments on blogs that presently get added as additional links. All “doubles” will be filtered out later)

Reblog this post [with Zemanta]

Buy me a Coffee

I’ve previously posted about the work going on regarding the NMR Game…now morphed to the spectral game and described in detail by JC Bradley. We’ve been working hard to increase the number of spectra available as part of the game (now in the 100s of spectra!) and Andy has been working hard to improve the flow of data. The original structure images have been replaced with ChemSpider structure images and we have delivered a web service to allow Andy to continue to update the spectral collection as more data are added to the database.

When users see issues with the spectra they get to leave comments regarding their observations. This can be very valuable for us to curate the spectral data. This will allow us to perform game-based crowdsourcing of the spectral data and the feedback is already of value.

We have about another 30 spectra to add to the present collection of spectral Open Data and then we’ll take a break and I’ll be approaching the spectrometer vendors and a few other friends to see whether they have any data to contribute to the game. We are already considering adding the ability to add a “Company Logo” to be associated with a spectrum so that the vendors/contributors get fair recognition for their contribution to the game. If you are interested in providing data we will upload it for you. Contact us at infoATchemspiderDOTcom.

JC Bradley has now uploaded a short tutorial to YouTube regarding how to play the movie and I have embedded it below. JC’s also announced a prize for the best player. Go test your skills..

Reblog this post [with Zemanta]

Buy me a Coffee

Jean-Claude Bradley has recently posted about about an NMR Game running on Second Life. Read his blog for details but I excerpt some of the comments here:

Andy and I brainstormed some new chemistry games that we could introduce to Second Life to leverage our recent tools. One of the applications is the NMR game. By combining the orac molecule rezzer, the SL spectral viewing tool and ChemSpider Open Data spectra I think we have a pretty good game.

The idea is simple: click on the molecule that is represented by the spectrum. If it is correct you get 2 points and get another spectrum. You lose a point by clicking on an incorrect molecule. After going through all the spectra your score gets posted on the web to a top10 list. For equal scores the best time takes it.”

So, here at ChemSpider we are delivering spectra as Open Data to help with the game. And we’re happy to do so. It’s always been our intention to have ChemSpider provide value like this. ANY registered user can upload spectra to the ChemSpider website. The details are outlined here (I just noticed the interface has changed since I wrote that but you should still be able to follow the process). We need the spectra to be in JCAMP format and if you want them to be available for the game, and for people to download, they MUST be declared as Open Data.

Right now we have 100s of spectra. You can find them here. But we need more. Much more!We’d like you to contribute them. if you don’t want to upload them yourself then contact us directly and we will process and uplood for you. We need the data and the name/structure of the associated molecule.

And how will the game be used on these spectra? The game will be used to “curate and validate” the spectra. As the game is being played a score of how many people say it is correct will be kept. And of course what is wrong. Based on these scores our curators will be directed to “problematic spectra” for their attention. This is true crowdsourcing and a great way to do spectral validation.

We would like the spectral collection to grow and welcome contributions from anyone. They do NOT have to be just NMR. They can be IR, MS, Raman etc too. Ultimately a Spectral game will be unveiled. Please consider ChemSpider as a repository for your data as it will benefit the community of chemists and, in particular, the process of teaching students and allowing them to “game their way” through the process. Watch where this goes…it’s VERY interesting to consider how it can improve…there is an NMR game website in development so you won’t have to go just to Second Life.

Buy me a Coffee

Following on from my recent post about “Why are structures like YouTube Videos ? “I am now asking the same question about spectra.

The answer is simple. When people have deposited data as OPEN DATA on ChemSpider we are now providing the ability to embed the spectral data and display at other sites. This is different in that we are not just showing images but real live spectra in the JSpecView Java Applet so Java must be installed. Thanks to Cameron Neylon for asking the question about whether we could provide the service. Glad to help…

If all is well you should see an IR spectrum associated with the ChemSpider record here. In order to EMBED spectra simply Login to ChemSpider, find an Open Data spectrum of interest (you could browse http://www.chemspider.com/spectra.aspx) and then click on EMBED (left hand corner below the spectral image. Do a left click to see additional features of JSpecView. We DO have some minor work to do with spectral plot reversal and improving the zoom display but we’re getting there. Enjoy.

Reblog this post [with Zemanta]

Buy me a Coffee

Many of us using ChemSpider are looking for compounds of interest to us. In some cases those chemical entities are not of fleeting interest but something that we are working on in our research, have a hobbyist interest in or some other driving force encouraging us to track activity in.

With this in mind we have now allowed any user to “monitor an article”. What this means is that when new information is associated with an article (new outlinks, new forms of data, new publications, associated spectra etc) then an email will be sent to you making you aware of the new information. In order to monitor an article simply login as a register user and click on the “Monitor This Article” button. If you want to discontinue in the future simply return to the article and click on “Cancel Article Monitor”. We’d like a few people to help test this process for us and provide us with feedback. Keep your eye on those molecules of interest to you with Article Monitoring.

HDR Eye
Image by ?Felix? via Flickr

Buy me a Coffee

I think the press release here, and copied below, speaks for itself…When I posted the blog about the need for an InChIKey Resolver it resulted in a great discussion and series of comments. Since that time I’ve had many discussions with interested parties about the need. The RSC and ChemSpider share a mutual view regarding the need for the InChI resolver and we are honored to be entrusted to develop a resolver for the community. Will it be “the” resolver..only time will tell. There are various ways to deliver a system to do this so we’ll start here and garner feedback. There are many ways to “hunt a Welshman” (I can say that since I’m Welsh!) so there may be other efforts to deliver a resolver coming too.

“RSC and ChemSpider develop InChI Resolver

01 December 2008

An InChI Resolver, a unique free service for scientists to share chemical structures and data, will be developed by a collaboration between ChemZoo Inc., host of ChemSpider, and the Royal Society of Chemistry. 

Using the InChI - an IUPAC standard identifier for compounds - scientists can share and contribute their own molecular data and search millions of others from many web sources. The RSC/ChemSpider InChI Resolver will give researchers the tools to create standard InChI data for their own compounds, create and use search engine-friendly InChIKeys to search for compounds, and deposit their data for others to use in the future. 

The future of publishing

‘The wider adoption and unambiguous use of the InChI standard will be an important development in the way chemistry is published in the future, and the further development of the semantic web,’ comments Robert Parker, Managing Director of RSC Publishing. 

The InChI Resolver will be based on ChemSpider’s existing database of over 21 million chemical compounds and will provide the first stable environment to promote the use and sharing of compound data. ‘ChemSpider hosts the largest and most diverse online database of chemical structures sourced from over 150 different data sources’ adds Antony Williams of ChemSpider, ‘We have embraced the InChI identifier as a key component of our platform and the basis of our structure searches and integration path to a number of other resources. We have delivered a number of InChI-based web services and, with the introduction of the InChI Resolver, we hope to continue to expand the utility and value of both InChI and the ChemSpider service.’ 

Society support

‘As a learned society publisher it is important that RSC provide support for the standard and contribute to the development of the resolver, which promises to be a valuable service for the chemical science community.’ continues Parker, ‘our collaboration with ChemSpider on this project will enable this to be delivered quickly and sustainably.’ 

The imminent adoption of the InChI generation protocol will be a welcome and necessary step to the wider adoption of the InChI standard. “

Buy me a Coffee

Frequent users of ChemSpider might have noticed a change in layout of the record view pages of late. As we layer more information onto a record view page (EPI Suite predictions, SimBioSys LASSO scores, spectral data, MORE predictions to come) the record view pages become increasingly heavy. As a result we have had to navigate the challenge of increasignly heavy pages and user experience. Since we have added the ability to perform structure searching on Pubmed recently and are now in the process of adding a new update for Patent searching we have chosen to hide the Data Source outlinks until you choose to see them.

So, if you are looking for original data sources and a list of potential commercial vendors please click on the button indicated below to fold out the list. Commercial vendors are indicated as discussed previously here.

Buy me a Coffee

I’ve been in a number of conversations of late about how Mass Spectrometrists might use ChemSpider and get value from our efforts. I recently gave a short Powerpoint presentation to a group about what ChemSpider is and the types of queries that ChemSpider users can conduct today. I’ve posted the presentation to Slideshare as usual so people can access it there if they are interested.

I’ve started wrapping my head around how we could provide more value to some of our users in regards to MS, HPLC and NMR. One of the things we could do is to use our known text mining skills to look for NMR or MS (LCMS) articles based on the use of the terms in the title or abstract and then using those terms as tags against chemical structures in the abstract/title. So, from titles such as “High-Performance Liquid Chromatographic Method for Determination of Phenytoin in Rabbits Receiving Sildenafil” from our collaborator Libertas Academica we would extract HPLC and Phenytoin and connect the article to the structure as we have done here. In this way the article would be searchable by structure and associated analytical technique and we could even look at extracting the detailed experimental approach from Open Access articles. More work but feasible. Any comments???

Buy me a Coffee

Readers of this blog will know we have a focus on enabling chemists to source information via both Open AND Closed access publishers with the aim, ultimately, of providing a way to perform structure and substructure searching of these articles. This work is well underway.

If you visit our Literature Search Page you will see that we have recently added the ACS AuthorChoice Free Access articles to the index and we will continue to index on an ongoing basis.  There are very few ACS AuthorChoice articles to search but the usual validation search of “Searching Taxol”  it does turn up one hit.

Herding Nanotransporters: Localized Activation via Release and Sequestration of Control Molecules (Nano Lett. 2007 Volume 8 Issue 1 Page 221) - American Chemical Society

R. Tucker, P. Katira, H. Hess

… 1 mM MgCl, 1 mM EGTA, pH 6 .9) containing 10 micromolar taxol for stabilization and kept at room temperature (20 C). Caged -ATP and “

Buy me a Coffee

Users of ChemSpider might have noticed some performance isseus in the past 2-3 weeks with our web services, service availability and speed of searches. I put my hand in the air and say “Yup, acknowledged”. Hopefully they have not been too disruptive BUT it is for the overall benefit of the service ultimately. We have been streaming in 8 MILLION links to Pubmed in order to make Pubmed structure and substructure searchable. We are NOT rolling this out with full fanfare yet but I do want to explain the performance issues you might be experiencing. We work on Microsoft technology and while we are advocates for the platforms of .NET, IIS and SQL Server we definitely are putting them under pressure as we keep expanding the database and adding more value. We have thoughts about how to resolve this but want to finishg populating the tables first.

The upside….the majority of links are already in place. For an example visit a structure and look for PubMed as a data source and click on one of the links. For example, for Valium here you will see in the datasource table a series of Pubmed IDs next to the PubMed datasource…

  16971504, 17673, 874970, 406430, 17881, 327854, 879884, 577681, 560225, 195649, …

These will link you out to PubMed directly. Try it out…

Now, do we have implementation issues? YES. The lists of external IDs can be long so right now we show only the first 10. We wiil deal with display of others shortly. We need to provide a way to curate out “junk” entries. For example, “methyl” is on Chemspider as a fragment and has links to PubMed IDs…you’ll see why if you click them..it was done with text mining. These issues will be resolved but for now we announce that PubMed is structure and substructure searchable via ChemSpider. We will explain how we did it shortly but for now we will acknowledge the massive contribution of our colleagues at SureChem. More to come…

Buy me a Coffee

I’ve had a number of questions about the presentation I gave at ACS Philly last week about document markup. The phrase I keep hearing is “very disruptive” followed by the question “will authors do more work and what’s in it for them?”.

The presentation here outlines the general concept that I talked about…

The basic concept I presented is as follows, with a focus on Chemistry Articles.

A lot of effort is being expended in “text-mining” publications, post-publication, to index these articles and make them searchable not only by text but by the specific language of chemistry, chemical structures. We are specifically asking the question “why extract chemical structures from articles using chemical name conversion approaches and chemical image conversion tools when the structures in the article were ORIGINALLY machine readable?”

We are considering a system whereby authors are asked to contribute to the availability of a free online service for performing structure and substructure-based searches of chemistry articles. While the submission of journal articles is already a lot of work (I know from experience of authoring/co-authoring about 10 a year) we hope that authors will support a service whereby they can upload their own articles to a “validation and mark-up service”. The upload capabilities will support upload of the primary document, chemical structures in standard formats and supplementary information of various types (to be defined)

This system will perform the following services:

1) semi-automated markup of a document - title, author(s), abstract and additional dictionary-based terms plus the ability to use the NLM-DTD markup
2) identification of chemical names and conversion to structures in an automated fashion
3) conversion of structure IMAGES to connection tables using optical structure recognition software (either commercial or open surce)
4) ask authors to confirm whether the converted structures are appropriate
5) provide a structure validation service for submitted molecules checking for “accurate representation”
6) Deposit all structures associated with an article onto ChemSpider but under embargo. Associate the article Title, authors and “abstract snippet” with all structures.
7) Issue a set of ChemSpider IDs for the author to submit to the publisher with the article
8) When a publication has passed through review the author can release the structures from embargo using a DOI or an article URL (more common for Open Access articles)

The result of this project will be a way for publishers to link their articles directly to a free access chemistry database and use a series of web services to enable other capabilities (to be defined). It will also allow articles in Open Access and non-Open Access publications to searchable by the “language of chemistry”.

This is only a slice of the overall project but I think it may be of interest relative to the comments you have made below.

Parts of this were shown last week at Drexel University and a particular snippet is available online here:

We are also going to provide a Microsoft Word add-on which will allow users to prepare articles for publishing using similar technologies.

We think this IS disruptive..what say you?

Buy me a Coffee

A link to the presentation I gave at ACS-Philly yesterday in Rajarshi Guha’s session is provided below. A lot changes between writing an abstract and writing a talk so I had the chance to expose an increasing number of papers ALREADY using ChemSpider as one of its platforms of choice to source information from.

Can a Free Access Structure-Centric Community for Chemists Benefit Drug Discovery?

ChemSpider is an online database of over 20 million chemical structures assembled from well over a hundred data sources including chemical and screening library vendors, publicly accessible databases and resources, commercial databases and Open Access literature articles. Such a public resource provides a rich source of ligands for the purpose of virtual screening experiments. These can take many forms. This work will present results from two specific types of studies: 1) Quantitative Structure Activity Relationship (QSAR) based analyses and 2) In-silico docking into protein receptor sites. We will review results from the application of both approaches to a number of specific examples. QSAR analyses utilizing the ChemModLab environment for assessing quantitative structure-activity relationships will and screening using a molecular surface descriptor model.

Link to presentation

Buy me a Coffee