Archive for the ChemSpider Chemistry Category

Part 4 in the exposure of new ChemSpider functionality from the recent update. We have been using the ACD/Labs Structure Drawing Applet on ChemSpider for the past three years. It’s been a great piece of technology and was one of the first applets, possibly the first structure drawing applet ever released. However, it’s old technology and we have been encouraged by our users to use a more modern applet. We are very fortunate to have been granted the right to use the Symyx JDraw applet and have had the pleasure of working with Keith Taylor and James Jack. For the time being we have left two applets online for the users to try out and provide feedback on. You can choose the ACD/applet or JDraw by selecting via the interface as shown below. Feedback welcomed.

symyx jdraw

Buy me a Coffee

JC has given a great overview of how students might want to use ChemSpider for the purpose of chemical information retrieval on the internet. JC’s course lecture thoroughly exercises ChemSpider, in real time, to do searches across the internet. He posted his seminar to Scivee here and I have embedded the lecture below. It’s a good talk for students and I encourage you to share it and review how ChemSpider can be used in your classwork and in your laboratories.

Buy me a Coffee

What’s your favorite flavor of mercury acetate..on Wikipedia here? on CAS Common Chemistry here or on ChemSpider here?

How would you represent this structure if you were to draw it as a 2D diagram?

mercury acetate

Buy me a Coffee

roadrunnerAs an active member of the Wikipedia Chemistry team I continue to be impressed with the dedication and commitment that the members have to improving the quality AND quantity of information available on Wikipedia for chemists. The number of lost hours of sleep freely given to the benefit of Wikipedia, and in this specific case to the chemistry community, is immense. The number of “Compound Pages” on Wikipedia dedicated to drugs/chemicals has continued to grow and, despite a sincere effort on our part to keep everything linked up from ChemSpider to Wikipedia it’s a little like chasing the Road Runner….we’re always behind!

We have been working with the WikiChem team of late to embed links from Wikipedia back to ChemSpider. I am humbled to know that our hard work to establish ChemSpider as a source of quality information has reached a level of trust such that Wikipedia now links from the ChemBoxes out to ChemSpider. The links are being updated on an on going basis at present with hundreds of new links already established and more being generated on an ongoing basis. Wikipedia User: Beetstra has written a ‘bot that is inserting ChemSpiderIDs across the database (see below) and we ARE doing rigorous checking of all of the links.This was using a file that we generated on our side showing links to Wikipedia from ChemSpider.

beetstra

We will then be able to generate a list of all ChemBoxes/DrugBoxes without links from Wikipedia to ChemSpider and we will then make the links on our side, manually curating the structures, and then hand back a file to finish all linking. At this point we will have the backfile under control and we can perform ongoing updates as new compound pages are created on ChemSpider and, if we curate and find errors on Wikipedia or ChemSpider making a few manual edits is easy.

There are very dedicated teams on Wikipedia and ChemSpider carefully poring over data with their robots and eyeballs to create a linked data set of quality chemistry. It’s long, tedious AND important work. When its done we will have an expanded set of data to semantically link from RSC articles when we do markup.

Buy me a Coffee

Last week I had the pleasure of being on an agenda with a number of people whose work I applaud and who I genuinely enjoy spending time with and sharing thoughts about “what if?” Martin Walker, one of the people I collaborate with on Wikipedia, invited me to speak in his session “Publishing and Promoting Chemistry in the Internet Age“. Martin gave an introduction to the session and spoke about Chemistry on the Internet. Beth Brown gave an overview of the Chemist’s Toolkit for Publishing and Promoting your work on the Internet. I followed with an overview about what’s going on with ChemSpider and the issues of connectedness and quality of chemistry on the internet. JC Bradley spoke about transparency and Open Notebook Science. My hat’s off to Martin for arranging the speakers in that order. Considering we didn’t coordinate our talks it was an excellent trajectory throughout the session and very much an integrated overview of activities regarding chemistry on the internet.

My talk is posted on SlideShare here and is available below. Any comments and questions are welcomed.

Beth Brown has her talk online here and JC Bradley will post his online here.

JC Bradley and I had a good talk about ways we can collaborate together more closely on Open Notebook Science. We have a path forward so that ChemSpider can provide additional support and will be discussing the path forward offline.

Buy me a Coffee

In the history of developing ChemSpider we have undertaken some fairly demanding curation activities. For example, Vancomycin and Ginkgolide B. Now we are in the middle of trying to resolve the structure of Digitonin. There are 25 (!) skeletons for digitonin on ChemSpider from various sources. There were eleven compounds on ChemSpider called Digitonin. We have been able to clean most of these by removing partial stereochemistry. We are now left with three structures…simply search Digitonin on ChemSpider and you will see three structures with full, but different stereochemistry.

What is a “correct structure” is a matter of assertion. Who says what is correct? What publications, what techniques, what database, who says its correct? Structures have timelines…they can change with time as new analytical techniques are applied.

This is a call to the community to help resolve the existing confusions around Digitonin on ChemSpider…but they are out there in all the other databases also and there are discrepencies between Wikipedia, DSSTox, ChEBI, PubChem and so on. So, my call to community…what is the correct structure of Digitonin and based on what assertions?

With this information in place, and assuming communal agreement on the conclusion, we can go help clean up the other databases. Help!

Buy me a Coffee

For those of you who have been using ChemSpider for the past few months you will be aware that historically we had an integration in place to SureChem’s Patent Portal. A few months ago that integration was unfortunately broken as SureChem improved their service. Also, we were un-synchronized with their growing set of chemical structures as they updated their patents. The previous integration was very limited in nature anyway as it simply showed the presence of patents associated with the ChemSpider structure in the SureChem database. Certainly a more ideal solution is the one that we introduced just in time for the ACS meeting in Washington.

The new solution lists not only the number of patents containing the chemical compound shown in the ChemSpider record but also show the first 10 patents, by title, and provides direct link-throughs to the patents on SureChem. This is a much improved integration and we hope you enjoy it.  The next stage is to deposit the latest SureChem structure collection that has grown significantly since our last deposition. Thanks to our collaborators at SureChem from offering you, our users, access to their service.

xanaxpatent

Reblog this post [with Zemanta]

Buy me a Coffee

It was a busy week at the ACS meeting in Washington. I gave three presentations and the title, abstracts and links to Slideshare are given below:

Oops and Downs of Resolving InChIs For the Chemistry Community (Link to Slideshare)

The InChI resolver was rolled out to the community in March 2009 with the purpose of providing a centralized resource for chemists to resolve InChIs (International Chemical Identifiers). This presentation will provide an overview of the development of the underlying technologies associated with the InChI resolver, and how the resolver is being used, integrated and enhanced to provide additional value to the chemistry community. We will discuss present limitations to application of the resolver for providing access to databases and chemistry information distributed across the internet and define our vision for enhancing interconnectivity across Open databases using the InChI resolver as the glue.

ChemSpider: Building a knowledge-based community for chemists using social and data networking technologies (Link to Slideshare)

In less than 2 years ChemSpider has become one of the primary online resources for chemists providing access to an unsurpassed aggregate of free-access knowledge and data. ChemSpider was developed with the intention of providing a structure centric community for chemists that would be enhanced by data depositions, curations and annotations by the community. The system presently hosts over 21.5 million chemical compounds from over 200 data sources. Working with a network of advisors, collaborators and data providers ChemSpider has created a unique resource of integrated information for chemists. These efforts have enabled us to support the curation of the Wikipedia chemistry pages, the production of a community supported Open Access chemistry journal and provision of web services integrated to spectrometer systems distributed around the world. This talk will provide an overview of how ChemSpider utilized social and data networking to create a community for chemistry.

Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources (Link to Slideshare)

The extraction of chemical entities from documents such as patents and publications has been pursued for a number of years. We wish to report on ChemMantis, an integrated system for chemistry-based entity extraction and document mark-up enabling access to the rich resource of online chemistry know as ChemSpider. We will discuss the development of the platform from its inception as a series of dictionaries to the integration of an entity extraction algorithm and its expansion to a public deposition and publishing platform for chemistry. Chemistry articles can now be deposited, marked-up and exposed to the public within a few minutes in many cases making it an ideal platform for communicating research and providing integrated access to data sources including PubChem, ChEBI, Wikipedia and Entrez.

Reblog this post [with Zemanta]

Buy me a Coffee

My colleague Will originally developed the ChemRefer service. When ChemSpider started up Will brought the ChemRefer technology and joined us to help expand the capabilities of our services. We integrated ChemRefer and released the text searching capabilities. Will indexed more and more journals and grew the index by 100s of thousands of articles. Unfortunately the downside was that the speed of the search decreased dramatically. Also, we kept hearing the comparison with the Google service and that their advantage was in their citations. So, Will has taken a few months off from indexing and has focused his efforts on developing his technologies to dramatically improve the speed of searching as well as implementing a system for recognizing citations. The system has been made available online for beta-testing just in time for the ACS meeting here in Salt Lake City BUT it is not yet integrated into ChemSpider.

I have performed some basic tests focused on searching chemical names initially. The literature search on ChemSpider has a lot more journals indexed but in order to perform the comparison I searched ONLY the RSC and Journal of Biological Chemistry articles since that is all we have indexed so far on the new system. The search results were as follows. The numbers compare number of hits for the old versus new literature search. The new search has indexed the latest RSC and JBC articles also so in theory should provide more hits.

Searching on Taxol: 626 hits found in 22 seconds (OLD) vs 717 hits in 1 seconds (NEW)

Searching on phenolphthalein: 47 hits found in 5 seconds OLD) vs 1514 hits in 1 second (NEW)

Searching on benzene: 846 hits found in 75 seconds vs 15260 hits in 4 seconds (NEW)

Clearly the searches are MUCH faster with the new system but it is also returning much more results. These are very early results and we will explain more about the system, the results and our future development shortly…

Try out the new system here for now and send us feedback at info@chemspider.com. Thanks

Reblog this post [with Zemanta]

Buy me a Coffee

We continue to expand the ChemSpider Database with new depositions sourced from various collaborators. We are especially privileged to have received the RSC’s structure collection associated with their Project Prospect articles and have spent a couple of weeks working with the data prior to depositing onto ChemSpider. During the deposition process we have formed the link between the chemical structures and their articles via a DOI link. We have been able to deposit the title, an associated author and the DOI. In this way we have been able to link thousands of chemical structures to articles on the RSC website. On each record associated an RSC article you will see both a link from the data source table and a link via DOI from the reference as shown here and in the figure below.

rsc_linkWith the RSC depositions came many beautiful structures – highly symmetric, complex and just plain “pretty” to a chemist. But a high level of complexity also arrived with the collection and while many InChIs could be converted to their associated connection tables the act of converting the InChIs could add additional stereochemistry and structure cleaning could change stereochemistry so this was a long, tedious and mostly manual process I’m afraid. Nevertheless, a wonderul addition to the ChemSpider database and our sincere thanks, on behalf of the community too, to the Royal Society of Chemistry for sharing their data with us. The InChIs will be deposited into the InChI Resolver shortly.

Reblog this post [with Zemanta]

Buy me a Coffee

There are some interesting articles showing up on ChemSpider from across the blogosphere. We have just added to our list of high priorities to generate an RSS feed of structures, short descriptions and ChemSpider IDs so that anyone can access them. When we add new descriptions we will add snippets to the RSS feed.

New Articles include:

Teen Chemist and Splenda

A Discussion about the Synthesis of Spirangien A from the TotallySynthetic Blog by Paul Docherty

A Discussion about the Synthesis of Omaezakianol from the TotallySynthetic Blog by Paul Docherty

Reblog this post [with Zemanta]

Buy me a Coffee

ons1We’ve been working with Jean-Claude Bradley and his Open Notebook Solubility Challenge group to assist where we can. This has included enhancing some of our services (though there is more work to be done…), populating data into ChemSpider and, now, linking us up to the Data Tables built by Andy Lang (of The Spectral Game fame…we’re quite a team).

The Open Notebook Solubility Challenge is described here. The present list of compounds for which we have created the integration to be described below is here. WHen you open that link you’ll see the first bunch…notice the little icons showing patent links, Wikipedia links and the presence of spectra on those records.

WHat we have done now is deposit the links into the Data Source tables for these compounds and providing the direct link to the ONS tables. They can be viewed WITHOUT leaving the site simply by hovering over the link…OR you can click on the link to view the data directly. An example of the link view is shown below. To find these tables simply look up the Open Notebook Solubility Challenge data source in the table.

 

ons2

Buy me a Coffee

Late nights and ailing computers aren’t conducive to the best of work. So, when I posted about the clean chemical structure I obtained using ChemDraw I was genuinely excited about the quality of clean-up that was produced. However I slept on it and reminded myself to check that the output InChI was equivalent to the input InchI as my experience with structure cleaning is that it can swap stereocenters.

So, I returned to that particular problem and looked specifically at the InChI string fed to ChemDraw to convert and then converted the resulting strcture to an InChI in Chemdraw. So, to clarify, this was all done inside the package:

Here’s the stereo layer of the input structure:

/t35u,36u,37u,40-,41+,42+,43-,44+,45-,46-,47+,48u,52-,53-,54-/m0/s1

and the stereo layer of the output InChI

/t35-,36-,37-,40+,41-,42-,43+,44-,45+,46+,47-,48-,52+,53+,54+/m1/s1/

This is the name of the structure generated by converting the original InChI to a structure and generating the name using nomenclature software: (4R,5E)-4-{[(1E,2S)-2-{[(E)-{2-[(1S)-1-amino-2-methylbutyl]-4,5-dihydro-1,3-thiazol-5-yl}(hydroxy)methylidene]amino}-1-hydroxy-4-methylpentylidene]amino}-5-{[(1E,2S)-1-{[(1E,3S,4E,6R,7E,9S,10E,12R,13E,15S,16E,18R,19E,21S)-18-(3-aminopropyl)-12-benzyl-15-(butan-2-yl)-6-(carboxymethyl)-2,5,8,11,14,17,20-heptahydroxy-3-(2-hydroxy-2-iminoethyl)-9-(1H-imidazol-5-ylmethyl)-1,4,7,10,13,16,19-heptaazacyclopentacosa-1,4,7,10,13,16,19-heptaen-21-yl]imino}-1-hydroxy-3-methylpentan-2-yl]imino}-5-hydroxypentanoic acid

This is the name of the structure generated by naming the structure produced by ChemDraw resulting from reversing the original InChI

(4R,5Z)-4-{[(1Z,2S)-2-{[(Z)-{(5R)-2-[(1S,2R)-1-amino-2-methylbutyl]-4,5-dihydro-1,3-thiazol-5-yl}(hydroxy)methylidene]amino}-1-hydroxy-4-methylpentylidene]amino}-5-{[(1Z,2S,3R)-1-{[(1Z,3S,4Z,6R,7Z,9S,10E,12R,13Z,15S,16Z,18R,19Z,21S)-18-(3-aminopropyl)-12-benzyl-15-[(2R)-butan-2-yl]-6-(carboxymethyl)-2,5,8,11,14,17,20-heptahydroxy-3-(2-hydroxy-2-iminoethyl)-9-(1H-imidazol-5-ylmethyl)-1,4,7,10,13,16,19-heptaazacyclopentacosa-1,4,7,10,13,16,19-heptaen-21-yl]imino}-1-hydroxy-3-methylpentan-2-yl]imino}-5-hydroxypentanoic acidCheck out and compare the names…look at the difference in stereocenters. Maybe there is someting I am not doing correctly and causing this effect. I am presently communicating with Cambridgesoft on this point to see if there is some setting I am missing that retains stereochemistry. This is exactly the issue I see with InChI reversals and CLEANING in other applications unfortunately. I will report back when I determine what the optimal settings are to stop such issues, if indeed they can be prevented.

 

Buy me a Coffee

I’ve been fighting with technology today. I opened my computer at 7am and the nightmares started…..40 minutes to boot, 20 minutes to open my Outlook PST file and that’s where we stay. The CPU pegged at 95% while Outlook is open. I have scanned the pst file to fix it and spent hours defrag’ing and blah, blah, blah. Looks like a reformatting job is coming…fortunately for me blogging and chemspider are all web-based so some catch ups tonight…

Some fast comments …

We’ve been adding new blog posts into some of our records…we can do this with your material if you want a larger audience and preservation moving forward. Some totallysynthetic blogs are here (1,2) and a fun posting from J on Bromination

We have agreement from NIST to use a “small slice” of the NIST Webbook data and are adding IR, MS and UV-vis data onto ChemSpider at present. See the spectra for Cholesterol here

Buy me a Coffee

InChIs are a powerful way to communicate chemical structures. They are going to enable internet chemistry and when we roll out the InChI Resolver shortly then the community will have access to a resource to resolve InChIKeys and ultimately navigate chemistry on the web. We commonly receive chemical structures in the form of InChIs and in order to deposit the structures we have to convert the InChIs back to chemical structures, commonly into SDF format for batch deposition. For simple organics this is not a difficult process…the tools we have at our disposal can deal with the layout of simple organics. However, for some of the chemical structures we receive optimizing 2D layout is very challenging. Many of the issues come with fullerenes (See examples below) but not only. Carbohydrates, complex cycles etc are big challenges.

clean

In building the InChI resolver we hope to provide attractive visual depictions of the associated structures. Without AuxInfo data carrying the coordinates,  or without the deposition of SDF files containing the layout coordinates we have a major challenge ahead of us. Auxinfo data are shown below for erythromycin. These data are rarely generated when people generate InChIKeys and the issue of structure layout will dominate the interpretation of complex structures.

auxinfo

Since beauty is in the eye of the beholder my judgement is that automatc layour algorithms should only assist in the appropriate layout and eyeballs will need to make the final decision. That is why it is better to deposit SDF files of InChIs with Auxinfo carrying the coordinates than it is to deposit InChIs only and leave the structure layout to an algorithm. It will fail.

I am interested in seeing what people can do with their structure cleaning algorithms on InChIs like this:

InChI=1/C66H103N17O16S/c1-9-35(6)52(69)66-72-32-48(100-66)63(97)80-43(26-34(4)5)59(93)75-42(22-23-50(85)86)58(92)83-53(36(7)10-2)64(98)76-40-20-15-16-25-71-55(89)46(29-49(68)84)78-62(96)47(30-51(87)88)79-61(95)45(28-39-31-70-33-73-39)77-60(94)44(27-38-18-13-12-14-19-38)81-65(99)54(37(8)11-3)82-57(91)41(21-17-24-67)74-56(40)90/h12-14,18-19,31,33-37,40-48,52-54H,9-11,15-17,20-30,32,67,69H2,1-8H3,(H2,68,84)(H,70,73)(H,71,89)(H,74,90)(H,75,93)(H,76,98)(H,77,94)(H,78,96)(H,79,95)(H,80,97)(H,81,99)(H,82,91)(H,83,92)(H,85,86)(H,87,88)/t35u,36u,37u,40-,41+,42+,43-,44+,45-,46-,47+,48u,52-,53-,54-/m0/s1

The images below show the iterative application of DIFFERENT structure layout algorithms. One caution…your layout algorithm should produce the SAME InChI at the end and NOT flip stereocenters. Interesting challenge. Who says cheminformatics isn’t challenging? And who thought building an InChI Resolver would be easy?

layout1layout2layout3layout4

Reblog this post [with Zemanta]

Buy me a Coffee

I gave my talk yesterday at CShals 2009, the conference on Semantics in Healthcare and Life Sciences.It was a great meeting for me (hindered by dismal access to wireless internet as a result of Marriott’s want to make more money from the conference organizers. They should be ashamed of themselves in this day and age!) as it was not about Chemistry, not about spectroscopy, not even about Open Data, Open Access and Open Source. It was about Semantics. I learned a lot and got to hear Tim Berners-Lee talk about where the semantic web is and where it can go and how can be disruptive in a good way while NOT being too disruptive to layer onto what already exists. The best part of the meetingfor me was the clear passion for the InChI, as well as a lot of acknowledgement that it is not perfect, cannot presently compete with molfiles, commercial systems, CAS Numbers and so on. But, people are optimistic and are waiting and supportive. Overnight I inserted a lot more information about InChIs and how they can be useful, where some of the limitations are presently, how the StdInChI has now added a new level of complexity on one hand and simplifcation on the other. There have already been a number of requests for a copy of the talk so it is up on Slideshare for now (and linked below). I’ll do a voice over in the next few days and upload to Scivee. I unveiled the first version of the InChI Resolver at conference and showed it to a couple of people. The general consensus is we are heading in the right direction. The timing on this conference was good because the intention is to layer on RDF before we release at the ACS, time allowing.

Reblog this post [with Zemanta]

Buy me a Coffee

Where in the world is Carmen Sandiego and who and where is Katie Crow? We’re still looking for her ever since she put her photo on ChemSpider and took advantage of the new capability we have for depositing images.

Well, a more appropriate use of the function is to actually deposit images of appropriate data. JSpecView does not support 2D NMR data at present but such data can still be of value. Ryan Sasaki from ACD/Labs was kind up enough to give me an example 2D COSY spectrum for strychnine so i could use it as a proof of concept. It is available under the spectra tab at this record (see the bottom of the page). This 2D spectrum could also show a structure with correlations etc.

Reblog this post [with Zemanta]

Buy me a Coffee

Beauty is in the eye of the beholder. Something I see as stunningly beautiful can just as easily be unattractive to my peers. Such is the nature of Chemistry too. Some might find a particular reaction particularly elegant while others would argue it is mundane. I judge that when it comes to the depiction of chemical structures we would all have fairly consistent views of what are attractive and appropriate chemical structure depictions or “layouts”.

Structure layout is hard to do well and there is still a need for THE optimal layout algorithm. We still find some nightmare organic structure layouts on ChemSpider. When we push them through the layout algorithm we use now they are easily resolved so we’re not sure why some escape the layout algorithm first time but such it is. We have provided the ability to clean these individual records as we find them and it takes just a couple of seconds. The technical note explaining how is here.

Such an operation was applied here. The structure on the left is the “ugly” structure (does anyone think it’s pretty?) and the one on the right is the cleaned version using the online process.

Unfortunately it is NOT so easy to obtain such improved layouts for the MAJORITY of organometallic compounds. This can be seen on PubChem (here) and, similarly, on ChemSpider here. The example is shown below. Are we working on this problem? Not really…the layout for such complex systems has been a challenge for many years and the appropriate way to deal with such situations is to use the CIF file, if its available, and display in JMol as we have enabled here. We are however still working on cleaning up the structures of organic molecules as we see them and still searching for the ultimate layout tool…

Buy me a Coffee

For those of you performing curation activities on ChemSpider you will likely have noticed the ability to mark a new type of identifier, a shorthand formula. We have enabled this because it has become clear that this could be a useful part of document markup as part of our ChemMantis system. For example, looking at an article let’s consider the excerpt shown below.

Regarding the excerpt you can see a number of highlighted terms, all being shorthand formulae and not depending on name to structure conversion algorithms but rather depending on a lookup dictionary. Each of these names are linked to ChemSpider for direct look up of information associated with the chemicals. The list of shorthand formulae extracted from a couple of hundred articles is actually only a couple of hundred formulae at present. It includes the most obvious compounds that we can all interpret: CH3OH, MeOH, CH3CN, MeCN, CH3COOH, NaCl, NaF, NaCN, KBr, KCl and so on. All of these are immediately interpretable by chemists. There are likely a few more to be found over the coming months but in the past week of reviewing articles from various sources we have actually only added a couple of new formulae. We have also seen value in linking up ions and elements as appropriate. We are likely to add filters for display/not display of elements and ions since we’re of the opinion that displaying every incidence of an element in an article is of luttle value…just imagine how many times you might see the word carbon or hydrogen in an article… carbon-carbon bonds, hydrogen bonding etc. So, we’re switching them off by default. We’ll keep reporting on how we are improving ChemMantis…based on the review of a stack of articles the system has improved dramatically. We are asking for your articles now…combining shorthand formulae and chemical name markup will highlight a document as shown below.

Buy me a Coffee

When  ChemSpider was rolled out to the world as a part of ChemZoo we always knew we would be introducing more “critters”. We are happy to announce our progree with our new development ChemMantis. Why Mantis? Well…it’s the Markup And Nomenclature Transformation Integrated System. Fits perfectly into our zoo!

We have been working on the markup of chemistry documents for a number of months and I unveiled the first aspects of our work at the ACS meeting in Philadelphia. The presentation is available online on my Slideshare account. What we are trying to do is to use our ChemSpider platform as the foundation of a document markup system whereby chemical names are automatically identified and can either be converted to chemical structures (possible using algorithms for name to structure conversion) or are retrieved from our ChemSpider database. We have invested a lot of efforts to curate and validate the ChemSpider database of over 21.5 million unique chemical entities over the past year and are now sitting on a foundation of information allowing us to connect between chemical identifiers, chemical structures and out to rich sources such as Wikipedia and PubChem and to provide information such as chemical vendors and other online systems. ChemMantis is well and truly weved into the web of ChemSpider now.

We are now in alpha release and are adding some finishing tweaks to the markup system, the visualization elements and the  workflow. You can see the immediate effects of our recent work on improving the quality of structure images in the balloon below.

We_would_like to test the system on YOUR documents if you are willing to participate. What we are looking for are WORD documents for already published papers. They can be Open or Closed access papers. We are not expecting copyright transfer – we want to markup the documents and return to you for feedback. In the process we will be testing the quality of our Dictionary, our conversions, our visulaizations and our process. We welcome your support. Feel free to connect with us at infoATchemspiderDOTcom. Over the next few weeks you will hear more about ChemMantis and our contributions to text mining and markup of chemistry documents.

Buy me a Coffee

Recently a new website connecting chemicals to synthesis references went online. The site is ChemSynthesis and as well as synthesis references the database also contains physical properties for many of the listed substances. There are currently more than 40 000 compounds and more than 45 000 synthesis references in the database and there is an intention to keep the database growing with contributions from the community. Presently ChemSynthesis is indexing information from quite an extensive list of journals given below.

The Journal of the American Chemical Society, Canadian Journal of Chemistry, Chemical and Pharmaceutical Bulletin, Chemistry Letters, Journal of Heterocyclic Chemistry, Journal of Medicinal Chemistry, The Journal of Organic Chemistry, Organic Syntheses, Synthesis, Synthetic Communications, Tetrahedron Letters, Tetrahedron

An example record can be found here and a list of hits from a text search is shown below.

Linking_from ChemSpider to ChemSynthesis seemed like a natural way to help our users source potential synthesis details. So, that’s done. Also we have exchanged the appropriate information with ChemSynthesis so that we have completed the loop. Users searching ChemSynthesis can navigate directly to the ChemSpider record with one click.

To review the entire ChemSynthesis dataset on ChemSpider simply follow this link. It is >40,000 molecules so might take a while to load. Another contribution to the community of connected chemists….

Buy me a Coffee

We’ve been working on structure depictions on ChemSpider and overall we are very happy with where we have got to. These structure depictions are going to be showing up in various parts of our system now.

However, we should qualify the difference between structure images and structure layout. The depictions and the layout are governed by different algorithms.While a structure image can be attractive the layout may not be perfect. it is possible to improve the layout of the molecule deposited on ChemSpider. Notice for the structure on the left that there is overlap with the methyl group.

For details on how to CLEAN structures on ChemSpider please read the Technical Note here: Interactive Cleaning of Molecules During Curation and Deposition.

The result of performing cleaning is shown below. This layout may also not be the perfect layout but there is no overlap. The user can continue to manually optimize the structure for the preferred layout.

Buy me a Coffee

It is finally time to rollout more attractive structure depictions. We have needed some more attractive structure depictions for a while but they have become an absolute must have as we rollout the following new capabilities:

1) The ability to make YOUR chemical blog structure searchable (watch this space…). We suggested one path previously…this is BETTER…

2) Structure balloons for using with our document markup tools, both browser-based and Microsoft Word based

We all judge quality of visual aesthetics quickly. We know a good structure when we see one. This is an announcement that we will be rolling out new structures across the site in the next few days. You will see better looking structures showing up across the site – during deposition, during service-based predictions, during searches and, well, everywhere. While not perfect as yet a little more tweaking and the entire database will be supported by the new structure depiction algorithms. As it is you should see some examples now on the database…one shown below. We welcome your feedback!

Buy me a Coffee

I recently started a discussion with the users of ChemSpider about how they use our system. There have already been two responses and I am hoping for more. Having sat in on a IUPAC InChI meeting in Washington last week I can honestly say that it was one of the most functional and on-task meetings I have sat in on in a long time. Decisions were made about how to move forward with the next release of the InChIKey and “standard versions” of both the InChIString and InChIKey.

The meeting has prompted the question how do you use InChI? For what purpose do you use InChI and do you use only the string? Do you use it for communication purposes and structure exchange? Do you use it in your internal databases? Is it a primary path to deduplication? What settings do you use for the InChIString?

I’m interested in how you are using InChI nad how important it has become for you? Comments welcomed..

Buy me a Coffee

As ChemSpider has grown into an important part of the online community for providing access to information and data to chemists to assist them in their work there are many subjective criteria by which to be measured. We set some objectives early on in regards to how we would measure our own successes in the first couple of years. These included:

1) A result of >500,000 in a Google search (we have been at this number for over a month I believe)

2) Acknowledgment by our “peers”, another subjective criterion, by comments made in the blogosphere, recognized by invitations to speak, participate in panel discussions etc. No shortage here.

3) Reach 5000 unique users per day in our first year (already achieved)

4) Be reviewed in a mainstream publication (the Nature article written about ChemSpider does that)

5) Have over 150 data sources feed ChemSpider. We are close…145 data sources at present and more in the pipe to feed in shortly

6) Be indexed by Chemical Abstracts Service.

CAS has been indexing a number of web resources for a considerable time. Until today I didn’t know that we were one of these sources. It actually makes a lot of sense that we should be indexed. We have unique chemistry on our site since we host Open Notebook Science from groups such as that of Jean-Claude Bradley at Drexel University. But, we also have spectra and assignments from research compounds being deposited onto the database and are establishing relationships with Open Access publishers to index their chemical compounds connected directly to their articles. So, being indexed makes sense.

There has been a murmuring in the community that what ChemSpider is doing will collide with CAS. I have reiterated many times that I believe CAS offers the crown jewels in terms of quality and curated data. With what amounts to likely 1000s of person years of investment in building the registry we are unlikely to surpass CAS’ breadth of knowledge. Rather we are focused on providing a service to the community so that the community can participate in developing and growing the databas. I believe CAS and ChemSpider are synergistic and have much to offer by being connected in this way.

Inserted above is a screen grab of part of a record showing the ChemSpider database as the source of the structure. CAS have rigorous expectations regarding how they select what chemical entities should be inserted into their database. While I don’t know this list of definitions this structure clearly meets it. The structure above is on ChemSpider here. We’re very happy that we are being indexed now in the CAS registry and will continue to enhance our “unique structure collection” working with chemical vendors, publishers and scientists to grow our database.

 

Buy me a Coffee