Archive for the ChemSpider Chemistry Category

We will soon be depositing data from the SORD databases (Selected Organic Reactions Database) onto ChemSpider. This will be done as two separate but related datasets until the SORD data source: Reactants and Products. If you don’t know what SORD is then who better to explain than Dick Wife, the “host” of the SORD database. Dick wrote the overview article below to provide an overview about what SORD is…ENJOY!

The Selected Organic Reactions (SOR) Database: capturing “Lost Chemistry”

Dick Wife, SORD B.V. The Netherlands (www.sord.nl; dick.wife@sord.nl)

A new database is capturing the 80% of Lost Chemistry from theses and dissertations which doesn’t make it into publications and chemists who contribute their data get access to the entire database for free.

SORD, an independent Dutch company, is carefully selecting the synthetic chemistry focused on Life Science research and making this chemistry available in their Selected Organic Reactions (SOR) Database. For the theses/dissertations which they select, SORD excerpts all of the reactions in the Experimental section are excerpted. This means there will still be a small overlap of data with full publications. There will also be a larger overlap with publications such as Notes, Letters or Communications but these do not contain the experimental details. The SOR Database brings all this chemistry to the desktop, every last detail written by the author.

Some time back, SORD looked at around 300k interesting drug-like compounds in the literature and which countries they had come from, and the native language. The English-speaking countries accounted for only 37% of the total. German/Swiss dissertations are often written in English but this is new. The theses and dissertations in the other languages represent more than half of the total. SORD routinely translates German and French experimental texts into English. They are about to start on Chinese and Japanese translations and, if anyone can give them access to Russian theses, they will translate these as well!

A thesis or dissertation is the result of several years of hard work by a research student under the constant supervision of the research leader whose reputation is at stake if the work described is wrong or inaccurate. It is also examined by a committee who decide on awarding the degree, or not. They scrutinize closely the Results & Discussion as well as the Experimental sections. The chemistry is reliable.

Advanced Chemistry Development, Inc (ACD/Labs) is partnering SORD in developing this Database. The SOR Database is available for in-house use with ChemFolder Enterprise or on the Internet with ACD/Web Librarian™. This is a screen-shot of a typical SOR Database record in Web Librarian.

 

 

 

 

 

 

 

 

 

 

 

 

 

The Reaction Scheme shows every atom (there are no abbreviations). The Experimental  text is edited to ASCII format and the key parameters (Reagent(s), Solvent(s), yield(s), MP(s) and Optical Rotation(s) are displayed in separate Fields, as are the full bibliographic data, making data-mining possible. There is also a link which enables the user to bring up the PDF of each reaction containing all of the spectral and other physical data which SORD does not excerpt. The PDF-EX link is a powerful and unique feature of the SOR Database.

Now some explanation about SORD’s excerption rules. What they call the Reaction Scheme (A + B à C, etc.) contains only the reacting and product compound structures. A Reagent is an essential reaction component of which no part ends up in the product – if it does, it becomes a Reactant! When several reactions are performed before the product is isolated (and characterized) the Reagents and Solvents are listed in Steps. Failed reactions are not excerpted but reactions with poor yields are.

The SOR Database currently contains 170k reactions; the target is one million at the end of 2013. Even this number is a lot smaller than what you find today in the major commercial reaction databases. Back in the nineties, SORD researchers looked at one such large commercial database which then contained 9 million compounds. Sifting through the content for drug-like compounds resulted in just 450k or 5% of the records[1]. Size is one database metric; quality is much more important! In the SOR Database, you will only find characterized products – and no polymers, or compounds with no molecular structure.

Users of the SOR Database also have access to the separate databases which contain the Reagents (ca. 3,000) and Solvents (ca. 450) which have been encountered so far. Often a Reagent is a catalyst (organic/organometallic) but they can also be simple entities like bases, acids, ammonium salts, etc. or complex chiral ligands. Authors give Reagents many different names and so each Reagent (and Solvent) in the SOR Database has been assigned a unique name. This enables rapid searches using the assigned names, again a novel feature of the database. Such searches can bring you to really nice chemistry.

As an Example, the second generation Grubbs olefin metathesis catalyst has been given the name Grubbs 2 catalyst. In the current SOR Database, there are more than 500 reactions where it has been used. Some of these are straightforward; some are not and generate novel ring systems like this one from the Martin group at North Carolina at Chapel Hill:

Searches in the Reactions Scheme, or using Reagent/Solvent names and hit refinement brings you to new chemistry which until now was only found on a dusty shelf in a library. The “Lost Chemistry” is now getting smaller as SORD carefully selects and excerpts the reactions which deserve a new life. The SOR Database is essential for novelty searches and it is a powerful supplement for the other commercial reaction databases.

Finally some more good news for academic research chemists; your data will be readily accessible to the whole chemical world who will cite your work in their publications. The chemistry which you never published may be just what others are looking for. Routinely SORD excerpts the complete collection of theses and dissertations from research supervisors; they will be more than happy to see your work appear in the next SOR Database!


[1] de Laet, A.; Hehenkamp, J. J.; Wife, R. L. Finding Drug Candidates in Lost/Emerging Chemistry. J. Heterocycl. Chem. 2000, 37, 669–674.

The free ChemSpider mobile app developed in collaboration with Alex Clark (innovator of the Mobile Molecular DataSheet, Reaction101 and Yield101) is now available for download from the iTunes store! The full details of the app, and some associated screenshots, are outlined on the SciMobileApps wiki here. A brief overview is given below…

“ChemSpider Mobile is a free iOS app (iPhone, iPod, iPad) for searching the ChemSpider online chemical database. It provides the ability to search by drawing a chemical structure, or entering a compound name. The app is very straightforward and easy to learn. Search results are shown in a list showing structure and names. Any search result can be examined in more detail by launching the mobile browser and viewing the structure on the ChemSpider web page. Although the ChemSpider web page is designed to work well on mobile browsers, the mobile app is more convenient to use, and is currently the best way to search by structure from a mobile device. The structure drawing capabilities are provided by the embedded version of the Mobile Molecular DataSheet. The app was built by Molecular Materials Informatics, on behalf of the Royal Society of Chemistry.”

We will look at developing an Android app for ChemSpider, taking into account what we learn from the early use of the iOS Mobile app.

A screencast of the functionality of ChemSpider Mobile is available below.

Only two days until the start of this year’s Fall ACS meeting in Denver. The ChemSpider team is busy preparing for the meeting, packing bags, polishing talks and honing workshop skills.

Please drop by and say “Hi!”

We’d like to repeat our invitation to everyone at the conference to drop by the RSC booth (Booth 1100). Where, of course you can chat with the ChemSpider team, get a quick demo (and find out more about our latest features), pick up our hot-off-the-press User Guide or scoop some exclusive ChemSpider goodies!

To celebrate the release of the new iPhone/iPad app* we have a limited number of covers for 3G and 4G iPhones as well as iPads

*The app itself is free to download from the AppStore.

You can also find out about lots of other things that the RSC does: from publishing books and journals to the promotion of chemistry worldwide. We’ll also have lots of information on our new e-membership option, which is making its’ debut at this meeting. Also keep an eye out for members of our Editorial staff from journals including: OBC, MedChemComm, PCCP, Soft Matter and RSC Advances, who will be scouring the conference in search of lots of new and exciting research.

Natural Product & Synthetic Chemists

I’d like to make an extra special invitation to any Synthetic chemists and Natural products chemists – from PhD students to Professors (please pass this on to all your friends and colleagues who will be at the meeting). The ChemSpider team really wants to hear about your research. Tell us about your latest publication or the work that you are most proud of, and we can make sure that your key compounds from these publications are in ChemSpider, on a platform freely accessible to chemists everywhere. If you are more interested in methodology you shouldn’t feel left out – ask us about ChemSpider Synthetic Pages.

 

ChemSpider related talks and workshops

Antony Williams (most-definitely the hardest working man I know) is giving a number of talks and workshops (details below) which are sure to be entertaining as well as thought-provoking and will be well-worth squeezing into your schedule.

We look forward to meeting you.

 

“Aligning scientific expertise and passion through a career path in the chemical sciences”

Colorado Convention Center, Room: 110, Sunday 28th August 2011, 1.40PM – 2PM

 

“Chemistry in the hand: The delivery of structure databases and spectroscopy gaming on mobile devices

Colorado Convention Center, Room: 110, Monday 29th August 2011, 9.05AM – 9.35AM

 

“ChemSpider: Does community engagement work to build a quality online resource for chemists?”

Colorado Convention Center, Room: 110, Tuesday 30th August, 10.10AM – 10.50AM

 

“An Introduction to ChemSpider – A Combination Platform of Free Chemistry Database, Free Prediction Engines and Wiki Environment”

Colorado Convention Center, Room 503, Wednesday 31th August 2011, 08.30AM – 11AM

 

“Structure representations in public chemistry databases: The challenges of validating the chemical structures for 200 top-selling drugs”

Colorado Convention Center, Room: 110, Wednesday 31st August 2011, 10.45AM – 11.05AM

Previously there was ChemMobi, then there was our implementation of ChemSpider for a mobile browser and then ChemSpider SyntheticPages for a mobile browser. At next weeks’ ACS meeting in Denver we hope that the ChemSpider mobile app developed in collaboration with Alex Clark (innovator of the Mobile Molecular DataSheet, Reaction101 and Yield101) will be available for download from the iTunes store! The full details of the app, and some associated screenshots, are outlined on the SciMobileApps wiki here. A brief overview is given below…

“ChemSpider Mobile is a free iOS app (iPhone, iPod, iPad) for searching the ChemSpider online chemical database. It provides the ability to search by drawing a chemical structure, or entering a compound name. The app is very straightforward and easy to learn. Search results are shown in a list showing structure and names. Any search result can be examined in more detail by launching the mobile browser and viewing the structure on the ChemSpider web page.

Although the ChemSpider web page is designed to work well on mobile browsers, the mobile app is more convenient to use, and is currently the best way to search by structure from a mobile device. The structure drawing capabilities are provided by the embedded version of the Mobile Molecular DataSheet. The app was built by Molecular Materials Informatics, on behalf of the Royal Society of Chemistry.”

An early view screencast of the functionality of ChemSpider Mobile is now available.  New movies showing the details of the app will follow in the near future but this is an early view for interested parties.

Dotmatics Limited is pleased to announce that it will provide its web-based structure drawing tool, Elemental, to the leading chemistry community website ChemSpider. Elemental provides a zero install drawing tool that lets users draw simple chemical structures or complex structure queries directly within a webpage.

Antony Williams, Vice President of Strategic Development for ChemSpider comments “Elemental offers ease of deployment and flexibility in structure drawing to our community of users and we are happy to embrace this web-based structure drawing platform as an entry point to the rich resources of ChemSpider.”

Dr Mike Hartshorn, Director and CSO of Dotmatics, said “We are delighted to be working with such a well-known chemistry resource as ChemSpider. The new tools will allow simple access to the wide range of structures and related information that is maintained by ChemSpider and the RSC”.

About Dotmatics
Dotmatics Limited (www.dotmatics.com) is a leading provider of web-based database integration and visualisation tools for use within the life sciences industry.

About the Royal Society of Chemistry
The Royal Society of Chemistry is the UK Professional Body for chemical scientists and an international Learned Society for the chemical sciences with more than 47,500 members worldwide. It is a major international publisher of chemical information, supports the teaching of chemical sciences at all levels and is a leader in bringing science to the public. www.rsc.org

About ChemSpider
ChemSpider offers a structure-centric community for chemists to resource data.  Offering access to over 25 million unique chemical entities from over 400 data sources and by providing a platform for crowd-sourced deposition, annotation, and curation, it is the richest source of free integrated chemistry information available online.  ChemSpider delivers data and services to enable the semantic web for chemistry.  www.chemspider.com

Contacts:
Mike Hartshorn
Dotmatics Limited, The Old Monastery
Windhill, Bishops Stortford
CM23 2ND, UK
Tel: +44 1279 654123
Email: info@dotmatics.com
www.dotmatics.com

Antony Williams
ChemSpider, Royal Society of Chemistry
904 Tamaras Circle
Wake Forest, NC 27587
Tel: 919-201-1516
Email: info@chemspider.com
www.chemspider.com

 

Earlier this month I reported on the integration of Infotherm to ChemSpider but at that time it would have been necessary for non-RSC members to pay for the data on Infotherm despite the fact that a search would have provided the links and you could have clicked through to the Infotherm data pages. Some good news from Fiz-Chemie though…they are waiving the fee for data on pure compounds accessed from ChemSpider and as a result giving access to over 200,000 tables of data. This is a great contribution to the community of ChemSpider users. Thanks Fiz-Chemie!

 

infotherm

Last night I gave a presentation at the BAGIM meeting in Boston. The abstract is below together with the embedded presentation from Slideshare

ChemSpider – Is This The Future of Linked Chemistry on the Internet?
ChemSpider was developed with the intention of aggregating and indexing available sources of chemical structures and their associated information into a single searchable repository and making it available to everybody, at no charge. There are now hundreds of chemical structure databases such as literature data, chemical vendor catalogs, molecular properties, environmental data, toxicity data, analytical data etc. and no single way to search across them. Despite the diversity of databases available online their inherent quality, accuracy and completeness is lacking in many regards. ChemSpider was established to provide a platform whereby the chemistry community could contribute to cleaning up the data, improving the quality of data online and expanding the information available to include data such as reaction syntheses, analytical data and experimental properties. ChemSpider has now grown into a database of almost 25 million chemical substances, grows daily, and is integrated with over 400 sources, many of these directly supporting the Life Sciences. This presentation will provide an overview of our efforts to improve the quality of data online, to provide a foundation for a linked web for chemistry and to provide access to a set online tools and services to support access to these data.

We deposit a lot of data onto ChemSpider in a  month and the database is growing daily. As an example of the ongoing depositions take a look at what has been deposited in a one month timrframe from July-August. This is simply what has been published by me…not all depositions. It’s a pretty good indicator of ongoing efforts to enhance the quantity of content on the site.

published_in_a_month

Have you ever had a niggling feeling that you’ve been missing some corner of ChemSpider which might have a tool that will make your life much easier?

http://www.chemspider.com/Sitemap.aspx is the new sitemap for ChemSpider which lists all of the different pages in it and will help you to get an overview of all the different things that you can see and do on ChemSpider.

There are also brief descriptions about each page which will, where necessary, suggest input examples if you just want to try something out but aren’t quite sure what to type into the boxes. If you are a ChemSpider depositor or curator and view the sitemap when logged in you will see additional pages relevant to your assigned roles.

If you are an iPhone user (as I am), have an iPad hanging around to check email 20/7 (I have to sleep sometime…), or use a phone with a browser, I suggest you point it to the new ChemSpider Mobile at  http://cs.m.chemspider.com. There you’ll see a simple interface, shown below, that allows you to search across our database of almost 25 million chemical entities based on chemical name (systematic, trivial or trade, registry number etc) and retrieve a list of intrinsic properties, a list of predicted properties, a list of associated identifiers, with links to Wikipedia if available, and a Google based search for the chemical based, for now, on the associated InChIKey. Check it out, give us feedback.

We are also working on providing access to ChemSpider SyntheticPages in the same way and the first screen shot is shown at the bottom. Things are always changing and, I believe, for the better.

iphone1

iphone3

iphone4

iphone5

Part 4 in the exposure of new ChemSpider functionality from the recent update. We have been using the ACD/Labs Structure Drawing Applet on ChemSpider for the past three years. It’s been a great piece of technology and was one of the first applets, possibly the first structure drawing applet ever released. However, it’s old technology and we have been encouraged by our users to use a more modern applet. We are very fortunate to have been granted the right to use the Symyx JDraw applet and have had the pleasure of working with Keith Taylor and James Jack. For the time being we have left two applets online for the users to try out and provide feedback on. You can choose the ACD/applet or JDraw by selecting via the interface as shown below. Feedback welcomed.

symyx jdraw

JC has given a great overview of how students might want to use ChemSpider for the purpose of chemical information retrieval on the internet. JC’s course lecture thoroughly exercises ChemSpider, in real time, to do searches across the internet. He posted his seminar to Scivee here and I have embedded the lecture below. It’s a good talk for students and I encourage you to share it and review how ChemSpider can be used in your classwork and in your laboratories.

What’s your favorite flavor of mercury acetate..on Wikipedia here? on CAS Common Chemistry here or on ChemSpider here?

How would you represent this structure if you were to draw it as a 2D diagram?

mercury acetate

roadrunnerAs an active member of the Wikipedia Chemistry team I continue to be impressed with the dedication and commitment that the members have to improving the quality AND quantity of information available on Wikipedia for chemists. The number of lost hours of sleep freely given to the benefit of Wikipedia, and in this specific case to the chemistry community, is immense. The number of “Compound Pages” on Wikipedia dedicated to drugs/chemicals has continued to grow and, despite a sincere effort on our part to keep everything linked up from ChemSpider to Wikipedia it’s a little like chasing the Road Runner….we’re always behind!

We have been working with the WikiChem team of late to embed links from Wikipedia back to ChemSpider. I am humbled to know that our hard work to establish ChemSpider as a source of quality information has reached a level of trust such that Wikipedia now links from the ChemBoxes out to ChemSpider. The links are being updated on an on going basis at present with hundreds of new links already established and more being generated on an ongoing basis. Wikipedia User: Beetstra has written a ‘bot that is inserting ChemSpiderIDs across the database (see below) and we ARE doing rigorous checking of all of the links.This was using a file that we generated on our side showing links to Wikipedia from ChemSpider.

beetstra

We will then be able to generate a list of all ChemBoxes/DrugBoxes without links from Wikipedia to ChemSpider and we will then make the links on our side, manually curating the structures, and then hand back a file to finish all linking. At this point we will have the backfile under control and we can perform ongoing updates as new compound pages are created on ChemSpider and, if we curate and find errors on Wikipedia or ChemSpider making a few manual edits is easy.

There are very dedicated teams on Wikipedia and ChemSpider carefully poring over data with their robots and eyeballs to create a linked data set of quality chemistry. It’s long, tedious AND important work. When its done we will have an expanded set of data to semantically link from RSC articles when we do markup.

Last week I had the pleasure of being on an agenda with a number of people whose work I applaud and who I genuinely enjoy spending time with and sharing thoughts about “what if?” Martin Walker, one of the people I collaborate with on Wikipedia, invited me to speak in his session “Publishing and Promoting Chemistry in the Internet Age“. Martin gave an introduction to the session and spoke about Chemistry on the Internet. Beth Brown gave an overview of the Chemist’s Toolkit for Publishing and Promoting your work on the Internet. I followed with an overview about what’s going on with ChemSpider and the issues of connectedness and quality of chemistry on the internet. JC Bradley spoke about transparency and Open Notebook Science. My hat’s off to Martin for arranging the speakers in that order. Considering we didn’t coordinate our talks it was an excellent trajectory throughout the session and very much an integrated overview of activities regarding chemistry on the internet.

My talk is posted on SlideShare here and is available below. Any comments and questions are welcomed.

Beth Brown has her talk online here and JC Bradley will post his online here.

JC Bradley and I had a good talk about ways we can collaborate together more closely on Open Notebook Science. We have a path forward so that ChemSpider can provide additional support and will be discussing the path forward offline.

In the history of developing ChemSpider we have undertaken some fairly demanding curation activities. For example, Vancomycin and Ginkgolide B. Now we are in the middle of trying to resolve the structure of Digitonin. There are 25 (!) skeletons for digitonin on ChemSpider from various sources. There were eleven compounds on ChemSpider called Digitonin. We have been able to clean most of these by removing partial stereochemistry. We are now left with three structures…simply search Digitonin on ChemSpider and you will see three structures with full, but different stereochemistry.

What is a “correct structure” is a matter of assertion. Who says what is correct? What publications, what techniques, what database, who says its correct? Structures have timelines…they can change with time as new analytical techniques are applied.

This is a call to the community to help resolve the existing confusions around Digitonin on ChemSpider…but they are out there in all the other databases also and there are discrepencies between Wikipedia, DSSTox, ChEBI, PubChem and so on. So, my call to community…what is the correct structure of Digitonin and based on what assertions?

With this information in place, and assuming communal agreement on the conclusion, we can go help clean up the other databases. Help!

For those of you who have been using ChemSpider for the past few months you will be aware that historically we had an integration in place to SureChem’s Patent Portal. A few months ago that integration was unfortunately broken as SureChem improved their service. Also, we were un-synchronized with their growing set of chemical structures as they updated their patents. The previous integration was very limited in nature anyway as it simply showed the presence of patents associated with the ChemSpider structure in the SureChem database. Certainly a more ideal solution is the one that we introduced just in time for the ACS meeting in Washington.

The new solution lists not only the number of patents containing the chemical compound shown in the ChemSpider record but also show the first 10 patents, by title, and provides direct link-throughs to the patents on SureChem. This is a much improved integration and we hope you enjoy it.  The next stage is to deposit the latest SureChem structure collection that has grown significantly since our last deposition. Thanks to our collaborators at SureChem from offering you, our users, access to their service.

xanaxpatent

Reblog this post [with Zemanta]

It was a busy week at the ACS meeting in Washington. I gave three presentations and the title, abstracts and links to Slideshare are given below:

Oops and Downs of Resolving InChIs For the Chemistry Community (Link to Slideshare)

The InChI resolver was rolled out to the community in March 2009 with the purpose of providing a centralized resource for chemists to resolve InChIs (International Chemical Identifiers). This presentation will provide an overview of the development of the underlying technologies associated with the InChI resolver, and how the resolver is being used, integrated and enhanced to provide additional value to the chemistry community. We will discuss present limitations to application of the resolver for providing access to databases and chemistry information distributed across the internet and define our vision for enhancing interconnectivity across Open databases using the InChI resolver as the glue.

ChemSpider: Building a knowledge-based community for chemists using social and data networking technologies (Link to Slideshare)

In less than 2 years ChemSpider has become one of the primary online resources for chemists providing access to an unsurpassed aggregate of free-access knowledge and data. ChemSpider was developed with the intention of providing a structure centric community for chemists that would be enhanced by data depositions, curations and annotations by the community. The system presently hosts over 21.5 million chemical compounds from over 200 data sources. Working with a network of advisors, collaborators and data providers ChemSpider has created a unique resource of integrated information for chemists. These efforts have enabled us to support the curation of the Wikipedia chemistry pages, the production of a community supported Open Access chemistry journal and provision of web services integrated to spectrometer systems distributed around the world. This talk will provide an overview of how ChemSpider utilized social and data networking to create a community for chemistry.

Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources (Link to Slideshare)

The extraction of chemical entities from documents such as patents and publications has been pursued for a number of years. We wish to report on ChemMantis, an integrated system for chemistry-based entity extraction and document mark-up enabling access to the rich resource of online chemistry know as ChemSpider. We will discuss the development of the platform from its inception as a series of dictionaries to the integration of an entity extraction algorithm and its expansion to a public deposition and publishing platform for chemistry. Chemistry articles can now be deposited, marked-up and exposed to the public within a few minutes in many cases making it an ideal platform for communicating research and providing integrated access to data sources including PubChem, ChEBI, Wikipedia and Entrez.

Reblog this post [with Zemanta]

My colleague Will originally developed the ChemRefer service. When ChemSpider started up Will brought the ChemRefer technology and joined us to help expand the capabilities of our services. We integrated ChemRefer and released the text searching capabilities. Will indexed more and more journals and grew the index by 100s of thousands of articles. Unfortunately the downside was that the speed of the search decreased dramatically. Also, we kept hearing the comparison with the Google service and that their advantage was in their citations. So, Will has taken a few months off from indexing and has focused his efforts on developing his technologies to dramatically improve the speed of searching as well as implementing a system for recognizing citations. The system has been made available online for beta-testing just in time for the ACS meeting here in Salt Lake City BUT it is not yet integrated into ChemSpider.

I have performed some basic tests focused on searching chemical names initially. The literature search on ChemSpider has a lot more journals indexed but in order to perform the comparison I searched ONLY the RSC and Journal of Biological Chemistry articles since that is all we have indexed so far on the new system. The search results were as follows. The numbers compare number of hits for the old versus new literature search. The new search has indexed the latest RSC and JBC articles also so in theory should provide more hits.

Searching on Taxol: 626 hits found in 22 seconds (OLD) vs 717 hits in 1 seconds (NEW)

Searching on phenolphthalein: 47 hits found in 5 seconds OLD) vs 1514 hits in 1 second (NEW)

Searching on benzene: 846 hits found in 75 seconds vs 15260 hits in 4 seconds (NEW)

Clearly the searches are MUCH faster with the new system but it is also returning much more results. These are very early results and we will explain more about the system, the results and our future development shortly…

Try out the new system here for now and send us feedback at info@chemspider.com. Thanks

Reblog this post [with Zemanta]

We continue to expand the ChemSpider Database with new depositions sourced from various collaborators. We are especially privileged to have received the RSC’s structure collection associated with their Project Prospect articles and have spent a couple of weeks working with the data prior to depositing onto ChemSpider. During the deposition process we have formed the link between the chemical structures and their articles via a DOI link. We have been able to deposit the title, an associated author and the DOI. In this way we have been able to link thousands of chemical structures to articles on the RSC website. On each record associated an RSC article you will see both a link from the data source table and a link via DOI from the reference as shown here and in the figure below.

rsc_linkWith the RSC depositions came many beautiful structures – highly symmetric, complex and just plain “pretty” to a chemist. But a high level of complexity also arrived with the collection and while many InChIs could be converted to their associated connection tables the act of converting the InChIs could add additional stereochemistry and structure cleaning could change stereochemistry so this was a long, tedious and mostly manual process I’m afraid. Nevertheless, a wonderul addition to the ChemSpider database and our sincere thanks, on behalf of the community too, to the Royal Society of Chemistry for sharing their data with us. The InChIs will be deposited into the InChI Resolver shortly.

Reblog this post [with Zemanta]

There are some interesting articles showing up on ChemSpider from across the blogosphere. We have just added to our list of high priorities to generate an RSS feed of structures, short descriptions and ChemSpider IDs so that anyone can access them. When we add new descriptions we will add snippets to the RSS feed.

New Articles include:

Teen Chemist and Splenda

A Discussion about the Synthesis of Spirangien A from the TotallySynthetic Blog by Paul Docherty

A Discussion about the Synthesis of Omaezakianol from the TotallySynthetic Blog by Paul Docherty

Reblog this post [with Zemanta]

ons1We’ve been working with Jean-Claude Bradley and his Open Notebook Solubility Challenge group to assist where we can. This has included enhancing some of our services (though there is more work to be done…), populating data into ChemSpider and, now, linking us up to the Data Tables built by Andy Lang (of The Spectral Game fame…we’re quite a team).

The Open Notebook Solubility Challenge is described here. The present list of compounds for which we have created the integration to be described below is here. WHen you open that link you’ll see the first bunch…notice the little icons showing patent links, Wikipedia links and the presence of spectra on those records.

WHat we have done now is deposit the links into the Data Source tables for these compounds and providing the direct link to the ONS tables. They can be viewed WITHOUT leaving the site simply by hovering over the link…OR you can click on the link to view the data directly. An example of the link view is shown below. To find these tables simply look up the Open Notebook Solubility Challenge data source in the table.

 

ons2

Late nights and ailing computers aren’t conducive to the best of work. So, when I posted about the clean chemical structure I obtained using ChemDraw I was genuinely excited about the quality of clean-up that was produced. However I slept on it and reminded myself to check that the output InChI was equivalent to the input InchI as my experience with structure cleaning is that it can swap stereocenters.

So, I returned to that particular problem and looked specifically at the InChI string fed to ChemDraw to convert and then converted the resulting strcture to an InChI in Chemdraw. So, to clarify, this was all done inside the package:

Here’s the stereo layer of the input structure:

/t35u,36u,37u,40-,41+,42+,43-,44+,45-,46-,47+,48u,52-,53-,54-/m0/s1

and the stereo layer of the output InChI

/t35-,36-,37-,40+,41-,42-,43+,44-,45+,46+,47-,48-,52+,53+,54+/m1/s1/

This is the name of the structure generated by converting the original InChI to a structure and generating the name using nomenclature software: (4R,5E)-4-{[(1E,2S)-2-{[(E)-{2-[(1S)-1-amino-2-methylbutyl]-4,5-dihydro-1,3-thiazol-5-yl}(hydroxy)methylidene]amino}-1-hydroxy-4-methylpentylidene]amino}-5-{[(1E,2S)-1-{[(1E,3S,4E,6R,7E,9S,10E,12R,13E,15S,16E,18R,19E,21S)-18-(3-aminopropyl)-12-benzyl-15-(butan-2-yl)-6-(carboxymethyl)-2,5,8,11,14,17,20-heptahydroxy-3-(2-hydroxy-2-iminoethyl)-9-(1H-imidazol-5-ylmethyl)-1,4,7,10,13,16,19-heptaazacyclopentacosa-1,4,7,10,13,16,19-heptaen-21-yl]imino}-1-hydroxy-3-methylpentan-2-yl]imino}-5-hydroxypentanoic acid

This is the name of the structure generated by naming the structure produced by ChemDraw resulting from reversing the original InChI

(4R,5Z)-4-{[(1Z,2S)-2-{[(Z)-{(5R)-2-[(1S,2R)-1-amino-2-methylbutyl]-4,5-dihydro-1,3-thiazol-5-yl}(hydroxy)methylidene]amino}-1-hydroxy-4-methylpentylidene]amino}-5-{[(1Z,2S,3R)-1-{[(1Z,3S,4Z,6R,7Z,9S,10E,12R,13Z,15S,16Z,18R,19Z,21S)-18-(3-aminopropyl)-12-benzyl-15-[(2R)-butan-2-yl]-6-(carboxymethyl)-2,5,8,11,14,17,20-heptahydroxy-3-(2-hydroxy-2-iminoethyl)-9-(1H-imidazol-5-ylmethyl)-1,4,7,10,13,16,19-heptaazacyclopentacosa-1,4,7,10,13,16,19-heptaen-21-yl]imino}-1-hydroxy-3-methylpentan-2-yl]imino}-5-hydroxypentanoic acidCheck out and compare the names…look at the difference in stereocenters. Maybe there is someting I am not doing correctly and causing this effect. I am presently communicating with Cambridgesoft on this point to see if there is some setting I am missing that retains stereochemistry. This is exactly the issue I see with InChI reversals and CLEANING in other applications unfortunately. I will report back when I determine what the optimal settings are to stop such issues, if indeed they can be prevented.

 

I’ve been fighting with technology today. I opened my computer at 7am and the nightmares started…..40 minutes to boot, 20 minutes to open my Outlook PST file and that’s where we stay. The CPU pegged at 95% while Outlook is open. I have scanned the pst file to fix it and spent hours defrag’ing and blah, blah, blah. Looks like a reformatting job is coming…fortunately for me blogging and chemspider are all web-based so some catch ups tonight…

Some fast comments …

We’ve been adding new blog posts into some of our records…we can do this with your material if you want a larger audience and preservation moving forward. Some totallysynthetic blogs are here (1,2) and a fun posting from J on Bromination

We have agreement from NIST to use a “small slice” of the NIST Webbook data and are adding IR, MS and UV-vis data onto ChemSpider at present. See the spectra for Cholesterol here

InChIs are a powerful way to communicate chemical structures. They are going to enable internet chemistry and when we roll out the InChI Resolver shortly then the community will have access to a resource to resolve InChIKeys and ultimately navigate chemistry on the web. We commonly receive chemical structures in the form of InChIs and in order to deposit the structures we have to convert the InChIs back to chemical structures, commonly into SDF format for batch deposition. For simple organics this is not a difficult process…the tools we have at our disposal can deal with the layout of simple organics. However, for some of the chemical structures we receive optimizing 2D layout is very challenging. Many of the issues come with fullerenes (See examples below) but not only. Carbohydrates, complex cycles etc are big challenges.

clean

In building the InChI resolver we hope to provide attractive visual depictions of the associated structures. Without AuxInfo data carrying the coordinates,  or without the deposition of SDF files containing the layout coordinates we have a major challenge ahead of us. Auxinfo data are shown below for erythromycin. These data are rarely generated when people generate InChIKeys and the issue of structure layout will dominate the interpretation of complex structures.

auxinfo

Since beauty is in the eye of the beholder my judgement is that automatc layour algorithms should only assist in the appropriate layout and eyeballs will need to make the final decision. That is why it is better to deposit SDF files of InChIs with Auxinfo carrying the coordinates than it is to deposit InChIs only and leave the structure layout to an algorithm. It will fail.

I am interested in seeing what people can do with their structure cleaning algorithms on InChIs like this:

InChI=1/C66H103N17O16S/c1-9-35(6)52(69)66-72-32-48(100-66)63(97)80-43(26-34(4)5)59(93)75-42(22-23-50(85)86)58(92)83-53(36(7)10-2)64(98)76-40-20-15-16-25-71-55(89)46(29-49(68)84)78-62(96)47(30-51(87)88)79-61(95)45(28-39-31-70-33-73-39)77-60(94)44(27-38-18-13-12-14-19-38)81-65(99)54(37(8)11-3)82-57(91)41(21-17-24-67)74-56(40)90/h12-14,18-19,31,33-37,40-48,52-54H,9-11,15-17,20-30,32,67,69H2,1-8H3,(H2,68,84)(H,70,73)(H,71,89)(H,74,90)(H,75,93)(H,76,98)(H,77,94)(H,78,96)(H,79,95)(H,80,97)(H,81,99)(H,82,91)(H,83,92)(H,85,86)(H,87,88)/t35u,36u,37u,40-,41+,42+,43-,44+,45-,46-,47+,48u,52-,53-,54-/m0/s1

The images below show the iterative application of DIFFERENT structure layout algorithms. One caution…your layout algorithm should produce the SAME InChI at the end and NOT flip stereocenters. Interesting challenge. Who says cheminformatics isn’t challenging? And who thought building an InChI Resolver would be easy?

layout1layout2layout3layout4

Reblog this post [with Zemanta]