Archive for the ChemSpider Syntheses Category

We will soon be depositing data from the SORD databases (Selected Organic Reactions Database) onto ChemSpider. This will be done as two separate but related datasets until the SORD data source: Reactants and Products. If you don’t know what SORD is then who better to explain than Dick Wife, the “host” of the SORD database. Dick wrote the overview article below to provide an overview about what SORD is…ENJOY!

The Selected Organic Reactions (SOR) Database: capturing “Lost Chemistry”

Dick Wife, SORD B.V. The Netherlands (www.sord.nl; dick.wife@sord.nl)

A new database is capturing the 80% of Lost Chemistry from theses and dissertations which doesn’t make it into publications and chemists who contribute their data get access to the entire database for free.

SORD, an independent Dutch company, is carefully selecting the synthetic chemistry focused on Life Science research and making this chemistry available in their Selected Organic Reactions (SOR) Database. For the theses/dissertations which they select, SORD excerpts all of the reactions in the Experimental section are excerpted. This means there will still be a small overlap of data with full publications. There will also be a larger overlap with publications such as Notes, Letters or Communications but these do not contain the experimental details. The SOR Database brings all this chemistry to the desktop, every last detail written by the author.

Some time back, SORD looked at around 300k interesting drug-like compounds in the literature and which countries they had come from, and the native language. The English-speaking countries accounted for only 37% of the total. German/Swiss dissertations are often written in English but this is new. The theses and dissertations in the other languages represent more than half of the total. SORD routinely translates German and French experimental texts into English. They are about to start on Chinese and Japanese translations and, if anyone can give them access to Russian theses, they will translate these as well!

A thesis or dissertation is the result of several years of hard work by a research student under the constant supervision of the research leader whose reputation is at stake if the work described is wrong or inaccurate. It is also examined by a committee who decide on awarding the degree, or not. They scrutinize closely the Results & Discussion as well as the Experimental sections. The chemistry is reliable.

Advanced Chemistry Development, Inc (ACD/Labs) is partnering SORD in developing this Database. The SOR Database is available for in-house use with ChemFolder Enterprise or on the Internet with ACD/Web Librarian™. This is a screen-shot of a typical SOR Database record in Web Librarian.

 

 

 

 

 

 

 

 

 

 

 

 

 

The Reaction Scheme shows every atom (there are no abbreviations). The Experimental  text is edited to ASCII format and the key parameters (Reagent(s), Solvent(s), yield(s), MP(s) and Optical Rotation(s) are displayed in separate Fields, as are the full bibliographic data, making data-mining possible. There is also a link which enables the user to bring up the PDF of each reaction containing all of the spectral and other physical data which SORD does not excerpt. The PDF-EX link is a powerful and unique feature of the SOR Database.

Now some explanation about SORD’s excerption rules. What they call the Reaction Scheme (A + B à C, etc.) contains only the reacting and product compound structures. A Reagent is an essential reaction component of which no part ends up in the product – if it does, it becomes a Reactant! When several reactions are performed before the product is isolated (and characterized) the Reagents and Solvents are listed in Steps. Failed reactions are not excerpted but reactions with poor yields are.

The SOR Database currently contains 170k reactions; the target is one million at the end of 2013. Even this number is a lot smaller than what you find today in the major commercial reaction databases. Back in the nineties, SORD researchers looked at one such large commercial database which then contained 9 million compounds. Sifting through the content for drug-like compounds resulted in just 450k or 5% of the records[1]. Size is one database metric; quality is much more important! In the SOR Database, you will only find characterized products – and no polymers, or compounds with no molecular structure.

Users of the SOR Database also have access to the separate databases which contain the Reagents (ca. 3,000) and Solvents (ca. 450) which have been encountered so far. Often a Reagent is a catalyst (organic/organometallic) but they can also be simple entities like bases, acids, ammonium salts, etc. or complex chiral ligands. Authors give Reagents many different names and so each Reagent (and Solvent) in the SOR Database has been assigned a unique name. This enables rapid searches using the assigned names, again a novel feature of the database. Such searches can bring you to really nice chemistry.

As an Example, the second generation Grubbs olefin metathesis catalyst has been given the name Grubbs 2 catalyst. In the current SOR Database, there are more than 500 reactions where it has been used. Some of these are straightforward; some are not and generate novel ring systems like this one from the Martin group at North Carolina at Chapel Hill:

Searches in the Reactions Scheme, or using Reagent/Solvent names and hit refinement brings you to new chemistry which until now was only found on a dusty shelf in a library. The “Lost Chemistry” is now getting smaller as SORD carefully selects and excerpts the reactions which deserve a new life. The SOR Database is essential for novelty searches and it is a powerful supplement for the other commercial reaction databases.

Finally some more good news for academic research chemists; your data will be readily accessible to the whole chemical world who will cite your work in their publications. The chemistry which you never published may be just what others are looking for. Routinely SORD excerpts the complete collection of theses and dissertations from research supervisors; they will be more than happy to see your work appear in the next SOR Database!


[1] de Laet, A.; Hehenkamp, J. J.; Wife, R. L. Finding Drug Candidates in Lost/Emerging Chemistry. J. Heterocycl. Chem. 2000, 37, 669–674.

If you are an iPhone user (as I am), have an iPad hanging around to check email 20/7 (I have to sleep sometime…), or use a phone with a browser, I suggest you point it to the new ChemSpider Mobile at  http://cs.m.chemspider.com. There you’ll see a simple interface, shown below, that allows you to search across our database of almost 25 million chemical entities based on chemical name (systematic, trivial or trade, registry number etc) and retrieve a list of intrinsic properties, a list of predicted properties, a list of associated identifiers, with links to Wikipedia if available, and a Google based search for the chemical based, for now, on the associated InChIKey. Check it out, give us feedback.

We are also working on providing access to ChemSpider SyntheticPages in the same way and the first screen shot is shown at the bottom. Things are always changing and, I believe, for the better.

iphone1

iphone3

iphone4

iphone5

The functionality discussed below will be released at the ACS Spring Meeting during the week of March 21st 2010

The Royal Society of Chemistry has a whole series of databases. None of them have been structure searchable…until now. As with our PubMed integration and our Google Patents integration rolling out shortly, just because a database hasn’t had the chemical structures extracted and indexed doesn’t mean that those resources cannot be made “structure searchable”. It’s not a subtle distinction however, as discussed in the Google Patents blog post. These types of integrations depend on the correct association between chemical names and structures, access to an API allowing facile and flexible searching and, something that is purely serendipitous in nature, the absence of overlaps between chemical names and common language.

We have used the recently announced RSC Publishing beta platform and the API made available to us to enable the searching. As my colleague Graham McCann announced recently “(the) platform gives access to over 500,000 journal articles, book chapters and database records through one simple search interface. The new platform delivers faster browsing, intelligent searching and more intuitive navigation and is open for beta testing now.”

Our approach has been to search the title and the abstract for each of the databases for all of the validated identifiers. It works. It is FAST and it provides “structure-related” access to all six RSC databases. An example screen shot is below where a search on chlorobenzene retrieves data on each of the following databases: Mass Spectrometry Bulletin, Laboratory Hazards Bulletin, Methods in Organic Synthesis, Catalysts and Catalysed Reactions, Natural Product Updates and Analytical Abstracts. The screen shot below shows the analytical abstracts linked by the term chlorobenzene in the title or abstract itself. 284 hits..in a fraction of a second. The abstract is linked out to the original article via DOI, where possible.

databases

My personal favorites in the set of databases are the Natural Product Updates (NPU) and the Methods in Organic Synthesis (MOS) databases. The NPU database contains tens of thousands of natural product chemical structures, together with chemical names, references and some physical properties. Rich resources for ChemSpider. MOS includes includes reaction schemes, title and bibliographic details. Rich resources to connect to ChemSpider SyntheticPages in the future.

We have only just started to tap into the riches contained within the RSC archive. It’s like stumbling across a roomful of rubies to pick up diamonds. There is content all around us waiting for us to connect. We will connect this up to ChemSpider and make it available. Access to the databases will be shown at the ACS Meeting in San Francisco.

For those of you who have watched the historical development of ChemSpider you are likely aware of our development of the ChemMantis platform and our use of the system to deliver the Open Access journal “The ChemSpider Journal of Chemistry” (CJOC). Following the acquisition of ChemSpider by the RSC we have been extremely busy in migrating ChemSpider onto the RSC infrastructure and working on a whole series of public-facing and internal projects. Just because of time available we haven’t had time to populate CJOC with new articles. That said we have also been looking to bring more of a focus to both CJOC and ChemMantis.

The majority of interest we were getting for the platform, and the greatest benefits in terms of  the semantic markup, were shown for discussions about organic chemistry and specific to the application of organic synthesis procedures. Many of the articles that we posted to CJOC as examples were sourced from the Molbank collection, an excellent Open Access journal focused on the synthesis of chemical compounds. ChemSpider is a database of chemical compounds. When we were developing the data model for ChemSpider we always knew that a time would come where we would need to support chemical reactions. CJOC became the container for those reactions in the initial phase of our work, housing only the textual description of the synthesis and semantically linked out to chemical compounds on ChemSpider, reaction articles on Wikipedia and out links to other related information.

We have decided on a path forward for CJOC from here. That is a re-dedication of the platform to the support of synthesis procedures only. ChemMantis, or a variant of the initial platform, will be the basis of the new ChemSpider Syntheses Database (this is just an interim title for the project for now). We will host a growing collection of synthesis procedures from the community (providing a deposition platform for the community to use). We will source procedures from the RSC electronic supplementary information (ESI) provided for many of the RSC publications. We will work with collaborators, publishers and other reaction database providers to source synthesis procedures from their collections. The full details regarding this project are presently being fleshed out but the extension of ChemSpider to host chemical reactions is underway. We welcome your questions, thoughts and comments.