Archive for the ChemMantis Category

It was a busy week at the ACS meeting in Washington. I gave three presentations and the title, abstracts and links to Slideshare are given below:

Oops and Downs of Resolving InChIs For the Chemistry Community (Link to Slideshare)

The InChI resolver was rolled out to the community in March 2009 with the purpose of providing a centralized resource for chemists to resolve InChIs (International Chemical Identifiers). This presentation will provide an overview of the development of the underlying technologies associated with the InChI resolver, and how the resolver is being used, integrated and enhanced to provide additional value to the chemistry community. We will discuss present limitations to application of the resolver for providing access to databases and chemistry information distributed across the internet and define our vision for enhancing interconnectivity across Open databases using the InChI resolver as the glue.

ChemSpider: Building a knowledge-based community for chemists using social and data networking technologies (Link to Slideshare)

In less than 2 years ChemSpider has become one of the primary online resources for chemists providing access to an unsurpassed aggregate of free-access knowledge and data. ChemSpider was developed with the intention of providing a structure centric community for chemists that would be enhanced by data depositions, curations and annotations by the community. The system presently hosts over 21.5 million chemical compounds from over 200 data sources. Working with a network of advisors, collaborators and data providers ChemSpider has created a unique resource of integrated information for chemists. These efforts have enabled us to support the curation of the Wikipedia chemistry pages, the production of a community supported Open Access chemistry journal and provision of web services integrated to spectrometer systems distributed around the world. This talk will provide an overview of how ChemSpider utilized social and data networking to create a community for chemistry.

Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources (Link to Slideshare)

The extraction of chemical entities from documents such as patents and publications has been pursued for a number of years. We wish to report on ChemMantis, an integrated system for chemistry-based entity extraction and document mark-up enabling access to the rich resource of online chemistry know as ChemSpider. We will discuss the development of the platform from its inception as a series of dictionaries to the integration of an entity extraction algorithm and its expansion to a public deposition and publishing platform for chemistry. Chemistry articles can now be deposited, marked-up and exposed to the public within a few minutes in many cases making it an ideal platform for communicating research and providing integrated access to data sources including PubChem, ChEBI, Wikipedia and Entrez.

Reblog this post [with Zemanta]

nature-chemistryNature have released their Nature Chemistry journal and in their press release they commented on some of the resources they are linking out to.

“…Nature Chemistry articles are annotated to identify all of the chemical compounds mentioned throughout the text. Users can choose to view the article with all of the compounds highlighted, and find out more about those compounds by linking out to other information resources including PubChem and ChemSpider.”

It’s great to be seen as a database of value to be linked to! Neil Withers highlighted it in his blogpost too.

Egon has already given a good overview of the markup and semantic nature of the articles so I won’t repeat that. Egon did comment in his post: “Like many other chemistry journals, Nature Chemistry does not consider properties of the molecule interesting, and NMR spectra are hidden in the Supplementary Information. This paper in particular, disregards a lot of machine readable facts by putting all experimental section bits in a PDF document. So, the next challenge for Nature Chemistry will be to get the authors of papers contribute the original spectra (JCAMP-DX, CMLSpect, etc) in the supplementary information section. Better, have the raw data or even the NMR peak-atom annotations deposited in public repositories such ”

We have done this on the ChemSPider Journal of Chemistry already for this example.

Reblog this post [with Zemanta]

The Journal of Cheminformatics is now officially online, just in time for the ACS meeting,  with the first three articles. One of them is an article about Computer Assisted Structure Elucidation written in collaboration with my friends (and ex-colleagues) at ACD/Labs.

In the Editorial Comment David Wild writes about “Grand challenges for cheminformatics” and comments:

“We have seen huge leaps forward in the provision of freely accessible chemical databases such as PubChem [3] and ChemSpider [4]. A wealth
of information is buried in these databases as well as many other related sources.”

To be listed with alongside PubChem is very humbling. PubChem is an incredible contribution to the community and we wish to deliver similar, though additional contributions.

One of the papers is a commentary

Chemistry publication – making the revolution
Steven M Bachrach
Journal of Cheminformatics 2009, 1:2

Steven touches my heart with his two references to our works

ChemSpider “Publishers should welcome the additional supporting materials and make then available under the Open Data principle. Furthermore, publishers should encourage the mining of these supporting materials, like the CrystalEye [42] and ChemSpider [43] projects.”

ChemMantis “There are two fledgling experiments that attempt to put into place some of these enhanced-publication/datument ideas: Project Prospect [51] from the Royal Chemical Society and ChemMantis from the ChemSpider [43] group.”

And he states… “We need to encourage these projects and the development of more tools. We need to encourage our colleagues to adopt a new model for publication. We can revolutionize how we perform our science. This is the real hope of the internet for scientists.”

Amen Steve…Amen.

We continue to expand the ChemSpider Database with new depositions sourced from various collaborators. We are especially privileged to have received the RSC’s structure collection associated with their Project Prospect articles and have spent a couple of weeks working with the data prior to depositing onto ChemSpider. During the deposition process we have formed the link between the chemical structures and their articles via a DOI link. We have been able to deposit the title, an associated author and the DOI. In this way we have been able to link thousands of chemical structures to articles on the RSC website. On each record associated an RSC article you will see both a link from the data source table and a link via DOI from the reference as shown here and in the figure below.

rsc_linkWith the RSC depositions came many beautiful structures – highly symmetric, complex and just plain “pretty” to a chemist. But a high level of complexity also arrived with the collection and while many InChIs could be converted to their associated connection tables the act of converting the InChIs could add additional stereochemistry and structure cleaning could change stereochemistry so this was a long, tedious and mostly manual process I’m afraid. Nevertheless, a wonderul addition to the ChemSpider database and our sincere thanks, on behalf of the community too, to the Royal Society of Chemistry for sharing their data with us. The InChIs will be deposited into the InChI Resolver shortly.

Reblog this post [with Zemanta]

print_chemmantisThere is a new simple visual aid to help print articles directly from the ChemSpider Journal of Chemistry. I had blogged previously about how to do this using the browser based menu but now you can simply click the little Print Button on the floating applet and get the same results…just more obvious.

Reblog this post [with Zemanta]

I gave my talk yesterday at CShals 2009, the conference on Semantics in Healthcare and Life Sciences.It was a great meeting for me (hindered by dismal access to wireless internet as a result of Marriott’s want to make more money from the conference organizers. They should be ashamed of themselves in this day and age!) as it was not about Chemistry, not about spectroscopy, not even about Open Data, Open Access and Open Source. It was about Semantics. I learned a lot and got to hear Tim Berners-Lee talk about where the semantic web is and where it can go and how can be disruptive in a good way while NOT being too disruptive to layer onto what already exists. The best part of the meetingfor me was the clear passion for the InChI, as well as a lot of acknowledgement that it is not perfect, cannot presently compete with molfiles, commercial systems, CAS Numbers and so on. But, people are optimistic and are waiting and supportive. Overnight I inserted a lot more information about InChIs and how they can be useful, where some of the limitations are presently, how the StdInChI has now added a new level of complexity on one hand and simplifcation on the other. There have already been a number of requests for a copy of the talk so it is up on Slideshare for now (and linked below). I’ll do a voice over in the next few days and upload to Scivee. I unveiled the first version of the InChI Resolver at conference and showed it to a couple of people. The general consensus is we are heading in the right direction. The timing on this conference was good because the intention is to layer on RDF before we release at the ACS, time allowing.

Reblog this post [with Zemanta]

We have continued to extend the capabilities of document markup on ChemMantis. For the floating Article Markup widget, where it is possible to switch various entities on and off in the display, there is a new tab entitled “entities”. In this tab we gather together all extracted entities under the specific entity family and display them as a list for fast review.

Where could we go from here? Some potential ideas….

1) provide the ability to switch on/off each individual entity in the paper.

2) select a name and highlight it across the paper

3) export the list(s) to text files

4) Link each of the names out to ChemSpider/Wikipedi/others

What else…we welcome your suggestions.

The ChemSpider Journal of Chemistry is formally released this evening with eight articles. These articles are all marked up using ChemZoo’s ChemMantis technology. The articles are sourced, with permission, from two Open Access journals – MDPI’s Molbank and Chemistry Central. We also have original articles sourced from a number of contributors.

The eight articles are provided at www.chemmantis.com or at the alternative redirect from www.chemspider.com/journal.

We welcome your feedback. The authors of the articles welcome your feedback. At the bottom of each article is a “post new comment” button. Please do so provide us with your thoughts.

This is a short post. It’s been a long few days getting the ChemSpider Journal of Chemistry online. There have been family emergencies, technical challenges and the general distractions of ChemSpider’s growing attention. But, the journal has been rolled out with its first collection of marked up articles.

Markup has been performed using our ChemMantis markup platform. ChemMantis stands for Chemical Markup And Nomenclature Integrated System. Markup is fully automated and then manual curation of the markup is performed. In general markup is taking less than 60 seconds and curation of course depends on the complexity of the content.

Over the next few days I will provide more detail about the platform and we will rollout a stack of articles. For now there is a teaser article here. This article is taken from Chemistry Central, with their permission, and my thanks to Jan Kuras for his support. This article is of the “Christmas tree Light” type of markup where we have chemical names, species, elements, chemical reactions, chemical groups, hardware vendors and software vendors all highlighted!!! Whoa…TOO colorful you say???? Simply switch off the “offending” entity classes using the checkboxes as shown in the image below (Click on the Thumbnail to see the full effect please)!

We are continuing to welcome submissions for the ChemSpider Journal of Chemistry. Please contact us at infoATchemspiderDOTcom if you want to submit an article to the journal.

Reblog this post [with Zemanta]

We are working hard to prepare the ChemSpider Journal of Chemistry for prime time and as a result the ChemMantis service will be disrupted from time to time and will go offline. We are choosing to do this over the holday season while the majority of you are enjoying the festivities. We are hoping that our work on ChemMantis doesn’t disturb those of you who have been reviewing our marked up articles. Please bear with us through the holiday season as we upgrade the system to support the journal.

This is a short announcement to inform users that anyone looking at Chemmantis articles at present will see issues with structures linked to chemical names. We are reworking structure image generation at present and are debugging some of the structure display issues. We are doing this live on the production system for a number of reasons including gathering feedback from certain collaborators.

We have also been expanding our dictionaries for mark-up on ChemMantis. The present list is shown below – we have been working on hardware vendors, software vendors and chemical vendors recently and have introduced the promised seperation for genes proteins and enzymes from the species list.

I have been giving a lot of presentations of late regarding ChemSpider, ChemMantis, chemistry document markup and the challenges to publishers. These have been both closed door presentations where people are seeking input regarding the business challenges for chemistry publishers as well as in more open forums. One of the more common questions that is coming up now is around ChemSpider and ChemMantis. How are they related and how are they different? I’d like to declare that here…

ChemSpider is a website providing access to a database of structure-based content. It is also a “linkbase” providing a way to navigate from structure-based records out to a multitude of resources with information about the chemical entities on ChemSpider. It is also a platform for the deposition of new content, the annotation and curation of existing content and access to a series of services for the prediction of properties and integration to other resources. The value of ChemSpider is, in many ways, dramatically reduced without the content.

ChemMantis is a platform for document markup, specifically focused on identifying chemistry related terms in various documents. At present we have algorithms and dictionaries for extraction of chemical names (trivial, trade and systematic), chemical groups, reactions and chemical families. We are also working on dictionaries for something we are loosely terming “species” – at present this includes bacteria, fungi, etc. These will be segregated appropriately in the very near future.

Following the extraction of these various entities we are connecting them out to allow searching of resources such as Wikipedia, ChemSpider, NCBI’s Entrez and Google. ChemMantis does NOT depend on ChemSpider but can make use of what is available on ChemSpider to the benefit of the user. ChemMantis will be a “product” in the future. It is something that can be installed inside an organization and used for document markup and indexing of chemistry related documents. It will also serve as the basis of our ChemSpider Journal. More detail to follow on that….

Reblog this post [with Zemanta]

We are adding additional dictionaries to ChemMantis to support linking to external information. Wikipedia is a rich source of information for chemists and we have chosen to connect out to Wikipedia for details about named reactions. Now, when a person marks up a document and highlights a particular named reaction then the link to Wikipedia is used to populate the information balloon on the article. An example is shown on this article on ChemMantis and the balloon is shown below for the Knoevenagel condensation.

As we work on ChemMantis it is clear that we want to expand the integration out to external sources of information as much as possible rather than limit the connectivities to the ChemSpider platform. We have started to build the necessary dictionaries to support bacteria, fungi, viruses etc so it makes sense to connect these up to external resources. As a proof of concept we are using Wikipedia sources to directly feed the “Species Balloons” and have enabled searching of Wikipedia, Google and Entrex directly from the balloon. As an example of the integration we see below the species balloon filled with the leed of the article from Wikipedia for Zymomonas mobilis(click on the thumbnail)

 

From the balloon it is possible to search across Entrez, Google and directly into Wikipedia for more information. For this particular bacterium Entrez gives a list of results as shown below (click on the thumbnail). We are using a similar approach with elements now. Rather than show a “bare element” in a structure balloon (who needs to see Li for Lithium?) we will display the leed text from Wikipedia for that element. The near future will likely see us link to Uniprot and PDB for proteins and out to similar rich sources for other species.

All reports about ChemMantis that I have reported to date have emphasized that ChemMantis only works in Internet Explorer. However, thanks to a comment from Soaring Bear, a member of our Advisory Group, I’m now looking at documents marked up with ChemMantis using the “IE Tab Add-on“. Details can be found here.This is an interim solution until we have direct support in Firefox.

We are progressing quite well with our development of Chemmantis, the document markup system for Chemistry-related documents. One of the problems with marking up various types of chemical entity is how “colorful” they can become as you markup a document. Some examples of markup we are considering are shown in the image. Not all of the supporting dictionaries are in place yet but are under development.

If you consider a standard chemistry document with the number of chemicals, reactions, elements and groups that can show up in just a paragraph you can end up with the “Christmas Tree effect”. An example of the effect is shown below. There are actually only three colors/effects shown – mark up of chemical names, mark up of elements and mark up of chemical names with no associated structure…in this case Bis-pi-allylnickel.

 

So, all elements are marked separately and, using the check box capability in the floating window, can be switched on and off. believe me, you need this. The words carbon, oxygen, nitrogen and hydrogen, for example, show clearly will up a lot in chemistry articles (hydrogen bond, carbon-carbon bond etc). So, there are many cases where you would just want to switch the element view off. One check box and its done. the same is true for chemical names, species and, shortly, chemical groups and reaction types. We are also working on splitting out bacteria, fungi, viruses etc.

If you want to see where we are at present I encourage you to look at this IUPAC Pure and Applied Chemistry article entitled : Carbon-Carbon Bond Forming Reactions Using Alkyl Fluorides. The markup does not yet work in any browser other than Internet Explorer so try it there. Hover over the chemical names and you should see the chemical balloon show up as shown below. In the next few days we hope to roll out the connections out to related data from this structure balloon. Watch this space.

I’ve posted a new presentation regarding ChemSpider/ChemMantis on Slideshare. The first part is the usual ChemSpider intro stuff. For ChemMantis start at slide 39…

Note that we’ve now started expanding the handling of “Species” handling by adding specific dictionaries. We’ll be adding support for fungi, bacteria, viruses etc. See slide 75 for a screenshot.

I previously posted a YouTube video of ChemMantis, our chemistry document markup system in action.While it is indicative of how the system works the detail is lost in the resolution of the video and there have been a number of requests for a higher resolution version. I’ve created a copy of the movie in Quicktime format and it can be downloaded from Mediafire here.

All feedback welcomed…

For those of you performing curation activities on ChemSpider you will likely have noticed the ability to mark a new type of identifier, a shorthand formula. We have enabled this because it has become clear that this could be a useful part of document markup as part of our ChemMantis system. For example, looking at an article let’s consider the excerpt shown below.

Regarding the excerpt you can see a number of highlighted terms, all being shorthand formulae and not depending on name to structure conversion algorithms but rather depending on a lookup dictionary. Each of these names are linked to ChemSpider for direct look up of information associated with the chemicals. The list of shorthand formulae extracted from a couple of hundred articles is actually only a couple of hundred formulae at present. It includes the most obvious compounds that we can all interpret: CH3OH, MeOH, CH3CN, MeCN, CH3COOH, NaCl, NaF, NaCN, KBr, KCl and so on. All of these are immediately interpretable by chemists. There are likely a few more to be found over the coming months but in the past week of reviewing articles from various sources we have actually only added a couple of new formulae. We have also seen value in linking up ions and elements as appropriate. We are likely to add filters for display/not display of elements and ions since we’re of the opinion that displaying every incidence of an element in an article is of luttle value…just imagine how many times you might see the word carbon or hydrogen in an article… carbon-carbon bonds, hydrogen bonding etc. So, we’re switching them off by default. We’ll keep reporting on how we are improving ChemMantis…based on the review of a stack of articles the system has improved dramatically. We are asking for your articles now…combining shorthand formulae and chemical name markup will highlight a document as shown below.

ChemMantis is now in alpha release and under tests. ChemMantis is our Chemistry Markup And Nomenclature Transformation Integrated System. The movie below can likely tell a better story than I can write. So, let’s start with this movie…and more will follow. The premise is upload a document, find chemical names, convert names/identifiers to chemical structures and find related information. In this case we are demonstrating how structures are linked to information on ChemSpider and from there out to other information on the web. There are more such displays to come….