The RSC’s free chemical database ChemSpider has added RDF functionality to its interface, in collaboration with the University of Southampton’s School of Chemistry. The availability of RDF allows the database records to be found and understood by semantic web tools, another step in ChemSpider’s mission to create a public chemical information infrastructure.

Richard Kidd, Informatics Manager at the RSC says “we are delighted to work with top academic teams pushing forward what’s possible with semantic chemistry, and we hope others will use the RDF representation of ChemSpider to support their own developments”

ChemSpider as a Linked Data source for oreChem

The machine-processable representation was specifically developed in order to leverage the core competencies of the ChemSpider database: resolvable identifiers; high-quality, curated metadata; and rich linking to the extensive RSC corpus. Furthermore, as part of the Microsoft Research-funded oreChem project, OAI-ORE technology is being used to facilitate the discovery and re-use of the chemical information in the correct context.

Prof Jeremy Frey and Dr Simon Coles commented “it is a pleasure for Southampton to work with the RSC’s ChemSpider as a culmination of our contribution to the Microsoft-funded oreChem project. As a member of the Southampton Chemistry eResearch team, this work forms the core of graduate student Mark Borkum’s PhD thesis. ”

“Enabling open, semantic chemistry in this way is a monumental step forward for the domain,” notes Lee Dirks, director of Education & Scholarly Communication for Microsoft Research, “We’re thrilled to have played a role in facilitating the creation of this resource and extremely pleased to see Southampton and the RSC innovating and leading the field.”

Another oreChem participant, Carl Lagoze, the Associate Professor, Cornell University Information Science, Co-Director Open Archives Initiative added “it’s wonderful to see the results of our work on OAI-ORE in this exciting application. It fulfils our goal of making the results of research easier to disseminate and reuse”

ChemSpider is a free chemical structure database providing fast access to over 25 million structures, properties and associated information. By integrating and linking compounds from more than 400 data sources, ChemSpider enables researchers to discover the most comprehensive view of freely available chemical data from a single online search. For more information, please visit www.chemspider.com

The Southampton work builds on work from the RC-UK & EPSRC funded e-Science CombeChem and Platform projects (GR/67729, EP/C008863, EP/G026238, EP/F05811X) and JISC Data Management projects.

About RSC Publishing
The RSC is the largest organisation in Europe for advancing the chemical sciences. Supported by a worldwide network of members and an international publishing business, our activities span education, conferences, science policy and the promotion of chemistry to the public. www.rsc.org

About the University of Southampton
The University of Southampton is a leading UK teaching and research institution with a global reputation for leading-edge research and scholarship across a wide range of subjects in engineering, science, social sciences, health and humanities.

UoS Chemistry
The University of Southampton is one of the best places in the UK for teaching and research programmes in Chemistry. The research covers a broad range – from synthesis of novel molecules and materials to in-depth studies of chemical reactions and processes, and the modelling of chemical systems. There is a strong emphasis on interdisciplinary collaboration, and we support other institutions across the UK by providing unique services such as national X-ray crystallography service. www.soton.ac.uk/chemistry.

The oreChem Project
The oreChem project integrates Chemistry Scholarship with the Semantic Web (http://research.microsoft.com/en-us/projects/orechem/) involving groups from the Universities of Southampton, Cornell, Cambridge, Penn State and Indiana with funding from Microsoft Research (MSR)

About Microsoft Research
Founded in 1991, Microsoft Research is dedicated to conducting both basic and applied research in computer science and software engineering. Researchers focus on more than 55 areas of computing and collaborate with leading academic, government and industry researchers to advance the state of the art. Microsoft Research has expanded over the years to eight locations worldwide and a number of collaborative projects that bring together the best minds in computer science to advance a research agenda based on their unique talents and interests. Microsoft Research collaborates openly with colleges and universities worldwide to enhance the teaching and learning experience, inspire technological innovation, and broadly advance the field of computer science. More information can be found at http://www.research.microsoft.com

About Microsoft
Founded in 1975, Microsoft (Nasdaq “MSFT”) is the worldwide leader in software, services and solutions that help people and businesses realize their full potential.

Stumble it!

7 Responses to “RSC and Southampton drive the chemical semantic web”

  1. Egon Willighagen says:

    Hi all, congrats on this really interesting collaboration! I am intrigued by the ‘drive the chemical semantic web’ claim, and am most interested in learning how the RSC and Southamption ‘drive’ this community. Can you elaborate on how this effort increases the momentum of chemistry in RDF? Particular problems the community faces is non-Open data, unclear licensing, lack of downloads, etc. As such, I am wondering how you integrated 400 resources, and how you have overcome these problems. I am also most interested in learning how this collaboration will collaborate with the rest of the community, such as Bio2RDF, Chem2Bio2RDF, LODD, and others. I know press releases are not the place for details, nuances, etc, but perhaps add those as a reply to this comment? (Did I say yet I hate press releases in scientific context?)

  2. Richard says:

    Hello Egon! These are pretty much rhetorical questions, right? We’re trying to build practical solutions a step at a time – now we have something to build on. We wouldn’t have made it available if we didn’t want it to be freely used, and as a basis for further expansion and collaboration.

  3. Richard says:

    Missed out examples earlier…

    Using the RDF Permalink via ChemSpiderID (CSID)

    For example:

    http://www.chemspider.com/Chemical-Structure.7787.rdf
    Or:
    http://www.chemspider.com/rdf.ashx?q=7787

    Using a Search Term to access the Atom feed

    For example:

    http://www.chemspider.com/rdf.ashx?q=cyclohexane
    Or:
    http://rdf.chemspider.com/cyclohexane

    *Note: When searching using a term other than ChemSpiderID (CSID) then an ATOM feed will be returned as multiple results must be handled. The ATOM feed contains links to the individual RDF Permalinks as well as a human-readable XHTML summary fragment.

  4. Egon Willighagen says:

    The thing with a semantic web is that there needs to be a web. In and out. Out is easier, though ChemSpider is not linking out to other resources yet. Just using the InChI to http://rdf.openmolecules.net/ would be a start. In requires ChemSpider to make a table with two columns available for download (under CC0): column 1 containing the InChI, column 2 the ChemSpider ID. That way, other databases can point to the ChemSpider RDF. Until either or both are done, ChemSpider cannot be seen as part of the semantic web. And both are easy to do.

  5. Richard says:

    In a connected world where data changes, I don’t get the stipulation to provide a download in full. And it’s making it difficult to apply more clarity about the licensing.

    If you want the right answer, now, why is the web service not the best answer rather than a download that’s out of date as soon as it’s generated?

    InChIToCSID

    InChIKeyToCSID

  6. Egon Willighagen says:

    “In a connected world where data changes, I don’t get the stipulation to provide a download in full.”

    Can you elaborate? The need for such an index is clear to me, but I can get a group of people together showing interest in this index. Also, I am not sure what you refer to as a ‘download in full’. A full download of ChemSpider is not requested.

    “And it’s making it difficult to apply more clarity about the licensing.”

    Do you mean that people that provided ChemSpider with data claim copyright on the chemical structures (InChIs) they provided? Or, how is the licensing unclear? Isn’t ChemSpider/RSC sole owner of their CSID identifiers?

    “If you want the right answer, now, why is the web service not the best answer rather than a download that’s out of date as soon as it’s generated?”

    Because it does not make sense to query a webinterface iteratively for a set of compounds in some database. That would needlessly stress the ChemSpider web services.

  7. Richard says:

    It’s not a copyright issue. We measure the value of providing ChemSpider by it being used – while we’ve always been prepared to discuss this with individual collaborators, there are risks to us of providing a public data download as well as benefits.

    And I still have to be convinced that the semantic web should depend on static downloads of fluid database resources being shipped around. Am happy to discuss further but might be better done offline.

Leave a Reply