Some of you might be keeping your eye on our partner blog hosted by Will Griffiths, the Open Chemistry Web. If you are then you will be aware that we have reached an agreement with the Royal Society of Chemistry as described here. The ChemSpider is all about building community. The Chemistry community is not just chemists – it is publishers, policy-makers, vendors, academics, and corporations etc. Our intention is to co-exist within the community and so we navigate the challenges as best we can, always knowing that we might get the odd slap on the hand. My judgment is that if this happens (and it has) conversation and a modicum of emotional intelligence can keep us in relationship with the community and get us to mutual agreement. I can comment we’ve done this already in a couple of situations.

With this as a lead-in we are presently working through a potential three-way relationship issue. I’ve posted previously about whether people would be interested in seeing us connect to CrystalEye. The response both on and off blog suggests we should do it so I initiated a conversation with PMR and have copied the comments below. The list of journals presently indexed is given here. You’ll quickly see the issue regarding three-way relationships.

  1. ChemSpiderMan Says:
    October 26th, 2007 at 12:01 am

Peter, I asked previously about how to obtain an SDF file of the structures on CrystalEye so that we could link to CrystalEye records via ChemSPider. This was based on my question to the community at

http://www.chemspider.com/blog/?p=191

Your comment was that the data was Open but that an SDF was not available and we should scrape the data. I was looking at this possibility today. I was pleasantly surprised to see a number of the journals listed included ACS journals and Elsevier journals (http://wwmm.ch.cam.ac.uk/crystaleye/summary/index.html). There has been a lot of traffic of late about their Open Access policies but now I see that they are supporting your Open Data efforts. This is excellent. I would like confirmation that they are aware of the Open Data posted from their journals before we scrape them. Are they aware? I want to make sure I am respecting all parties. Thanks

  1. pm286 Says:
    October 26th, 2007 at 7:54 am

(1) All data come from Free sources – i.e. visible without a subscription. Some journals (Acta Crystallographica and RSC for example) do not copyright the data. Others like ACS add copyright notices. It is our contention, and Elsevier has agreed for its own material, that facts are not copyrightable. We have therefore extracted and transformed facts and mounted these. Where the original material (CIF) does not carry copyright we mount it on our pages – where it does we do not, but we have the transformed data. In those cases it would be possible to recreate the original CIF data in semantic form ,but not the exact typographical layout which contains meaningless whitespace.

I am not aware that ACS or Elsevier have ever made statements of any kind about our Open Data efforts.

You may scrape anything, must you must honour the source and the metadata and you should add the Open Data sticker. If you scrape the link (simplest) you may simpy point to our site. If you scrape more data you should ensure that the integrity of the data is maintined and that if it is re-used the re-used data should still clearly show our metadata

Our intention is to scrape the InChIs, the title of the article, the journal name, volume and page details and the DOI number. We will de-duplicate the structures onto the database or create new structure records as appropriate. My concern is whether or not the ACS will allow us to scrape their Open Data so I have issued the direct question to them below. I am hoping for an affirmative response and then I will move on to confirm with the other publishers.

Colleagues,
I am the host of ChemSpider, an online resource for chemists. www.chemspider.com. For an overview of what we are doing please visit: http://www.chemspider.com/docs/ChemSpider_Overview_SLides_August_2007.pdf

I am presently considering utilizing the data from the CrystalEye online database as I have outlined here: http://www.chemspider.com/blog/?p=191

The CrystalEye database is run from the University of Cambridge by Professor Murray-Rust. I have looked at the sources of data populated on the database and see that there are a number of ACS journals represented there, including JACS. Please see http://wwmm.ch.cam.ac.uk/crystaleye/summary/index.html

I am seeking confirmation that if we scrape the data from the CrystalEye database and populate onto ChemSpider that we will not be breaking any copyrights. I have asked the question here to Peter: http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=737#comment-62799 and he has answered. I am now seeking your confirmation that it is appropriate for me to access the data since this is marked as Open Data at Peter’s site. I welcome your comments. Thank you

The list of all Publishers is given below. If we can deposit the Open Data structures from CrystalEye into ChemSpider and link up to the articles using DOI lookup through Crossref then we will be continuing our project of making articles structure searchable. Exciting times.

 

Stumble it!

One Response to “Intention to Scrape CrystalEye Content and Staying in Relationship with Publishers”

  1. Antony Williams says:

    Today I was talking to one of the managers at the copyright office of ACS. The conversations continue and there is a group brought together to discuss this built up of people from CAS, Ohio and ACS, Washington. I can’t believe there will be any outcome other than allowing the scraping of the data into CrystalEye because if they didn’t then the the public relations nightmare fallout would be amazing.

Leave a Reply