Since ChemSpider went live in March of this year we have received a lot of feedback and questions regarding our understanding of science, our purpose and our passions. We have an excellent Advisory Group who participate in dialogs and constructive discussions. Much of the feedback we have received has been from one individual , Peter Murray-Rust (PMR).
Before proceeding with this post I want to clarify my perceptions. I believe PMR brings a lot of value to the Chemistry Blogosphere. Over the past decade I have watched Peter’s activities with interest as he has participated with many other evangelists to pursue the cause of ODOSOS (Open Data, Open Source and Open Standards). Over the years I will confess a level of hero-worship. I had enjoyed watching what he was doing in regards to enabling the web for chemists. He is prolific..I don’t know where he finds the time to write so much. He travels the world and informs us all of what is going on “out there”. He does a great service. In contrast to these positive traits which I honor I am of the opinion that Peter is overly harsh and judgmental in some cases. Often he posts without necessary research and his perceptions become the “truth”. This is dangerous when he has such a public profile and such influence. For evidence of influence visit the graph here and notice the incredible spike in traffic resulting from his post about the Monkeys at ChemZoo in April of this year. It is unlikely those visitors ever returned to our site or blog to hear our comments. Potential damage was done.This blog post is in regards to his most recent judgments of ChemSpider.
When ChemSpider was set up for the benefit of the chemistry community I had assumed that this humble effort by a small group of dedicated individuals would be welcomed by PMR and other Open Access advocates. In general I believe that’s true. Our actions, policies and status have drawn a significant amount of feedback from PMR on his blog. New feedback was posted late last week and I’ll get to that shortly. As a review, in keeping with the trend being set by Rich Apodaca (1,2,3), I am listing what’s happened to date.
“Constructive Feedback” for Newbies
The Challenge to ChemSpider Chemistry
When Sodium chloride dimers are bad science..but are on NIST Webbook and PubChem
Calcium Carbonate is not soluble and can’t have a logP PLUS Lipinski says Calcium Carbonate CAN have a logP
Prussian Blue on ChemSpider is Terrible…but still as good as Pubchem and Emolecules.
Is Stereochemistry on Taxol important? Should the public data be curated?
ChemSpider VERSUS PubChem or ChemSpider SUPPORTS PubChem
ChemSpider ripped off PubChem…damn them.
ChemSpider and Their Openness and non-Web 2.0
ChemSpider don’t understand what Web 2.0 is.
ChemSpider contribute to the community…and support PubChem
Spectral Data are Declared Open Data
Helping out the community with Web Services
There are a lot more…and so to the latest. I’ll identify the recent post comments in italics.
PMR> Recently the Chemspider company has announced an â€œOpen Chemistry Webâ€ which in my opinion misuses the word â€œOpenâ€.
Open Chemistry Web is the name of a new blog set up and hosted by Will Griffiths. It’s not ODOSOS. It’s a NAME of a blog. If we are in an environment where the name of a blog cannot include the word “Open” then we are living in sad times. Will’s passion is in text-mining OPEN ACCESS Chemistry Articles..or others if people will allow it. Can he not name his own blog? Hmmm….
PMR> Chemspider.com and its associates are commercial organization which have aggregated a large number of chemical connection tables and have started by calculating their properties and extracting literature references which they make freely accessible but not Open. The freedom is for an unspecified timescale and you cannot download significant amounts of the data and you cannot re-use it without permission. ”
Yes we are “commercial”. I dealt with this same comment previously. If you have interest in this please browse it. A later post outlines the present status of the project and whether or not it will survive.
Yes, we have aggregated a large number of connection tables and have started by calculating their properties and extracting literature references which they make freely accessible.We have done a lot more. We have made multiple services available to the community (1,2,3,4) but, with no surprise, have received no acknowledgment.
Regarding “not open“. We are giving away the ChemSpider database to those who ask for it. It will be published in PubChem. We USE Open Source components (1,2,3,4). We have not generated any Open Source components yet and our source code is not Open. We index Open Access articles on ChemRefer. We work with the Open Source data community to help.
Regarding “you cannot download significant amounts of the data and you cannot re-use it without permission“. We are giving away the ChemSpider database to those who ask for it. We do NOT have a server farm to support downloads. The FAQ page says
“May I download the data and use it in my own database(s)?
You have limited rights in this regard. You can only assemble a database of 5000 structures or less, and their associated properties, from our database without our permission. You can download up to 1000 structures per day from the website. Please contact us at feedbackATchemspiderDOTcom to request an extension outside this constraint. We are willing to provide the ENTIRE database of ChemSpider structures at your request – the file will consist of InChI Strings, InChIKeys and ChemSpider IDs. These constraints are under regular review so please feel free to engage us in conversation.”
PMR>”Initially I was concerned about the complete lack of quality in these calculations and said so – I believe there has been some improvement in quality but I do not check and do not intend to do so. I do not follow Chemspider regularly but they appear to have added the ability for anyone to add annotations and curation. I have serious concerns about the lack of thought given to metadata and I do not expect Chemspider to be able to scale or to compete against modern approaches.”
I acknowledge the judgments and opinions. A question…in terms of online data sources for chemistry I believe that approximately 20 million structures ranks in the top 3. We have about 1500 chemists per day using the site with thousands of transactions including text and structre/substructure searching. Please compare with other services in this domain and, if you do this, provide quantitative information. We welcome any feedback on metadata. We are presently working on RDF’ing ChemSpider thanks to the guidance and support of Egon Willighagen. I have dealt with the metadata discussion previously here and abstracted below.
“Other comments include â€œI see very little difference between Chemfinder and Chemspider. They are both closed, proprietary, do not expose data, or metadata, or algorithms; have closed code, do not allow downloads or re-use. They lose metadata in their aggregation process. I have nothing personal against Chemspider (or, if they are associated, ACDLabs) – I just think the Web 1.0 model is out of date for chemistry.â€
To respondâ€¦yes, the code is proprietary and closed..we donâ€™t know of any Open Source code that would quickly search >10 million structures by structure and substructure (that will be covered in a separate blog as I have the utmost respect for the commercial entities that do this well! Itâ€™s DIFFICULT.) Ohâ€¦but Open Source isnâ€™t part of the Web 2.0 definition. We donâ€™t expose algorithmsâ€¦correctâ€¦many are provided by collaborators and we do not have the right to expose their code. But that isnâ€™t part of Web 2.0 either.
And nextâ€¦the beloved â€œmetadataâ€ term. What exactly IS metadata? Letâ€™s refer again to our web-friendly Wikipedia regarding metadata. In brief itâ€™s â€œdata about dataâ€ and a perfect example is an XML schema vs XML. An XML schema is metadata. According to my interpretation this means InChI and SMILES are not metadata since these data can be interchanged with the structure itself. I may be wrong. The hypothetical entity describing what data can be bound to a structure would be metadata not necessarily data related somehow to the structure, but rather more general data describing the datamodel – for example the source of the data â€“ this IS metadata. ChemSpider doesnâ€™t lose the metadataâ€¦we retain the only metadata currently available, the data source, and use it as our link out to the provider. Our primary role again, for now, is to connect silos of information via chemical structures.”
PMR> Chemspider also encourages Uploading Spectra Onto ChemSpider. These spectra by default all belong to Chemspider. They are not Open. If you can convince the world at large to donate IPR to you for free, you deserve some form of congratulations for sheer bravado. Note that even if you upload data and metadata you are not allowed to download it (there is a limit of 100 structures).
Thanks, again, for the judgments. We have been testing out the system with two of our advisory group and myself. Only JC Bradley’s Lab and Bob Lancashire have deposited and with the understanding, I believe, that the data would be “Open”. Since PMR’s blog posts continue to do damage to our reputation we have no choice but to respond. We do this with coding. Within 24 hours of his comments Open Data was declared, spectra can be downloaded. The intention was always there to do this…just we have higher priorities.
PMR>”We have ca. 250,000 calculations on molecules and 130,000 crystal structures which Chemspider have suggested we upload to them. Iâ€™m not yet sure why we should do this.”
Well, if they are Open Data, as marked at the CrystalEye website, and seeing as though people would like to access the data via ChemSpider, we should just be able to download. But, we don’t want all the data..we just want the structures and the appropriate URL structure to link back to CrystalEye. This is what we do with all data sources including NMRShiftDB.
PMR>”Chemrefer appears to allow searching of Open chemistry articles by keyword. Unexceptional, but why shouldnâ€™t we simply use Pubchem? AFAIK it will index all these journals.”
PubChem indexes these journals? No, I think it’s PubMed. We’ll check on whether everything ChemRefer indexes is in PubMed. However, what they don’t do, yet, or ever, is connect the chemical names in those journals to chemical structures. That’s what’s been done for patents.
“PMR> The IPR model of Chemspider seems clear. No data, metadata and author contributions are Open. ”
“PMR>That allows them, at some stage in the future to close some or all of the site and to charge for data and services”
The site, as it exists today, is intended to stay free for all. We may, OPENLY acknowledged, open services that are for charge.
“PMR> and – like eMolecules and their tie-up with Wiley (Wiley and eMolecules: unacceptable; an explanation would be welcome) – I predict this will happen within 5 years (unless Chemspider fails to survive in its current form).
I have posted on what I believe is an inappropriate judgment by Peter that the data on Chemgate is extracted from the journals. I put a trackback to Peter’s original post. He never responded. He did comment separately though about busyness and commenting. Unfortunately Wiley and Chemgate now show up again…with no effort to clean up the previous comments and, unfortunately, more incorrect information about ChemSpider.
“PMR> So all the authors who are contributing metadata are, in effect, donating IP to Chemspider. I have no moral objection to this – it just seems retrograde when we have Open collections of molecules such as PubChem and our own crystalEye.”
ChemSpider data will all go onto PubChem shortly. This was announced at the recent PubChem meeting. I have asked PMR to point me to where I can download the CrystalEye collection if it is indeed Open Data.
“PMR>But a number of my friends in the Open Chemistry area are on the Chemspider advisory board, so I must be missing something. Perhaps they can show how donating IP to a commercial closed company advances the cause of Open Chemistry.”
I hope they discuss with you. This group is a powerful team of intellect, capabilities, insight and support. I value the opportunity to work with them.
“PMR> And I applaud Chemspidermanâ€™s efforts to clean up chemistry. Sometimes this gets muddled with the association with a commercial organisation based on possessing chemical IP so sometimes my messages have been less than generous and I apologized.”
Yes, you did. And I accept it willingly. It was very gentleman like.
“PMR> I am not anti-capitalist – I do not attack companies per se. But I do attack people who use the word â€œOpenâ€ incorrectly and to promote themselves. I have done this when publishers come up with â€œOpen Accessâ€ offerings which appear to be less than satisfactory ( see â€œopen access productsâ€ at Nature obscures the debate, Why Open Access metrics are necessary) and for which the community has to pay. â€œOpenâ€ is now used by commercial organisations in the same way as â€œhealthyâ€ – please feel good about us and our activities as we use the word â€œOpenâ€. We know itâ€™s meaningless, but it makes us look good. Well, it isnâ€™t meaningless. A number of people are trying carefully to describe what is meant by Open access, Open Data, Open source and Open Services. And when others use it to mean something less, I take exception. If nothing else it makes our job much harder.”
I will comment on this in a couple of later posts. I do not support the “marketing” use of Open and do not believe we are doing so. However, I want to comment more on this, but at a later date. Marketing statements bug me too. You’d think that “…the worldâ€™s most comprehensive openly accessible search engine for chemical structures” would be PubChem. But it’s not according to this marketing statement …who is it?
There have been comments about PubChem being the model of Openness. I think the effort is great. FULLY support it. But let’s wake up. If funding ceases then PubChem could go away. The data is Open. The software is NOT. PubChem is built around some home-built services and on top of commercial modules such as CACTVS and OpenEye. I discussed it here and it has not been challenged. Am I wrong?
“PMR>: There is nothing Open about this. Even the blog is not Open (it does not carry a CC licence). The services may be free, and they may be useful, but they are not Open. The text that they index may indeed be Open Access in its own right (and probably is because otherwise the publishers will sue them) but this is no especial credit to Chemrefer. We also index Open resources but we make our results Open.Chemrefer could disappear tomorrow. Only if the data, and the source code are made Openly available under licence can they be called Open.”
There is a CC license on the page. Peter acknowledged this. Who said the services were Open? if we did, point me to it and we will rectify. I have asked Peter separately whether all articles linked to CrystalEye are Open Access or some with permission from the publishers. This is very interesting.
This has been a long post. I understand I have likely added fuel to the fire. I have done it in a public way. I judge that ChemSpider is being harmed by the ongoing misinformation. I wish it to stop. What I want is advice and support to make this a better service for our users. However, I refuse to make it my personal mission to satiate PMR’s requests and objectives. ChemSpider is developed for its users and the community in general NOT for it’s non-users. PMR is not a user. Not everything has to be Open for it to be of high-value. I believe we deliver value.