Last a week I had a pleasant chat with a reporter from Nature magazine, a Mr Geoff Brumfiel. Geoff was interested in ChemSpider…what it was, how it ran, who used it, who supported it, who liked it, who curated it, who didn’t like it and so on.
The results of that discussion, and others he spoke to about ChemSpider, are here in his article.
Chemists spin a web of data p139
Chemspider website provides free information on millions of molecules.
Full Text | PDF
It is a rule at Nature, at least for this type of article, that I could not see the article before it went to press and therefore I didn’t get the chance to proofread and comment. Geoff has accurately captured the spirit of our discussions but a few detailed clarifications are needed too. I have pasted in black the article content and in italics the clarification.
“providing the community with an open-access source of chemical information”
I giggled and commented please don’t say it’s Open Access. Say it’s Free Access. Say there are Open Data. And now we have Creative Commons licenses. But don’t say it’s Open Access, not Strong, not weak, not gold, not green. Just Free Access. No price barriers to usage.
“Chemist Antony Williams is hoping to change this in a move likely to ruffle the feathers of the American Chemical Society.”
I commented that we are not purposely in competition with anyone. It’s not what drives us to do this. Whether others see us to be competitive is for them not us. We don’t intentionally try to ruffle feathers. It doesn’t mean that what we are doing won’t ruffle feathers of course. Whether it’s ACS or others. It’s not the goal..it might be an outcome.
“The modest project has made chemists interested in open access take notice — last week, the number of daily users of the site surpassed 5,000.”
We have crossed 5500 users for the past two nights. The trend is positive.
“Other potential sources of information, such as Wikipedia, lack the algorithms needed to search chemicals according to their structure. “
Structure searching is “feasible” of course with InChI Strings. But substructure isn’t and Wikipedia is treated as a text-based search by almost all of its users
“The site is maintained with modest profits from advertising and the work of about 30 active volunteers who double-check the data pulled in from outside.”
The original investment in hardware and software costs has finally been recouped. Modest profits? No one gets paid for the work we do. There is a phenomenal sweat equity investment in the platform numbering many thousands of hours to get here. We are indebted to the many software collaborators, providers of tools and the people curating and depositing to the system. There have BEEN about 30 active volunteers. RIght now I would say the number of active depositors and curators is around 10. But it is growing. I hadn’t checked the number of REGISTERED users for a long time. We have over 1150 registered users…those who CAN login and curate data, deposit data, see new features etc. People do NOT have to register to use the site…but >1150 did. Wow. I didn’t know it was that many until i just checked (BIG SMILE)
““There’s an awful lot of chemical information, but there’s an awful lot of rubbish as well,” says Barrie Walker, a retired industrial chemist in Yorkshire, UK, who helps maintain the site.”
Don’t know whether Barrie said this or not. He IS an honest guy and he is our QUALITY GURU and we are proud that he is willing to give us his fine eyes. There IS garbage on the site still. But, after a year online and active curating it has been much reduced. About 200 edits a day are made to the site: names changed/deleted/added, spectra/structures/URLs/Publications added etc. It’s quite the pace. We have cleaned up 100s of thousands of incorrect associations from the external data sources. It’s been and will remain an enormous task with an enormous payback for the community
“Williams adds that the site still has problems with certain searches. For example, it struggles to distinguish between isomers: molecules with the same chemical formula arranged in different structures.”
We can distinguish isomers no problem. The PROBLEM is that there is a mixture of isomeric species submitted from multiple data sources and data are mixed and intermingled in way that the user cannot get to the correct structure. Search taxol or Ginkgolide on the ChemSpider blog and read the mutliple blog posts about this. We can of course search all isomers for a particular chemical formula…
“But Williams nevertheless believes that the service may be able to compete with for-profit services. “What I’m doing is highly disruptive,” he says. “I think it can be done and it needs to be done.””
I think what WE are doing…its not me..it’s we…is disruptive. In a good way. Many chemists will benefit. Will it have an impact on for-profit services? Yes, maybe. As an outcome but not as the target. Our team of people, both internal to ChemSpider’s development and Advisory Group, and the people we don’t even know who are cleaning and depositing into the system for their colleagues in the community, are creating a powerful resource for Chemists. The FOCUS of this effort is to Build a Structure Centric Community for Chemists. We will change that soon…the focus on Structure-Centric will be to cover Chemistry in general and to Build a Community for Chemists.
We are well on our way and thanks to Nature, and Geoff in particular for exposing it. My comments above are not meant to detract from Geoff’s reporting abilities but it was a long discussion and some clarification statements are of value i believe.