…. and it did. But not quite in the way that Cambridge had imagined. Over the last few weeks around 200,000 articles from contributing publishers have been added to ChemSpider’s literature search (as ChemRefer is now styled), though even this is not in the final form which we imagine.
Another 40,000 articles or so are following next week as this resource grows. The indexer is running hot 24 hours, seven days a week. Tens of thousands more articles will follow after that and on top of that we now have the capability to index text from image PDFs (many journal articles are still in this form) which that also opens up the possibility of users sending in scanned images of their data rich documents as a form of submission of chemical information to ChemSpider as well.
The main issue now is not having the time/resources to index everything we have permission for, we have still barely scratched the surface of Highwire for instance and adding updates from the resources we already index is not yet implemented properly. But, these are nice problems to have.
When we do have the critical mass of text journal articles indexed, the “cited in” feature can be implemented and we can open up the chemical names from the indexed content for downloading and curation by the ChemSpider community… and that’s when things get really interesting.
We are still on track, with just scant resources, to create a community curated cheminformatics-text search that we hope will eventually gain unstoppable momentum thanks to our community backing. Mozilla Firefox competes with Microsoft’s Internet Explorer because it has user and developer community backing and that is worth consideration as a role model for ChemSpider and the chemistry world as a whole.
The turn around that has occurred in terms of the interest in having published materials text indexed is highly significant in the long run since thousands of references will pour into ChemSpider structure records to enhance the usefulness of the database.
These, of course, will be free for anyone to download, so will make a material contribution to the openness of chemical data (which is what I want Open Chemistry Web to be all about) as opposed to talking about definitions/licenses/copyrights and other such distractions (as I see them) surrounding open access and open data.