When I was at the Scifoo meeting earlier this year I got very excited about the Google Datasets project. I must admit that my creative spirit and need to hang out with innovators has, for years, called out to me to “Take Chemistry to Google”. When I left SciFoo I left with a hard drive to put data onto. I had great ideas about using the ChemSpider dataset of InChIs and CSIDs to connect chemists. I had hoped to put the data into the Google Datasets Project but actually work with Google to “do something” with them other than just host them for other people to download. If you do a search on Google today (at least if I do) I get the following result…let me know what you get! I’ll admit my naivety on this but maybe there is a limitation of hits shown etc (David Bradley..any ideas?)

 Considering_that_part of the story for InChI, and I have given the story many times myself (!) is that the internet can be made structure searchable by InChI this is a limited result set especially considering that there 21.5 million of them on ChemSpider. Then there’s PubChem, Drugbank, and so many more.

My hope was that Google might be interested in connecting Google Scholar to structure searching and work with us to enable it. Couldn’t get anyone interested. I was in California for a week and asked whether I could stop by and talk about ChemSPider and how we could help Google with Chemistry – no interest. Overall I will say that I couldn’t get any traction with Google about Chemistry and it’s a great shame. I’ve had similar things said by others. One guy who used to be at Google who WAS interested in Chemistry was Simon Quellen-Field who runs the Sci-Toys website. I think Google needed an advocate for Chemistry in their Datasets Team so that it could have been more than just hosting data but rather doing something WITH the data for the community.

I’m disappointed that the project has come to an end since I was hopeful for its purpose and its impact. I think that someone else will pick it up. If not, then they should…

The letter said…

Thank you very much for trying out Google Research Datasets, providing interesting datasets, and giving us extremely useful feedback. We have learned a lot about the issues facing researchers and dataset producers from this testing period.

As you know, Google is a company that promotes experimentation with innovative new products and services. At the same time, we have to carefully balance that with ensuring that our resources are used in the most effective possible way to bring maximum value to our users.

It has been a difficult decision, but we have decided not to continue work on Google Research Datasets, but to instead focus our efforts on other activities such as Google Scholar, our Research Programs, and publishing papers about research here at Google.

The Google Research Datasets service will remain active until the end of January 2009 during which time any datasets may be downloaded. For those datasets that are impractical to download, we will also happily provide interested users with a copy via hard drive shipment.

Once again, we’d like to thank you for helping us test Google Research Datasets, it’s been a very useful experience, and we look forward to finding new ways to provide you with useful services in the future.”

Stumble it!

One Response to “The Google Datasets Project Comes to An End – Oh My Chemistry – Who Cares For You?”

  1. Joerg Kurt Wegner says:

    Focus, I guess most of us have heard that more than once, especially for those of us in industry. Anyway, its maybe not only the syntax problem of the InChI, but also the ‘deep web’ problem, making it difficult for indexing services.

    And would that not a beautiful follow-up paper taking the work of Peter, Henry, and of course, Rich into account ?

    Maybe we need to find a ‘deep-web’ and search engine expert to write a comment?

Leave a Reply