Archive for April, 2009

Jean-Claude Bradley recently blogged about an all too familiar issue which you will need to read about there to understand this post.

They’ve done work, now they have to remember what they’ve done in one place in an updatable way w/o creating a bureaucratic headache. At ChemSpider, organisation of data in a cost-effective manner has caused headaches every now and then though it has also been one of the biggest drivers of innovation.

I have followed from time to time but the activity around what is called ‘Open Notebook Science’ is reaching an industrial scale.

The ONSchallenge site, which i looked at the most closely, is annotated with links to comprehensive experimental, structural and relevant external data sources but still is reader friendly.

So anyway, after some discussion over at Usefulchem, i wrote a small script to collect google spreadsheet urls (that link directly to the xLS format) and associate them with the wiki page on which they are linked from.

This is only at the first level of connections intially (starting from this base url) and no extra metadata such as InChI are collected.. yet. But we have to determine the best format for this service before it is scaled.

e.g. a web service might be a good idea (but it could face spam requests) or perhaps a locally run script (which would assume configuration by every potential user is convenient).

The generated excel spreadsheet is << here >>.

The literature search now has a (partially enabled) feature which extracts references from documents where the author has cited themselves. It also counts citations of that reference in other documents.

This allows users to follow a chain of research backwards and forwards (as cited in links will always be in the future, with references being in the past).

These features are in testing but are live for users in the text search as well fyi an example search:

Suzuki coupling

There is no engineering to make cited papers ‘preferred’ by the search engine in terms of ranking as, whilst the association between keyword and relevance is strong, it is not clear (to me) that this is true for the association between citation count and relevance.

And anyway as we are only indexing a couple of hundred thousand docs so far we dont have a big enough sample size for ‘cited in’ counts to be comprehensive. Their main use is for following chains of research.