Archive for December, 2007

For those interested, here’s some stats from indexing PMC’s OAI subset:

  • 3,114,818 unique terms
  • 58,807 articles

How to get stuff with PMC’s OAI service.

To get ONE article:

http://www.pubmedcentral.nih.gov/oai/oai.cgi?verb=GetRecord&metadataPrefix=pmc&identifier=oai:pubmedcentral.nih.gov:1088242

Where the number at the end is the PubMed ID.

To get lots of articles:

http://www.pubmedcentral.nih.gov/oai/oai.cgi?verb=ListRecords&metadataPrefix=pmc&set=pmc-open&from=2007-11-17&until=2007-11-18

Where metadataPrefix=pmc means harvest the full text (change that to oai_dc to just get the metadata) and the ‘from’ and ‘until’ parameters restrict by the articles’ date. Restrict the set of articles to open access with the set=pmc-open parameter.

Or you can leave off the ‘from’ and ‘until’ arguments and then OAI will serve you the first subset of matching articles and give a ‘resumptionToken’ for you to harvest the rest afterwards. I won’t provide an example here (because these tokens expire) but the basic construction is:

http://www.pubmedcentral.nih.gov/oai/oai.cgi?verb=ListRecords&resumptionToken=STICK-IT-HERE

Happy harvesting. Don’t hack them off by making more than 1 request per 3 secs or more than 100 requests during their peak hours.

It is not the largest single source we have indexed (IUCr: ~83,000)

So far in terms of indexing, these are complete:

Hindawi, Electrochemical Science Group, Repositorium (Universidade do Minho Eprints), Medknow, MDPI

The next few to go:

ACBI, IUCr, PubMed Central*, PubMed**, Bentham***, Nature****

* Full text indexing for Open Access list only; ** Bibliographic data and chemical names only; *** Bibliographic data only except Bentham Open (Full Text Indexing); **** OTMI

 All sources are full text indexed unless otherwise stated.