Structure Images on ChemSpider Tagged for InChI Searching. Web Service Enabled.
Posted by: Antony Williams in ChemSpider ServicesCopyright©2007 Antony Williams
If you’ve been frequenting our blog(s) you will have seen our passion for InChI adoption (1,2). A few months ago I started a discussion with Martin Walker (Walkerma), a very active member of the Chemistry community on Wikipedia. We were chatting about how to make chemical structure images searchable and while I was managing ChemSketch at ACD/Labs we embedded information into PNG images to facilitate this.
it just made sense to facilitate this via ChemSpider also so now all structure images on ChemSpider are also tagged in the image with BOTH InChI Strings and InChI keys and the ChemSpider ID.
There is a fairly standard way to embed tags into the PNG format. An effort to standardize the approach is described at: http://pmt.sourceforge.net/exif/drafts/d020.html. We have used the following fields to enclose information:
1. DocumentName - contains http://www.chemspider.com/RecordView.aspx?id=#
2. Software - contains ChemSpider (http://www.chemspider.com/)
3. Artist - ChemSpider
4. ImageTitle - InChI
5. ImageDescription - InChIKey
If there are other tages you would like included in the image please let us know. The format for the time being is PNG. I assume there will be requests for SVG but let’s see…
18,630,699 structures are on the database right now. Not all of them have structure images generated yet but they will be done shortly after we have updated the database to over 20 million compounds (we are presently updating the structures associated with the Surechem Patent Database.
A new web service has been published online, GetRecordImage, allowing you to get the record image based on a search by systematic name, synonym, trade name, InChI etc. As an example of this in action visit the test page at http://www.chemspider.com/WSSearch.aspx . Simply type in a compound name or some text string and hit return. For Xanax for example, there is one image returned.
In some cases you will see MULTIPLE structures…if the text string you search on is in the DB. For example, diazonamide A has SIX hits.
We hope you find value in this service to access structure images directly from ChemSpider. You will notice our ChemSpider logo on the image and our URL. We acknowledge they can be removed. Our request is respect our efforts and leave them there.
Buy me a Coffee
Entries (RSS)
September 25th, 2007 at 12:46 am
For those do-it-yourselfers out there, many programming languages support reading and writing PNG image metadata:
http://depth-first.com/articles/2007/08/29/never-draw-the-same-molecule-twice-writing-png-image-metadata-with-python
http://blog.modp.com/2007/08/python-pil-and-png-metadata-take-2.html
http://baoilleach.blogspot.com/2007/08/access-embedded-molecular-information.html
http://chem-bla-ics.blogspot.com/2007/08/jchempaint-too-png-embedded.html
September 25th, 2007 at 2:29 am
Tags above were chosen mostly arbitrarily. I still can’t find any pointers on which keywords should be used for tEXt. Any attempt to standardize? Any example of libpng direct usage for storing cheminfo? Any hints where/how IIOMetadata gets stored in PNG? What about other image formats?
October 10th, 2007 at 7:35 am
Hi Antony:
Gotta ask, but why not XMP? That IMHO is the way forward to getting extensible metadata in XML (RDF/XML to boot) into media files:
“An XMP Packet is embedded in a PNG graphic file by adding a chunk of type iTXt. This
chunk is semantically equivalent to the tEXt and zTXt chunks, but the textual data is in the
UTF-8 encoding of the Unicode character set, instead of Latin-1.
The Chunk Data portion is the XMP Packet. The packet must be marked as read-only.”
Details on p.97, XMP Spec:
http://www.adobe.com/devnet/xmp/pdfs/xmp_specification.pdf
Cheers,
Tony