<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Hacking PubChem &#8211; Technology Easy, Quality Difficult</title>
	<atom:link href="http://www.chemspider.com/blog/hacking-pubchem-technology-easy-quality-difficult.html/feed" rel="self" type="application/rss+xml" />
	<link>http://www.chemspider.com/blog/hacking-pubchem-technology-easy-quality-difficult.html</link>
	<description>Building Community for Chemists</description>
	<lastBuildDate>Fri, 10 Feb 2012 11:07:05 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.2</generator>
	<item>
		<title>By: ChemSpider Blog &#187; Blog Archive &#187; STOP COUNTING the Number of Chemical Entities in Public Compound Databases and There are Ghosts in the Closet</title>
		<link>http://www.chemspider.com/blog/hacking-pubchem-technology-easy-quality-difficult.html/comment-page-1#comment-188314</link>
		<dc:creator>ChemSpider Blog &#187; Blog Archive &#187; STOP COUNTING the Number of Chemical Entities in Public Compound Databases and There are Ghosts in the Closet</dc:creator>
		<pubDate>Sat, 11 Jul 2009 19:46:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.chemspider.com/blog/?p=137#comment-188314</guid>
		<description>[...] simple example is that of methane in PubChem that I have blogged about many times&#8230;one example here. Here are some of the names associated with the structure of methane on PubChem: [...]</description>
		<content:encoded><![CDATA[<p>[...] simple example is that of methane in PubChem that I have blogged about many times&#8230;one example here. Here are some of the names associated with the structure of methane on PubChem: [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: A naive biochemist wakes up to the closed world of chemical abstracts and such &#171; The Omics world</title>
		<link>http://www.chemspider.com/blog/hacking-pubchem-technology-easy-quality-difficult.html/comment-page-1#comment-125819</link>
		<dc:creator>A naive biochemist wakes up to the closed world of chemical abstracts and such &#171; The Omics world</dc:creator>
		<pubDate>Fri, 05 Dec 2008 16:19:08 +0000</pubDate>
		<guid isPermaLink="false">http://www.chemspider.com/blog/?p=137#comment-125819</guid>
		<description>[...] providers using a suitable lookup id . Naively I assumed this would be the CAS id which is the &#8220;unique id&#8221; associated with each molecule . An hour of googling later I woke up to the realization that CAS is [...]</description>
		<content:encoded><![CDATA[<p>[...] providers using a suitable lookup id . Naively I assumed this would be the CAS id which is the &#8220;unique id&#8221; associated with each molecule . An hour of googling later I woke up to the realization that CAS is [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ChemSpider Blog &#187; Blog Archive &#187; Enforcing Copyright of CAS Numbers</title>
		<link>http://www.chemspider.com/blog/hacking-pubchem-technology-easy-quality-difficult.html/comment-page-1#comment-24350</link>
		<dc:creator>ChemSpider Blog &#187; Blog Archive &#187; Enforcing Copyright of CAS Numbers</dc:creator>
		<pubDate>Sun, 09 Mar 2008 07:07:21 +0000</pubDate>
		<guid isPermaLink="false">http://www.chemspider.com/blog/?p=137#comment-24350</guid>
		<description>[...] ID&#8221; for a compound. Check out my earlier posts about the need for curation (1,2,3,4 and many others). CAS is very highly curated and are the authority for the CAS numbers. PubChem [...]</description>
		<content:encoded><![CDATA[<p>[...] ID&#8221; for a compound. Check out my earlier posts about the need for curation (1,2,3,4 and many others). CAS is very highly curated and are the authority for the CAS numbers. PubChem [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Unilever Centre for Molecular Informatics, Cambridge - petermr&#8217;s blog &#187; Blog Archive &#187; Comments on comments and agents and eyeballs</title>
		<link>http://www.chemspider.com/blog/hacking-pubchem-technology-easy-quality-difficult.html/comment-page-1#comment-3010</link>
		<dc:creator>Unilever Centre for Molecular Informatics, Cambridge - petermr&#8217;s blog &#187; Blog Archive &#187; Comments on comments and agents and eyeballs</dc:creator>
		<pubDate>Mon, 01 Oct 2007 19:10:48 +0000</pubDate>
		<guid isPermaLink="false">http://www.chemspider.com/blog/?p=137#comment-3010</guid>
		<description>[...] ChemSpiderMan Says: October 1st, 2007 at 3:55 pm ePeter, Iâ€™ve given many examples of the issue of Data Quality on the blog. Some links are:http://www.chemspider.com/blog/?p=64 http://www.chemspider.com/blog/?p=164 http://www.chemspider.com/blog/?p=168 http://www.chemspider.com/blog/?p=137 [...]</description>
		<content:encoded><![CDATA[<p>[...] ChemSpiderMan Says: October 1st, 2007 at 3:55 pm ePeter, Iâ€™ve given many examples of the issue of Data Quality on the blog. Some links are:<a href="http://www.chemspider.com/blog/?p=64" rel="nofollow">http://www.chemspider.com/blog/?p=64</a> <a href="http://www.chemspider.com/blog/?p=164" rel="nofollow">http://www.chemspider.com/blog/?p=164</a> <a href="http://www.chemspider.com/blog/?p=168" rel="nofollow">http://www.chemspider.com/blog/?p=168</a> <a href="http://www.chemspider.com/blog/?p=137" rel="nofollow">http://www.chemspider.com/blog/?p=137</a> [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Egon Willighagen</title>
		<link>http://www.chemspider.com/blog/hacking-pubchem-technology-easy-quality-difficult.html/comment-page-1#comment-2542</link>
		<dc:creator>Egon Willighagen</dc:creator>
		<pubDate>Sun, 16 Sep 2007 06:27:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.chemspider.com/blog/?p=137#comment-2542</guid>
		<description>Yet another reason to use InChI instead. I *know* a lot of people use the CAS registry number as identifiers; some people also can&#039;t stop smoking/drinking/...

To use Rich blog of last week: InChI is to chemoinformatics what the pass forward was to football. And this blog item shows the score :)

Cheers!</description>
		<content:encoded><![CDATA[<p>Yet another reason to use InChI instead. I *know* a lot of people use the CAS registry number as identifiers; some people also can&#8217;t stop smoking/drinking/&#8230;</p>
<p>To use Rich blog of last week: InChI is to chemoinformatics what the pass forward was to football. And this blog item shows the score <img src='http://www.chemspider.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Cheers!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Antony Williams</title>
		<link>http://www.chemspider.com/blog/hacking-pubchem-technology-easy-quality-difficult.html/comment-page-1#comment-2491</link>
		<dc:creator>Antony Williams</dc:creator>
		<pubDate>Fri, 14 Sep 2007 22:41:06 +0000</pubDate>
		<guid isPermaLink="false">http://www.chemspider.com/blog/?p=137#comment-2491</guid>
		<description>A friend sent me these comments offline &quot;Some of those seem to be formulations for CH4 and a non-structured component.  This is one of the most common ways that CAS numbers get associated improperly.  I seem to recall that Methane is simply 74-82-8, not even in your subset.&quot;

He is right that 74-82-8 is not in the subset but it is bolded and is the FIRST of all of the synonyms showing that it has been validated by a curator</description>
		<content:encoded><![CDATA[<p>A friend sent me these comments offline &#8220;Some of those seem to be formulations for CH4 and a non-structured component.  This is one of the most common ways that CAS numbers get associated improperly.  I seem to recall that Methane is simply 74-82-8, not even in your subset.&#8221;</p>
<p>He is right that 74-82-8 is not in the subset but it is bolded and is the FIRST of all of the synonyms showing that it has been validated by a curator</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joerg Kurt Wegner</title>
		<link>http://www.chemspider.com/blog/hacking-pubchem-technology-easy-quality-difficult.html/comment-page-1#comment-2480</link>
		<dc:creator>Joerg Kurt Wegner</dc:creator>
		<pubDate>Fri, 14 Sep 2007 17:16:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.chemspider.com/blog/?p=137#comment-2480</guid>
		<description>Agreed! This gives a *big* minus for PubChem. Are they really expecting that others find the duplicates and the wrong structures?

Anyway, can you not offer something for this? E.g. another web-service called &quot;Check pubchem cid in curated chemspider data&quot;

And thanks for *doing* this 
http://www.chemspider.com/blog/?p=135

You could even go a step further by offering a service called &quot;return curated unique inchikey or chemspider csid from non-curated pubchem cid&quot;!</description>
		<content:encoded><![CDATA[<p>Agreed! This gives a *big* minus for PubChem. Are they really expecting that others find the duplicates and the wrong structures?</p>
<p>Anyway, can you not offer something for this? E.g. another web-service called &#8220;Check pubchem cid in curated chemspider data&#8221;</p>
<p>And thanks for *doing* this<br />
<a href="http://www.chemspider.com/blog/?p=135" rel="nofollow">http://www.chemspider.com/blog/?p=135</a></p>
<p>You could even go a step further by offering a service called &#8220;return curated unique inchikey or chemspider csid from non-curated pubchem cid&#8221;!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rich Apodaca</title>
		<link>http://www.chemspider.com/blog/hacking-pubchem-technology-easy-quality-difficult.html/comment-page-1#comment-2473</link>
		<dc:creator>Rich Apodaca</dc:creator>
		<pubDate>Fri, 14 Sep 2007 15:38:46 +0000</pubDate>
		<guid isPermaLink="false">http://www.chemspider.com/blog/?p=137#comment-2473</guid>
		<description>The question of quality will always be present in any database, and your post raises valid concerns. It _would_ be great if CAS created a free Web service for CAS number lookup. In fact, such a move might just save their franchise.

I&#039;ve seen many examples of the same substance having multiple CAS numbers - it&#039;s quite common. It might be interesting for a reader with access to SciFinder to get a list of all of the CAS numbers for methane. Clearly, many of those CAS numbers are for &quot;Carbon&quot; and its mixtures/derivatives/allotropes - which could be attributed to the molfile format not having a consistent way to represent implicit hydrogens. (The data format used for compound registration matters - a lot). So the methane example may be the worst case scenario for a molecule unlikely to appear in most databases.

During the transition away from CAS numbers as a chemical identifier, conversions such as CAS number-&gt; PubChem CID (or CAS number-&gt; IUPAC name) will become very important. PubChem records contain IUPAC names (in several flavors), so it might be possible to use that field as a consistency check - as you&#039;ve done.

Bottom line: limbo-land is where we&#039;re at. As the Chinese say, may you live in interesting times...</description>
		<content:encoded><![CDATA[<p>The question of quality will always be present in any database, and your post raises valid concerns. It _would_ be great if CAS created a free Web service for CAS number lookup. In fact, such a move might just save their franchise.</p>
<p>I&#8217;ve seen many examples of the same substance having multiple CAS numbers &#8211; it&#8217;s quite common. It might be interesting for a reader with access to SciFinder to get a list of all of the CAS numbers for methane. Clearly, many of those CAS numbers are for &#8220;Carbon&#8221; and its mixtures/derivatives/allotropes &#8211; which could be attributed to the molfile format not having a consistent way to represent implicit hydrogens. (The data format used for compound registration matters &#8211; a lot). So the methane example may be the worst case scenario for a molecule unlikely to appear in most databases.</p>
<p>During the transition away from CAS numbers as a chemical identifier, conversions such as CAS number-&gt; PubChem CID (or CAS number-&gt; IUPAC name) will become very important. PubChem records contain IUPAC names (in several flavors), so it might be possible to use that field as a consistency check &#8211; as you&#8217;ve done.</p>
<p>Bottom line: limbo-land is where we&#8217;re at. As the Chinese say, may you live in interesting times&#8230;</p>
]]></content:encoded>
	</item>
</channel>
</rss>

