<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: STOP COUNTING the Number of Chemical Entities in Public Compound Databases and There are Ghosts in the Closet</title>
	<atom:link href="http://www.chemspider.com/blog/stop-counting-the-number-of-chemical-entities-in-public-compound-databases-and-there-are-ghosts-in-the-closet.html/feed" rel="self" type="application/rss+xml" />
	<link>http://www.chemspider.com/blog/stop-counting-the-number-of-chemical-entities-in-public-compound-databases-and-there-are-ghosts-in-the-closet.html</link>
	<description>Building Community for Chemists</description>
	<lastBuildDate>Tue, 14 May 2013 21:50:31 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5</generator>
	<item>
		<title>By: Antony Williams</title>
		<link>http://www.chemspider.com/blog/stop-counting-the-number-of-chemical-entities-in-public-compound-databases-and-there-are-ghosts-in-the-closet.html/comment-page-1#comment-189398</link>
		<dc:creator>Antony Williams</dc:creator>
		<pubDate>Mon, 13 Jul 2009 20:05:31 +0000</pubDate>
		<guid isPermaLink="false">http://www.chemspider.com/blog/?p=1405#comment-189398</guid>
		<description><![CDATA[Martin,
We have a very active project to perform curation of the data. Not all of this depends on humans ...there are ways to use regular expressions to clean up the names associated with structures and we have cleaned up many tens of thousands of names associated with structures using such expressions. We are developing approaches to cluster structures also.]]></description>
		<content:encoded><![CDATA[<p>Martin,<br />
We have a very active project to perform curation of the data. Not all of this depends on humans &#8230;there are ways to use regular expressions to clean up the names associated with structures and we have cleaned up many tens of thousands of names associated with structures using such expressions. We are developing approaches to cluster structures also.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Martin Walker</title>
		<link>http://www.chemspider.com/blog/stop-counting-the-number-of-chemical-entities-in-public-compound-databases-and-there-are-ghosts-in-the-closet.html/comment-page-1#comment-189357</link>
		<dc:creator>Martin Walker</dc:creator>
		<pubDate>Mon, 13 Jul 2009 18:40:06 +0000</pubDate>
		<guid isPermaLink="false">http://www.chemspider.com/blog/?p=1405#comment-189357</guid>
		<description><![CDATA[Great post, Antony!  I think this is a major problem, and you illustrate the issues very nicely.  I think, though, that each major database (such as ChemSpider) needs to have a specific project for dealing with the problem.  Just asking people to curate things will help fix a few dozen things, an organized project will fix hundreds, and if an automation tool is simple and available to assist curators (e.g., to find things of identical skeleton) it will fix thousands - assuming you can build a critical mass of people to help.  

Please keep reminding us about this!]]></description>
		<content:encoded><![CDATA[<p>Great post, Antony!  I think this is a major problem, and you illustrate the issues very nicely.  I think, though, that each major database (such as ChemSpider) needs to have a specific project for dealing with the problem.  Just asking people to curate things will help fix a few dozen things, an organized project will fix hundreds, and if an automation tool is simple and available to assist curators (e.g., to find things of identical skeleton) it will fix thousands &#8211; assuming you can build a critical mass of people to help.  </p>
<p>Please keep reminding us about this!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ChemSpider Blog &#187; Blog Archive &#187; Wolfram Alpha and It&#8217;s Support for Chemistry #scifoo09</title>
		<link>http://www.chemspider.com/blog/stop-counting-the-number-of-chemical-entities-in-public-compound-databases-and-there-are-ghosts-in-the-closet.html/comment-page-1#comment-188783</link>
		<dc:creator>ChemSpider Blog &#187; Blog Archive &#187; Wolfram Alpha and It&#8217;s Support for Chemistry #scifoo09</dc:creator>
		<pubDate>Sun, 12 Jul 2009 23:03:07 +0000</pubDate>
		<guid isPermaLink="false">http://www.chemspider.com/blog/?p=1405#comment-188783</guid>
		<description><![CDATA[[...] STOP COUNTING the Number of Chemical Entities in Public Compound Databases and There are Ghosts in t...       12 07 [...]]]></description>
		<content:encoded><![CDATA[<p>[...] STOP COUNTING the Number of Chemical Entities in Public Compound Databases and There are Ghosts in t&#8230;       12 07 [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Markus Sitzmann</title>
		<link>http://www.chemspider.com/blog/stop-counting-the-number-of-chemical-entities-in-public-compound-databases-and-there-are-ghosts-in-the-closet.html/comment-page-1#comment-188514</link>
		<dc:creator>Markus Sitzmann</dc:creator>
		<pubDate>Sun, 12 Jul 2009 13:50:33 +0000</pubDate>
		<guid isPermaLink="false">http://www.chemspider.com/blog/?p=1405#comment-188514</guid>
		<description><![CDATA[Antony,

we are working hard on curating the Chemical Identifier Resolver or the Lookup service, respectively. And I am fully aware of the problem with many of the names (actually our name index still gives my main headache), thats why we called the service &quot;beta&quot;. My plan is that you can trace back the source of any name we have but that is a bigger effort as you can imagine.

Markus]]></description>
		<content:encoded><![CDATA[<p>Antony,</p>
<p>we are working hard on curating the Chemical Identifier Resolver or the Lookup service, respectively. And I am fully aware of the problem with many of the names (actually our name index still gives my main headache), thats why we called the service &#8220;beta&#8221;. My plan is that you can trace back the source of any name we have but that is a bigger effort as you can imagine.</p>
<p>Markus</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Antony Williams</title>
		<link>http://www.chemspider.com/blog/stop-counting-the-number-of-chemical-entities-in-public-compound-databases-and-there-are-ghosts-in-the-closet.html/comment-page-1#comment-188363</link>
		<dc:creator>Antony Williams</dc:creator>
		<pubDate>Sun, 12 Jul 2009 06:57:49 +0000</pubDate>
		<guid isPermaLink="false">http://www.chemspider.com/blog/?p=1405#comment-188363</guid>
		<description><![CDATA[AN interesting response to the original post to the CHMINF letter has been posted by Harry Gottlieb...notice the comments regarding &quot;CAS has 30 stereoisomers, isotopomers, or anions. &quot;

&quot;Unique structures may not all be equally good choices, or valid representations, of what they purport to represent. If you say that structures just represent themselves, then each unique structure may be equally valid. Curation is hard. If you need it, either someone else does it and you pay for that, or you get to, and so does the next person... 

To toss in an organic example, someone looking for, &quot;the structure&quot; of curcumin, expecting a unique structure, may be confused to find all of the following (I&#039;ve just pasted SMILES below) as PubChem Compound entries for that name (along with 13 substring hits): 
442783	COC1=C(C=CC(=C1)C=CC(=CC(=O)C=CC2=CC(=C(C=C2)O)OC)O)O 
24884282	COC1=C(C=CC(=C1)/C=C/C(=O)CC(=O)/C=C\C2=CC(=C(C=C2)O)OC)O 
5281767	COC1=C(C=CC(=C1)/C=C/C(=C/C(=O)/C=C/C2=CC(=C(C=C2)O)OC)/O)O 
2889	COC1=C(C=CC(=C1)C=CC(=O)CC(=O)C=CC2=CC(=C(C=C2)O)OC)O 
969516	COC1=C(C=CC(=C1)/C=C/C(=O)CC(=O)/C=C/C2=CC(=C(C=C2)O)OC)O 
10666836	[2H]/C(=C\C(=C\C(=O)/C=C(\[2H])/C1=CC(=C(C=C1)O)OC)\O)/C2=CC(=C(C=C2)O)OC 
10595440	COC1=C(C=CC(=C1)/[14CH]=C/C(=C/C(=O)/C=[14CH]/C2=CC(=C(C=C2)O)OC)/O)O 
25245785	[HH].COC1=C(C=CC(=C1)/C=C/C(/C=C(/C=C/C2=CC(=C(C=C2)O)OC)\O)O)O 
All but the last have the same molecular formula in this database (including the dideuterated 10666836). &quot; IUPAC : (1E,4Z,6E)-5-hydroxy-1,7-bis(4-hydroxy-3-methoxyphenyl)hepta-1,4,6-trien-3-one&quot; for 10595440 ignores the labeling shown in the PubChem structure (only spotted after looking at the SMILES string). 

In Chemical Abstracts Registry, RN 458-37-7 was the only retrieval for, &quot;curcumin,&quot; having a structure (presumably tautomer-normalized) matching 969516 above: https://scifinder.cas.org/scifinder/view/link_v1/substance.jsf?l=t7c60yhXV6v5ScoFE0KHlByibptEUn5iKfX7oG5zA4kLRoa9GLLZbUX-kdfHIjXb 

So, if you wanted, &quot;the structure&quot; for curcumin, CAS answers your need? Maybe not. Acta Cryst. (2007). E 63 , o860-o862 [ doi:10.1107/S160053680700222X ] suggests that the crystal structure for curcumin is the enol (5281767 above and CAS 147556-16-9, associated with, &quot;curcumin enol&quot; in Registry) , (1 E ,4 Z ,6 E )-5-hydroxy-1,7-bis(4-hydroxy-3-methoxyphenyl)hepta-1,4,6-trien-3-one . CAS&#039; 458-37-7 record includes five deleted RNs. Beyond the curcumin (normalized dione) and curcumin enol structures, CAS has 30 stereoisomers, isotopomers, or anions. 

Harry Gottlieb
Dept. Chemistry
Temple University &quot;]]></description>
		<content:encoded><![CDATA[<p>AN interesting response to the original post to the CHMINF letter has been posted by Harry Gottlieb&#8230;notice the comments regarding &#8220;CAS has 30 stereoisomers, isotopomers, or anions. &#8221;</p>
<p>&#8220;Unique structures may not all be equally good choices, or valid representations, of what they purport to represent. If you say that structures just represent themselves, then each unique structure may be equally valid. Curation is hard. If you need it, either someone else does it and you pay for that, or you get to, and so does the next person&#8230; </p>
<p>To toss in an organic example, someone looking for, &#8220;the structure&#8221; of curcumin, expecting a unique structure, may be confused to find all of the following (I&#8217;ve just pasted SMILES below) as PubChem Compound entries for that name (along with 13 substring hits):<br />
442783	COC1=C(C=CC(=C1)C=CC(=CC(=O)C=CC2=CC(=C(C=C2)O)OC)O)O<br />
24884282	COC1=C(C=CC(=C1)/C=C/C(=O)CC(=O)/C=C\C2=CC(=C(C=C2)O)OC)O<br />
5281767	COC1=C(C=CC(=C1)/C=C/C(=C/C(=O)/C=C/C2=CC(=C(C=C2)O)OC)/O)O<br />
2889	COC1=C(C=CC(=C1)C=CC(=O)CC(=O)C=CC2=CC(=C(C=C2)O)OC)O<br />
969516	COC1=C(C=CC(=C1)/C=C/C(=O)CC(=O)/C=C/C2=CC(=C(C=C2)O)OC)O<br />
10666836	[2H]/C(=C\C(=C\C(=O)/C=C(\[2H])/C1=CC(=C(C=C1)O)OC)\O)/C2=CC(=C(C=C2)O)OC<br />
10595440	COC1=C(C=CC(=C1)/[14CH]=C/C(=C/C(=O)/C=[14CH]/C2=CC(=C(C=C2)O)OC)/O)O<br />
25245785	[HH].COC1=C(C=CC(=C1)/C=C/C(/C=C(/C=C/C2=CC(=C(C=C2)O)OC)\O)O)O<br />
All but the last have the same molecular formula in this database (including the dideuterated 10666836). &#8221; IUPAC : (1E,4Z,6E)-5-hydroxy-1,7-bis(4-hydroxy-3-methoxyphenyl)hepta-1,4,6-trien-3-one&#8221; for 10595440 ignores the labeling shown in the PubChem structure (only spotted after looking at the SMILES string). </p>
<p>In Chemical Abstracts Registry, RN 458-37-7 was the only retrieval for, &#8220;curcumin,&#8221; having a structure (presumably tautomer-normalized) matching 969516 above: <a href="https://scifinder.cas.org/scifinder/view/link_v1/substance.jsf?l=t7c60yhXV6v5ScoFE0KHlByibptEUn5iKfX7oG5zA4kLRoa9GLLZbUX-kdfHIjXb" rel="nofollow">https://scifinder.cas.org/scifinder/view/link_v1/substance.jsf?l=t7c60yhXV6v5ScoFE0KHlByibptEUn5iKfX7oG5zA4kLRoa9GLLZbUX-kdfHIjXb</a> </p>
<p>So, if you wanted, &#8220;the structure&#8221; for curcumin, CAS answers your need? Maybe not. Acta Cryst. (2007). E 63 , o860-o862 [ doi:10.1107/S160053680700222X ] suggests that the crystal structure for curcumin is the enol (5281767 above and CAS 147556-16-9, associated with, &#8220;curcumin enol&#8221; in Registry) , (1 E ,4 Z ,6 E )-5-hydroxy-1,7-bis(4-hydroxy-3-methoxyphenyl)hepta-1,4,6-trien-3-one . CAS&#8217; 458-37-7 record includes five deleted RNs. Beyond the curcumin (normalized dione) and curcumin enol structures, CAS has 30 stereoisomers, isotopomers, or anions. </p>
<p>Harry Gottlieb<br />
Dept. Chemistry<br />
Temple University &#8220;</p>
]]></content:encoded>
	</item>
</channel>
</rss>
