<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Open Notebook Science NMR Study Part 2</title>
	<atom:link href="http://www.chemspider.com/blog/open-notebook-science-nmr-study-part-2.html/feed" rel="self" type="application/rss+xml" />
	<link>http://www.chemspider.com/blog/open-notebook-science-nmr-study-part-2.html</link>
	<description>Building Community for Chemists</description>
	<lastBuildDate>Fri, 10 Feb 2012 11:07:05 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.2</generator>
	<item>
		<title>By: Sanford Dickert</title>
		<link>http://www.chemspider.com/blog/open-notebook-science-nmr-study-part-2.html/comment-page-1#comment-4171</link>
		<dc:creator>Sanford Dickert</dc:creator>
		<pubDate>Mon, 29 Oct 2007 12:37:09 +0000</pubDate>
		<guid isPermaLink="false">http://www.chemspider.com/blog/?p=219#comment-4171</guid>
		<description>In regard to the Open Data question, we have been doing our own work for curating community data with the Red Hen Spectra project - which you can see a version of it at alpha.redhenspectra.com:3005.  

We are still making minor modifications - and need to improve the dataset, but the development concepts are available for all to see.  

I would be happy to discuss with others about our design of our product as well as the web-service we make available for use outside the bounds of our website.

Email me at sanford [AT] cooper dot edu</description>
		<content:encoded><![CDATA[<p>In regard to the Open Data question, we have been doing our own work for curating community data with the Red Hen Spectra project &#8211; which you can see a version of it at alpha.redhenspectra.com:3005.  </p>
<p>We are still making minor modifications &#8211; and need to improve the dataset, but the development concepts are available for all to see.  </p>
<p>I would be happy to discuss with others about our design of our product as well as the web-service we make available for use outside the bounds of our website.</p>
<p>Email me at sanford [AT] cooper dot edu</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Antony Williams</title>
		<link>http://www.chemspider.com/blog/open-notebook-science-nmr-study-part-2.html/comment-page-1#comment-4045</link>
		<dc:creator>Antony Williams</dc:creator>
		<pubDate>Sat, 27 Oct 2007 11:47:43 +0000</pubDate>
		<guid isPermaLink="false">http://www.chemspider.com/blog/?p=219#comment-4045</guid>
		<description>I added a new comment to PMR&#039;s blog today entitled Open NMR calculations: Intermediate conclusions at http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=750

Peter, You have some interesting conclusions in this post and some are contrary to earlier observations made by others. First some comments:
1) Regarding &quot;It is also unclear how many of these could be robotically downloaded without the publishers sending lawyers - at least Elsevier allow us to do this.&quot; This is excellent news that one of the biggest publishers around allows you to robotically download spectra from their papers. Very good indeed!
2) Regarding &quot;the only Open collection of spectra is NMRShiftDB - open nmr database on the web.&quot; Just to clarify these are NOT NMR spectra actually. Unless NMRShiftDB has a capability I am aware of NMRSHiftDB is a database of molecular structures with associated assignments (and maybe in some cases just a list of shifts..maybe all don&#039;t have to be assigned.) As an NMR spectroscopist the spectrum itself is what comes off the instrument, the one that can be re-referenced, phased, baseline corrected etc. NMRShiftDB is limited (I think) to a peak listing. This should not detract from the value of the data collection but it may cause confusion. Certainly one conversation I have had in the past 24 hours suggests that people think that NMRShiftDB contains NMR &quot;spectra&quot;. But Christoph named it appropriately as a SHIFT database.

3) REgarding &quot;We downloaded the whole of NMRShiftDB. When we started we had NO idea of the quality.&quot;  I think you had an idea and point you to your own blogpostings. http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=278; http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=346. I recall you followed the scientific discourse between Wolfgang Robien and ACD/Labs regarding the quality and supported our conclusions that the data was of good quality. I recommend following the NMRShiftDB homepage (http://nmrshiftdb.ice.mpg.de/)where such reports get posted by Christoph as they occur:

a) NMRShiftDB Critique   2007-04-05 02:01 - NMRShiftDB
Prof. Wolfgang Robien from Vienna, maker of the CSearch system, has evaluated NMRShiftDB&#039;s data quality and found a number of partly severe errors. Robien&#039;s critique is summarized on his own site here. 

b) NMRShiftDB review   2007-05-03 04:12 - NMRShiftDB
Antony Williams published an NMRShiftDB quality review in his ChemSpider blog. See here

c)Quality Campaign   2007-07-02 08:03 - NMRShiftDB
Between 2007-3-10 and today, altogether 72 spectra and/or structures in NMRShiftDB have been edited by the community to correct errors identified in analyses by Wolfgang Robien and Antony Williams as well as internal cross-checks.

4)Regarding &quot;We knew in advance that certain calculations would be inappropriate. Large molecules (&gt; 20 heavy atoms) would take too long. &quot; The 20 heavy atom limit is a real constraint. I judge that most pharmaceuticals in use today are over 20 atoms (xanax, sildenafil, ketoconazole, singulair for example). I would hope that members of the NMR community are watching your work as it should be of value to them but I believe 20 atoms is a severe constraint. That said I know that with more time you could do larger molecules but a day per molecule is likely enough time investment. 

5) Regarding &quot;Molecules with floppy groups cannot be easily analysed.&quot; So, anything with a side chain then.

6) Regarding &quot;So we have a final list of about 300 candidates.&quot; Out of a total of over 20000 individual structures your analysis was performed on 1.5% of the dataset. How many data points was this out of interest. A structure is clearly not a data point since each structure has multiple nuclear centers and you are predicting individual shifts. I&#039;ll estimate about 3000 shifts? The earlier validation I reported on was 214,000 shifts (http://www.chemspider.com/blog/?p=37) but that was an old version of the database and it has grown since then.

7) Regarding &quot; probably 20% of entries have misassignments and transcription errors. Difficult to say, but probably about 1-5%&quot;. This suggests about 25% of shifts associated with my estimated 3000 shifts are in error. This is about 750 data points and this conclusion was made by the study of 300 molecules. For sure the 25% does not carry over to the entire database. It is of MUCH higher quality that that. My earlier posting suggested that there were about 250 BAD points. The subjective criteria are discussed here (http://www.chemspider.com/blog/?p=44). Wolfgang suggested about 300 bad points but we were both being very conservative.You discussed the difference between 250 and 300 here on your blog as you likely recall http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=346

8) Regarding &quot;We realise that other groups have access to larger and, they claim, better data sets. But they are closed. I shall argue in a later post that closed approaches hold back the quality of scientific data.&quot; I think your comments are regarding Wolfgang Robien and ACD/labs. That is true that we have access to larger datasets but we can limit the conversations to NMRShiftDB since we ALL have access to that. Robien&#039;s and ACD/Labs algorithms can adequately deal with the NMRSHiftDB dataset. For the neural nets and Increment based approach over 200,000 data points can be calculated in less than 5 minutes (http://www.chemspider.com/blog/?p=213). You have access to the same dataset and can handle 300 of the structures. Your statement is moot..it is NOT about database size but about algorithmic capabilities.</description>
		<content:encoded><![CDATA[<p>I added a new comment to PMR&#8217;s blog today entitled Open NMR calculations: Intermediate conclusions at <a href="http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=750" rel="nofollow">http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=750</a></p>
<p>Peter, You have some interesting conclusions in this post and some are contrary to earlier observations made by others. First some comments:<br />
1) Regarding &#8220;It is also unclear how many of these could be robotically downloaded without the publishers sending lawyers &#8211; at least Elsevier allow us to do this.&#8221; This is excellent news that one of the biggest publishers around allows you to robotically download spectra from their papers. Very good indeed!<br />
2) Regarding &#8220;the only Open collection of spectra is NMRShiftDB &#8211; open nmr database on the web.&#8221; Just to clarify these are NOT NMR spectra actually. Unless NMRShiftDB has a capability I am aware of NMRSHiftDB is a database of molecular structures with associated assignments (and maybe in some cases just a list of shifts..maybe all don&#8217;t have to be assigned.) As an NMR spectroscopist the spectrum itself is what comes off the instrument, the one that can be re-referenced, phased, baseline corrected etc. NMRShiftDB is limited (I think) to a peak listing. This should not detract from the value of the data collection but it may cause confusion. Certainly one conversation I have had in the past 24 hours suggests that people think that NMRShiftDB contains NMR &#8220;spectra&#8221;. But Christoph named it appropriately as a SHIFT database.</p>
<p>3) REgarding &#8220;We downloaded the whole of NMRShiftDB. When we started we had NO idea of the quality.&#8221;  I think you had an idea and point you to your own blogpostings. <a href="http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=278" rel="nofollow">http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=278</a>; <a href="http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=346" rel="nofollow">http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=346</a>. I recall you followed the scientific discourse between Wolfgang Robien and ACD/Labs regarding the quality and supported our conclusions that the data was of good quality. I recommend following the NMRShiftDB homepage (<a href="http://nmrshiftdb.ice.mpg.de/" rel="nofollow">http://nmrshiftdb.ice.mpg.de/</a>)where such reports get posted by Christoph as they occur:</p>
<p>a) NMRShiftDB Critique   2007-04-05 02:01 &#8211; NMRShiftDB<br />
Prof. Wolfgang Robien from Vienna, maker of the CSearch system, has evaluated NMRShiftDB&#8217;s data quality and found a number of partly severe errors. Robien&#8217;s critique is summarized on his own site here. </p>
<p>b) NMRShiftDB review   2007-05-03 04:12 &#8211; NMRShiftDB<br />
Antony Williams published an NMRShiftDB quality review in his ChemSpider blog. See here</p>
<p>c)Quality Campaign   2007-07-02 08:03 &#8211; NMRShiftDB<br />
Between 2007-3-10 and today, altogether 72 spectra and/or structures in NMRShiftDB have been edited by the community to correct errors identified in analyses by Wolfgang Robien and Antony Williams as well as internal cross-checks.</p>
<p>4)Regarding &#8220;We knew in advance that certain calculations would be inappropriate. Large molecules (> 20 heavy atoms) would take too long. &#8221; The 20 heavy atom limit is a real constraint. I judge that most pharmaceuticals in use today are over 20 atoms (xanax, sildenafil, ketoconazole, singulair for example). I would hope that members of the NMR community are watching your work as it should be of value to them but I believe 20 atoms is a severe constraint. That said I know that with more time you could do larger molecules but a day per molecule is likely enough time investment. </p>
<p>5) Regarding &#8220;Molecules with floppy groups cannot be easily analysed.&#8221; So, anything with a side chain then.</p>
<p>6) Regarding &#8220;So we have a final list of about 300 candidates.&#8221; Out of a total of over 20000 individual structures your analysis was performed on 1.5% of the dataset. How many data points was this out of interest. A structure is clearly not a data point since each structure has multiple nuclear centers and you are predicting individual shifts. I&#8217;ll estimate about 3000 shifts? The earlier validation I reported on was 214,000 shifts (<a href="http://www.chemspider.com/blog/?p=37" rel="nofollow">http://www.chemspider.com/blog/?p=37</a>) but that was an old version of the database and it has grown since then.</p>
<p>7) Regarding &#8221; probably 20% of entries have misassignments and transcription errors. Difficult to say, but probably about 1-5%&#8221;. This suggests about 25% of shifts associated with my estimated 3000 shifts are in error. This is about 750 data points and this conclusion was made by the study of 300 molecules. For sure the 25% does not carry over to the entire database. It is of MUCH higher quality that that. My earlier posting suggested that there were about 250 BAD points. The subjective criteria are discussed here (<a href="http://www.chemspider.com/blog/?p=44" rel="nofollow">http://www.chemspider.com/blog/?p=44</a>). Wolfgang suggested about 300 bad points but we were both being very conservative.You discussed the difference between 250 and 300 here on your blog as you likely recall <a href="http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=346" rel="nofollow">http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=346</a></p>
<p> <img src='http://www.chemspider.com/blog/wp-includes/images/smilies/icon_cool.gif' alt='8)' class='wp-smiley' /> Regarding &#8220;We realise that other groups have access to larger and, they claim, better data sets. But they are closed. I shall argue in a later post that closed approaches hold back the quality of scientific data.&#8221; I think your comments are regarding Wolfgang Robien and ACD/labs. That is true that we have access to larger datasets but we can limit the conversations to NMRShiftDB since we ALL have access to that. Robien&#8217;s and ACD/Labs algorithms can adequately deal with the NMRSHiftDB dataset. For the neural nets and Increment based approach over 200,000 data points can be calculated in less than 5 minutes (<a href="http://www.chemspider.com/blog/?p=213" rel="nofollow">http://www.chemspider.com/blog/?p=213</a>). You have access to the same dataset and can handle 300 of the structures. Your statement is moot..it is NOT about database size but about algorithmic capabilities.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

