<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Further Comments on the Quality of NMRShiftDB and NMR Prediction Algorithm Validation</title>
	<atom:link href="http://www.chemspider.com/blog/further-comments-on-the-quality-of-nmrshiftdb-and-nmr-prediction-algorithm-validation.html/feed" rel="self" type="application/rss+xml" />
	<link>http://www.chemspider.com/blog/further-comments-on-the-quality-of-nmrshiftdb-and-nmr-prediction-algorithm-validation.html</link>
	<description>Building Community for Chemists</description>
	<lastBuildDate>Tue, 21 May 2013 16:16:29 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5</generator>
	<item>
		<title>By: Ryan Sasaki</title>
		<link>http://www.chemspider.com/blog/further-comments-on-the-quality-of-nmrshiftdb-and-nmr-prediction-algorithm-validation.html/comment-page-1#comment-476</link>
		<dc:creator>Ryan Sasaki</dc:creator>
		<pubDate>Wed, 06 Jun 2007 12:46:45 +0000</pubDate>
		<guid isPermaLink="false">http://www.chemspider.com/blog/?p=44#comment-476</guid>
		<description><![CDATA[Before anyone takes Robien&#039;s results about the NMR prediction comparisons AS FACT...please read my latest post about the details of the comparison. 

http://acdlabs.typepad.com/my_weblog/2007/06/robiens_and_mod.html

There are still some remaining questions about how Modgraph came to an average deviation of 1.40 ppm. 

So before we can make a final decision on performance, I think Modgraph needs to make very clear the following: 

1. What is the overlap between NMRShiftDB and Modgraphâ€™s NMR prediction databases? Further, with several different database sources how much duplication of data exists across the databases and within the entire package?
   
2. Once that overlap is removed from the dataset, what is the final deviation produced by NMRPredict?

I think this information needs to be made very clear from Modgraph before they can claim to be, â€œthe most accurate carbon 13 NMR predictor in an independent evaluation?â€]]></description>
		<content:encoded><![CDATA[<p>Before anyone takes Robien&#8217;s results about the NMR prediction comparisons AS FACT&#8230;please read my latest post about the details of the comparison. </p>
<p><a href="http://acdlabs.typepad.com/my_weblog/2007/06/robiens_and_mod.html" rel="nofollow">http://acdlabs.typepad.com/my_weblog/2007/06/robiens_and_mod.html</a></p>
<p>There are still some remaining questions about how Modgraph came to an average deviation of 1.40 ppm. </p>
<p>So before we can make a final decision on performance, I think Modgraph needs to make very clear the following: </p>
<p>1. What is the overlap between NMRShiftDB and Modgraphâ€™s NMR prediction databases? Further, with several different database sources how much duplication of data exists across the databases and within the entire package?</p>
<p>2. Once that overlap is removed from the dataset, what is the final deviation produced by NMRPredict?</p>
<p>I think this information needs to be made very clear from Modgraph before they can claim to be, â€œthe most accurate carbon 13 NMR predictor in an independent evaluation?â€</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Wolfgang Robien</title>
		<link>http://www.chemspider.com/blog/further-comments-on-the-quality-of-nmrshiftdb-and-nmr-prediction-algorithm-validation.html/comment-page-1#comment-473</link>
		<dc:creator>Wolfgang Robien</dc:creator>
		<pubDate>Wed, 06 Jun 2007 08:24:46 +0000</pubDate>
		<guid isPermaLink="false">http://www.chemspider.com/blog/?p=44#comment-473</guid>
		<description><![CDATA[NMRShiftDB Debate


CSEARCH performs at 2.22ppm/2.19ppm before/after correction with a certain
network and a certain parameter setting.

ACD&#039;s CNMR-Predictor performs at 1.59 as can be seen from the table above.

MODGRAPH&#039;s NMRPredict performs at 1.40ppm as can be seen from their 
website.

That&#039;s the facts, which came out from the discussion now ....


I think, it&#039;s good to know that, verified on the same dataset. I initiated
this discussion, therefore I want to recall the starting point of this
discussion:

The central question of my webpage at nmrpredict.orc.univie.ac.at was:
(Statements copied from: 
nmrpredict.orc.univie.ac.at/csearchlite/Robien2Ryan_May31_2007.html
nmrpredict.orc.univie.ac.at/csearchlite/enjoy_its_free.html)

&quot;Is there a visible improvement when you spent a few hours on 
data-correction, both on a statistical basis and also with a few specific 
examples?&quot;

The answer was:

&quot;This small improvement - which is far away from being perfect 
and/or complete - without the use of one, single literature citation
improves the assignment quality by ca. 6,200 ppm !&quot;

&quot;My finding was, that spending less than an afternoon you can improve a 
collection of some 20,000 NMR-spectra having more than 200,000 shiftvalues 
by 0.03ppm or ca. 6,200ppm in total.&quot;

My summary was:

&quot;That&#039;s an amazing result ! NMRShiftDB has been online for about 5 years, 
the 20 most important contributors are mentioned on their homepage by name 
and I think many other people have contributed into this project. All 
these people together were unable to spend less than an afternoon on 
data-correction within 5 years! That&#039;s exactly the point nobody seems to 
(be willing to) understand !&quot;

Peter Murray-Rust stated on his blog with respect to NMRShiftDB-data:

&quot;...... It contains mechanisms for assessing data quality automatically. 
For example software can be run that will indicate whether values are 
seriously in error.........&quot;

My answer was:

&quot;That&#039;s really a great idea AFTER being 5 years on the web ! ....... let 
me know which data-correction protocol has been applied to the 
NMRShiftDB-data leading to deviations of about 130ppm (= 2/3 of the usual 
C-NMR shift range) BEFORE you put them on the web ! ......&quot;

Final remarks:

Only 4 main-contributors are involved in this discussion:

That&#039;s me - I have started this &#039;avalanche&#039;
Antony Williams and Ryan Sasaki from ACD
Peter Murray-Rust from Cambridge

I couldn&#039;t find any comment by NMRShiftDB people including Christoph 
Steinbeck.

In order to improve science with a project like NMRShiftDB, the first step 
should be to look around what is already there (&quot;State-of-the-art&quot;), the 
second step should be to avoid the errors other people have already paid for
(&quot;Avoid starting from scratch&quot;) and the third step should be avoiding
hacking the same numbers/strings into a computer (&quot;share resources&quot;)
The community needs chemical data, THEY SHOULD BE VALIDATED ACCORDING
TO THE STATE-OF-THE-ART (when you cite me, please use both parts of this 
sentence!) - I have shown by spending a few minutes CPU-time, that the 
second part of this sentence was not true for NMRShiftDB as downloaded
on March 10th, 2007.

What&#039;s the result of this discussion:

a) We know how accurate 3 programs perform on the same dataset
b) There was severe error-correction performed on NMRShiftDB after
   my analysis

Both items are valuable contributions for the scientific community.

At the end a personal remark to Christoph Steinbeck:

    Copied from: sourceforge.net/forum/forum.php?forum_id=681882

    Start citation ----

    ..... We also feel that this makes a strong case for our open access, 
open source policy, which gave our reviewer the chance to access our full 
material and run this test. As Eric Raymond puts it: &quot;Given enough 
eyeballs, all bugs are shallow&quot;

    End of citation ----

    I clearly state here: Don&#039;t use the &quot;open access, open source policy&quot;
    as excuse. You simply have the responsibility as database supplier
    and/or project manager to apply basic statistical tests on the data,
    BEFORE you make them available to the scientific community in order
    to obtain a reasonable quality of the data you provide. I am  
    talking only about errors, which can be found just by &#039;snipping
    fingers&#039;. Obviously this point has been missed over 5 years.

Wolfgang Robien, June 6th, 2007]]></description>
		<content:encoded><![CDATA[<p>NMRShiftDB Debate</p>
<p>CSEARCH performs at 2.22ppm/2.19ppm before/after correction with a certain<br />
network and a certain parameter setting.</p>
<p>ACD&#8217;s CNMR-Predictor performs at 1.59 as can be seen from the table above.</p>
<p>MODGRAPH&#8217;s NMRPredict performs at 1.40ppm as can be seen from their<br />
website.</p>
<p>That&#8217;s the facts, which came out from the discussion now &#8230;.</p>
<p>I think, it&#8217;s good to know that, verified on the same dataset. I initiated<br />
this discussion, therefore I want to recall the starting point of this<br />
discussion:</p>
<p>The central question of my webpage at nmrpredict.orc.univie.ac.at was:<br />
(Statements copied from:<br />
nmrpredict.orc.univie.ac.at/csearchlite/Robien2Ryan_May31_2007.html<br />
nmrpredict.orc.univie.ac.at/csearchlite/enjoy_its_free.html)</p>
<p>&#8220;Is there a visible improvement when you spent a few hours on<br />
data-correction, both on a statistical basis and also with a few specific<br />
examples?&#8221;</p>
<p>The answer was:</p>
<p>&#8220;This small improvement &#8211; which is far away from being perfect<br />
and/or complete &#8211; without the use of one, single literature citation<br />
improves the assignment quality by ca. 6,200 ppm !&#8221;</p>
<p>&#8220;My finding was, that spending less than an afternoon you can improve a<br />
collection of some 20,000 NMR-spectra having more than 200,000 shiftvalues<br />
by 0.03ppm or ca. 6,200ppm in total.&#8221;</p>
<p>My summary was:</p>
<p>&#8220;That&#8217;s an amazing result ! NMRShiftDB has been online for about 5 years,<br />
the 20 most important contributors are mentioned on their homepage by name<br />
and I think many other people have contributed into this project. All<br />
these people together were unable to spend less than an afternoon on<br />
data-correction within 5 years! That&#8217;s exactly the point nobody seems to<br />
(be willing to) understand !&#8221;</p>
<p>Peter Murray-Rust stated on his blog with respect to NMRShiftDB-data:</p>
<p>&#8220;&#8230;&#8230; It contains mechanisms for assessing data quality automatically.<br />
For example software can be run that will indicate whether values are<br />
seriously in error&#8230;&#8230;&#8230;&#8221;</p>
<p>My answer was:</p>
<p>&#8220;That&#8217;s really a great idea AFTER being 5 years on the web ! &#8230;&#8230;. let<br />
me know which data-correction protocol has been applied to the<br />
NMRShiftDB-data leading to deviations of about 130ppm (= 2/3 of the usual<br />
C-NMR shift range) BEFORE you put them on the web ! &#8230;&#8230;&#8221;</p>
<p>Final remarks:</p>
<p>Only 4 main-contributors are involved in this discussion:</p>
<p>That&#8217;s me &#8211; I have started this &#8216;avalanche&#8217;<br />
Antony Williams and Ryan Sasaki from ACD<br />
Peter Murray-Rust from Cambridge</p>
<p>I couldn&#8217;t find any comment by NMRShiftDB people including Christoph<br />
Steinbeck.</p>
<p>In order to improve science with a project like NMRShiftDB, the first step<br />
should be to look around what is already there (&#8220;State-of-the-art&#8221;), the<br />
second step should be to avoid the errors other people have already paid for<br />
(&#8220;Avoid starting from scratch&#8221;) and the third step should be avoiding<br />
hacking the same numbers/strings into a computer (&#8220;share resources&#8221;)<br />
The community needs chemical data, THEY SHOULD BE VALIDATED ACCORDING<br />
TO THE STATE-OF-THE-ART (when you cite me, please use both parts of this<br />
sentence!) &#8211; I have shown by spending a few minutes CPU-time, that the<br />
second part of this sentence was not true for NMRShiftDB as downloaded<br />
on March 10th, 2007.</p>
<p>What&#8217;s the result of this discussion:</p>
<p>a) We know how accurate 3 programs perform on the same dataset<br />
b) There was severe error-correction performed on NMRShiftDB after<br />
   my analysis</p>
<p>Both items are valuable contributions for the scientific community.</p>
<p>At the end a personal remark to Christoph Steinbeck:</p>
<p>    Copied from: sourceforge.net/forum/forum.php?forum_id=681882</p>
<p>    Start citation &#8212;-</p>
<p>    &#8230;.. We also feel that this makes a strong case for our open access,<br />
open source policy, which gave our reviewer the chance to access our full<br />
material and run this test. As Eric Raymond puts it: &#8220;Given enough<br />
eyeballs, all bugs are shallow&#8221;</p>
<p>    End of citation &#8212;-</p>
<p>    I clearly state here: Don&#8217;t use the &#8220;open access, open source policy&#8221;<br />
    as excuse. You simply have the responsibility as database supplier<br />
    and/or project manager to apply basic statistical tests on the data,<br />
    BEFORE you make them available to the scientific community in order<br />
    to obtain a reasonable quality of the data you provide. I am<br />
    talking only about errors, which can be found just by &#8216;snipping<br />
    fingers&#8217;. Obviously this point has been missed over 5 years.</p>
<p>Wolfgang Robien, June 6th, 2007</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ryan's Blog on NMR Software</title>
		<link>http://www.chemspider.com/blog/further-comments-on-the-quality-of-nmrshiftdb-and-nmr-prediction-algorithm-validation.html/comment-page-1#comment-392</link>
		<dc:creator>Ryan's Blog on NMR Software</dc:creator>
		<pubDate>Thu, 31 May 2007 18:13:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.chemspider.com/blog/?p=44#comment-392</guid>
		<description><![CDATA[&lt;strong&gt;More Dialogue on NMRShiftDB Debate&lt;/strong&gt;

Peter Murray-Rust and Tony Williams have added their two cents to this debate on their respective blogs. Peter provides a great justification on providing open access to scientific information:In the case of NMRShiftDB I am firmly of the opinion that]]></description>
		<content:encoded><![CDATA[<p><strong>More Dialogue on NMRShiftDB Debate</strong></p>
<p>Peter Murray-Rust and Tony Williams have added their two cents to this debate on their respective blogs. Peter provides a great justification on providing open access to scientific information:In the case of NMRShiftDB I am firmly of the opinion that</p>
]]></content:encoded>
	</item>
</channel>
</rss>
