<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Physical Property Predictions &#8211; Filtering Out Potential Problematic Data on ChemSpider&#8230;or is it NOT a problem?</title>
	<atom:link href="http://www.chemspider.com/blog/physical-property-predictions-filtering-out-potential-problematic-data-on-chemspideror-is-it-not-a-problem.html/feed" rel="self" type="application/rss+xml" />
	<link>http://www.chemspider.com/blog/physical-property-predictions-filtering-out-potential-problematic-data-on-chemspideror-is-it-not-a-problem.html</link>
	<description>Building Community for Chemists</description>
	<lastBuildDate>Fri, 24 May 2013 06:45:34 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5</generator>
	<item>
		<title>By: ChemSpider Blog &#187; Blog Archive &#187; Prediction Errors and Filtering the ChemSpider Database - How Accurate Does a Prediction Need to Be?</title>
		<link>http://www.chemspider.com/blog/physical-property-predictions-filtering-out-potential-problematic-data-on-chemspideror-is-it-not-a-problem.html/comment-page-1#comment-230</link>
		<dc:creator>ChemSpider Blog &#187; Blog Archive &#187; Prediction Errors and Filtering the ChemSpider Database - How Accurate Does a Prediction Need to Be?</dc:creator>
		<pubDate>Tue, 22 May 2007 11:05:34 +0000</pubDate>
		<guid isPermaLink="false">http://www.chemspider.com/blog/?p=26#comment-230</guid>
		<description><![CDATA[[...] had previously added comments to his post regarding my questions. Based on this feedback and other comments on blog postings and email exchanges it&#8217;s time to summarize our path forward and the reasons [...]]]></description>
		<content:encoded><![CDATA[<p>[...] had previously added comments to his post regarding my questions. Based on this feedback and other comments on blog postings and email exchanges it&#8217;s time to summarize our path forward and the reasons [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Greg Pearl</title>
		<link>http://www.chemspider.com/blog/physical-property-predictions-filtering-out-potential-problematic-data-on-chemspideror-is-it-not-a-problem.html/comment-page-1#comment-203</link>
		<dc:creator>Greg Pearl</dc:creator>
		<pubDate>Mon, 21 May 2007 12:16:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.chemspider.com/blog/?p=26#comment-203</guid>
		<description><![CDATA[All of these questions essentially boil down to the philosophical question of whether or not any value is better than nothing.  The problem of this is that the answer is dependent upon the objective of the individual user and is hence quite similar to opinions, everyone has one.  

Predicting Isotopes: The only isotope that has potential of significantly effecting the calculated physical properties (excluding MW) would be deuterium when it is replacing hydrogen in a potential hydrogen bond.  

Multi-component systems:  There is potential value in providing the predicted properties for these systems.  It might be useful to set an additional text field that is a count of the number molecules/ions in a record.  Then users could easily limit their search to compounds containing multiple molecules, additionally we can put a notice on the multi-component systems that indicates the predicted value is for the largest molecule/ion and based upon the protonation of the molecule/ion to form a neutral species.

Filtering Data:  Generally speaking the more search options the better with the following caveat that the interface must remain uncluttered.  So would recommend that an additional search panel is created that enables the user to search any data field/ meta data that is available.

So there are two possible solutions...  
1. Create 2 different systems (1 for curated data and 1 for non-curated)
2. Design Data-structure to easily enable users to select which type of data they want to utilize
3. Assign Weighting factor to the data that describes the quality.  So using the wilkipedia concept allow the users to assign a grade to the data and then aggregate the data accordingly
4......

Good Luck, Glad to see that progress is being made to improve the access to chemical data on the internet....]]></description>
		<content:encoded><![CDATA[<p>All of these questions essentially boil down to the philosophical question of whether or not any value is better than nothing.  The problem of this is that the answer is dependent upon the objective of the individual user and is hence quite similar to opinions, everyone has one.  </p>
<p>Predicting Isotopes: The only isotope that has potential of significantly effecting the calculated physical properties (excluding MW) would be deuterium when it is replacing hydrogen in a potential hydrogen bond.  </p>
<p>Multi-component systems:  There is potential value in providing the predicted properties for these systems.  It might be useful to set an additional text field that is a count of the number molecules/ions in a record.  Then users could easily limit their search to compounds containing multiple molecules, additionally we can put a notice on the multi-component systems that indicates the predicted value is for the largest molecule/ion and based upon the protonation of the molecule/ion to form a neutral species.</p>
<p>Filtering Data:  Generally speaking the more search options the better with the following caveat that the interface must remain uncluttered.  So would recommend that an additional search panel is created that enables the user to search any data field/ meta data that is available.</p>
<p>So there are two possible solutions&#8230;<br />
1. Create 2 different systems (1 for curated data and 1 for non-curated)<br />
2. Design Data-structure to easily enable users to select which type of data they want to utilize<br />
3. Assign Weighting factor to the data that describes the quality.  So using the wilkipedia concept allow the users to assign a grade to the data and then aggregate the data accordingly<br />
4&#8230;&#8230;</p>
<p>Good Luck, Glad to see that progress is being made to improve the access to chemical data on the internet&#8230;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Antony Williams</title>
		<link>http://www.chemspider.com/blog/physical-property-predictions-filtering-out-potential-problematic-data-on-chemspideror-is-it-not-a-problem.html/comment-page-1#comment-116</link>
		<dc:creator>Antony Williams</dc:creator>
		<pubDate>Wed, 16 May 2007 12:29:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.chemspider.com/blog/?p=26#comment-116</guid>
		<description><![CDATA[This is a link back to Peter Murray-Rust&#039;s comments at his blog to retain linkage. 
http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=327#comments

#  pm286 Says:
May 16th, 2007 at 7:39 am

(1)

1) DO people believe that isotopes will make a difference (within prediction error) to the calculation of the physicochemical properties predicted. I have my own judgments but put this question out there for public feedback.

PMR&gt; Deuterium has a significant influence on many physical properties - e.g. boiling point of D2O - and obviously on vibrational frequencies. But in general it depends on the accuracy and precision of the property. We, for example, compute phonons of crystalline materials and these are certainly isotope dependent.

2) Should all multi-component systems be excluded? I demonstrated clearly in an earlier post that prediction of LogP for CaCO3 was appropriate so should it be excluded or not?

PMR&gt; It depends what your properties are. Until you address the aspect of physical state I would strongly suggest you omit multi-component systems. For example we are working with calcite, vaterite and other forms of CaCO3 and these have many properties that depend on the polymorph. In principle (log)P should be independent of polymorph but I would be suspicious of this for many systems

You commented â€œThese are very close to the filters based on molecular formula which I would recommend. Since I donâ€™t have knowledge of your metadata (e.g. date, format, contributor, etc.) I canâ€™t comment, but it may be that these are also useful filters.â€. So, the ones I have suggested are closeâ€¦what additional ones would you suggest?

PMR&gt; I donâ€™t know what your properties actually are - the only ones displayed are MW, (log)P, polar surface area and volume. Since I donâ€™t know the algorithm for the last two I anâ€™t comment, but I would expect both to depend on molecular flexibibility.

Also, we DO have the date, format and contributor data available. How would you use these data yourself to make a decision to predict physchem properties. Assuming all data available as MOL/SDF files how would date of submission and contributor info be used?

PMR&gt; Since I assume you have compared experiment with prediction I would look to see if outliers showed any predictabiluty of source, data, etc. For example some submitters may routinely get molecular formulae garbled (e.g. hydrogen atoms). We have found that some have garbled Celcius and Kelvin - several crystallographic experiements were reported at 298 degC, which is almost certainly an error.]]></description>
		<content:encoded><![CDATA[<p>This is a link back to Peter Murray-Rust&#8217;s comments at his blog to retain linkage.<br />
<a href="http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=327#comments" rel="nofollow">http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=327#comments</a></p>
<p>#  pm286 Says:<br />
May 16th, 2007 at 7:39 am</p>
<p>(1)</p>
<p>1) DO people believe that isotopes will make a difference (within prediction error) to the calculation of the physicochemical properties predicted. I have my own judgments but put this question out there for public feedback.</p>
<p>PMR> Deuterium has a significant influence on many physical properties &#8211; e.g. boiling point of D2O &#8211; and obviously on vibrational frequencies. But in general it depends on the accuracy and precision of the property. We, for example, compute phonons of crystalline materials and these are certainly isotope dependent.</p>
<p>2) Should all multi-component systems be excluded? I demonstrated clearly in an earlier post that prediction of LogP for CaCO3 was appropriate so should it be excluded or not?</p>
<p>PMR> It depends what your properties are. Until you address the aspect of physical state I would strongly suggest you omit multi-component systems. For example we are working with calcite, vaterite and other forms of CaCO3 and these have many properties that depend on the polymorph. In principle (log)P should be independent of polymorph but I would be suspicious of this for many systems</p>
<p>You commented â€œThese are very close to the filters based on molecular formula which I would recommend. Since I donâ€™t have knowledge of your metadata (e.g. date, format, contributor, etc.) I canâ€™t comment, but it may be that these are also useful filters.â€. So, the ones I have suggested are closeâ€¦what additional ones would you suggest?</p>
<p>PMR> I donâ€™t know what your properties actually are &#8211; the only ones displayed are MW, (log)P, polar surface area and volume. Since I donâ€™t know the algorithm for the last two I anâ€™t comment, but I would expect both to depend on molecular flexibibility.</p>
<p>Also, we DO have the date, format and contributor data available. How would you use these data yourself to make a decision to predict physchem properties. Assuming all data available as MOL/SDF files how would date of submission and contributor info be used?</p>
<p>PMR> Since I assume you have compared experiment with prediction I would look to see if outliers showed any predictabiluty of source, data, etc. For example some submitters may routinely get molecular formulae garbled (e.g. hydrogen atoms). We have found that some have garbled Celcius and Kelvin &#8211; several crystallographic experiements were reported at 298 degC, which is almost certainly an error.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
