29
07
2008
Another 1/2 million compounds added to ChemSpider
Posted by: Antony Williams in Community Building, Quality and ContentCopyright©2008 Antony Williams
We’ve been enhancing our deposition system so that the addition of 10s of thousands of new compounds to ChemSpider doesn’t have too big an impact on the performance of ChemSpider. The deposition of every structure demands the calculation of associated properties and deduplication against the database and needed to be optimized. As a result of our improved processing we are now cleaning up our backlog of new structures, something which is well overdue we know but we didn’t want to overly stress the servers for our users. New data are now on the database from the following companies. There are more to come…



Entries (RSS)
July 29th, 2008 at 4:01 am
Antony, I guess this is not 0.5M new compounds, or? Would be interesting how many new unique compounds have been added…
July 29th, 2008 at 9:35 am
Egon…they are definitely NOT all Unique compounds. But I would say that the majority are…probably over >200,000 but this is an estimate. I can tell this by watching the streams of new CSIDs that come through:
http://www.chemspider.com/Chemical-Structure.21351436.html
http://www.chemspider.com/Chemical-Structure.21351437.html
http://www.chemspider.com/Chemical-Structure.21351438.html
http://www.chemspider.com/Chemical-Structure.21351439.html
http://www.chemspider.com/Chemical-Structure.21351440.html
http://www.chemspider.com/Chemical-Structure.21351441.html
http://www.chemspider.com/Chemical-Structure.21351442.html
http://www.chemspider.com/Chemical-Structure.21351443.html
http://www.chemspider.com/Chemical-Structure.21351444.html
http://www.chemspider.com/Chemical-Structure.21351445.html
http://www.chemspider.com/Chemical-Structure.21351446.html
http://www.chemspider.com/Chemical-Structure.21351447.html
http://www.chemspider.com/Chemical-Structure.21351448.html
http://www.chemspider.com/Chemical-Structure.21351449.html
http://www.chemspider.com/Chemical-Structure.21351450.html
http://www.chemspider.com/Chemical-Structure.21351451.html
http://www.chemspider.com/Chemical-Structure.21351452.html
http://www.chemspider.com/Chemical-Structure.21351453.html
http://www.chemspider.com/Chemical-Structure.21351454.html
http://www.chemspider.com/Chemical-Structure.21351455.html
(the list shows only 20 the first structures)
July 29th, 2008 at 8:20 pm
Tony, looks like Egon’s comment has disappeared. My question also related to uniqueness, quantitatively speaking. I’m wondering what story would be told by just a simple chart showing percentage of “new” compounds as a function of time for ChemSpider. Maybe even a chart showing number of duplicate compound submissions as a function of time.
There are many ways to slice it, but it’s all pretty interesting given how new the concept of a large, public-facing chemical database is.