Copyright©2009 Antony Williams
A few weeks ago I noticed that PubChem had grown substantially after a deposition from the Zinc group.I had thought, incorrectly, that this was due to the deposition of protonated forms of the ZINC database because they produce such forms as part of their docking procedures. I had discussed this possibility with Evan Bolton from the PubChem team when we were at the InChI meeting in Glasgow. In fact, this was not due to the different protonation states but because ZINC had deposited 12M make-on-demand compounds that they hold in their catalogs. For me these are virtual chemicals. The vendors involved with the deposition of such chemistry into the Zinc Database have done research to demonstrate that the chemistries that would be involved in the production of these chemicals, when ordered, would have a good probability of being synthesized but they are, for the time-being, virtual compounds only. In the early days of ChemSpider we went through a discussion internally regarding whether or not we should open ourselves to the deposition of virtual compounds and we did add a dataset from the UsefulChem team from Drexel University. Since then however we have steered away from the deposition of such libraries. As explained on the Zinc blog a decision was made to remove 12 million of the make-on-demand chemicals as “Pubchem’s rules require that compounds have been made somewhere before they be included”. I’m fairly sure that what is left on PubChem does not fully exclude such compounds as they are deposited by a number of vendors who have the ability to submit such collections but I appreciate the effort made by ZINC to remove their deposition from this class.
I am interested in community feedback on this matter. Should ChemSpider host collections of virtual chemistry? There is certainly value for people who wish to perform such activities as virtual screening but we don’t allow downloads of our entire database the way that ZINC and PubChem would. We are focused on layering on more information associated with a chemical compound at present.- physicochemical properties, spectra, article links, patents etc. We want to make sure that the chemistry represented in the backfile of RSC articles makes it onto ChemSpider in the future. This parallels some of the efforts being made by Fiz Chemie and InfoChem to make available the backfile of Chemisches Zentralblatt. We want to make sure that the compounds in the Natural Product Updates file from RSC make it onto ChemSpider. We have a lot to do but the focus is getting real data, real structures onto the database and removing “junk chemistry” from the deposited data.That said we are interested in your comments. What are your thoughts regarding “virtual chemistry”? Should we support virtual compounds or not? For sure there will always be some virtual chemistry on there in some form – for example, products that were thought to once be elucidated but were later shown to be something else are virtual chemistry. Compounds that have been deposited with incomplete stereochemistry can be “partial chemistry” if you like. Your thoughts and comments are welcomed.Stumble it!