An announcement was made on the Blue Obelisk Discussion List this week reagrding a new database of 4 million molecules at present but up to 50 million molecules in the future. It is called molecules.gnu-darwin.org/ and lists with the following comments:

Some facts: The Molecules website contains more than 4 million small molecule structure files in pdb format, and molecular graphics representations. About 50 million molecules are still in the pipe, and they are expected to appear here over the course of the next few weeks and months. The pdb format is readable by common FOSS molecule viewer software, such as RasMol and PyMOL. In due course, we plan to provide high quality structures via energy minimization refinement, and additional resources.

Molecules@gnu-darwin.org is founded in the spirit of free software, open source, and public access. It is hoped that access to these files will be a wonderful community resource for science education, research, and entertainment as well. We are looking for investment or funding to expedite and expand this work, and lead the field, with an eye towards an advanced, complete, synthetic, structural, and informatical bioorganome. Meanwhile, the site is already an exceptional lab resource, and molecular catalog, providing the means and building blocks towards additional novel structures. We aim to be the best.

The structural biology, protein crystallography, and molecular graphics talent that is building the Molecules archive is available to work for you in a contract or consulting arrangement. Wide-ranging expertise is available. Molecules@gnu-darwin.org is built entirely with FOSS, free and open source software, GNU-Darwin OS, and it is under the aegis of The GNU-Darwin Distribution. Here is a link to the Distribution résumé. Our founder is an X-ray laboratory admin for the Department of Biophysics and Biophysical Chemistry of Johns Hopkins University School of Medicine. You can also read his CV. We would like to build a community around this website, and we are looking for volunteers and collaborators to help. Regarding any aspect of the work of this site, please feel free to contact us, molecules@gnu-darwin.org, with gdmolecules in the subject line. Cheers!”

I’m always interested in potential databases to connect to that will add additional capabilities and diversity to ChemSpider’s information. I have browsed the database and searched on some common molecules (Xanax, aspirin, Taxol and others) and found no hits. This seemed strang but it does say “Search warning: not yet fully spidered

The statement that there are 50 million molecules in total coming suggests that the database is a republication of PubChem and the SDF archives seem to suggest so too since they redirect to PubChem for the download: http://molecules.gnu-darwin.org/ftp.ncbi.nlm.nih.gov/pubchem/Substance/CURRENT-Full/

At present the database therefore appears to be the PubChem database in PDB format. I hope that there is some additional information added to warrant our linking to this new database.

Stumble it!

Leave a Reply