Today I had the privilege of meeting with many members of the team creating the RCSB Protein Data Bank. This resulted from the wonderful networking opportunity offered by the Scifoo camp held earlier this year at Google where I met Helen Berman, director of the PDB team, part of the worldwide Protein Data Bank. Helen and I shared some conversations sitting outside the Google offices in California and shared our opinions and visions regarding the quality of small molecule data available online. Today was an opportunity to take those conversations further, meet with members of the team and determine whether ChemSpider’s efforts could bring benefit to the PDB in terms of our curation efforts and whether ChemSpider users could benefit from having access to information on the PDB via hosting of the PDB ligand dictionary.
I gave a presentation (online here and based on others I have delivered previously) and received a one on one review of the deposition and curation processes of the PDB as well participated in a group discussion about how to continue the stringent and exacting process of validation and curation associated with small molecule structure sets. We discussed the complex relationships between systematic names, trivial names, registry IDs, database IDs, tautomers, charged states, SMILES and InChIs. It was a particularly validating day to spend time with a group of people who have responsibility for building one of the most valuable resources in the world and have faced the many challenges associated with validating structure-based data. There is a distinction between people who talk about what it takes to curate structure collections rather than those who actually do the job for a living. This team is made up of dedicated, passionate and skilled individuals who deeply care about the quality of their data and who do the heavy lifting and grunt work so that the users of the PDB enjoy the benefits. They have been working on a multi-year process to curate and improve the PDB data and are in the final major phase of the effort to clean up the archive and apply the processes to all new data moving forward . ChemSpider and PDB will be more integrated in the near future and we look forward to supporting their efforts for providing high quality structure data to the community and continuing to expand the network of integrated online chemistry.