Oh boy do we have a lot of things to do with ChemSpider. Not only now, while shifting ChemSpider to the RSC infrastructure, but in the future as we do the work necessary to make ChemSpider the primary internet resource for structure-based chemistry. We don’t have small eyes in terms of what we want to deliver to the community. Far from it…we have big eyes and big ideas regarding what is possible and even, in most cases, how to get there. What is clear is that we need the appropriate skill sets to make it happen. At present all ChemSpider platform development work is done by our team over here in the US. We are looking to add a team member into the RSC Offices in Cambridge. We’re looking for someone with established Cheminformatics skills to work with us. They need to have an established track record in working in the field of Cheminformatics, have a deep knowledge of handling chemical structures, experience in working with web-based systems and, of course, have a big appetite for making a difference and wants to work with a fast-moving team. If you’re interested in talking with us about the opportunity ping me at antonyDOTwilliamsATchemspiderDOTcom.
Archive for the Uncategorized Category
There are a small number of primary chemical vendors serving the industry. These include companies such as Sigma Aldrich, Spectrum Chemical, Alfa Aesar, ThermoFisher and many others. There are also thousands of smaller companies serving the industry with their chemicals. These can very from a dozen to a few hundred chemicals but rarely number into the 10s of thousands offered by the larger companies. The large chemical companies offer excellent services in terms of delivery of catalogs to the door and circulation of updated CDs of information. I find the Aldrich catalog an excellent tool and have one on my desk, underneath my Merck Index.
Those smaller chemical companies are in the long tail of suppliers that the majority of chemists will never even hear of. Not unless there is some way for those suppliers to deliver their message regarding their list of products, availability and overall their existence, to interested parties. In China specifically there are many hundreds of small chemical companies popping up now. They cannot afford to market themselves via CD distribution and catalogs to their potential userbase and have to depend on their website to market their wares. They likely deposit their collections to the Available Chemical Directory from Symyx (a GREAT product and with a lot of quality work going into it in the background!), maybe into ChemACX from Cambridgesoft, onto ChemExper or onto the eMolecules site. Some of these offer up to date pricing and procurement systems while others offer simply “Get me a Quote” services whereby a chemist can request a quote directly from the vendor for the material of interest.
ChemSpider has been depositing chemical compound collections for chemical vendors, both large and small, for many months. The word seems to have got out that there is value to doing this. Despite the fact that we do not have, at present, the ability to list real time or availability pricing for compounds chemical vendors appear to be deriving value from the listings and chemists are finding chemicals for purchase via ChemSpider.
if there is a certain small molecule chemical vendor that you think we should list on ChemSPider let them know to contact us OR point us to their URL and we will contact them. One example of data added just today is the data set, small though it is, from Asiaron. They offer rich compound pages like this and are a good addition to the database.
I have ChemMobi running on my iPhone now and, I am happy to say, it looks just like it should. While visiting the RSC in Cambridge a couple of weeks ago I had a chance to hang out with James Jack, the Symyx consultant responsible for developing ChemMobi. That’s him on the left. No, that’s not him trying to hunt sharks with hand held harpoons, it’s him driving the “ChemSpider punt” in a race against the IT team from the RSC. Since we weren’t locals it seemed appropriate to challenge us to a speed punt down the river. This was of course preceded by the imbibing of adequate amounts of flavored water and juices.
Strangely enough all of us in the ChemSpider punt did appear to have some undiscovered talents for punting. We very quickly lost the IT team back at the “juice house” and found them when we had finished our loop back from our destination. We realized that we had an unfair advantage since we had a dopted a strategy of punting from the surface of the vessel. They had not defined to us that they were doing the whole race in their own way…pushing with a pole while immersed. That’s our colleague Doug Spooner from the IT team showing us how to do it “IT style”.
ChemMobi will soon be posted to the App Store for you all to download and use. I’ll let you know when…hopefully within a week. All glory, love and adoration for the App should go to James jack and to Symyx for allowing him to do what he does best…get creative with software and structures!
It’s been a long time since I blogged here on the ChemSpider blog. Now I am officially an employee of the Royal Society of Chemistry and have spent a week in Cambridge meeting my new colleagues, discussing the transfer of ChemSpider to their servers for hosting and working on plans for a relaunch of ChemSpider later in the year. More about that later. I’ll be back in action on this blog in the coming week.
I actually write on two blogs. This one will now be dedicated to ChemSpider activities specifically and focus on new functionality, plans and vision for ChemSpider as a service. My other blog, the ChemConnector blog (www.chemconnector.com/chemunicating) will be more of a personal blog. My views of cheminformatics, activities in Chemistry and Science, Open Science, Open Access and Open Data and other things that interest me.
Glad to be back and looking forward to connecting with everyone again.
A couple of days ago I asked whether readers could see any issues with the structure of Micrococcin P1 published in the C&E News article this week. A few people took a stab on blog and off blog but only Stuart Cantrill from the Nature Publishing Group got it right. One double bond in the wrong place. Subtle, but rather important. General structure drawing tools will help with things like this. For example, a human might not see the issue in the structure of Taxol to the left very easily. Software tools designed to flag valency issues will show the issue easily.
In the expanded image the pentavalent carbon is marked. The same type of tools would have shown a positive charge on the sulphur in the ring for the incorrect structure of Micrococcin.In the same way, software tools can recognize charge imbalances and incomplete stereochemistry.
I sent an email to the editor of C&E News when I noticed the structure issue but didn’t get a response. Nevertheless it is an advantage of online publications that images can be swapped out easily. This has been done for the online article here at this point and the change, while subtle, is there (shown below).
The structure is now on the ChemSpider database here.
Drawing accurate representations of chemical structures is difficult. Copying them from publications can be fraught with errors and it is common to see that structures in publications are incomplete in their definitions of stereochemistry and that groups are missing anyway. Such is the nature of the beast. I have blogged recently about an observation of a structure drawing error in C&E News and the editor was kind enough to comment. Here’s an image of a structure from a C&E News article about Micrococcin P1 from this weeks magazine. Check out the structure….can you see any issues?
Now that ChemSpider is part of the RSC we will be able to offer some of our experiences in identifying potential errors in structures before they are published. There are ways to do this so that both authors and editors alike get flagged to such issues. This is way down the road from migrating ChemSpider to RSC servers but would definitely bring value to helping to ensure quality of data in Chemistry.
Feel free to post your comments regarding any issues you see with the structure as drawn.
PhysChim62 (PC) is someone I meet with regularly on the Wikipedia Chemistry IRC chats. We’ve never met but I judge we have mutual respect, earned through many hours of working to improve the chemistry on Wikipedia. PC has been at it for a long time and has a broad reach in the WP community…I’m focused primarily on structure validation and delivering tools which can be of value to Wikipedians. If you have an interest in Chemistry on Wikipedia it’s one to add to your blogroll/reader as PC will likely touch on this quite regularly, as well as other things of interest. The blog is at http://phoscarb.blogspot.com/.
I’m heading over to the UK shortly for a week-long meeting with the RSC. In case there is any confusion I WILL be an employee of the RSC working on ChemSpider and we are building our ChemSpider team at present. I’m really looking forward to the meeting as I have already met many of the people and they are skilled, focused and yet lighthearted and funny. Yes, funny. Maybe it comes with territory of working with a young, passionate team of people. One thing about the RSC that I enjoyed during my last visit was the ENERGY in the building. The place is buzzing. There is a lot of young passionate energy with mature skills in the building and it is focused on growing the reputation and impact of the society. Even the “older guys” of which I am now one (!) have this youthful spirit that they bring to RSC. It’s great.
BUT, enough is enough. Okay, I might still run 5km a few days a week, and I might still lift weights a few times a week but gravity is not my friend and I do not have the lithe, supple physique that I had as a 30 year old. Add to that twin boys tearing me apart and bilateral rotator cuff injuries from said boys and I have not been able to stay in shape to the level I had hoped this past year. So, imagine my surprise when I am told that for the inaugral ChemSpider presentation to RSC staff in June I will be expected to dress appropriately. Here’s me thinking that meant a shirt and tie (and best behavior) but no…here comes a package with a “party dress” for me. Sure…make fun of the ChemSpiderman moniker why don’t you! Look at that costume. I wouldn’t wear it when I was young and lithe. Not my thing that. Sorry guys, I have my limits..it’ll be shirt and tie and maybe best behavior but no Lycra Spandex Spidey suit for me for my presentation at RSC!
The logo to the left says it all really. The Royal Society of Chemistry has acquired ChemSpider. Is that a good thing? ABSOLUTELY it’s a good thing. One of the most prestigious, forward-looking, high-quality and innovative societies in the world, who have already demonstrated their commitment to the Chemistry community, have chosen to bring ChemSpider under their wing and give it a home. This is good for us for a number of reasons. Specifically we will no longer have to deal with our very significant resource limitations but more than that it lends credence and validation to the work that we have been doing over the past 2 years. It seems so long ago now but ChemSpider was first unveiled to the world at the ACS Spring meeting 2007. What began then only as a hobby project is now being recognized by the community as one of the primary resources for internet chemistry.
ChemSpider has an interesting story really. It was started to release our creativity on the world of internet chemistry to see if we could deliver value and something more than was already available. It was clear that PubChem was becoming a valuable resource for the world of drug discovery, that Wikipedia was gaining traction for encyclopedic articles and that eMolecules/Chmoogle was out to help people purchase chemicals. It didn’t seem that anyone was going after the challenge of becoming a centralized resource for integrating these resources together (and others of course). The development of a structure-centric platform for the community allowing depositions, curation and annotation and expansion to allow linking to articles, blogposts, wikis and the hosting of analytical data, prediction engines and other software utilities for the community seemed appropriate. And so we began. We were applauded for our efforts by some and dismissed and ridiculed by others. Nevertheless we plodded forwards forming relationships, expanding our network, increasing our visibility and expanding our reach in terms of integrated resources. With a clear focus on serving the community, a passion for quality and an intention to stay in relationship with our users, contributors and supporters we worked hard. Very hard.
Building ChemSpider has not been easy. It has not only been a labor of love but it has been done under duress at times, under severe time and resource constraints and with lots of late night hours. This time was given willingly, not only by our own intimate team but with significant contributions from some of our Advisory Group and by members of the community at large. We thank you all. We had support through sponsorship and this allowed us to cover the costs associated with improving our hardware and purchasing software and covering travel costs as necessary. Members of the commercial chemistry software community provided tools to us to use, at no cost. We were made welcome at conferences and round tables discussing the future of Open Chemistry. We grew our reputation by word of mouth only and by doing what we said we would do. Some of our early critics are now some of our loudest advocates. It’s all been very humbling, incredibly enlightening and genuinely invigorating (while also being very tiring!)
Over the past 2 years we have been approached by a number of organizations to merge/acquire/consume. In all cases things didn’t feel quite right. The experiences and instincts covered a diverse range: we might be acquired and switched off, we might be engulfed by bureaucracy and process that would prevent us from producing at the speed to which we and our users have become accustomed, and we might be offered career paths that could be destructive in terms of life balance (I’ve had parts of my life where I have not seen my own home for almost 3 months because of travel schedules and will not do that to my family again).
When we were approached by the RSC, and engaged in discussions with them about their interest in what we were doing, it was clear that we are like-minded. Our want is to have a positive impact on the flow of data, knowledge and information in the domain of chemistry. We are honest in our relationships and focused in producing results. We are doers and not talkers. We want what we produce to enhance the ability for chemists to access chemistry-related resources and speed up their research. Bottom line we want to help advance the chemical sciences. Do a search on “advancing the chemical sciences” on Google and see what comes out on top. Or don’t..just look below
The_RSC is focused on advancing the chemical sciences and we want to help! In fact, we’ve been destined to do so since ChemSpider went online and when RSC approached us it felt as if this could be a marriage made in heaven. Over the past few months of discussions matching up our interests and ideas with those of the RSC, and then going through the entire due diligence process it became clear that we are indeed well-matched. No, I’ll say ideally matched.
Things will never be the same again. Not just for us but for internet chemistry. We can now TRULY get to work and not worry about bandwidth constraints and how to buy our next disk drive. The community can stop worrying that their investments in time into expanding and enhancing ChemSpider will be lost. There is no need to worry about ChemSpider “going away”.
Watch this space. We will announce the new and improved ChemSpider later in the year but the present version will remain active for everyone for the time-being. We will be migrating the present version to RSC servers for improved performance over the next few weeks. Our long term goal is simple: To deliver the primary online platform where chemists will resource information and collaborate across the worldwide community of chemistry.
Tell us what you think. Please do. If you read this blog and have remained quiet previously please give us feedback about this announcement. We hope you will celebrate this path forward the way we are. It’s going to be just great!
Maybe it is the success of the Spectral Game that is driving more depositions of spectral data onto ChemSpider, or ChemSpider itself is garnering a greater following or we simply have some great supporters. Either way, there has been a significant increase in the number of spectra making their way onto ChemSpider with an increase in the number of IR spectra being deposited and an increase in the number of very high quality NMR spectra. I especially acknowledge the contributions being made by Heinz Kolshorn who is not only depositing spectra but also assignments. As an example see the spectra here and the associated assignments in the images section as shown below. Contributions from scientists such as Heinz continue to enhance ChemSpider and make it a rich resource for the community.
We recently released ChemSpider’s WikiBox service. Then we made a call for support so we could release multilingual support. Our friends on the Wikipedia Chemistry team like what we’re up to and PhysChim62 already gave us guidance for German and Spanish mapping. Now it is possible to generate ChemBoxes in both languages. Simply use the pulldown menu to choose the appropriate langauge. Simple.
In order to perform some routine maintenance Chempider will go offline tonight for approximately one hour around 10pm. Please don’t be surprised if Chempider is non-responsive at that time.
Steve Ritter from C&E News has given some wonderful feedback to my previous blogpost regarding where does C&E News source its structures. I admit to being overjoyed to have someone from the ACS organization respond in such a willing and open way regarding their processes as my previous attempts to connect with the organization regarding Open Data were interesting in a very different way (1,2,3). I will be following up with Steve to say thanks and see how we can help source structure images for him if necessary. I’ve copied his comments below and have inserted my own into his post.
Steve Ritter said:
Thanks to Antony for pointing out the mistake in the C&EN structure. I did inadvertently leave out the stereochemistry at the methyl on the side chain, and the geometry for one of the double bonds is incorrect.We publish several hundred structures per year in our 51 print issues and on our website, and inevitably we get some wrong–on the average five or fewer per year that I am aware of.
AJW> Steve and his colleagues do have a tough challenge as their efforts are seen by thousands of people every week and with the variation in quality out there it is not difficult to generate some mistakes. My experience would support his estimates.
We are grateful to our readers for pointing out the mistakes. In this case, a revised structure is being posted on our website and a correction will run in an upcoming print edition. Please check to see that the new structure is correct.
AJW> The one on Wikipedia has been edited by me tonight. Steve…feel free to grab the image from Wikimedia Commons here.
As for where we source our structures, our primary source is the researcher and peer-reviewed papers, because many compounds are novel. For known compounds, knowing that those can sometimes be wrong in papers, we always double check them against one or more primary sources, typically Merck Index and SciFinder.
AJW> I gave three ladies from the Merck Index at the ACS in Salt Lake City an overview of ChemSpider and they GAVE ME a copy of the latest Merck Index. I agree..it’s a RICH resource of correct and valid information.
Although CAS and C&EN are both part of the ACS Publications Division, we at C&EN still have to pay for our SciFinder access, strangely enough.
AJW> It’s not a surprise that they have to pay as I have experience of Fortune 500 America and “internal services” cost. But, it’s a shame that cost might have been barrier here, if it was.
To tell a woeful story, one that demonstrates it is never easy to make sure a structure is “correct,” I received a structure of domoic acid from the researcher I wrote about, as there was not one in the paper. But the structure was wrong–it was missing a methylene in one of the short carboxylic acid side chains. The researcher was not aware of that until I pointed it out, and that structure had been used in several published papers already. I noticed the error by checking the structure in the Merck Index.
When it came time for our artist to draw the structure, I did not really like its orientation in the versions I had. I checked SciFinder, and the structure there is identical to the Merck version, but SciFinder does indicate the absolute stereochemistry. I also checked the Web, and found the Wikipedia entry and several other references with the structure. As Antony noted, domoic acid is well known in the literature, but one sees it drawn myriad ways. I liked the orientation of the Wikipedia entry the best, and used that as a model to draw out the structure by hand for our artist to redraw. I checked my version against Merck, but I was focusing on the double bond geometry and missed the stereocenter when I drew it. That’s the long-winded version.
AJW> Steve, I am smiling at your long-winded version. Been there, done that. It’s HARD work!
It’s embarassing to make any kind of mistake, especially in C&EN. But it is a bit more so for me because every structure that appears in C&EN comes across my desk for scrutiny. It’s not the first time I missed something in a structure, and probably isn’t the last. We have a great staff of writers and editors that make such mistakes rare.
AJW> Join the club. It’s easy to make mistakes with complex structures. That’s why a public resource of validated structures is critical and I believe a combination of Wikipedia, Wikimedia Commons and ChemSpider can provide exactly that, with time.
As a rule, we at C&EN don’t use Wikipedia as a primary source for structures or chemical information, and I recommend that policy to anyone. We don’t even use articles or structures previously published in C&EN as a primary source without rechecking, in case we made a mistake the first time around. The only two sources for checking structures that I really trust are Merck Index and SciFinder, with Merck being a little better because sometimes the SciFinder structures are drawn awkwardly, but that is just my personal opinion.
AJW> I agree with your opinion regarding structure images in SciFinder. They are far from attractive BUT they do carry clear nomenclature on the image which is VERY necessary for structures such as ajmaline.
It would be nice to have an authoritative web-based source of standard, well-drawn structures for chemists to go to so they can freely cut and paste structures into their papers, PowerPoint presentations, and anything else they might need. Maybe Wikipedia will be that source one day.
AJW> For encyclopedic articles I agree..and we are working on it with the team with our Wikipedia services. We discussed today on the Wikipedia Chemistry IRC Chat the need to change the display format for structures to ACS settings and will do so. For the other millions of structures that don’t make their way to Wikipedia then ChemSpider can provide that role.
As for the structure of domoic acid on the NOAA page that Antony noted, I believe the stereochemistry for each of the three ring carbons is backward.
AJW> High five!
Apologies for rambling on, but thanks again for pointing out our mistake. We at C&EN know we are considered authoritative and held to a high standard by the chemistry community that we serve, and accuracy is paramount to maintaining that trust. We take our responsibility seriously.
AJW> Steve..I am happy you took the time out of your day to post here. It shows a true commitment to your responsibility and intentions to provide high standards of support for us all. C&E news is a weekly read for me and I respect your work. My thanks to all of your colleagues for a great magazine.
An update regarding the Domoic Acid chemical structure – I am seeing a lot of conflicting information now about the E/Z orientation for the side chain adjacent to the ring in Domoic Acid and am working to bring everything together at present. I believe that the original structure I marked as correct is actually INCORRECT. Oh, the oy of structure curation. We’ve resolved one stereo center and now there is double bond orientation confusion. Curation is a long and tiring job…
UPDATE: Okay…everything is checked. The structure I originally suggested IS the correct structure and a new PNG image file has been provided to the Wikipedia Chemistry team today and will be uploaded shortly. The problem is that the images from Wikipedia have already proliferated as seen here with Zemanta, a plug in for images I use on the Wrdpress blog. Notice no-stereo on the side chain methyl…
In the blogpost regarding Wikipedia Services yesterday I discussed “Domoic Acid“. Domoic Acid is very well documented in the literature and I would expect the structure to be well known. On ChemSpider the structure has been curated and is believed to be that shown below.
On Wikipedia the structure is lacking the stereocenter on the side chain as shown below
The April 6, 2009 issue of C&E News had an article on Page 27 in the Science and Technology Concentrates about “Algal Neurotoxin Lingers in the Ocean“. Unless you are an ACS member and have an ACS ID you won’t be able to read the article. However, the structure from the article is shown below. Do you notice a similarity between the structures?
Unfortunately, both are wrong. They are both lacking the side chain stereocenter for the methyl group based on my research. Previously I had been using C&E News as a source of news about chemical compounds and association with records on ChemSpider. On a couple of occasions however I observed that the structures were wrong. Since C&E News is an ACS magazine I had assumed that the writers would have access to Scifinder to get the correct structures. Since the structure is wrong maybe it’s wrong in Scifinder (!).
In theory the presence of an article on Wikipedia means a related page will exist on CommonChemistry.org. Unfortunately the CommonChemistry.org does NOT have all Wikipedia structures. The estimated overlap is somewhere between 50-70%. Fortunately someone had already checked the structure of Domoic Acid on Scifinder and confirmed to me that the curated structure on ChemSpider is “consistent” with that on Scifinder…let’s assume that this means its correct. I did actually confirm that structure at MANY other sites too.
So, the structure in C&E News is identical, both in layout and in chemistry, to that on Wikipedia but is NOT consistent with that in SciFinder. Surely C&E News is not sourcing their chemical structures from Wikipedia when they have access to the most highly curated compound database available?
Note to C&E News reporters…there is a LOT of work going on to validate and curate the ChemBoxes and DrugBoxes on Wikipedia but the work is not complete yet. I recommend using SciFinder to source your chemical structures for now.
Conspiracy theories are fun. Most of us have seen a movie or read a book regarding some form of conspiracy theory – whether it’s something that is in our distant history, some interpretation of what happened on 9/11 (and there are no shortages of those) or some view on industrial espionage. They are fun. What is surprising is how many of them turn out to be true. There is a new conspiracy theory in our own domain and it relates to the InChI, the International Chemical Identifier. How does that story go?
I use Google Alerts to keep my eye on what is being said on the web about ChemSpider. It’s also how we keep track of what people think should be uploaded to ChemSpider using the loadtochemspider tag. So it was that I was made aware of an article mentioning ChemSpider. Later that day two people pointed me to the same article. Daniel Pollock at Outsell had published an article on March 30th 2009 entitled “Chemical Bonding InChI by InChI”. He discussed the InChI Resolver and the efforts to raise enthusiasm for the InChI. He also discussed the efforts of both Nature Publishing Group and the Royal Society of Chemistry to proliferate the use of InChIs. ChemSpider is a user and producer of InChIs. We like them..and also acknowledge they are not perfect. The mainstream chemistry software vendors like them. The cheminformatics domain has embraced them. Societies see InChI as an enabling standard. The InChI subcommittee continues to expand with participants. InChIs are added to many online databases now. InChI has arrived, warts and all, and we should be working together now to support its enhancements and use it to integrate information. Any publisher or producer in the domain of chemistry publishing and chemistry related information should be embracing the opportunities InChI offers – if not now then for sure in the future. There won’t be much choice because information will become increasingly available and interconnected and groups ignoring the InChI will become less relevant. It’s taken a decade for InChI to gain traction..but now momentum is increasong quickly.
Daniel’s article went on to comment on the present level of acceptance for InChI by the American Chemical Society and CAS and stated “However, given that CAS has been criticised for its proprietary approach in the past, and took until April 2008 to release a web based version of its flagship SciFinder database, in Outsell’s opinion we may have to wait a while yet.” Overall I thought that Daniel’s article was well-written and balanced and concluded with “Meanwhile, whilst we can see the reaction of the big chemistry publishers and abstraction services, we can reflect on a sobering question: why is it taking government and voluntary contributions to build an industry standard? Surely that should have be the territory of the information providers? In chemistry it seems, as everywhere, the web changes everything.” Good question.
I’d like to recommend that you go and read the article. Why not? Well, the article is not there anymore. It’s been withdrawn! While the first article was, in my opinion quite balanced, the retraction puzzles me. It states “in the Implications section we published information about Chemical Abstract Service’s highly-regarded SciFinder product that was incorrect, and we did not cite a sufficiently balanced set of references in developing our argument.” In the original article there is one mention of SciFinder and it says “and took until April 2008 to release a web based version of its flagship SciFinder database”. One statement, one reference..back to CAS’s own press release.
The retraction also stated “Further, it is our practice to avoid speculating about an organization’s stance on a topic without reaching out to the organization for on-the-record research briefings. Overall, the tone of the piece could be taken to single out CAS as being late in responding to the trends, and in our view the research and analysis did not support it.” I’ll interpret this as “no one spoke to CAS”. Ok…that’s fair comment. Someone should have spoken to CAS about this article and asked for their opinion. Maybe some questions might be: 1) It appears that InChI is already changing the way that chemistry related information can be linked for the benefit of the community. What are your observations and thoughts? 2) InChI has been around for over a decade and I am interested to know whether ACS and CAS will embrace the perceived value of InChI and the potential benefits to the community and include in either ACS articles or integrate into the CAS registry? 3) You recently released the CommonChemistry.org website and it is an interesting shift towards Openness by CAS. Congratulations. It would be an ideal opportunity to allow integration via InChIs. What type of feedback have you received from the community? 4) It would appear that the ongoing growth in informational resources such as PubChem, ChEBI, ChemSpider, Google Scholar, Wikipedia and many other rich resources can impact the business model of CAS. InChI-integrated resources and efforts such as the InChI Resolver allows connection of such resources in a seamless manner and will lead to a web-centric view of chemistry resources. How does CAS expect to respond to this potential threat?5) There are LOTS more questions that I believe the community would like to ask. Who in the scientific reporting community would get an audience with CAS to ask such questions?
Conspiracy theories are already moving around the community. The majority of people I have discussed this with believe that the retraction was likely forced by CAS and as Stuart Cantrill from Nature Chemistry points out in his blog “Outsell now say that the original article wasn’t balanced and that the ‘tone of the piece could be taken to single out CAS as being late in responding to the trends’. Surely readers could make that judgement for themselves?”.
I say decide for yourself. The article is in the Google Archives here. Welcome to the power of the web. Now then…can the removal of THAT article from the Google Archives be enforced? Hmm…..
We at ChemSpider have an implicit belief that InChIs will soon dominate the communication of chemistry across the internet and will be one of the primary keys on all chemistry databases exposed on the web. This is already gaining momentum and the unveiling of the InChI resolver at the ACS meeting will facilitate this process. The InChI is NOT prefect…it was acknolwedged, emphasized and dragged across hot coals that the InChI is not perfect at the recent all day meeting at the ACS Meeting in Salt Lake City. But, it already has value and there are already plans afoot to enhance InChI – one of the meetings at the ACS included discussions about funding and resources. InChI will continue to be developed.
An interesting article penned by Daniel Pollock was released today: Chemical Bonding InChi by InChI. Definitely worth a read from an industry observer about where he believes InChI is going…
I just returned from the ACS Meeting in Salt Lake City. As is usual with these events it was hectic, interesting, full of great conversations, demanding in terms of presentations and, overall, a great meeting. I got to share time with people I appreciate, I got to meet new people who I already think of as friends and I had some unique challenges.
I was supposed to give two talks on Wednesday (yesterday) at 2pm and 3:55pm in Rajarshi’s session. At 9am on Wednesday morning Alex Tropsha and I drove his kids to the Alta ski hill outside Salt Lake City to ski. By the time we turned around to drive back to the meeting it had snowed enough, and was still snowing hard enough that the road was closed. That was at 10am and we were told the road would not open until 2pm. So, we found a ski lodge with wireless and went to work…cell calls, twittering, emails and finally Rajarshi and I got to talk. Rajarshi called to suggest that maybe I could give the presentation remotely…he had downloaded my slides and was willing to click through them while I presented via Skype. Fortunately I had a webcam with me in my bag so 10 minutes before I was due to present I plugged it in while chatting with Rajarshi via Skype and BOOM…blue screen of death!!!
So, a full reboot and at 5 minutes to 2pm I was on Skype with rajarshi, people were sitting in the room and we did a quick test and away we went with a presentation. With kids running around in the background, and with skiers walking through the lodge, I presented to the people in room 251F at the conference center while I was at 8400 feet above sea level on a ski hill. Immediately afterwards we drove down the hill since the road was now open. I walked into the CINF room 10 minutes before my next presentation and gave that one live. What an adventure.
There are very few people who would take the risk and be as creative as Rajarshi was to enable me to give two presentations that way. This is just who Rajarshi is…fast-thinking, pushing technology and focused on providing a solution. Quite the guy!
I was fortunate enough to share dinner with Rajarshi, Christoph Steinbeck (ChEBI) and Martin Walker (who I work with on Wikipedia Chemistry). What a great, fun evening discussing Open Science, Chemistry Software and opening up the internet to more chemists.
About a decade ago I sat with my soon to be friend Gary Martin. Gary is larger than life. Literally. Tall, heavy, highly published, respected, damn funny, disruptive in a good way and one of my best friends. Gary is an incredible NMR spectroscopist and has contributed more to small volume NMR than any other NMR jock (in my opinion). We were in Florida at the ENC conference and I took Gary for a couple of cocktails and told him about my idea for an automated structure verification software platform that would verify suggested structures using NMR data…initially 1D NMR spectra and ultimately a combination of 2D data also. Then, the ultimate goal would be to automatically elucidate chemical structures from a combination of spectral data. It was a “napkin” plan where he and I signed our names to what it would take to do it. Over the next 10 years we would work together to execute on this plan as he moved from Pfizer to Schering Plough. The technical team at ACD/Labs would improve their software for analytical data processing, for multinuclear NMR prediction and for computer-assisted structure elucidation. Part of that development was using a combination of 1D and 2D HSQC spectra to verify structures and resulted in the publication below…
Automated structure verification based on a combination of 1D 1H NMR and 2D 1H–13C HSQC spectra, Sergey S. Golotvin, Eugene Vodopianov, Rostislav Pol, Brent A. Lefebvre, Antony J. Williams, Randy D. Rutkowske and Timothy D. Spitzer, Magn. Reson. Chem. 2007; 45: 803–813, DOI: 10.1002/mrc.2034
The paper described a method for structure validation based on the simultaneous analysis of a 1D H NMR and 2D 1H–13C single-bond correlation spectrum such as HSQC or HMQC. When compared with the validation of a structure by a 1D HNMR spectrum alone, the advantage of including a 2DHSQC spectrum in structure validation is that it adds not only the information of 13C shifts, but also which proton shifts they are directly coupled to, and an indication of which methylene protons are diastereotopic. Using multiple real-life data sets of chemical structures and the corresponding 1D and 2D data, it was possible to unambiguously identify at least 90% of the correct structures.
ACD/Labs has provided us with the 30 sets of 1D and 2D spectra together with the pairs of correct and incorrect structures. We are offering these up in a new 2D NMR Spectral Game for users to start to learn how to use the combined information available from both 1D H1 and 2D HSQC data for identifying the correct structure. The development of this game has required the development of a new tool for visualizing 2D NMR spectra on the web (kudos to Andrew Lang!) and requires the player to use the H1 NMR spectrum in the standard JSpecView applet together with the 2D data display to interrogate the data and decide on the most appropriate match. The 2D NMR Spectral Game can be played here. Good luck.
Today we had an entire session dedicated to InChIs. It was a bit of a love fest for the work that has been done to develop InChI but also an acknowledgment that there are limitations and where they are. My presentation is available here on Slideshare. I did embed it on this [age but it pulls down the site when its fed to the home page via RSS feed so I’ve had to remove it.
Computer software for the generation of systematic names from structures and the conversion of chemical names to structures has been the subject of numerous discussions on this blog as we have discussed Chemmantis and the need for high qualityy conversion of chemical names to structures. There are a number of software programs available for the generation of names from structures and vice versa. The ones I have most knowledge of are those of ACD/Labs, Cambridgesoft and OpenEye. I have worked with all of them and they all have their strengths and weaknesses but all three companies are working to improve their products. What is interesting is Cambridegsoft’s position in regards to Name to Structure conversion…they seem to have it patented.
The chemical names that you see on ChemSpider are generated using OpenEye software as shown below. We owe them a debt of gratitude for providing us the software.
The nomenclature “guru” at OpenEye is Roger Sayle and Steven Bachrach’s post here pointed me to Roger’s recent publication “Foreign Language Translation of Chemical Nomenclature by Computer”. While at ACD/Labs we included both French and German nomenclatre support for generating chemical names from structures. So, I have a real appreciation for the issues of multilingual nomenclature. In fact, when I process files myself for deposition in ChemSpider I do use their desktop tool for generating French and German names so don’t be surprised to see accents in some of the identifiers on ChemSpider.
Roger’s paper is an amusing and educational read and I recommend it to anyone interested in the complexities of nomenclature. He discusses multilingual support including Chinese and Japanese and even wanders into Klingon! OpenEye did the right thing in making this Authors Choice so the pDF is available to everyone here: http://pubs.acs.org/doi/abs/10.1021/ci800243w. Clearly OpenEye have a great path forward for nomenclature and it’s going to be interesting to watch their product develop.
An additional comment…we have to deal with a lot of complex issues regarding fonts on web-based articles. It’s not easy but we do proofread everything on ChemMantis and the ChemSpider Journal of Chemistry and try to catch all the issues. Take a look at the HTML version of the article here: http://pubs.acs.org/doi/full/10.1021/ci800243w. Oh those exploding fonts!
In the past few days we have added a number of RSS feeds to ChemSpider so that the community can track, if they are interested, the depositions of new compounds, of new descriptions associated with ChemSpider, of new data sources as they are added and now, of new spectral data as they are added to ChemSpider. We’re going to stop here for now and see whether or not the RSS feeds provide value to the community, who might use them and gather feedback on their implementation. I’m off to Salt Lake City shortly, have 4 presentations to write and the blog will be a lot quieter for the next couple of weeks.
The RSS feed is here: http://www.chemspider.com/rss.ashx?c=spectra
If you’re there…see you in Salt lake City.
An article entitled “Computer-assisted methods for molecular structure elucidation: realizing a spectroscopist’s dream” has been published online at the Journal of Cheminformatics (Journal of Cheminformatics 2009, 1:3). This was a review article of what’s possible with computer assisted structure elucidation and in particular focused on the ACD/Structure Elucidator software package I was involved with during my tenure at ACD/Labs. An outline of the article is provided below.
Computer-assisted methods for molecular structure elucidation: realizing a spectroscopist’s dream
Journal of Cheminformatics 2009, 1:3doi:10.1186/1758-2946-1-3
|Published:||17 March 2009|
This article coincides with the 40 year anniversary of the first published works devoted to the creation of algorithms for computer-aided structure elucidation (). The general principles on which CASE methods are based will be reviewed and the present state of the art in this field will be described using, as an example, the expert system Structure Elucidator.
The developers of CASE systems have been forced to overcome many obstacles hindering the development of a software application capable of drastically reducing the time and effort required to determine the structures of newly isolated organic compounds. Large complex molecules of up to 100 or more skeletal atoms with topological peculiarity can be quickly identified using the expert system Structure Elucidator based on spectral data. Logical analysis of 2D NMR data frequently allows for the detection of the presence of COSY and HMBC correlations of “nonstandard” length. Fuzzy structure generation provides a possibility to obtain the correct solution even in those cases when an unknown number of nonstandard correlations of unknown length are present in the spectra. The relative stereochemistry of big rigid molecules containing many stereocenters can be determined using the StrucEluc system and NOESY/ROESY 2D NMR data for this purpose.
The StrucEluc system continues to be developed in order to expand the general applicability, provide improved workflows, usability of the system and increased reliability of the results. It is expected that expert systems similar to that described in this paper will receive increasing acceptance in the next decade and will ultimately be integrated directly to analytical instruments for the purpose of organic analysis. Work in this direction is in progress. In spite of the fact that many difficulties have already been overcome to deliver on the spectroscopist’s dream of “fully automated structure elucidation” there is still work to do. Nevertheless, as the efficiency of expert systems is enhanced the solution of increasingly complex structural problems will be achievable.
We have over 150 data sources fed into ChemSpider now and we continue to add data from various sources. As we add data sources you might be interested in seeing who has joined the ChemSpider web so you can subscribe to the ChemSpider RSS feed of data sources here. If you see any specific data sources where you would like to see the information expanded please let me know.
In the past few days we’ve been adding thousands of new chemical entities to ChemSpider including contributions from publishers and chemical vendors. We and other users have also been adding a few entries into the description sections of the individual records and generally expanding the content of the database.
There isn’t an easy way for the community to find out about our efforts in these areas so we have set up RSS feeds of the new chemical compounds, the descriptions being added to the individual records as well as a list of the new articles added to the ChemSpider Journal of Chemistry.
The ChemSpider compounds RSS feed is here.
The ChemSpider descriptions RSS feed is here.
The ChemSpider Journal of Chemistry RSS feed is here.
I urge caution to those of you who might want to simply accept the InChIs and convert back to structures. GOOD LUCK. Be very careful with structure depictions and the introduction of confusing stereochemistry.
If you want to proliferate your stories about chemistry simply tage them with “loadtochemspider” and we’ll take care of th rest.