Archive for the Uncategorized Category

It’s nice to be acknowledged! An email in my inbox yesterday acknowledged the Mobilizing Chemistry presentation from the SLA.

Mobilizing Chemistry – Chemistry in Our Hands” is being tweeted more than any other document on SlideShare right now. So we’ve put it on the homepage of (in the “Hot on Twitter” section).

Well done, you!

- SlideShare Team

I have been in New Orleans for two days at the SLA conference and talking to librarians about ChemSpider and its direction, grand vision and progress. What a reception. There were two instances where I blushed…and that doesn’t happen often…in fact I’m done for this year now! We were showered with praise for our efforts …and of course given a long list of things to do! That’s always good. I gave two presentations…one as a general overview and the other on “Mobile Chemistry”, my views of what is going on in the domain and an overview of a series of Mobile applications etc. They are embedded below and on slideshare.


Originally uploaded by Bio-IT World

A couple of weeks ago Valery and I were in Boston at the Bio-IT meeting and received the Bio-IT Best Practices Award for Community Contribution. That’s us receiving the award (Valery on the left and me in the middle, with Kevin Davies, Bio-IT World Editor and Chief on the right) looking distinctly uncomfortable in shirts and ties! We don’t get to stand on an awards stage very often!

I haven’t seen the movie yet about the bottled water industry. I can’t comment on the accuracy of what is represented. But, as a chemist, a father and as a water drinker I am definitely going to go see this movie. I encourage you to watch the trailer and decide for yourself whether its worth you seeing it too. When I have seen the movie I will make my comments about it…

In the past 48 hours I have read book reviews on Amazon, movie reviews on Netflix and articles on Wikipedia. I haven’t written any book reviews for Amazon, ever. I have not written any movie reviews for Netflix, ever. But, I have edited and curated articles on Wikipedia. Let’s bottom line it though…I am a taker from the resources more than a giver. I’m a busy guy and I believe that other people can review books and movies as well as I can (though of course we might differ on opinions). Where I feel an obligation to comment is in those places that I am really passionate…in the blogosphere when there is something being said that doesn’t sit squarely with me. I tend to challenge things I disagree with rather than applaud things I do agree with…except for my friends where I feel obliged to give them recognition for their efforts. Friends do that. I read a lot of blogs..a lot of web pages…a lot of resources. But I very rarely go out of my way to comment on the contribution the writer might have made to my day. I judge most of us operate in this mode. It is what it is….

As we work to produce a platform for the sharing of synthetic procedures/syntheses by developing ChemSpider SyntheticPages we run into the same challenge with this platform as we have with ChemSpider. It is related to the same human condition of us being users and takers over contributors. There is nothing inherently objectionable about this…we all do it. We contribute to something we care about, believe in, feel compelled to participate in. But, it does limit the rate of growth, the participation in and the success of a platform. In terms of a crowdsourcing platform it’s success can be measured by the number of visitors, the number of contributors, the quality of the content, the changes the platform can effect and a myriad of other factors. In terms of traffic ChemSpider continues to increase in terms of the number of visitors. The plot below shows the growth from mid July 2009 to the last week of April 2010. Overall we have seen a 3X growth. While the absolute numbers can be questioned, and differ from measurement system to system the trend is a self-consistent trend.The dip in December is called “Holiday Season”.


During this period have we seen a threefold increase in the number of curators? No. We have seen an increase of about 2X in the number of people who are adding data, links, publication links and spectra to ChemSpider though. But, let’s be clear about these numbers…this might max out at about 45 contributors max….for a peak of 45,000 visitors. That’s a very small percentage! It categorically shows that we take more than we give.

For ChemSpider SyntheticPages we are hoping for more contributions. More people to deposit their syntheses onto the system to share with the chemistry community. What can we offer to encourage such engagement?

1) Every record will have a DOI generated that you can list on your resume, should you choose. Basic development is done already. Testing is about the start.

2) You, the person who did the synthesis, get the recognition. You are the author. Each page can be attributed to a research group also so that the Group Leader would also be able to get aggregate recognition for contributions. it is why you see on pages “From the Research Lab of ****” for example

3) We will also host your analytical data and structures and perform mark-up of the article on your behalf until we have training materials in place for you to do your own markup. Your work will be “well-represented” in a free community resource for chemists that is destined to become one of the major contributors to the domain.

4) Your work will be repeated, peer-reviewed, critiqued and hopefully expanded upon…all good for your science and your reputation ultimately.

5) We will periodically offer recognition, rewards and acknowledgment for masterful synthetic procedures in a public forum. We intend to put in place a full recognition system, above and beyond that one in place at present.

So, what is standing in the way of adding your syntheses onto ChemSpider SyntheticPages. Other then some work, what is in the way? It’s a real question. Is it? 1) your boss won’t let you; 2) you don’t see the value or point in sharing your syntheses; 3) you are concerned about copyright transfer and think won’t be able to use the synthesis in  a future publication; 4) you don’t know how; 5) one of many other reasons. Let us know please….we need your feedback to position and develop CS|SP for you.

ecrystalsAs we expand the presence of analytical data on ChemSpider through the addition of various forms of spectral data it made sense to start work on expanding the collection of CIFS available on ChemSpider also. At present we have NMR spectra related to ChemSpider SyntheticPages waiting to go online and a large number of Raman spectra waiting to be processed and deposited. For now however our efforts are focused on the deposition of CIFS associated with the eCrystals platform at Southampton. This is manual work unfortunately as we need to confirm that the CIF itself matches the molfile that is online. When there are multiple components in a unit cell we need to ensure that we deposit against the correct structure etc. We should have a few hundred CIFs deposited in the next few weeks.

Since returning with early feedback from the American Chemical Society meeting in San Francisco a few weeks ago work has progressed on improving work flows and usability, specifically for depositors of new submissions to synthetic pages. Shortly an update to CS|SP will be made providing improved access to analytical data within a synthetic page, facile deposition of new pages (but we welcome your input to improve further!), a number of bug fixes and improved integration into the ChemSpider database. I am interested in talking to readers who might be interested in contributing to ChemSpider SyntheticPages but don’t know where to start. Please ping me at tonyATchemspiderDOTcom.

JC Bradley gave his own overview of CS|SP over at his blog recently….

The integration to NMRShiftDB has been switched off for the time being while some new bugs regarding the integration are resolved. We’ve been in discussions with Egon Willighagen regarding the nature of the integration challenges and it comes down to how the SMILES that are being passed to NMRShiftDB are being interpreted. Check out the comments section for more details. There is no time line associated with fixing this integration at present but we do want it resolved. We will be focusing our efforts on doing direct look ups into the database for the immediate future.

My article on Mobile Chemistry is now available online…

Mobile chemistry – chemistry in your hands and in your face

The technology we’ve got used to accessing through our desktops is moving at high speed to our mobile phones, says Antony Williams

It is amusing to watch movies from the 1980s and see the stars of the period holding a so-called ‘mobile phone’ to their ear. This mobile device used to be the size of a brick, with a pull-out antenna to boot. It served one function: to allow two people to talk to each other across a connection challenged by static and dropouts. How things have changed.


Read the rest of the article here

opensciNY I will be presenting at the OpenSciNY 2010 conference on May 14th. OpenSciNY is a free, one-day conference on the impact of publicly accessible scientific tools & resources, open access publishing in the sciences, and open data/notebook efforts. I am looking forward to spending time with the attendees interested in these areas and being on the agenda with my fellow presenters, most of whom I know personally and have presented with on numerous occasions. In these gatherings, and with such a common mindset, the future of Open Science and its impact and contributions to society are clear. While there is much work to be done the momentum continues to gather. The future of OpenScience is exciting, stimulating and fun to envisage. Come along to OpenSciNY and engage with us!

Every year, Chemistry World and Education in Chemistry offer an internship over the summer for a would-be science writer to gain some experience working with two of the best chemistry magazines around.

The position is for 8 weeks (start/end dates negotiable) and comes with a bursary of £1750 sponsored by the Marriott bequest.

Activities undertaken would include researching and writing blog posts and news articles and recording podcasts for Chemistry World, writing a feature article for Education in Chemistry and pieces aimed at sixth-formers. They will also help lay out and proofread the print issue of Chemistry World.

For more details see :

Applicants should be members of the Royal Society of Chemistry. You can join up as affiliates at

A couple of weeks ago I gave a talk at the Lawrence Berkeley National Laboratory at the end of the ACS meeting. It was great to meet the attendees and share some good conversations about Open Data, Open Science and our efforts with ChemSpider. The talk was turned into a screencast and is shown below.

We’ve been depositing a lot of new data into ChemSpider over the past few weeks. We’ve been adding millions of new compounds from chemical vendors, from RSC databases and articles, from updated government databases, contributions from academia and from some of the online Open resources.

Recently I sat in on the presentation of Rich Apodaca who talked about ChemPedia. Rich shares a lot of the views that many of us do about the value of having open resources of chemical compounds online and has contributed ChemPedia to the domain. On Slide 20 of his presentation Rich gave an overview of a Missing Service that needed to provide a number of capabilities. These were an on-demand unique ID, expose a URL to link to the structure, support synonyms and integrate peer review. ChemSpider does all this with maybe one caveat…we expect the ID to include the URL….so or is the link to the structure that we assert is the structure of Xanax. If you want to add additional synonyms  you can do so. If you want to curate, add comments etc you can (peer-review). If you want to add new compounds you can and you are issued a new ChemSpider ID. I would agree that our IDs are not as distinct as those that Rich and ChemPedia are generating..but they are of a similar format to PubChem IDs..i.e. “just numbers”. Check out ChemPedia and contribute! We are taking advantage of the fact that Rich makes the data Open for download and download the last iteration (664 compounds) and deposited them to ChemSpider here.

missing service

Tuesday morning at the ACS meeting here in San Francisco…two talks done, one 2 hour training session completed, one poster presented and two talks left to give before heading off to the Lawrence Berkeley National Laboratory to give my final talk before the dreaded red-eye home. I am so looking forward to sitting on a cramped plane overnight…

My presentations delivered so far are already on SlideShare and are linked below for display.

For the past few months we have been busily developing new functionality and capabilities for the ChemSpider platform with the intention of making navigation easier, enhancing integration to external resources, adding new rich data sources and providing access to brand new capabilities. This new functionality has been described in a series of recent blog posts today and is outlined below.

Improving the ChemSpider interface using tabbed infoboxes

Introducing NMR prediction capabilities to ChemSpider

Linking Google Patents searching to ChemSpider

Integrating RSC Databases into ChemSpider

Integrating RSC Publishing Beta into ChemSpider – includes integrations to Google Scholar, Google Books and Microsoft Academic Search


The LBNL Library is hosting a seminar for researchers interested in online collaboration, data storage and curation, data exchange, crowdsourcing, and open access.

This seminar will explore ChemSpider ( – a free access service providing a structure centric community for chemists and the richest single source of structure-based chemistry information.


March 24, 2010 – Wednesday
3:00 p.m. – 4:30 p.m.
Building 50 Auditorium, Lawrence Berkeley National Laboratory

Bring your laptop for a hands-on demo session.”For non-Berkeley Lab personnel: Please contact Jeffery Loo ( by Monday, March 22, 12:00 p.m. for a visitor pass and shuttle bus directions.  A visitor pass is required for entry into the Berkeley Lab by guests.


The increasing availability of free and open access resources for scientists on the internet presents us with a revolution in data availability. The Royal Society of Chemistry hosts ChemSpider, a free access website for chemists built with the intention of building community for chemists (

ChemSpider is an aggregator of chemistry related information, at present over 20 million unique chemical entities linked out to over 300 separate data sources, ChemSpider has taken on the task of both robotically and manually curating publicly available data sources. It is also a public deposition platform where chemists can deposit their own data including novel structures, analytical data, synthesis procedures and host data associated with the growing activities associated with Open Notebook Science.

This presentation will examine chemistry on the internet, the dubious quality of what is available and how the ChemSpider crowdsourced curation platform is fast becoming one of the centralized hubs for resourcing information about chemical entities.

We will also review our efforts to provide free resources for synthesis procedures, spectral data and structure-based searching of the chemistry literature and how chemists can contribute directly to each of these projects.

Following the presentation and a question and answer session, a hands on session showing how to search for, curate and deposit data on ChemSpider will be given for interested parties.


Profile Photo

Antony Williams, PhD, is a leader in the domain of free access chemistry. He is the Vice President of Strategic Development at the Royal Society of Chemistry and is the host of ChemSpider, a free online structure centric community for chemists.

ChemSpider began as a hobby project in a basement and went on to become one of the most popular Chemistry websites with the highest quality of data available online. Antony spent over a decade in the commercial scientific software business as Chief Science Officer for ACD/Labs, one of the domain leaders in scientific software. He is an accomplished NMR spectroscopist with over 100 peer-reviewed publications. During his career he was the NMR Technology Leader for the Eastman-Kodak company and has worked in both academia and national government research institutions.

We are presently receiving sign ups for our training session on ChemSpider. The session will be on Monday afternoon between 4-6pm (details below) It is free to attend and we’d love to see you there if you are in San Francisco at that time. Sign up here…

Royal Society of Chemistry
How to get started with ChemSpider – Searching, Structure Deposition and Database Curation
Instructor(s): Antony Williams, VP Strategic Development ChemSpider
Where: Moscone Center
Room: 110
When: Monday, March 22, 4:00 PM – 6:00 PM
>> Click here to register for this workshop
This session will give the opportunity to learn more about how to search ChemSpider, how to deposit your structures and how you can participate in curation of the data.  Presenter: Antony Williams, VP Strategic Development ChemSpider

I was sitting down today to review what presentations are coming up in the next few weeks and how much writing and travel was ahead of me. Ugh. Painful. During the next few weeks of conference season there will be a lot of talks and, as usual, a lot of late nights before the presentations to write new talks or modify existing talks. I will be at the ACS meeting in San Francisco this spring and will be giving four presentations, a poster and leading a training session on ChemSpider. The presentations are outlined below. Looking forward to seeing you there and it would be great to hear from any of you who would like to get together and connect about community chemistry over a coffee.

Presentation: Utilizing ChemSpider as a platform for education and exposure of student data to the community.

Educators and students now have access to rich internet resources of information. RSC’s ChemSpider is a community resource of structure-based chemistry delivering data including chemical compound collections, reaction synthesis procedures, physicochemical property and various forms of spectral data. ChemSpider offers the opportunity for the community to participate in populating, annotating and curating the data on ChemSpider. We believe that ChemSpider offers an opportunity for educators and students to participate in the ongoing development of a rich resource for the chemistry community. This presentation will suggest some potential uses of the ChemSpider website in terms of integrating into lesson plans. We will also outline how students can expose their structure and reaction-based research work via the ChemSpider platform for the benefit of the community and their online scientific reputation.

Presentation: ChemSpider – How An Online Resource of Chemical Compounds, Reaction Syntheses, and Property Data Can Support Green Chemistry

ChemSpider is an online database containing in excess of 20 million chemical compounds and associated experimental and predicted physicochemical data, reaction synthesis details and analytical data. A significant amount of the data contained within the database has been harvested and collated from a number of inventory systems and integrated to provide a centralized resource for the community. The ChemSpider database has the added benefit of being available for community deposition, annotation and curation. As a result it offers the potential for researchers to share their latest research with the public and participate in the creation of a rich resource of chemistry related information for the Green Chemistry community. This presentation will provide an overview of present capabilities and discuss the future vision for the platform.

Presentation: ChemSpider, how a free community resource of data can support teaching NMR spectroscopy

ChemSpider is an online database of chemical compounds, reaction syntheses and analytical data. Provided by the Royal Society of Chemistry, our intention is to provide a free internet resource of chemistry related data for the community. ChemSpider is unique in its role of allowing user depositions of chemical structures, synthesis procedures and analytical data and, in so doing, provides an environment for crowdsourced gathering of information. To date over 2000 1D and 2D NMR spectra have been deposited online by the community and are available for reuse. The data have been used as the basis of a spectral game whereby students can learn NMR by interacting with the data. This presentation will provide an overview of the tools and capabilities presently available on ChemSpider to support teaching NMR in the undergraduate curriculum and will outline how the community can participate in enriching this resource for the benefit of all.

Presentation: Enhancing discoverability across Royal Society of Chemistry content by integrating to ChemSpider, an online database of chemical structures

The ability to query across a chemistry publishers content using chemical structure searching can dramatically enhance discoverability. RSC has been applying a number of procedures to integrate RSC’s ChemSpider community resource with our published content and databases. These include: 1) entity extraction procedures 2) chemical name conversion procedures using software algorithms and curated dictionaries 3) semantic markup and 4) a crowdsourced curation processes. This presentation will provide an overview of the processes we have utilized in order to provide structure-based integration to RSC content. We will discuss our ongoing efforts to extend the approaches to the mining of data from the rich supplementary information sections of many RSC publications. Our intention is to provide access to synthesis procedures and analytical data and further enrich the ChemSpider database for the benefit of the chemistry community.

Poster: Utilizing ChemSpider as a platform for education and exposure of student data to the community

Recently I announced the release of ChemSpider  SyntheticPages. We are honored to have an editorial board of chemists to assist in directing the project and they are introduced below:

  • Kevin Booker-Milburn

    Kevin Booker-Milburn is a Professor of Synthetic Chemistry in the School of Chemistry at the University of Bristol, UK. He has 20 years research experience in broad aspects of synthetic chemistry and in recent years has focused on the development of new synthetic methods for use in the total synthesis of natural products such as terpenes and alkaloids; specifically developing and applying novel photochemical and transition metal techniques. He is Director of the Bristol Chemical Synthesis Doctoral Training Centre, an EPSRC and Industry funded initiative which has a bold vision to train a new generation of researchers for the chemical industry and academe.

  • Jean-Claude Bradley

    Jean-Claude Bradley is an Associate Professor of Chemistry at Drexel University. He leads the UsefulChem project, an initiative started in the summer of 2005 to make the scientific process as transparent as possible by publishing all research work in real time to a collection of public blogs, wikis and other web pages. Jean-Claude coined the term Open Notebook Science (ONS) to distinguish this approach from other more restricted forms of Open Science. Jean-Claude has a Ph.D. in organic chemistry and has published articles and obtained patents in the areas of synthetic and mechanistic chemistry, gene therapy, nanotechnology and scientific knowledge management.

  • Stephen Caddick

    Stephen Caddick is a Professor of Organic Chemistry and Chemical Biology and Head of Department of Chemistry at UCL. He was previously at the University of Sussex (1993 – 2003). His research interests include Organic Synthesis and Synthetic Methodology, Chemical Biology and Structural Biology and Catalysis.

  • Peter Scott

    Peter Scott is a Professor of Chemistry at the University of Warwick, UK, and was formerly at the University of Sussex. His research is focussed on metallo-organic chemistry and mechanism, and specifically in chiral systems for enantioselective catalysis, polymer synthesis, materials science and healthcare. He has interests in how universities and industry can work together, and is Director of Warwick Chemistry’s EPSRC funded PhD with Industrial Collaboration, and also of Warwick Knowledge Transfer Secondments.

  • Martin A. Walker

    Martin A. Walker is an assistant professor of organic chemistry at the State University of New York at Potsdam. He previously worked in the fine chemicals industry for 12 years. His interests center on organic synthesis methodology, particularly green chemistry, as well as chemical information. He is active on Wikipedia, where he contributes to chemistry content and coordinates the Wikipedia 1.0 project, preparing offline releases of Wikipedia.

Stephen Caddick, Peter Scott, Kevin Booker-Milburn and Max Hammond were the original founders of, an online database for chemical transformations. The data from SyntheticPages has been used as the seed data for ChemSpider SyntheticPages.

    Please note the headline, and don’t waste your time trying this at home

A couple of days ago I came across a video on YouTube about “Water Marbles”. I’ve inserted it below…I recommend watching it…it’s excellent!

It’s excellent because by time I had finished watching this I was both excited and confused. Confused because how could I not of heard of this experiment. Even if it was to work why were those spheres so big and uniform? Excited because I’d been looking for some good kitchen chemistry to do with my kids and this would be a great example. I couldn’t really get my head around how the observations were working but on a rushed grocery expedition prior to going into ScienceOnline2010 #scio10 this part weekend I threw everything necessary into the grocery basket to repeat the experiment.

At ScienceOnline2010 I was involved in a number of discussions, as usual, regarding data quality, curation and assertions….this being based on my experience with curating the ChemSpider database. Today I sat in on a discussion entitled “Getting the Science Right: The importance of fact checking mainstream science publications — an underappreciated and essential art — and the role scientists can and should (but often don’t) play in it – Rebecca Skloot, Sheril Kirshenbaum, and David Dobbs.” it was an interesting exchange with comments such as “newspapers and magazines don’t check facts” and the urban myth that a one minute kiss burns 26 calories while the fact is that a Hershey’s Kiss contains 26 calories.

Post ScienceOnline2010 I got home this afternoon to find my kids desperately wanting to do kitchen chemistry so, with pessimism I started to work through the experiment with them. They mixed and stirred and cooled and heated. They got to see a lot fizzing and to see crystals grow which they thought was great. It of course failed dismally as it has for many other people, including this guy, but they had a great time. In parallel I was doing some fact-checking to see whether or not to prepare them for disappointment.

There have been a lot of exchanges online about this topic of water marbles with chemists exchanging concepts about the science behind it if it did work. See here for example. The video has gone viral across many sites. Very impressive for a hoax really…and it did get me interested in doing kitchen chemistry. The truth is a lot easier though…and still good chemistry! Watch Steve Spangler in action below…

The polymer beads can be bought here.

There’s more Kitchen Chemistry to come but I think I’ll stick to some of Theodore Gray’s guidance …maybe time for some Mad Science at home

I’m off to ScienceOnline2010 in a few minutes. It’s the last day of the conference and the experience has been a highly positive one. I’ve finally met people face to face that I have been connected with for over 2 years….and congruency is always good…they are as interesting, passionate and generally nice people face to face as they are online. I also managed to catch up with a number of old friends. I got to meet some new people focused on changing the flow of communication for ScienceOnline and working hard to do so. #scio10 is different….there’s an energy in the air that I haven’t experienced at any other scientific gathering other than SciFoo. This is an audience that is introducing me to social networking tools that I’ve never heard of…that doesn’t happen often. It has to be that over half the attendees are twittering. iPhones are everywhere. Flips are out capturing video in the sessions and are uploaded online shortly thereafter. The conversations are open, opinionated, full of energy and motivating. This is MY type of conference and I’m fortunate to live less than half an hour away.

The dinner event was fun, giggly, five minute “Ignite” talks were given (I gave two …one on Curating Chemistry online and one with JC Bradley regarding the spectral game). The first of those is linked here and shown below.

Today I will be giving a live demo of ChemSpider to anyone interested and around at the end of the conference. It’s nasty weather so people might be leaving early.

I found myself a virtual running partner for my 1000 miles in a year challenge assuming my calf muscle tear heals. We’re going to try and figure out how to raise money for asthma. Anyone want to join us as to form a virtual team let me know…

Bora and Anton have done a tremendous job organizing the conference. Clearly there is a great team supporting them and the Sigma Xi facility is excellent. Terrific conference all around….glad I spent the weekend this way…

Wired Magazine is my favorite monthly read. I get a lot of magazines delivered to the house for our family to browse through and these include Popular Mechanics, Popular Science, Science Illustrated and then additionally Chemistry World, C&E News, Drug Discovery News and a lot of the other trade magazines. Nevertheless, after the books I am reading (and I am presently reading Dr Mary’s Monkey as a follow on regarding the SV-40 cancer causing monkey virus in polio vaccines) Wired magazine is always the next thing I pick up.  It’s an easy read, some great short snippets for when I’m sitting on a stationary bike flipping pages or some long interesting articles, always well written. I recently read an old Wired magazine that had been on my stack for a few weeks and wish I’d read it earlier. We’ve been discussing the importance of user interface on ChemSpider and it’s impact and influence on the users of the website. This connected to the article on Craigslist that was covered in Wired Magazine.

Now, if you don’t know what Craigslist is then how about eBay? I’ll assume you know, and use, eBay. I use eBay…I like it. I’ve used Craigsist and like it, but for a different reason than I like eBay. Here is an interesting statement about Craigslist from the article: “With more than 47 million unique users every month in the US alone—nearly a fifth of the nation’s adult population—it is the most important community site going and yet the most underdeveloped.” The article goes on to tell the story about how confusing the site is, how poor the aesthetics are and how non-Web 2.0 it is in terms of integration access etc. I recommend it as a fun read, if nothing else to get a handle on Craig Newmark, the interesting (and VERY rich man) behind the initial concept. As a historical article regarding how early technology can morph over time into something more flashy but not necessarily more successful it’s a great read.Wired was convinced that people would want to give some input on how Craigslist should be improved and set up their Extreme MakeOver: Craigslist Edition for user comments. I doubt that Newmark and colleagues will pay much attention and, based on stats available to date, they don’t need to.

Another interesting read is a separate article regarding eBay vs Craigslist and the fact that Google and Microsoft actually tried to get into the same sector and both failed. What’s the magic, the secret sauce, the USP (unique selling points) for Craiglist? I’m read a number of suggestions but am not sure of the conclusions. I think its a combination of: 1) old and less complicated technology for novice users (searching means scrolling in a lot of cases) 2) traction …it’s been around a long time and 3) price for people to post ads. The bottom line though, of relevance to our discussions, is that “it ain’t the user interface!”.

Let’s be honest, technology is fun, especially when you work in our domain of building an internet for chemistry. Over the past few years I have upgraded from computer to computer, operating system to operating system (with Vista the worst transition but now loving Windows 7), from browser to browser (i have three installed: IE8, FireFox and Google Chrome with FF my preferred). I would say that while I am not at the bleeding edge of technologies I have access to more advanced systems than the majority of users in schools, homes and the rest of the world especially when taking into account that I have good, solid high speed access, both wireless-N and cabled in our house. If you truly want to see how a site works in the “hands of the masses” it is necessary to look at it on another computer where the latest and greatest browser isn’t installed and they are still running on 512Mb of RAM. In my new “personal adventure” of running 1000 miles in a year I am using the NikePlus website to track my performance but it uses so much Flash, so much animation and “looks” so modern and beautiful that I am struggling to use it even on my most recent laptop. It needs a “dumb down” button (maybe its there but I’m dumb enough to not see it).

We know we need to change some of the ChemSpider website for ease of navigation, for ease of use and to cater with all of the browser dependencies that we see with just things such as copy and paste of long strings, word wrapped strings etc. They can all be fixed. We know that there is an abundance of functionality on the site that only a fraction of the user base will care about. Our focus since starting the ChemSpider project was to establish a high-quality dataset (much progress but a long way to go), provide useful functionality to our diverse user base (lots in place, more to add, some to remove), provide a “successful” experience that meant that users could get answers to questions/queries they asked and that the experience wouldn’t so challenging or mundane as to provide no value. Feedback to date suggests we’re doing okay but we’d like your feedback. Ultimately I’ll likely assemble this into a SurveyMonkey questionnaire but for brevity and early feedback I am interested in your comments to some of the following questions

1) What is your favorite piece of functionality on ChemSpider?

2) What is your LEAST favorite piece of functionality on ChemSpider?

3) If there was one new function you would like to see added/improved what would it be?

4) Assuming a scoring system of 1 to 10, 10 being the best, how well does the ChemSpider interface support your usage of the system?

5) Which public dataset would you most like to see integrated to ChemSpider?

Any other comments are of course welcomed. We will be working on usability over the next few months and it’s hard to please everybody but we’ll do what we can with the resources we have. A Survey Monkey questionnaire will show up in the future with more questions. Watch this space and check out the Craigslist article…I think you’ll enjoy it.

For those of you who read this blog you will be aware that it can take a lot of time just to get a single chemical curated against its correct associations of chemical names and synonyms. I’ve shown this for vancomycin, Taxol (1,2,3), Ginkgolide B and it is presently underway with Digitonin, though not yet complete. Working on one structure is hard enough. Building a database of a few thousand curated structures is difficult work yet the EBI did it, and did it well when they built ChEBI. ChEBI is also not perfect as we discovered working on vancomycin and I still find occasional small issues.

The EBI recently released the ChEMBL database. This is a much bigger resource as described at the home page for the resource here. The site states “ChEMBL is a database of ca. 500,000 bioactive compounds, their quantitative properties and bioactivities (binding constants, pharmacology and ADMET, etc). The data is abstracted and curated from the primary scientific literature and the data made available due to funding by the Wellcome Trust.” It is MUCH harder to curate larger databases and 1/2 a million records is a challenge.

I downloaded the data from the FTP site and took a browse of the data. There are definitely structures in the data file that we don’t have in ChemSpider but I found an issue with charge balance for many hundreds of records where the counterions were charged (for example, chloride or bromide) but the primary component was neutral. An example is here where the compound is named as a hydrochloride but the compound has the chloride anion. I think this likely arises from treatment with some type of standardizer so it should be a matter of changing the standardizer settings and regenerating. We deal with over 23 million compounds and have been through such issues ourselves when it comes to generation of structure images.

For an example of a rich record in ChEMBL take a look at this record showing the target, assay, activity type, value and reference all listed. ChEMBL is sure to be an invaluable reference for the Life Sciences.

I have never met Warren DeLano. But, I have respected him from afar for a long time. Warren is the developer of PyMol, an Open Source molecular visualization system that has made enormous contributions to the community and can produce stunning visualizations of Proteins. His impact on the field of protein visualization has been recognized many times by the community and his tools are used in labs all over the world. He has garnered respect across our community.

A few months ago I had the opportunity to spend an hour on the phone with him after he had made such positive comments when the RSC acquired ChemSpider. We talked about Open Science, Open Source and models of business. We talked about the adventure of trying to change the world one step at a time by making our humble contributions to the world of science. By the end of our conversation I knew that when I met Warren we would be able to talk for many more hours as we shared many common views and, primarily, a want to make a difference.

Today I learned of the sad news that Warren had passed away. Despite the fact that I hadn’t yet managed to sit with Warren face to face I was immediately  saddened. My truth is that there is a specific type of shock I feel when someone younger than myself passes away. Warren and I talked about the impact of our chosen career paths on our relationships with our wives and the hours spent in front of a screen instead of spending them with those we share our lives with. We both reflected on the fact that we have given too much to the keyboard over the years driven by our need to make a difference. Warren’s hard work and superior programming skills and are paralleled by the fact that he was clearly a charitable contributor to science by giving his code away to the world and was, even based on only one phone call, a kind man.

My thoughts go out to his wife and family for his loss.

I’ve been in discussions with JC Bradley and Andy Lang about the Open Notebook Science Solubility Data project. Specifically we’ve been comparing  logP predictions from the CDK versus those listed on ChemSpider. We actually have six values of logP listed for some records. For example, for toluene we have 4 predicted values, 1 experimental value from a database and 1 experimental value from a publication. These are shown below:

toluene4 logpThere are three predicted logP values from three different algorithms (ACD/LogP, XlogP and AlogPs) as shown at the top of the figure. There is a predicted value and a database value from the EPISuite from the EPA (middle of the figure) and there is a LogP value from a publication with the link out indicated by the arrow (this datum was deposited by Egon Willighagen when he deposited the data from his publication). If you examine the list of data, both experimental and predicted, you will see a general value of  around 2.65+/- error. This should be compared with the CDK value listed in the ONS spreadsheet that gives a predicted value of 0.64. This was the primary reason that we were discussing the comparison…the values of predicted logP from CDK were different from the predicted values listed on ChemSpider for a number of examples in the spreadsheet.

Egon and I exchanged a couple of emails discussing the fact that logP predictions could be generated by a number of parties if there was a good Open Data training set available. A recent publication entitled “Calculation of Molecular Lipophilicity:State of the Art and Comparison of Log P Methods on More Than 96000 Compounds” performed a thorough analysis of different logP methods on a very large dataset. The publication is available online here. They compared “the predictive power of representative methods for one public (N = 266) and two in house datasets from Nycomed(N = 882) and Pfizer (N = 95 809). A total of 30 and 18 methods were tested for public and industrial datasets, respectively.” During the work they derived a simple equation based on the number of carbon atoms, NC, and the number of hetero atoms, NHET: log P = 1.46(±0.02) + 0.11(±0.001) NC – 0.11(±0.001) NHET. This equation was shown to outperform a large number of programs benchmarked in this study. This would certainly be easy to implement on ChemSpider and, just out of interest, applying this equation to toluene gives us a value of 2.23. Compare this with the values listed above.

Unfortunately there doesn’t appear to be too many Open logP datasets available around for people to use as training sets. Also, with the thorough work reported in the publication above is it necessary to build yet another logP prediction algorithm? ACD/Labs have made their logP prediction software free for download (, the VCCLab software is available for free (, the EPISuite software is available for free ( and if you just want to predict a value for a compound not on ChemSpider then you can use the services here:

However, even though there are a lot of predictors available it still makes sense to gather data and provide it as an experimental dataset, made available as Open Data for the developers of such algorithms to ake the benefits of structural diversity and fresh data to potentially improve their models. If you have any logP data available please point me to the data to download or contact me offline to discuss. We are presently working on enhancing our data model to provide improved access to experimental data on ChemSpider as well as access to the predicted data via web services. More to follow…