Archive for the Uncategorized Category

For the past few months we have been busily developing new functionality and capabilities for the ChemSpider platform with the intention of making navigation easier, enhancing integration to external resources, adding new rich data sources and providing access to brand new capabilities. This new functionality has been described in a series of recent blog posts today and is outlined below.

Improving the ChemSpider interface using tabbed infoboxes

Introducing NMR prediction capabilities to ChemSpider

Linking Google Patents searching to ChemSpider

Integrating RSC Databases into ChemSpider

Integrating RSC Publishing Beta into ChemSpider – includes integrations to Google Scholar, Google Books and Microsoft Academic Search

Buy me a Coffee

OVERVIEW

The LBNL Library is hosting a seminar for researchers interested in online collaboration, data storage and curation, data exchange, crowdsourcing, and open access.

This seminar will explore ChemSpider (http://www.chemspider.com/) – a free access service providing a structure centric community for chemists and the richest single source of structure-based chemistry information.

EVENT DETAILS

March 24, 2010 – Wednesday
3:00 p.m. – 4:30 p.m.
Building 50 Auditorium, Lawrence Berkeley National Laboratory

Bring your laptop for a hands-on demo session.”For non-Berkeley Lab personnel: Please contact Jeffery Loo (JLLoo@lbl.gov) by Monday, March 22, 12:00 p.m. for a visitor pass and shuttle bus directions.  A visitor pass is required for entry into the Berkeley Lab by guests.

ABSTRACT

The increasing availability of free and open access resources for scientists on the internet presents us with a revolution in data availability. The Royal Society of Chemistry hosts ChemSpider, a free access website for chemists built with the intention of building community for chemists (http://www.chemspider.com/).

ChemSpider is an aggregator of chemistry related information, at present over 20 million unique chemical entities linked out to over 300 separate data sources, ChemSpider has taken on the task of both robotically and manually curating publicly available data sources. It is also a public deposition platform where chemists can deposit their own data including novel structures, analytical data, synthesis procedures and host data associated with the growing activities associated with Open Notebook Science.

This presentation will examine chemistry on the internet, the dubious quality of what is available and how the ChemSpider crowdsourced curation platform is fast becoming one of the centralized hubs for resourcing information about chemical entities.

We will also review our efforts to provide free resources for synthesis procedures, spectral data and structure-based searching of the chemistry literature and how chemists can contribute directly to each of these projects.

Following the presentation and a question and answer session, a hands on session showing how to search for, curate and deposit data on ChemSpider will be given for interested parties.

SPEAKER

Profile Photo

Antony Williams, PhD, is a leader in the domain of free access chemistry. He is the Vice President of Strategic Development at the Royal Society of Chemistry and is the host of ChemSpider, a free online structure centric community for chemists.

ChemSpider began as a hobby project in a basement and went on to become one of the most popular Chemistry websites with the highest quality of data available online. Antony spent over a decade in the commercial scientific software business as Chief Science Officer for ACD/Labs, one of the domain leaders in scientific software. He is an accomplished NMR spectroscopist with over 100 peer-reviewed publications. During his career he was the NMR Technology Leader for the Eastman-Kodak company and has worked in both academia and national government research institutions.

Buy me a Coffee

We are presently receiving sign ups for our training session on ChemSpider. The session will be on Monday afternoon between 4-6pm (details below) It is free to attend and we’d love to see you there if you are in San Francisco at that time. Sign up here…

Royal Society of Chemistry
How to get started with ChemSpider – Searching, Structure Deposition and Database Curation
Instructor(s): Antony Williams, VP Strategic Development ChemSpider
Where: Moscone Center
Room: 110
When: Monday, March 22, 4:00 PM – 6:00 PM
>> Click here to register for this workshop
This session will give the opportunity to learn more about how to search ChemSpider, how to deposit your structures and how you can participate in curation of the data.  Presenter: Antony Williams, VP Strategic Development ChemSpider

Buy me a Coffee

I was sitting down today to review what presentations are coming up in the next few weeks and how much writing and travel was ahead of me. Ugh. Painful. During the next few weeks of conference season there will be a lot of talks and, as usual, a lot of late nights before the presentations to write new talks or modify existing talks. I will be at the ACS meeting in San Francisco this spring and will be giving four presentations, a poster and leading a training session on ChemSpider. The presentations are outlined below. Looking forward to seeing you there and it would be great to hear from any of you who would like to get together and connect about community chemistry over a coffee.

Presentation: Utilizing ChemSpider as a platform for education and exposure of student data to the community.

Educators and students now have access to rich internet resources of information. RSC’s ChemSpider is a community resource of structure-based chemistry delivering data including chemical compound collections, reaction synthesis procedures, physicochemical property and various forms of spectral data. ChemSpider offers the opportunity for the community to participate in populating, annotating and curating the data on ChemSpider. We believe that ChemSpider offers an opportunity for educators and students to participate in the ongoing development of a rich resource for the chemistry community. This presentation will suggest some potential uses of the ChemSpider website in terms of integrating into lesson plans. We will also outline how students can expose their structure and reaction-based research work via the ChemSpider platform for the benefit of the community and their online scientific reputation.

Presentation: ChemSpider – How An Online Resource of Chemical Compounds, Reaction Syntheses, and Property Data Can Support Green Chemistry

ChemSpider is an online database containing in excess of 20 million chemical compounds and associated experimental and predicted physicochemical data, reaction synthesis details and analytical data. A significant amount of the data contained within the database has been harvested and collated from a number of inventory systems and integrated to provide a centralized resource for the community. The ChemSpider database has the added benefit of being available for community deposition, annotation and curation. As a result it offers the potential for researchers to share their latest research with the public and participate in the creation of a rich resource of chemistry related information for the Green Chemistry community. This presentation will provide an overview of present capabilities and discuss the future vision for the platform.

Presentation: ChemSpider, how a free community resource of data can support teaching NMR spectroscopy

ChemSpider is an online database of chemical compounds, reaction syntheses and analytical data. Provided by the Royal Society of Chemistry, our intention is to provide a free internet resource of chemistry related data for the community. ChemSpider is unique in its role of allowing user depositions of chemical structures, synthesis procedures and analytical data and, in so doing, provides an environment for crowdsourced gathering of information. To date over 2000 1D and 2D NMR spectra have been deposited online by the community and are available for reuse. The data have been used as the basis of a spectral game whereby students can learn NMR by interacting with the data. This presentation will provide an overview of the tools and capabilities presently available on ChemSpider to support teaching NMR in the undergraduate curriculum and will outline how the community can participate in enriching this resource for the benefit of all.

Presentation: Enhancing discoverability across Royal Society of Chemistry content by integrating to ChemSpider, an online database of chemical structures

The ability to query across a chemistry publishers content using chemical structure searching can dramatically enhance discoverability. RSC has been applying a number of procedures to integrate RSC’s ChemSpider community resource with our published content and databases. These include: 1) entity extraction procedures 2) chemical name conversion procedures using software algorithms and curated dictionaries 3) semantic markup and 4) a crowdsourced curation processes. This presentation will provide an overview of the processes we have utilized in order to provide structure-based integration to RSC content. We will discuss our ongoing efforts to extend the approaches to the mining of data from the rich supplementary information sections of many RSC publications. Our intention is to provide access to synthesis procedures and analytical data and further enrich the ChemSpider database for the benefit of the chemistry community.

Poster: Utilizing ChemSpider as a platform for education and exposure of student data to the community

Buy me a Coffee

Recently I announced the release of ChemSpider  SyntheticPages. We are honored to have an editorial board of chemists to assist in directing the project and they are introduced below:

  • Kevin Booker-Milburn

    Kevin Booker-Milburn is a Professor of Synthetic Chemistry in the School of Chemistry at the University of Bristol, UK. He has 20 years research experience in broad aspects of synthetic chemistry and in recent years has focused on the development of new synthetic methods for use in the total synthesis of natural products such as terpenes and alkaloids; specifically developing and applying novel photochemical and transition metal techniques. He is Director of the Bristol Chemical Synthesis Doctoral Training Centre, an EPSRC and Industry funded initiative which has a bold vision to train a new generation of researchers for the chemical industry and academe.

  • Jean-Claude Bradley

    Jean-Claude Bradley is an Associate Professor of Chemistry at Drexel University. He leads the UsefulChem project, an initiative started in the summer of 2005 to make the scientific process as transparent as possible by publishing all research work in real time to a collection of public blogs, wikis and other web pages. Jean-Claude coined the term Open Notebook Science (ONS) to distinguish this approach from other more restricted forms of Open Science. Jean-Claude has a Ph.D. in organic chemistry and has published articles and obtained patents in the areas of synthetic and mechanistic chemistry, gene therapy, nanotechnology and scientific knowledge management.

  • Stephen Caddick

    Stephen Caddick is a Professor of Organic Chemistry and Chemical Biology and Head of Department of Chemistry at UCL. He was previously at the University of Sussex (1993 – 2003). His research interests include Organic Synthesis and Synthetic Methodology, Chemical Biology and Structural Biology and Catalysis.

  • Peter Scott

    Peter Scott is a Professor of Chemistry at the University of Warwick, UK, and was formerly at the University of Sussex. His research is focussed on metallo-organic chemistry and mechanism, and specifically in chiral systems for enantioselective catalysis, polymer synthesis, materials science and healthcare. He has interests in how universities and industry can work together, and is Director of Warwick Chemistry’s EPSRC funded PhD with Industrial Collaboration, and also of Warwick Knowledge Transfer Secondments.

  • Martin A. Walker

    Martin A. Walker is an assistant professor of organic chemistry at the State University of New York at Potsdam. He previously worked in the fine chemicals industry for 12 years. His interests center on organic synthesis methodology, particularly green chemistry, as well as chemical information. He is active on Wikipedia, where he contributes to chemistry content and coordinates the Wikipedia 1.0 project, preparing offline releases of Wikipedia.

Stephen Caddick, Peter Scott, Kevin Booker-Milburn and Max Hammond were the original founders of SyntheticPages.org, an online database for chemical transformations. The data from SyntheticPages has been used as the seed data for ChemSpider SyntheticPages.

Buy me a Coffee

A couple of days ago I came across a video on YouTube about “Water Marbles”. I’ve inserted it below…I recommend watching it…it’s excellent!

It’s excellent because by time I had finished watching this I was both excited and confused. Confused because how could I not of heard of this experiment. Even if it was to work why were those spheres so big and uniform? Excited because I’d been looking for some good kitchen chemistry to do with my kids and this would be a great example. I couldn’t really get my head around how the observations were working but on a rushed grocery expedition prior to going into ScienceOnline2010 #scio10 this part weekend I threw everything necessary into the grocery basket to repeat the experiment.

At ScienceOnline2010 I was involved in a number of discussions, as usual, regarding data quality, curation and assertions….this being based on my experience with curating the ChemSpider database. Today I sat in on a discussion entitled “Getting the Science Right: The importance of fact checking mainstream science publications — an underappreciated and essential art — and the role scientists can and should (but often don’t) play in it – Rebecca Skloot, Sheril Kirshenbaum, and David Dobbs.” it was an interesting exchange with comments such as “newspapers and magazines don’t check facts” and the urban myth that a one minute kiss burns 26 calories while the fact is that a Hershey’s Kiss contains 26 calories.

Post ScienceOnline2010 I got home this afternoon to find my kids desperately wanting to do kitchen chemistry so, with pessimism I started to work through the experiment with them. They mixed and stirred and cooled and heated. They got to see a lot fizzing and to see crystals grow which they thought was great. It of course failed dismally as it has for many other people, including this guy, but they had a great time. In parallel I was doing some fact-checking to see whether or not to prepare them for disappointment.

There have been a lot of exchanges online about this topic of water marbles with chemists exchanging concepts about the science behind it if it did work. See here for example. The video has gone viral across many sites. Very impressive for a hoax really…and it did get me interested in doing kitchen chemistry. The truth is a lot easier though…and still good chemistry! Watch Steve Spangler in action below…

The polymer beads can be bought here.

There’s more Kitchen Chemistry to come but I think I’ll stick to some of Theodore Gray’s guidance …maybe time for some Mad Science at home

Buy me a Coffee

I’m off to ScienceOnline2010 in a few minutes. It’s the last day of the conference and the experience has been a highly positive one. I’ve finally met people face to face that I have been connected with for over 2 years….and congruency is always good…they are as interesting, passionate and generally nice people face to face as they are online. I also managed to catch up with a number of old friends. I got to meet some new people focused on changing the flow of communication for ScienceOnline and working hard to do so. #scio10 is different….there’s an energy in the air that I haven’t experienced at any other scientific gathering other than SciFoo. This is an audience that is introducing me to social networking tools that I’ve never heard of…that doesn’t happen often. It has to be that over half the attendees are twittering. iPhones are everywhere. Flips are out capturing video in the sessions and are uploaded online shortly thereafter. The conversations are open, opinionated, full of energy and motivating. This is MY type of conference and I’m fortunate to live less than half an hour away.

The dinner event was fun, giggly, five minute “Ignite” talks were given (I gave two …one on Curating Chemistry online and one with JC Bradley regarding the spectral game). The first of those is linked here and shown below.

Today I will be giving a live demo of ChemSpider to anyone interested and around at the end of the conference. It’s nasty weather so people might be leaving early.

I found myself a virtual running partner for my 1000 miles in a year challenge assuming my calf muscle tear heals. We’re going to try and figure out how to raise money for asthma. Anyone want to join us as to form a virtual team let me know…

Bora and Anton have done a tremendous job organizing the conference. Clearly there is a great team supporting them and the Sigma Xi facility is excellent. Terrific conference all around….glad I spent the weekend this way…

Buy me a Coffee

Wired Magazine is my favorite monthly read. I get a lot of magazines delivered to the house for our family to browse through and these include Popular Mechanics, Popular Science, Science Illustrated and then additionally Chemistry World, C&E News, Drug Discovery News and a lot of the other trade magazines. Nevertheless, after the books I am reading (and I am presently reading Dr Mary’s Monkey as a follow on regarding the SV-40 cancer causing monkey virus in polio vaccines) Wired magazine is always the next thing I pick up.  It’s an easy read, some great short snippets for when I’m sitting on a stationary bike flipping pages or some long interesting articles, always well written. I recently read an old Wired magazine that had been on my stack for a few weeks and wish I’d read it earlier. We’ve been discussing the importance of user interface on ChemSpider and it’s impact and influence on the users of the website. This connected to the article on Craigslist that was covered in Wired Magazine.

Now, if you don’t know what Craigslist is then how about eBay? I’ll assume you know, and use, eBay. I use eBay…I like it. I’ve used Craigsist and like it, but for a different reason than I like eBay. Here is an interesting statement about Craigslist from the article: “With more than 47 million unique users every month in the US alone—nearly a fifth of the nation’s adult population—it is the most important community site going and yet the most underdeveloped.” The article goes on to tell the story about how confusing the site is, how poor the aesthetics are and how non-Web 2.0 it is in terms of integration access etc. I recommend it as a fun read, if nothing else to get a handle on Craig Newmark, the interesting (and VERY rich man) behind the initial concept. As a historical article regarding how early technology can morph over time into something more flashy but not necessarily more successful it’s a great read.Wired was convinced that people would want to give some input on how Craigslist should be improved and set up their Extreme MakeOver: Craigslist Edition for user comments. I doubt that Newmark and colleagues will pay much attention and, based on stats available to date, they don’t need to.

Another interesting read is a separate article regarding eBay vs Craigslist and the fact that Google and Microsoft actually tried to get into the same sector and both failed. What’s the magic, the secret sauce, the USP (unique selling points) for Craiglist? I’m read a number of suggestions but am not sure of the conclusions. I think its a combination of: 1) old and less complicated technology for novice users (searching means scrolling in a lot of cases) 2) traction …it’s been around a long time and 3) price for people to post ads. The bottom line though, of relevance to our discussions, is that “it ain’t the user interface!”.

Let’s be honest, technology is fun, especially when you work in our domain of building an internet for chemistry. Over the past few years I have upgraded from computer to computer, operating system to operating system (with Vista the worst transition but now loving Windows 7), from browser to browser (i have three installed: IE8, FireFox and Google Chrome with FF my preferred). I would say that while I am not at the bleeding edge of technologies I have access to more advanced systems than the majority of users in schools, homes and the rest of the world especially when taking into account that I have good, solid high speed access, both wireless-N and cabled in our house. If you truly want to see how a site works in the “hands of the masses” it is necessary to look at it on another computer where the latest and greatest browser isn’t installed and they are still running on 512Mb of RAM. In my new “personal adventure” of running 1000 miles in a year I am using the NikePlus website to track my performance but it uses so much Flash, so much animation and “looks” so modern and beautiful that I am struggling to use it even on my most recent laptop. It needs a “dumb down” button (maybe its there but I’m dumb enough to not see it).

We know we need to change some of the ChemSpider website for ease of navigation, for ease of use and to cater with all of the browser dependencies that we see with just things such as copy and paste of long strings, word wrapped strings etc. They can all be fixed. We know that there is an abundance of functionality on the site that only a fraction of the user base will care about. Our focus since starting the ChemSpider project was to establish a high-quality dataset (much progress but a long way to go), provide useful functionality to our diverse user base (lots in place, more to add, some to remove), provide a “successful” experience that meant that users could get answers to questions/queries they asked and that the experience wouldn’t so challenging or mundane as to provide no value. Feedback to date suggests we’re doing okay but we’d like your feedback. Ultimately I’ll likely assemble this into a SurveyMonkey questionnaire but for brevity and early feedback I am interested in your comments to some of the following questions

1) What is your favorite piece of functionality on ChemSpider?

2) What is your LEAST favorite piece of functionality on ChemSpider?

3) If there was one new function you would like to see added/improved what would it be?

4) Assuming a scoring system of 1 to 10, 10 being the best, how well does the ChemSpider interface support your usage of the system?

5) Which public dataset would you most like to see integrated to ChemSpider?

Any other comments are of course welcomed. We will be working on usability over the next few months and it’s hard to please everybody but we’ll do what we can with the resources we have. A Survey Monkey questionnaire will show up in the future with more questions. Watch this space and check out the Craigslist article…I think you’ll enjoy it.

Buy me a Coffee

For those of you who read this blog you will be aware that it can take a lot of time just to get a single chemical curated against its correct associations of chemical names and synonyms. I’ve shown this for vancomycin, Taxol (1,2,3), Ginkgolide B and it is presently underway with Digitonin, though not yet complete. Working on one structure is hard enough. Building a database of a few thousand curated structures is difficult work yet the EBI did it, and did it well when they built ChEBI. ChEBI is also not perfect as we discovered working on vancomycin and I still find occasional small issues.

The EBI recently released the ChEMBL database. This is a much bigger resource as described at the home page for the resource here. The site states “ChEMBL is a database of ca. 500,000 bioactive compounds, their quantitative properties and bioactivities (binding constants, pharmacology and ADMET, etc). The data is abstracted and curated from the primary scientific literature and the data made available due to funding by the Wellcome Trust.” It is MUCH harder to curate larger databases and 1/2 a million records is a challenge.

I downloaded the data from the FTP site and took a browse of the data. There are definitely structures in the data file that we don’t have in ChemSpider but I found an issue with charge balance for many hundreds of records where the counterions were charged (for example, chloride or bromide) but the primary component was neutral. An example is here where the compound is named as a hydrochloride but the compound has the chloride anion. I think this likely arises from treatment with some type of standardizer so it should be a matter of changing the standardizer settings and regenerating. We deal with over 23 million compounds and have been through such issues ourselves when it comes to generation of structure images.

For an example of a rich record in ChEMBL take a look at this record showing the target, assay, activity type, value and reference all listed. ChEMBL is sure to be an invaluable reference for the Life Sciences.

Buy me a Coffee

I have never met Warren DeLano. But, I have respected him from afar for a long time. Warren is the developer of PyMol, an Open Source molecular visualization system that has made enormous contributions to the community and can produce stunning visualizations of Proteins. His impact on the field of protein visualization has been recognized many times by the community and his tools are used in labs all over the world. He has garnered respect across our community.

A few months ago I had the opportunity to spend an hour on the phone with him after he had made such positive comments when the RSC acquired ChemSpider. We talked about Open Science, Open Source and models of business. We talked about the adventure of trying to change the world one step at a time by making our humble contributions to the world of science. By the end of our conversation I knew that when I met Warren we would be able to talk for many more hours as we shared many common views and, primarily, a want to make a difference.

Today I learned of the sad news that Warren had passed away. Despite the fact that I hadn’t yet managed to sit with Warren face to face I was immediately  saddened. My truth is that there is a specific type of shock I feel when someone younger than myself passes away. Warren and I talked about the impact of our chosen career paths on our relationships with our wives and the hours spent in front of a screen instead of spending them with those we share our lives with. We both reflected on the fact that we have given too much to the keyboard over the years driven by our need to make a difference. Warren’s hard work and superior programming skills and are paralleled by the fact that he was clearly a charitable contributor to science by giving his code away to the world and was, even based on only one phone call, a kind man.

My thoughts go out to his wife and family for his loss.

Buy me a Coffee

I’ve been in discussions with JC Bradley and Andy Lang about the Open Notebook Science Solubility Data project. Specifically we’ve been comparing  logP predictions from the CDK versus those listed on ChemSpider. We actually have six values of logP listed for some records. For example, for toluene we have 4 predicted values, 1 experimental value from a database and 1 experimental value from a publication. These are shown below:

toluene4 logpThere are three predicted logP values from three different algorithms (ACD/LogP, XlogP and AlogPs) as shown at the top of the figure. There is a predicted value and a database value from the EPISuite from the EPA (middle of the figure) and there is a LogP value from a publication with the link out indicated by the arrow (this datum was deposited by Egon Willighagen when he deposited the data from his publication). If you examine the list of data, both experimental and predicted, you will see a general value of  around 2.65+/- error. This should be compared with the CDK value listed in the ONS spreadsheet that gives a predicted value of 0.64. This was the primary reason that we were discussing the comparison…the values of predicted logP from CDK were different from the predicted values listed on ChemSpider for a number of examples in the spreadsheet.

Egon and I exchanged a couple of emails discussing the fact that logP predictions could be generated by a number of parties if there was a good Open Data training set available. A recent publication entitled “Calculation of Molecular Lipophilicity:State of the Art and Comparison of Log P Methods on More Than 96000 Compounds” performed a thorough analysis of different logP methods on a very large dataset. The publication is available online here. They compared “the predictive power of representative methods for one public (N = 266) and two in house datasets from Nycomed(N = 882) and Pfizer (N = 95 809). A total of 30 and 18 methods were tested for public and industrial datasets, respectively.” During the work they derived a simple equation based on the number of carbon atoms, NC, and the number of hetero atoms, NHET: log P = 1.46(±0.02) + 0.11(±0.001) NC – 0.11(±0.001) NHET. This equation was shown to outperform a large number of programs benchmarked in this study. This would certainly be easy to implement on ChemSpider and, just out of interest, applying this equation to toluene gives us a value of 2.23. Compare this with the values listed above.

Unfortunately there doesn’t appear to be too many Open logP datasets available around for people to use as training sets. Also, with the thorough work reported in the publication above is it necessary to build yet another logP prediction algorithm? ACD/Labs have made their logP prediction software free for download (http://www.acdlabs.com/download/logp.html), the VCCLab software is available for free (http://www.vcclab.org/lab/alogps/), the EPISuite software is available for free (http://www.epa.gov/oppt/exposure/pubs/episuite.htm) and if you just want to predict a value for a compound not on ChemSpider then you can use the services here: http://www.chemspider.com/Services.aspx.

However, even though there are a lot of predictors available it still makes sense to gather data and provide it as an experimental dataset, made available as Open Data for the developers of such algorithms to ake the benefits of structural diversity and fresh data to potentially improve their models. If you have any logP data available please point me to the data to download or contact me offline to discuss. We are presently working on enhancing our data model to provide improved access to experimental data on ChemSpider as well as access to the predicted data via web services. More to follow…

Buy me a Coffee

We get a lot of kudos for what we do with ChemSpider and we appreciate it. Sometimes there is an email that comes in that just makes me smile. One from this week is shown below…it’s nice to be appreciated!

“Dr ChemSpider,
GOD BLESS you and your website! My classmate and I just wanted you to know that we appreciate your website to the UTMOST!! you saved us hours upon hours of work… we have been spending hours trying to figure out a structure from our lab reaction product. THANKS for the awesome website, we are now able to further our knowledge in organic chemistry!!!”

Buy me a Coffee

The ChemSpider blog has become very quiet in many ways. For that I am both saddened and realistic….we are very busy with working on improvements to ChemSpider both in the functionality and to the overall infrastructure. You will see these roll out in the near future. I personally am traveling a lot more than previously and engaged in the writing of many articles and presentations. My backlog of articles is over half a dozen and more than that in presentations to prepare. Add to that H1N1 through the household, one little boy in our family with pneumonia and my intention to participate in a mini-triathlon next year and to see that I am distracted would be an understatement.

I hope this “bad news” post is the first of many to get me active on the blog. This bad news post is actually a good news post, we hope. We have been seeing some conflicts between backups and server performance and need to apply some Microsoft Hotfixes and will be taking the system down on Wednesday for about 30 minutes as announced on the HomePage. Our apologies if it causes a disruption.

Service Interruption 07/10/2009
Due to essential maintenance ChemSpider will be unavailable during the following period:
07/10/2009 from 10:30 GMT until 11:00am GMT
We apologise for any inconvenience this may cause.

Buy me a Coffee

The ACS meeting in Washington was good for ChemSpider and the team in a number of ways. ChemSpider garnered a lot of attention so that was a relief. More than that though was the fact that the ACS was the culmination of weeks of efforts by an extended team of people in the informatics group, our internal and external marketing groups and the development team.ChemSpider was “everywhere” at the ACS…it was really about “getting you there”..see the side of the bus below!

buslogo

We showed a number of new things at the ChemSpider booth. We certainly had our new look and feel in terms of the logo and visual aesthetics. Two of the most exciting capabilities that we introduced that had the majority of people smiling at were the introduction of integration to the SureChem patent portal described previously and our new integration to the Pubmed web services. If you haven’t seen the integration to the Pubmed integration yet you’ll likely appreciate this!

I can explain the process in detail but I think the video itself tells the story best. What we are doing is using validated synonyms to look up articles in PubMed. If there are cases where there are no PubMed articles it is VERY common that the synonym validation process will result in articles being recovered. This lends even more value to the structure-name curation process. The YouTube movie is below but an SWF form of the movie, easier to watch in my opinion, is here. Let me know which format you find better. It is easier to make YouTube only but I think for details SWF is better. Comments welcomed.

Buy me a Coffee

I am writing an editorial piece at present that necessitates the communication of what types of data we can host from users if they choose to use ChemSpider as a platform to host their data and interesting chemistry pieces. For example:

Hosting Reaction Details: The Synthesis of cis-Bicyc​lo[3.3.0]​octane-3,​7-dione

Chemistry Movie: Photochromism in action

Spectral data in abundance: Spectra of aspirin (click on the green image to view)

Open Notebook Science report: An analysis of  the spectrum of Cholesterol

List of publications: A long list of publications associated with cholesterol

The Linked Wikipedia Article: Xanax

Buy me a Coffee

new-logoWe are just about to head off to the IUPAC Congress in Glasgow and unveil a spiffing new booth. In preparation for the unveiling of our new logo we’ve done some editing to the website and changed the look and feel of some of the pages. These are mostly cosmetic at present and there is little change to the core functionality of the site but we hope that some of the changes make the site a little easier to navigate.

This is the first work we are doing to improve the website and to roll out a redesign of the logo (look out for that logo at the ACS meeting in Washington in a couple of weeks…you’ll see it in a few places and we will have our own booth there too). Over the next few weeks we will be working further to improve the usability and flow of the website and to enhance the core functionality of the platform. Watch this space.

We welcome your feedback on the new logo and, if you don’t see it on the ChemSpider website please refresh the stylesheet using Ctrl-F5.

Buy me a Coffee

ChemMobi, an application written by James Jack from Symyx has finally been posted to the App Store and can be downloaded, for free, and enable your iPhone to search both Symyx’s Discovery Gate and ChemSpider (using our web services). I’ve posted before about the work done by James (1,2) and it has now come to fruition with the first version of ChemMobi. If you are an iPhone user try it out and give us your feedback!

chemmobi

Reblog this post [with Zemanta]

Buy me a Coffee

Since it was easy to do we will bring back ChemSpider online in Read Only mode for you to ccontinue using if you need it. This will mean that the web services will all be returned also. The only things that will not be enabled are deposition, annotation and curation. In order to block these we have disabled login. While it will be possible to add comments please note that these will be dealt with on the RSC system following rollover to their systems.

Buy me a Coffee

Since the RSC acquired ChemSpider we have been working hard with the IT team in Cambridge to transfer ChemSpider from our servers and onto the RSC servers. This has been quite a significant undertaking as now we will be dealing with development servers, staging servers and live servers. This is a significant departure from the environment we have been working in for the past couple of years where code was published to the live environment for testing. Some would say this was risky but with the limited resources we had available at the time it was what it was….oh, and it worked!

We have already started testing the system on the RSC servers that will go live sometime early next week. At present the intended schedule is that we will be switching over sometime between Monday and Wednesday. Of course, this is an intention at present and, based on testing, this may change. For right now we have stopped depositions onto ChemSpider. If curation activities continue we will sync these over to the live server next week so no issues there. ChemSpider will go offline next week sometime and, as the actual data becmes clearer, the announcements will be updated.

Watch this space…ChemSpider is moving to the RSC servers and their will be disruptions in the next few days.

Buy me a Coffee

When I present on ChemSpider and talk about community participation one of the common questions is “how many people curate? deposit? annotate? records on ChemSpider”. It’s a low number for each but, in my estimation, it is in-keeping with how we operate as individuals. If you compare the number of people reading Wikipedia articles to writing them I judge it has to be a pretty high ratio of likely >5000:1. Even if its 1000:1 you get the point. More people use than contribute. It is the same for most everything that we use…Amazon book reviews, Netflix DVD reviews, things like that. It’s only when it’s “about us” that the majority of us tend to contribute – to our blogs, our LinkedIn profiles, our Twitter account, our Friendfeed discussions, our Facebook pages etc. I judge this is because it makes us directly visible…we are showing what we are interested in and taking owenership for our comments, activities etc. This is of course human nature…the majority of us have that “look at me” mentality and “connect with like minds” and it is, in many cases, that need for incoming voyeurism and participation that has driven the incredible shift to social networking we are encountering.

There are then the “servants for the community”. In this case I mean servants with the most positive connotation. Those who slave away on Wikipedia articles and don’t immediately have their names up in lights. You actually have to dig under an article to find out who wrote/contributed to it. It’s not upfront and center. On Wikipedia chemistry there are a very small number of dedicated individuals who contribute large blocks of time to working on Wikipedia to improve its quality and content. There is a Long Tail of contribution of course but you might be quite surprised by the small number of “primary” contributors. If you check out their Wiki pages however these individuals are recognized and commended within their own community of participation yet may never be known by the readers of the articles.

On ChemSpider we have a similar situation. There are a very small number of primary curators (I will name them: Myself, Heinz Kolshorn and Barrie Walker – these people are enhancing ChemSpider literally daily). We have a smaller number of secondary contributors who add a spectrum once in a while, annotate a record occasionally or curate out bad data. I would say this is about 30 other people. We also have people who provide us data to deposit and they do it willingly but don’t want to have a hands on approach to depositing data onto the database.

When I was in the UK recently during my first week of employment with the RSC I gave a number of presentations. There was a lot of interest in what ChemSpider could bring to the organization and offer the community and a lot of discussions regardng “what if”. Of the audiences I would suggest that only a small portion actually laid their hands on the system to investigate its capability and an even smaller fraction chose to jump in, feet first, and use the system and participate fully. There was one spike in particular. During the evening after one of the presentations I noticed that one individual in particular was adding comments to individual records, questioning names, suggesting that structure layouts be changed and examining links to external resources. The first evening there were a few edits. The next night, even more, and since then this individual has continued, unabated, making edits and now enhancing the articles with new information, in this case YouTube videos.

david-sharpe_50David Sharpe is fairly new to the RSC and is one of those people who just cares. A silent contibutor in the background (until today!) who is cleaning and enhancing ChemSpider for the sake of the community. To be clear, his work on these activities has been done in the evenings and weekends and this past weekend he was exchanging emails with me about adding “Element Videos” to the elements on ChemSpider. David’s been moving across the elements on ChemSpider and using the YouTube embed functionality to put the Periodic Table videos from the University of Nottingham into the Description section of the appropriate records.

Check out for example the video for Sulphur here. As we move forward we will layer on a recognition system for individuals contributing to ChemSpider so that we can track the spectral depositions, curations and so on. We believe that such efforts warrant recognition and applause. Of course some will choose to be anonymous and remain in the background making their difference in a silent manner. We honor you all.

Reblog this post [with Zemanta]

Buy me a Coffee

eyesOh boy do we have a lot of things to do with ChemSpider. Not only now, while shifting ChemSpider to the RSC infrastructure, but in the future as we do the work necessary to make ChemSpider the primary internet resource for structure-based chemistry. We don’t have small eyes in terms of what we want to deliver to the community. Far from it…we have big eyes and big ideas regarding what is possible and even, in most cases, how to get there. What is clear is that we need the appropriate skill sets to make it happen. At present all ChemSpider platform development work is done by our team over here in the US. We are looking to add a team member into the RSC Offices in Cambridge. We’re looking for someone with established Cheminformatics skills to work with us. They need to have an established track record in working in the field of Cheminformatics, have a deep knowledge of handling chemical structures, experience in working with web-based systems and, of course, have a big appetite for making a difference and wants to work with a fast-moving team. If you’re interested in talking with us about the opportunity ping me at antonyDOTwilliamsATchemspiderDOTcom.

Reblog this post [with Zemanta]

Buy me a Coffee

There are a small number of primary chemical vendors serving the industry. These include companies such as Sigma Aldrich, Spectrum Chemical, Alfa Aesar, ThermoFisher and many others. There are also thousands of smaller companies serving the industry with their chemicals. These can very from a dozen to a few hundred chemicals but rarely number into the 10s of thousands offered by the larger companies. The large chemical companies offer excellent services in terms of delivery of catalogs to the door and circulation of updated CDs of information. I find the Aldrich catalog an excellent tool and have one on my desk, underneath my Merck Index.

Those smaller chemical companies are in the long tail of suppliers that the majority of chemists will never even hear of. Not unless there is some way for those suppliers to deliver their message regarding their list of products, availability and overall their existence, to interested parties. In China specifically there are many hundreds of small chemical companies popping up now. They cannot afford to market themselves via CD distribution and catalogs to their potential userbase and have to depend on their website to market their wares. They likely deposit their collections to the Available Chemical Directory from Symyx (a GREAT product and with a lot of quality work going into it in the background!), maybe into ChemACX from Cambridgesoft, onto ChemExper or onto the eMolecules site. Some of these offer up to date pricing and procurement systems while others offer simply “Get me a Quote” services whereby a chemist can request a quote directly from the vendor for the material of interest.

ChemSpider has been depositing chemical compound collections for chemical vendors, both large and small, for many months. The word seems to have got out that there is value to doing this. Despite the fact that we do not have, at present, the ability to list real time or availability pricing for compounds chemical vendors appear to be deriving value from the listings and chemists are finding chemicals for purchase via ChemSpider.

if there is a certain small molecule chemical vendor that you think we should list on ChemSPider let them know to contact us OR point us to their URL and we will contact them. One example of data added just today is the data set, small though it is, from Asiaron. They offer rich compound pages like this and are a good addition to the database.

Reblog this post [with Zemanta]

Buy me a Coffee

james_jack_50I have ChemMobi running on my iPhone now and, I am happy to say, it looks just like it should. While visiting the RSC in Cambridge a couple of weeks ago I had a chance to hang out with James Jack, the Symyx consultant responsible for developing ChemMobi. That’s him on the left. No, that’s not him trying to hunt sharks with hand held harpoons, it’s him driving the “ChemSpider punt” in a race against the IT team from the RSC. Since we weren’t locals it seemed appropriate to challenge us to a speed punt down the river. This was of course preceded by the imbibing of adequate  amounts of flavored water and juices.

Strangely enough all of us in the ChemSpider punt did appear to have some undiscovered talents for punting. We very quickly lost the IT team back at the “juice house” and found them when we had finished our loop back from our destination. We realized that we had an unfair advantage since we had a dopted a strategy of punting from the surface of the vessel. They had not defined to us that they were doing the whole race in their own way…pushing with a pole while immersed. That’s our colleague Doug Spooner from the IT team showing us how to do it “IT style”. doug-in-cam

ChemMobi will soon be posted to the App Store for you all to download and use. I’ll let you know when…hopefully within a week. All glory, love and adoration for the App should go to James jack and to Symyx for allowing him to do what he does best…get creative with software and structures!

Reblog this post [with Zemanta]

Buy me a Coffee

It’s been a long time since I blogged here on the ChemSpider blog. Now I am officially an employee of the Royal Society of Chemistry and have spent a week in Cambridge meeting my new colleagues, discussing the transfer of ChemSpider to their servers for hosting and working on plans for a relaunch of ChemSpider later in the year. More about that later. I’ll be back in action on this blog in the coming week.

I actually write on two blogs. This one will now be dedicated to ChemSpider activities specifically and focus on new functionality, plans and vision for ChemSpider as a service. My other blog, the ChemConnector blog (www.chemconnector.com/chemunicating) will be more of a personal blog. My views of cheminformatics, activities  in Chemistry and Science, Open Science, Open Access and Open Data and other things that interest me.

Glad to be back and looking forward to connecting with everyone again.

Reblog this post [with Zemanta]

Buy me a Coffee

taxol1A couple of days ago I asked whether readers could see any issues with the structure of Micrococcin P1 published in the C&E News article this week. A few people took a stab on blog and off blog but only Stuart Cantrill from the Nature Publishing Group got it right. One double bond in the wrong place. Subtle, but rather important. General structure drawing tools will help with things like this. For example, a human might not see the issue in the structure of Taxol to the left very easily. Software tools designed to flag valency issues will show the issue easily.

In the expanded image the pentavalent carbon is marked. taxol2The same type of tools would have shown a positive charge on the sulphur in the ring for the incorrect structure of Micrococcin.In the same way, software tools can recognize charge imbalances and incomplete stereochemistry.

I sent an email to the editor of C&E News when I noticed the structure issue but didn’t get a response. Nevertheless it is an advantage of online publications that images can be swapped out easily. This has been done for the online article here at this point and the change, while subtle, is there (shown below). micrococcinp1_new-and-old

The structure is now on the ChemSpider database here.

Reblog this post [with Zemanta]

Buy me a Coffee