I will be presenting at the OpenSciNY 2010 conference on May 14th. OpenSciNY is a free, one-day conference on the impact of publicly accessible scientific tools & resources, open access publishing in the sciences, and open data/notebook efforts. I am looking forward to spending time with the attendees interested in these areas and being on the agenda with my fellow presenters, most of whom I know personally and have presented with on numerous occasions. In these gatherings, and with such a common mindset, the future of Open Science and its impact and contributions to society are clear. While there is much work to be done the momentum continues to gather. The future of OpenScience is exciting, stimulating and fun to envisage. Come along to OpenSciNY and engage with us!
Archive for the Uncategorized Category
Every year, Chemistry World and Education in Chemistry offer an internship over the summer for a would-be science writer to gain some experience working with two of the best chemistry magazines around.
The position is for 8 weeks (start/end dates negotiable) and comes with a bursary of £1750 sponsored by the Marriott bequest.
Activities undertaken would include researching and writing blog posts and news articles and recording podcasts for Chemistry World, writing a feature article for Education in Chemistry and pieces aimed at sixth-formers. They will also help lay out and proofread the print issue of Chemistry World.
For more details see : http://www.rsc.org/AboutUs/rscwork/Sciencewriterinternship.asp
Applicants should be members of the Royal Society of Chemistry. You can join up as affiliates at www.rsc.org/join.
A couple of weeks ago I gave a talk at the Lawrence Berkeley National Laboratory at the end of the ACS meeting. It was great to meet the attendees and share some good conversations about Open Data, Open Science and our efforts with ChemSpider. The talk was turned into a screencast and is shown below.
We’ve been depositing a lot of new data into ChemSpider over the past few weeks. We’ve been adding millions of new compounds from chemical vendors, from RSC databases and articles, from updated government databases, contributions from academia and from some of the online Open resources.
Recently I sat in on the presentation of Rich Apodaca who talked about ChemPedia. Rich shares a lot of the views that many of us do about the value of having open resources of chemical compounds online and has contributed ChemPedia to the domain. On Slide 20 of his presentation Rich gave an overview of a Missing Service that needed to provide a number of capabilities. These were an on-demand unique ID, expose a URL to link to the structure, support synonyms and integrate peer review. ChemSpider does all this with maybe one caveat…we expect the ID to include the URL….so http://www.chemspider.com/Chemical-Structure.2034.html or http://www.chemspider.com/2034 is the link to the structure that we assert is the structure of Xanax. If you want to add additional synonyms you can do so. If you want to curate, add comments etc you can (peer-review). If you want to add new compounds you can and you are issued a new ChemSpider ID. I would agree that our IDs are not as distinct as those that Rich and ChemPedia are generating..but they are of a similar format to PubChem IDs..i.e. “just numbers”. Check out ChemPedia and contribute! We are taking advantage of the fact that Rich makes the data Open for download and download the last iteration (664 compounds) and deposited them to ChemSpider here.
Tuesday morning at the ACS meeting here in San Francisco…two talks done, one 2 hour training session completed, one poster presented and two talks left to give before heading off to the Lawrence Berkeley National Laboratory to give my final talk before the dreaded red-eye home. I am so looking forward to sitting on a cramped plane overnight…
My presentations delivered so far are already on SlideShare and are linked below for display.
For the past few months we have been busily developing new functionality and capabilities for the ChemSpider platform with the intention of making navigation easier, enhancing integration to external resources, adding new rich data sources and providing access to brand new capabilities. This new functionality has been described in a series of recent blog posts today and is outlined below.
The LBNL Library is hosting a seminar for researchers interested in online collaboration, data storage and curation, data exchange, crowdsourcing, and open access.
This seminar will explore ChemSpider (http://www.chemspider.com/) – a free access service providing a structure centric community for chemists and the richest single source of structure-based chemistry information.
March 24, 2010 – Wednesday
3:00 p.m. – 4:30 p.m.
Building 50 Auditorium, Lawrence Berkeley National Laboratory
Bring your laptop for a hands-on demo session.”For non-Berkeley Lab personnel: Please contact Jeffery Loo (JLLoo@lbl.gov) by Monday, March 22, 12:00 p.m. for a visitor pass and shuttle bus directions. A visitor pass is required for entry into the Berkeley Lab by guests.
The increasing availability of free and open access resources for scientists on the internet presents us with a revolution in data availability. The Royal Society of Chemistry hosts ChemSpider, a free access website for chemists built with the intention of building community for chemists (http://www.chemspider.com/).
ChemSpider is an aggregator of chemistry related information, at present over 20 million unique chemical entities linked out to over 300 separate data sources, ChemSpider has taken on the task of both robotically and manually curating publicly available data sources. It is also a public deposition platform where chemists can deposit their own data including novel structures, analytical data, synthesis procedures and host data associated with the growing activities associated with Open Notebook Science.
This presentation will examine chemistry on the internet, the dubious quality of what is available and how the ChemSpider crowdsourced curation platform is fast becoming one of the centralized hubs for resourcing information about chemical entities.
We will also review our efforts to provide free resources for synthesis procedures, spectral data and structure-based searching of the chemistry literature and how chemists can contribute directly to each of these projects.
Following the presentation and a question and answer session, a hands on session showing how to search for, curate and deposit data on ChemSpider will be given for interested parties.
Antony Williams, PhD, is a leader in the domain of free access chemistry. He is the Vice President of Strategic Development at the Royal Society of Chemistry and is the host of ChemSpider, a free online structure centric community for chemists.
ChemSpider began as a hobby project in a basement and went on to become one of the most popular Chemistry websites with the highest quality of data available online. Antony spent over a decade in the commercial scientific software business as Chief Science Officer for ACD/Labs, one of the domain leaders in scientific software. He is an accomplished NMR spectroscopist with over 100 peer-reviewed publications. During his career he was the NMR Technology Leader for the Eastman-Kodak company and has worked in both academia and national government research institutions.
We are presently receiving sign ups for our training session on ChemSpider. The session will be on Monday afternoon between 4-6pm (details below) It is free to attend and we’d love to see you there if you are in San Francisco at that time. Sign up here…
Royal Society of Chemistry
How to get started with ChemSpider – Searching, Structure Deposition and Database Curation
Instructor(s): Antony Williams, VP Strategic Development ChemSpider
Where: Moscone Center
When: Monday, March 22, 4:00 PM – 6:00 PM
>> Click here to register for this workshop
This session will give the opportunity to learn more about how to search ChemSpider, how to deposit your structures and how you can participate in curation of the data. Presenter: Antony Williams, VP Strategic Development ChemSpider
I was sitting down today to review what presentations are coming up in the next few weeks and how much writing and travel was ahead of me. Ugh. Painful. During the next few weeks of conference season there will be a lot of talks and, as usual, a lot of late nights before the presentations to write new talks or modify existing talks. I will be at the ACS meeting in San Francisco this spring and will be giving four presentations, a poster and leading a training session on ChemSpider. The presentations are outlined below. Looking forward to seeing you there and it would be great to hear from any of you who would like to get together and connect about community chemistry over a coffee.
Presentation: Utilizing ChemSpider as a platform for education and exposure of student data to the community.
Educators and students now have access to rich internet resources of information. RSC’s ChemSpider is a community resource of structure-based chemistry delivering data including chemical compound collections, reaction synthesis procedures, physicochemical property and various forms of spectral data. ChemSpider offers the opportunity for the community to participate in populating, annotating and curating the data on ChemSpider. We believe that ChemSpider offers an opportunity for educators and students to participate in the ongoing development of a rich resource for the chemistry community. This presentation will suggest some potential uses of the ChemSpider website in terms of integrating into lesson plans. We will also outline how students can expose their structure and reaction-based research work via the ChemSpider platform for the benefit of the community and their online scientific reputation.
Presentation: ChemSpider – How An Online Resource of Chemical Compounds, Reaction Syntheses, and Property Data Can Support Green Chemistry
ChemSpider is an online database containing in excess of 20 million chemical compounds and associated experimental and predicted physicochemical data, reaction synthesis details and analytical data. A significant amount of the data contained within the database has been harvested and collated from a number of inventory systems and integrated to provide a centralized resource for the community. The ChemSpider database has the added benefit of being available for community deposition, annotation and curation. As a result it offers the potential for researchers to share their latest research with the public and participate in the creation of a rich resource of chemistry related information for the Green Chemistry community. This presentation will provide an overview of present capabilities and discuss the future vision for the platform.
Presentation: ChemSpider, how a free community resource of data can support teaching NMR spectroscopy
ChemSpider is an online database of chemical compounds, reaction syntheses and analytical data. Provided by the Royal Society of Chemistry, our intention is to provide a free internet resource of chemistry related data for the community. ChemSpider is unique in its role of allowing user depositions of chemical structures, synthesis procedures and analytical data and, in so doing, provides an environment for crowdsourced gathering of information. To date over 2000 1D and 2D NMR spectra have been deposited online by the community and are available for reuse. The data have been used as the basis of a spectral game whereby students can learn NMR by interacting with the data. This presentation will provide an overview of the tools and capabilities presently available on ChemSpider to support teaching NMR in the undergraduate curriculum and will outline how the community can participate in enriching this resource for the benefit of all.
Presentation: Enhancing discoverability across Royal Society of Chemistry content by integrating to ChemSpider, an online database of chemical structures
The ability to query across a chemistry publishers content using chemical structure searching can dramatically enhance discoverability. RSC has been applying a number of procedures to integrate RSC’s ChemSpider community resource with our published content and databases. These include: 1) entity extraction procedures 2) chemical name conversion procedures using software algorithms and curated dictionaries 3) semantic markup and 4) a crowdsourced curation processes. This presentation will provide an overview of the processes we have utilized in order to provide structure-based integration to RSC content. We will discuss our ongoing efforts to extend the approaches to the mining of data from the rich supplementary information sections of many RSC publications. Our intention is to provide access to synthesis procedures and analytical data and further enrich the ChemSpider database for the benefit of the chemistry community.
Poster: Utilizing ChemSpider as a platform for education and exposure of student data to the community
Recently I announced the release of ChemSpider SyntheticPages. We are honored to have an editorial board of chemists to assist in directing the project and they are introduced below:
A couple of days ago I came across a video on YouTube about “Water Marbles”. I’ve inserted it below…I recommend watching it…it’s excellent!
It’s excellent because by time I had finished watching this I was both excited and confused. Confused because how could I not of heard of this experiment. Even if it was to work why were those spheres so big and uniform? Excited because I’d been looking for some good kitchen chemistry to do with my kids and this would be a great example. I couldn’t really get my head around how the observations were working but on a rushed grocery expedition prior to going into ScienceOnline2010 #scio10 this part weekend I threw everything necessary into the grocery basket to repeat the experiment.
At ScienceOnline2010 I was involved in a number of discussions, as usual, regarding data quality, curation and assertions….this being based on my experience with curating the ChemSpider database. Today I sat in on a discussion entitled “Getting the Science Right: The importance of fact checking mainstream science publications — an underappreciated and essential art — and the role scientists can and should (but often don’t) play in it – Rebecca Skloot, Sheril Kirshenbaum, and David Dobbs.” it was an interesting exchange with comments such as “newspapers and magazines don’t check facts” and the urban myth that a one minute kiss burns 26 calories while the fact is that a Hershey’s Kiss contains 26 calories.
Post ScienceOnline2010 I got home this afternoon to find my kids desperately wanting to do kitchen chemistry so, with pessimism I started to work through the experiment with them. They mixed and stirred and cooled and heated. They got to see a lot fizzing and to see crystals grow which they thought was great. It of course failed dismally as it has for many other people, including this guy, but they had a great time. In parallel I was doing some fact-checking to see whether or not to prepare them for disappointment.
There have been a lot of exchanges online about this topic of water marbles with chemists exchanging concepts about the science behind it if it did work. See here for example. The video has gone viral across many sites. Very impressive for a hoax really…and it did get me interested in doing kitchen chemistry. The truth is a lot easier though…and still good chemistry! Watch Steve Spangler in action below…
The polymer beads can be bought here.
There’s more Kitchen Chemistry to come but I think I’ll stick to some of Theodore Gray’s guidance …maybe time for some Mad Science at home…
I’m off to ScienceOnline2010 in a few minutes. It’s the last day of the conference and the experience has been a highly positive one. I’ve finally met people face to face that I have been connected with for over 2 years….and congruency is always good…they are as interesting, passionate and generally nice people face to face as they are online. I also managed to catch up with a number of old friends. I got to meet some new people focused on changing the flow of communication for ScienceOnline and working hard to do so. #scio10 is different….there’s an energy in the air that I haven’t experienced at any other scientific gathering other than SciFoo. This is an audience that is introducing me to social networking tools that I’ve never heard of…that doesn’t happen often. It has to be that over half the attendees are twittering. iPhones are everywhere. Flips are out capturing video in the sessions and are uploaded online shortly thereafter. The conversations are open, opinionated, full of energy and motivating. This is MY type of conference and I’m fortunate to live less than half an hour away.
The dinner event was fun, giggly, five minute “Ignite” talks were given (I gave two …one on Curating Chemistry online and one with JC Bradley regarding the spectral game). The first of those is linked here and shown below.
Today I will be giving a live demo of ChemSpider to anyone interested and around at the end of the conference. It’s nasty weather so people might be leaving early.
I found myself a virtual running partner for my 1000 miles in a year challenge assuming my calf muscle tear heals. We’re going to try and figure out how to raise money for asthma. Anyone want to join us as to form a virtual team let me know…
Bora and Anton have done a tremendous job organizing the conference. Clearly there is a great team supporting them and the Sigma Xi facility is excellent. Terrific conference all around….glad I spent the weekend this way…
Wired Magazine is my favorite monthly read. I get a lot of magazines delivered to the house for our family to browse through and these include Popular Mechanics, Popular Science, Science Illustrated and then additionally Chemistry World, C&E News, Drug Discovery News and a lot of the other trade magazines. Nevertheless, after the books I am reading (and I am presently reading Dr Mary’s Monkey as a follow on regarding the SV-40 cancer causing monkey virus in polio vaccines) Wired magazine is always the next thing I pick up. It’s an easy read, some great short snippets for when I’m sitting on a stationary bike flipping pages or some long interesting articles, always well written. I recently read an old Wired magazine that had been on my stack for a few weeks and wish I’d read it earlier. We’ve been discussing the importance of user interface on ChemSpider and it’s impact and influence on the users of the website. This connected to the article on Craigslist that was covered in Wired Magazine.
Now, if you don’t know what Craigslist is then how about eBay? I’ll assume you know, and use, eBay. I use eBay…I like it. I’ve used Craigsist and like it, but for a different reason than I like eBay. Here is an interesting statement about Craigslist from the article: “With more than 47 million unique users every month in the US alone—nearly a fifth of the nation’s adult population—it is the most important community site going and yet the most underdeveloped.” The article goes on to tell the story about how confusing the site is, how poor the aesthetics are and how non-Web 2.0 it is in terms of integration access etc. I recommend it as a fun read, if nothing else to get a handle on Craig Newmark, the interesting (and VERY rich man) behind the initial concept. As a historical article regarding how early technology can morph over time into something more flashy but not necessarily more successful it’s a great read.Wired was convinced that people would want to give some input on how Craigslist should be improved and set up their Extreme MakeOver: Craigslist Edition for user comments. I doubt that Newmark and colleagues will pay much attention and, based on stats available to date, they don’t need to.
Another interesting read is a separate article regarding eBay vs Craigslist and the fact that Google and Microsoft actually tried to get into the same sector and both failed. What’s the magic, the secret sauce, the USP (unique selling points) for Craiglist? I’m read a number of suggestions but am not sure of the conclusions. I think its a combination of: 1) old and less complicated technology for novice users (searching means scrolling in a lot of cases) 2) traction …it’s been around a long time and 3) price for people to post ads. The bottom line though, of relevance to our discussions, is that “it ain’t the user interface!”.
Let’s be honest, technology is fun, especially when you work in our domain of building an internet for chemistry. Over the past few years I have upgraded from computer to computer, operating system to operating system (with Vista the worst transition but now loving Windows 7), from browser to browser (i have three installed: IE8, FireFox and Google Chrome with FF my preferred). I would say that while I am not at the bleeding edge of technologies I have access to more advanced systems than the majority of users in schools, homes and the rest of the world especially when taking into account that I have good, solid high speed access, both wireless-N and cabled in our house. If you truly want to see how a site works in the “hands of the masses” it is necessary to look at it on another computer where the latest and greatest browser isn’t installed and they are still running on 512Mb of RAM. In my new “personal adventure” of running 1000 miles in a year I am using the NikePlus website to track my performance but it uses so much Flash, so much animation and “looks” so modern and beautiful that I am struggling to use it even on my most recent laptop. It needs a “dumb down” button (maybe its there but I’m dumb enough to not see it).
We know we need to change some of the ChemSpider website for ease of navigation, for ease of use and to cater with all of the browser dependencies that we see with just things such as copy and paste of long strings, word wrapped strings etc. They can all be fixed. We know that there is an abundance of functionality on the site that only a fraction of the user base will care about. Our focus since starting the ChemSpider project was to establish a high-quality dataset (much progress but a long way to go), provide useful functionality to our diverse user base (lots in place, more to add, some to remove), provide a “successful” experience that meant that users could get answers to questions/queries they asked and that the experience wouldn’t so challenging or mundane as to provide no value. Feedback to date suggests we’re doing okay but we’d like your feedback. Ultimately I’ll likely assemble this into a SurveyMonkey questionnaire but for brevity and early feedback I am interested in your comments to some of the following questions
1) What is your favorite piece of functionality on ChemSpider?
2) What is your LEAST favorite piece of functionality on ChemSpider?
3) If there was one new function you would like to see added/improved what would it be?
4) Assuming a scoring system of 1 to 10, 10 being the best, how well does the ChemSpider interface support your usage of the system?
5) Which public dataset would you most like to see integrated to ChemSpider?
Any other comments are of course welcomed. We will be working on usability over the next few months and it’s hard to please everybody but we’ll do what we can with the resources we have. A Survey Monkey questionnaire will show up in the future with more questions. Watch this space and check out the Craigslist article…I think you’ll enjoy it.
For those of you who read this blog you will be aware that it can take a lot of time just to get a single chemical curated against its correct associations of chemical names and synonyms. I’ve shown this for vancomycin, Taxol (1,2,3), Ginkgolide B and it is presently underway with Digitonin, though not yet complete. Working on one structure is hard enough. Building a database of a few thousand curated structures is difficult work yet the EBI did it, and did it well when they built ChEBI. ChEBI is also not perfect as we discovered working on vancomycin and I still find occasional small issues.
The EBI recently released the ChEMBL database. This is a much bigger resource as described at the home page for the resource here. The site states “ChEMBL is a database of ca. 500,000 bioactive compounds, their quantitative properties and bioactivities (binding constants, pharmacology and ADMET, etc). The data is abstracted and curated from the primary scientific literature and the data made available due to funding by the Wellcome Trust.” It is MUCH harder to curate larger databases and 1/2 a million records is a challenge.
I downloaded the data from the FTP site and took a browse of the data. There are definitely structures in the data file that we don’t have in ChemSpider but I found an issue with charge balance for many hundreds of records where the counterions were charged (for example, chloride or bromide) but the primary component was neutral. An example is here where the compound is named as a hydrochloride but the compound has the chloride anion. I think this likely arises from treatment with some type of standardizer so it should be a matter of changing the standardizer settings and regenerating. We deal with over 23 million compounds and have been through such issues ourselves when it comes to generation of structure images.
For an example of a rich record in ChEMBL take a look at this record showing the target, assay, activity type, value and reference all listed. ChEMBL is sure to be an invaluable reference for the Life Sciences.
I have never met Warren DeLano. But, I have respected him from afar for a long time. Warren is the developer of PyMol, an Open Source molecular visualization system that has made enormous contributions to the community and can produce stunning visualizations of Proteins. His impact on the field of protein visualization has been recognized many times by the community and his tools are used in labs all over the world. He has garnered respect across our community.
A few months ago I had the opportunity to spend an hour on the phone with him after he had made such positive comments when the RSC acquired ChemSpider. We talked about Open Science, Open Source and models of business. We talked about the adventure of trying to change the world one step at a time by making our humble contributions to the world of science. By the end of our conversation I knew that when I met Warren we would be able to talk for many more hours as we shared many common views and, primarily, a want to make a difference.
Today I learned of the sad news that Warren had passed away. Despite the fact that I hadn’t yet managed to sit with Warren face to face I was immediately saddened. My truth is that there is a specific type of shock I feel when someone younger than myself passes away. Warren and I talked about the impact of our chosen career paths on our relationships with our wives and the hours spent in front of a screen instead of spending them with those we share our lives with. We both reflected on the fact that we have given too much to the keyboard over the years driven by our need to make a difference. Warren’s hard work and superior programming skills and are paralleled by the fact that he was clearly a charitable contributor to science by giving his code away to the world and was, even based on only one phone call, a kind man.
My thoughts go out to his wife and family for his loss.
I’ve been in discussions with JC Bradley and Andy Lang about the Open Notebook Science Solubility Data project. Specifically we’ve been comparing logP predictions from the CDK versus those listed on ChemSpider. We actually have six values of logP listed for some records. For example, for toluene we have 4 predicted values, 1 experimental value from a database and 1 experimental value from a publication. These are shown below:
There are three predicted logP values from three different algorithms (ACD/LogP, XlogP and AlogPs) as shown at the top of the figure. There is a predicted value and a database value from the EPISuite from the EPA (middle of the figure) and there is a LogP value from a publication with the link out indicated by the arrow (this datum was deposited by Egon Willighagen when he deposited the data from his publication). If you examine the list of data, both experimental and predicted, you will see a general value of around 2.65+/- error. This should be compared with the CDK value listed in the ONS spreadsheet that gives a predicted value of 0.64. This was the primary reason that we were discussing the comparison…the values of predicted logP from CDK were different from the predicted values listed on ChemSpider for a number of examples in the spreadsheet.
Egon and I exchanged a couple of emails discussing the fact that logP predictions could be generated by a number of parties if there was a good Open Data training set available. A recent publication entitled “Calculation of Molecular Lipophilicity:State of the Art and Comparison of Log P Methods on More Than 96000 Compounds” performed a thorough analysis of different logP methods on a very large dataset. The publication is available online here. They compared “the predictive power of representative methods for one public (N = 266) and two in house datasets from Nycomed(N = 882) and Pfizer (N = 95 809). A total of 30 and 18 methods were tested for public and industrial datasets, respectively.” During the work they derived a simple equation based on the number of carbon atoms, NC, and the number of hetero atoms, NHET: log P = 1.46(±0.02) + 0.11(±0.001) NC – 0.11(±0.001) NHET. This equation was shown to outperform a large number of programs benchmarked in this study. This would certainly be easy to implement on ChemSpider and, just out of interest, applying this equation to toluene gives us a value of 2.23. Compare this with the values listed above.”
Unfortunately there doesn’t appear to be too many Open logP datasets available around for people to use as training sets. Also, with the thorough work reported in the publication above is it necessary to build yet another logP prediction algorithm? ACD/Labs have made their logP prediction software free for download (http://www.acdlabs.com/download/logp.html), the VCCLab software is available for free (http://www.vcclab.org/lab/alogps/), the EPISuite software is available for free (http://www.epa.gov/oppt/exposure/pubs/episuite.htm) and if you just want to predict a value for a compound not on ChemSpider then you can use the services here: http://www.chemspider.com/Services.aspx.
However, even though there are a lot of predictors available it still makes sense to gather data and provide it as an experimental dataset, made available as Open Data for the developers of such algorithms to ake the benefits of structural diversity and fresh data to potentially improve their models. If you have any logP data available please point me to the data to download or contact me offline to discuss. We are presently working on enhancing our data model to provide improved access to experimental data on ChemSpider as well as access to the predicted data via web services. More to follow…
We get a lot of kudos for what we do with ChemSpider and we appreciate it. Sometimes there is an email that comes in that just makes me smile. One from this week is shown below…it’s nice to be appreciated!
GOD BLESS you and your website! My classmate and I just wanted you to know that we appreciate your website to the UTMOST!! you saved us hours upon hours of work… we have been spending hours trying to figure out a structure from our lab reaction product. THANKS for the awesome website, we are now able to further our knowledge in organic chemistry!!!”
The ChemSpider blog has become very quiet in many ways. For that I am both saddened and realistic….we are very busy with working on improvements to ChemSpider both in the functionality and to the overall infrastructure. You will see these roll out in the near future. I personally am traveling a lot more than previously and engaged in the writing of many articles and presentations. My backlog of articles is over half a dozen and more than that in presentations to prepare. Add to that H1N1 through the household, one little boy in our family with pneumonia and my intention to participate in a mini-triathlon next year and to see that I am distracted would be an understatement.
I hope this “bad news” post is the first of many to get me active on the blog. This bad news post is actually a good news post, we hope. We have been seeing some conflicts between backups and server performance and need to apply some Microsoft Hotfixes and will be taking the system down on Wednesday for about 30 minutes as announced on the HomePage. Our apologies if it causes a disruption.
Service Interruption 07/10/2009
Due to essential maintenance ChemSpider will be unavailable during the following period:
07/10/2009 from 10:30 GMT until 11:00am GMT
We apologise for any inconvenience this may cause.
The ACS meeting in Washington was good for ChemSpider and the team in a number of ways. ChemSpider garnered a lot of attention so that was a relief. More than that though was the fact that the ACS was the culmination of weeks of efforts by an extended team of people in the informatics group, our internal and external marketing groups and the development team.ChemSpider was “everywhere” at the ACS…it was really about “getting you there”..see the side of the bus below!
We showed a number of new things at the ChemSpider booth. We certainly had our new look and feel in terms of the logo and visual aesthetics. Two of the most exciting capabilities that we introduced that had the majority of people smiling at were the introduction of integration to the SureChem patent portal described previously and our new integration to the Pubmed web services. If you haven’t seen the integration to the Pubmed integration yet you’ll likely appreciate this!
I can explain the process in detail but I think the video itself tells the story best. What we are doing is using validated synonyms to look up articles in PubMed. If there are cases where there are no PubMed articles it is VERY common that the synonym validation process will result in articles being recovered. This lends even more value to the structure-name curation process. The YouTube movie is below but an SWF form of the movie, easier to watch in my opinion, is here. Let me know which format you find better. It is easier to make YouTube only but I think for details SWF is better. Comments welcomed.
I am writing an editorial piece at present that necessitates the communication of what types of data we can host from users if they choose to use ChemSpider as a platform to host their data and interesting chemistry pieces. For example:
Hosting Reaction Details: The Synthesis of cis-Bicyclo[3.3.0]octane-3,7-dione
Chemistry Movie: Photochromism in action
Spectral data in abundance: Spectra of aspirin (click on the green image to view)
Open Notebook Science report: An analysis of the spectrum of Cholesterol
List of publications: A long list of publications associated with cholesterol
The Linked Wikipedia Article: Xanax
We are just about to head off to the IUPAC Congress in Glasgow and unveil a spiffing new booth. In preparation for the unveiling of our new logo we’ve done some editing to the website and changed the look and feel of some of the pages. These are mostly cosmetic at present and there is little change to the core functionality of the site but we hope that some of the changes make the site a little easier to navigate.
This is the first work we are doing to improve the website and to roll out a redesign of the logo (look out for that logo at the ACS meeting in Washington in a couple of weeks…you’ll see it in a few places and we will have our own booth there too). Over the next few weeks we will be working further to improve the usability and flow of the website and to enhance the core functionality of the platform. Watch this space.
We welcome your feedback on the new logo and, if you don’t see it on the ChemSpider website please refresh the stylesheet using Ctrl-F5.
ChemMobi, an application written by James Jack from Symyx has finally been posted to the App Store and can be downloaded, for free, and enable your iPhone to search both Symyx’s Discovery Gate and ChemSpider (using our web services). I’ve posted before about the work done by James (1,2) and it has now come to fruition with the first version of ChemMobi. If you are an iPhone user try it out and give us your feedback!
Since it was easy to do we will bring back ChemSpider online in Read Only mode for you to ccontinue using if you need it. This will mean that the web services will all be returned also. The only things that will not be enabled are deposition, annotation and curation. In order to block these we have disabled login. While it will be possible to add comments please note that these will be dealt with on the RSC system following rollover to their systems.
Since the RSC acquired ChemSpider we have been working hard with the IT team in Cambridge to transfer ChemSpider from our servers and onto the RSC servers. This has been quite a significant undertaking as now we will be dealing with development servers, staging servers and live servers. This is a significant departure from the environment we have been working in for the past couple of years where code was published to the live environment for testing. Some would say this was risky but with the limited resources we had available at the time it was what it was….oh, and it worked!
We have already started testing the system on the RSC servers that will go live sometime early next week. At present the intended schedule is that we will be switching over sometime between Monday and Wednesday. Of course, this is an intention at present and, based on testing, this may change. For right now we have stopped depositions onto ChemSpider. If curation activities continue we will sync these over to the live server next week so no issues there. ChemSpider will go offline next week sometime and, as the actual data becmes clearer, the announcements will be updated.
Watch this space…ChemSpider is moving to the RSC servers and their will be disruptions in the next few days.
When I present on ChemSpider and talk about community participation one of the common questions is “how many people curate? deposit? annotate? records on ChemSpider”. It’s a low number for each but, in my estimation, it is in-keeping with how we operate as individuals. If you compare the number of people reading Wikipedia articles to writing them I judge it has to be a pretty high ratio of likely >5000:1. Even if its 1000:1 you get the point. More people use than contribute. It is the same for most everything that we use…Amazon book reviews, Netflix DVD reviews, things like that. It’s only when it’s “about us” that the majority of us tend to contribute – to our blogs, our LinkedIn profiles, our Twitter account, our Friendfeed discussions, our Facebook pages etc. I judge this is because it makes us directly visible…we are showing what we are interested in and taking owenership for our comments, activities etc. This is of course human nature…the majority of us have that “look at me” mentality and “connect with like minds” and it is, in many cases, that need for incoming voyeurism and participation that has driven the incredible shift to social networking we are encountering.
There are then the “servants for the community”. In this case I mean servants with the most positive connotation. Those who slave away on Wikipedia articles and don’t immediately have their names up in lights. You actually have to dig under an article to find out who wrote/contributed to it. It’s not upfront and center. On Wikipedia chemistry there are a very small number of dedicated individuals who contribute large blocks of time to working on Wikipedia to improve its quality and content. There is a Long Tail of contribution of course but you might be quite surprised by the small number of “primary” contributors. If you check out their Wiki pages however these individuals are recognized and commended within their own community of participation yet may never be known by the readers of the articles.
On ChemSpider we have a similar situation. There are a very small number of primary curators (I will name them: Myself, Heinz Kolshorn and Barrie Walker – these people are enhancing ChemSpider literally daily). We have a smaller number of secondary contributors who add a spectrum once in a while, annotate a record occasionally or curate out bad data. I would say this is about 30 other people. We also have people who provide us data to deposit and they do it willingly but don’t want to have a hands on approach to depositing data onto the database.
When I was in the UK recently during my first week of employment with the RSC I gave a number of presentations. There was a lot of interest in what ChemSpider could bring to the organization and offer the community and a lot of discussions regardng “what if”. Of the audiences I would suggest that only a small portion actually laid their hands on the system to investigate its capability and an even smaller fraction chose to jump in, feet first, and use the system and participate fully. There was one spike in particular. During the evening after one of the presentations I noticed that one individual in particular was adding comments to individual records, questioning names, suggesting that structure layouts be changed and examining links to external resources. The first evening there were a few edits. The next night, even more, and since then this individual has continued, unabated, making edits and now enhancing the articles with new information, in this case YouTube videos.
David Sharpe is fairly new to the RSC and is one of those people who just cares. A silent contibutor in the background (until today!) who is cleaning and enhancing ChemSpider for the sake of the community. To be clear, his work on these activities has been done in the evenings and weekends and this past weekend he was exchanging emails with me about adding “Element Videos” to the elements on ChemSpider. David’s been moving across the elements on ChemSpider and using the YouTube embed functionality to put the Periodic Table videos from the University of Nottingham into the Description section of the appropriate records.
Check out for example the video for Sulphur here. As we move forward we will layer on a recognition system for individuals contributing to ChemSpider so that we can track the spectral depositions, curations and so on. We believe that such efforts warrant recognition and applause. Of course some will choose to be anonymous and remain in the background making their difference in a silent manner. We honor you all.