Archive for April, 2008

I am off to Bio-IT in Boston this coming week and I am honored to have been asked to talk on ChemSpider. I wasn’t on the agenda as of 72 hours ago but was offered an opportunity as a result of a cancellation in one of the sessions. I’m looking forward to seeing what’s new in the world of informatics this year and due to be unveiled at Bio-IT. At SBS there was only one person in the room when I talked who had even heard of ChemSpider. I didn’t take offense…about 60 people went away informed! I hope for a similar opportunity at Bio-IT. The blog will be fairly quiet this week. Catch-up time is next week.

AN fyi, I recently wrote an article entitled “Public Chemical Compound Databases” in Current Opinion in Drug Discovery & Development 2008 11(3). The abstract is:

“The internet has rapidly become the first port of call for all information searches. The increasing array of chemistry-related resources that are now available provides chemists with a direct path to the information that was previously accessed via library services and was limited by commercial and costly resources. The diversity of the information that can be accessed online is expanding at a dramatic rate, and the support for publicly available resources offers significant opportunities in terms of the benefits to science and society. While the data online do not generally meet the quality standards of manually curated sources, there are efforts underway to gather scientists together and ‘crowdsource’ an improvement in the quality of the available data. This review discusses the types of public compound databases that are available online and provides a series of examples. Focus is also given to the benefits and disruptions associated with the increased availability of such data and the integration of technologies to data mine this information.”

It’s not an Open Access article but it’s out there if anyone is interested or is subscribed. Enjoy.

Buy me a Coffee

Those of you frequenting the blog will know that we have a dedicated subset on ChemSpider for Molbank and that I have found the MDPI management and editorial team a pleasure to work with. I discussed my want to stay in relationship with them in a recent blogpost and, as stated in that posting, followed up with them to make them aware of an error in their article and the ongoing discussions in the blogosphere about their “openness”. In case the readers of the blog aren’t set up to catch the comments on the blogposts I am pointing to a comment made today by a member of MDPI.

“We are aware that our current MDPI copyright statement is not in line with the BBB definitions on open access. We are currently smoothly moving to a CC By Attribution License v3.0. Marine Drugs (http://www.mdpi.org/marinedrugs/) has already been published under that license since January 2008. IJMS (http://www.mdpi.org/ijms/) and other MDPI journals will start publishing under this license in the May respectively June 2008 issues. All previous content published by MDPI will be released under the CC By license within a couple of months on our new publication platform (now under testing). So this discussion about MDPI and open access will soon be part of history.”

My experience of working in the domain of creating a community for chemists is quite a simple one. If you want to know what a group is up to just ask them. Seems that MDPI has a clear path forward.

Buy me a Coffee

Those who frequent the ChemSpider blog will know that we have worked with JC Bradley and his team to support their Open Notebook Science work. We have added functionality for them and they have been frequent depositors of both structures and spectra. It seemed appropriate to give them their own dedicated website as we have done with many others of late. So, now, UsefulChem has a dedicated subset on ChemSpider.

Buy me a Coffee

Recently I posted about our intention to post the full Molbank articles on ChemSpider. PMR commented on my potential over-extension of their Open Access nature:

“PMR: I also support publishers who make their material available. I don’t want to appear churlish, but Molbank use what is effectively a NC (non-commercial) license and this is what concerned me (and others) when I posted about 1 year ago. I don’t think it has changed. So sorry, Antony, it’s not “as Open Access as they can be” especially if one has to ask permission to mount the material.”

He may be right. What I do know is that I prefer to get into relationship with the groups/people I work with in the community. Simply grabbing their content/data without some connection doesn’t feel comfortable. AND, I realize in these days of search engines and scraping that’s quite acceptable.

When I approached MDPI, the publishers of Molbank, they were gracious in their willingness to have ChemSpider support, integrate and utilize their content. This is contrary to some of my experiences with some other advocates of Open Data and Open Access where trying to get their “Open Data” is like pulling teeth. MDPI appear to be the opposite, in my experience.

I commented  on Peter’s blog tonight:

“Regarding your comment “especially if one has to ask permission to mount the material.” I think that’s a comment on the fact that I asked permission? I asked permission for the reason that I am focused on building a community for chemists and this includes me staying in relationship with publishers. I think you know this about me from my previous comments about CrystalEye

“http://www.chemspider.com/blog/intention-to-scrape-crystaleye-content-and-staying-in-relationship-with-publishers.html”

I judge its a better way to Build the Structure Centric Community for Chemists on ChemSpider. So, while I didn’t have to ask for permission, I did. the result was an excellent exchange, newfound relationships and an opportunity to build an enhanced relationship WITH support and permission.

Many bloggers it appears assume that “concerned parties” read their blogs. For example, when you posted this: http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=1048 did you make the editors at Molbank aware of the error or did you just scrape their content and blog? I have adopted a new approach of late - when I see issues with peoples data, websites etc I inform them directly to help them clean up errors. I’ve done this for Drugbank, PubChem, a number of blogsites, and so on.

In case you didn’t inform them I will send them your blog link tonight…also to the original author since I’m sure they will appreciate it too. This, I believe, is being a member of the community and   since the authors and the publishers are taking actions to contribute to the Open Access community it’s part of my personal charge to help.”

I have sent an email to the original author and to the MDPI editors with the hope they might clean the article or post an Erratum. This is what I feel is appropriate as an active member of the community. If you see errors on ChemSpider please do let us know directly. We have a “Add: Feedback” on every record page and do pay attention to your input.

Buy me a Coffee

I’ve blogged previously about my view of webstat monitors such as Alexa and Compete. I trust more those things that we measure ourself and can objectively analyze. Three of those measures are our own webstats, our Feedburner count for the blog and the number of registered users on ChemSpider who register to curate, annotate and add content. The numbers are as follows now:

Website stats: Over 4000 users per day

Feedburner: Oscillating between 475 and 525 people on a weekly basis

Registered Users: 728 as of today

We are now starting to see traffic to WiChempedia, a consistent stream of depositions, new links to external websites and an increase in the number of DOI linked articles. Overall, traffic is where we hoped it would be at this stage of exposure.

Buy me a Coffee

I’ve blogged previously about the work being done on PlayStations to support computational chemistry. Now, Zsolt Zsoldo, CTO of SimBioSys, announces that their “eHITS Lightning” program will be unveiled at Bio-IT in Boston next week. I sat in on his talk at ACS where he reviewed the initial results on the Cell Processor on both PlayStation 3 and IBM Blade processors with the cell processor. His talk is online as discussed on the SimBioSys blog. They are VERY impressive initial numbers. Is this the future…computational chemistry on gaming machins that can be purchased for about $400, run on free Linux? Why not…

Buy me a Coffee

I have blogged previously about our integration to IUCr. I am happy to announce that we have been busily curating their data and are presently active in indexing thousands of chemical structures and articles. We will definitely have problems in accurately representing organometallics with our present structure rendering and handling and due to the limitations of InChI. For the time being we are focused on organic molecules. As we index the molecules we are adding the Author and Title, linked via the DOI to the original article on the IUCr website, to the record. So, you will see something like this in the Supplementary Data section.

iucr-link.png

Expanding the index will grow the connectivities to IUCr structure by structure. We presently have >1300 structures on the dedicated website, iucr.chemspider.com. Enjoy.

Buy me a Coffee

Since day one ChemSpider has been focused on developing a social network via our intention to build a structure centric community for chemists. While I am a member of a number of social communities I felt it appropriate to approach someone who has both a passion and masterful knowledge of this domain. I had the pleasure of meeting Gerry Mckiernan at the ACS-New Orleans meeting early one morning pre-show. We chatted for an hour about our mutual passion for building community and Gerry graciously accepted my invitation to join our advisory group. Gerry’s details are below.

Gerry Mckiernan currently has primary responsibilities for Collection Development, Instruction, and Reference and Research Services in Chemical and Biological Engineering; Civil, Construction, and Environmental Engineering; Environment Sciences; Industrial and Manufacturing Systems Engineering; and Mechanical Engineering, at Iowa State University (ISU), Ames. He has been employed by the ISU Library since April 1987. He has served as the Museum Librarian at the Carnegie Museum of Natural History, Pittsburgh (1983-1987), and as an Assistant Librarian with the Library of the New York Botanical Garden in the Bronx, New York City (1978-1983), his hometown. His research interests have included alternative peer review practices and philosophies, emerging information technologies, and scholar-based innovations in publishing. His current research interests relate to Web 2.0 - the Participatory Web, most notably blogs, online social networks, wikis, and communities of participation.

Buy me a Coffee

Some of you may be aware of the Molbank Open Access Journal. I recently blogged about our dedicated website for this Open Access Journal described here. Murray-Rust has discussed MDPI journals previously and their nature of Open Access. I am happy to validate that they are as Open Access as they can be. They have given us the right to mirror their articles on our site and in the next few weeks we will do exactly that, host Molbank articles connected directly to the chemical structures. Watch this space for our exapanding integrations with Open Access publishers.

Buy me a Coffee

I blogged previously about our intention to build a structure/substructure searchable version of Wikipedia. We declared we would call it WiChempedia. Since rolling out the new website we have had the ability to provide access to subsets of data (See Molecule of the Day and Molbank as two examples). With this newfound ability it became easier to rollout WiChempedia and the first version is now available at www.wichempedia.org.

The difference between ChemSpider and WiChempedia, for now, is the presence of the first paragraph of the Wikipedia text on the WiChempedia site and a link out to the original article on Wikipedia. An example is shown below. Notice the link to the GNU free documentation license .
wichempedia1.png

Hopefully we will receive feedback on the site quite quickly and get it out of beta at speed so please do let your colleagues know about it. We will design a new logo header shortly and we are aware that some minor types of the data resulting from the scraping process have slipped in so we will resolve those too. AN example of how much information is starting to be populated can be seen by looking at the record for Cocaine here. Here you will see the Wiki first paragraph content, a link out to a GC run on the Phenomenex website, a series of validated identifiers and an IR spectrum. The content continues to expand as we source more information

I also point you to another implementation of a Wikipedia chemistry system, chempedia.net, that you might be interested in reviewing.

Buy me a Coffee

We made our web services available with the intention that third parties might take advantage of the capabilities. Waters recently integrated our MassSpecAPI web service to their MarkerLynx software.

To perform an online search of the ChemSpider database the user can set up a series of specific databases they might want to search. In the figure below the user has selected the Human Metabolome Database, Lipidmaps and the KEGG database.

waters1.pngSearches of the selected series of ChemSpider subset databases can be made based on either mass or elemental composition and returns the compound name in the ID column and retains the link to the ChemSpider ID.

waters2.png

ChemSpider records can then be viewed by clicking on the View Hit Details. The chemical structure and associated information can then be reviewed online.

waters3.png

Web-Service_Integrations of this type have started to expand in number and you can expect to see others appearing very shortly.

Buy me a Coffee

A couple of days ago I blogged about building the first dedicated website for Molecule of the Day. To continue our “proof of concept” demonstrations in this vein we now unveil our first support of a free-access publisher. Molbank is defined to be an Open Access journal on Wikipedia but based on some of the conversations I have seen on Murray-Rust’s blog this is in question. As I have expressed previously I hope to stay in relationship with publishers as we navigate our way through building our structure centric community for chemists. I have exchanged numerous emails with the editorial team at Mobank and have found them very supportive of our integration so away we went.

The data was scraped from the Molbank website, specifically the titles, authors, URL link to the article and the molfile itself. A couple of scripts later and an SDF was constructed from the molfiles and the text. This SDF file was then opened and reviewed visually to remove “errors in the data”. There were a number of different types of errors and some examples are listed below. For example:

http://www.mdpi.org/molbank/molbank2007/m558.htm includes HA and HB annotations

http://www.mdpi.org/molbank/molbank2007/m555.htm includes R groups - should be expanded

http://www.mdpi.org/molbank/molbank2005/m407.htm the mol file is for CH2=CH2

http://www.mdpi.org/molbank/molbank2005/m409.htm the mol file is for ethane

There are other example and Rich Apodaca has made a number of similar observations previously.

Our belief is that we have created from this dataset a high quality, curated (but likely not perfect) dataset as a subset at molbank.chemspider.com.  The structures show names, identifiers, supplementary info where appropriate and a link to the original article. An example is shown below for the linkages.

molbank21.png

Notice the Link to the article from the data sources, from the supplementary info and the miscellaneous safety and tox data scraoed from MSDS sheets online. We will now keep this dataset updated as Molbank expands. With the permission of the editorial staff we would be interested in extracting the analytical data also.

Our proof of concepts have shown that we can host different datasets on ChemSpider and we urge anybody interested in such a service to approach us for discussions.

Buy me a Coffee

Last week I gave two presentations - one at the American Chemical Society meeting in New Orleans and one at the Society of Biomolecular Screening meeting in St Louis. As I give public presentations or put together general informational presentations I will publish them to our presentations page for review. Both the ACS and SBS presentations are now available online for anyone to review and comment. Feel free to post your comments here on this blog post or provide feedback offline.

Buy me a Coffee

With the rollout of the new website we are now set up to provide DEDICATED websites to subsets of data. We will unveil a number of these in the next few weeks as examples of how we can host data with a dedicated purpose. As an example of a website I point you to a dedicated Molecule of the Day website. I have discussed Molecule of the Day recently. Now we unveil the website here. Notice the logo on the upper right hand side.

motd.png

Buy me a Coffee

Over the past year we have been interested in our website statistics and our growing traffic. I have blogged previously about Alexa and was challenged to review the Compete statistics. After growing in rankings for a few weeks we removed the Alexa widget and saw our rankings plummet. We then installed the Compete widget and saw ourselves go up the rankings quite dramatically before removing that widget and seeing our rankings decrease. Meanwhile, our own website statistics have shown consistent month to month growth with an average of about 4000 unique users per day at present (As shown in the figure below).

stats.png

Bottom line, based on our observations, neither Alexa nor Compete give anywhere near valid statistics. At the SBS conference in St Louis this past week I asked the audience, about 60 people, how many in the room knew of or had heard of ChemSpider. ONE hand went up…and that was someone I had informed many months earlier. As I expressed to the audience…this was not disappointing news to me…it was quite exciting to know what the potential growth is as people are informed of the service. I expect the growth to continue, especially after the visits to the ACS and the SBS.

Buy me a Coffee

To celebrate our one year anniversary since going live at ACS Spring in Chicago in 2007 we lifted the title of beta from ChemSpider and released a new website design. I’ve already received some positive feedback on the new design and on us growing the service over the past year. We welcome any and all feedback though. What do you think of our contributions over the past year? What do you think of our new website. Let us know!

And Happy 1st Birthday ChemSpider…

Buy me a Coffee

For anyone with an interest in Chem(o)informatics and who lives in the Research Triangle area I highly recommend taking advantage of the lecture below to listen to “one of the greats” - Professor Johann Gasteiger. I guarantee you will not be disappointed.

Chemoinformatics - Addressing fundamental topics of a chemist

Prof. Dr. Johann Gasteiger

Computer-Chemie-Centrum
Institut für Organische Chemie I
Friedrich-Alexander-Universität Erlangen-Nürnberg
Nägelsbachstraße 25
91052 Erlangen
Deutschland / Germany

Thursday, April 17th, 2006

3:00PM -5:00PM

Kerr Hall, Rm. 2001, School of Pharmacy

University of North Carolina

Chapel Hill, NC 27599

Host: Prof. Alexander Tropsha

Sponsor: CECCR

Buy me a Coffee

Over the past year ChemSpider has been working hard to build a functional and stable platform for the hosting, deposition and curation of structure-based data. This is to form the foundation of our mission to build a Structure-Based Community for Chemists. Our deposition system is in place and well-tested. Our indexing of articles is proven, and continues. We have indexed multiple Open Access articles. We support the deposition of analytical data (spectra and CIF files) into ChemSpider.

It is now time to take this to the next level and I would like to extend an invitation to Open Access publishers to work with us to design an interface (preferably a web service) to facilitate direct deposition of data into ChemSpider. We’d like to design an interface where you can feed your articles in with Title, Authors, Journal reference, DOI and Abstract. We would associate the article with the chemical structures in one of two specific ways - 1) extract the chemical names from the title and/or abstract and convert on the fly to deposit and/or associate with structures on ChemSpider and 2) allow the publisher to pass us a series of SMILES strings, InChI Strings, molfiles or chemical names to deposit on ChemSpider. Based on what we have already done it is clear this process is feasible, and will require some manual intervention until we optimize processes. If we do this we can design an interface and input format that can be made public, reusable by other groups for the deposition of information into their systems and, potentially, move away from the need for extracting information out of PDF files (and other formats). The outcome of this work would be a freely accessible structure and substructure searchable index of Open Access articles with links back to the Open Access article. We are already indexing articles so, with permission from even the non-Open Access publishers we could use similar processes to index abstracts and make articles structure/substructure searchable based on titles and abstracts.

So, my question. Are there any Open Access/Free Access publishers willing to discuss the possibilities I have outlined? If any of you will be at the ACS meeting and would like to discuss please post a response here or contact me at the usual email address (antonyDOTwilliamsATchemspiderDOTcom) and let’s talk about building a disruptive and enabling technology for chemists around the world

Buy me a Coffee

During a recent discussion about ChemSpider interest was expressed in whether or not ChemSpider would be supporting toxicity and Safety data. It’s been on our list for a while but questions result in action…so, check out the following links. Scroll to the bottom to the supplementary information.

Sodium Acetate Trihydrate

sodium-acetate.png

Benzoyl Peroxide

benzoyl-peroxide.png

There are about another 3000 records with such information on the website now. Click on any of the question marks and up pops a dialog box explaining what the property is…and admittedly the example below is rather obvious!

help-boxes.png

Also, notice the wiki link wikilink.png which takes you out to the originating site for the data.

As an example of the process used to map the fields see below how we take the original fields and then map them to other fields to “homogenize”. Notice the meta info layer too, specifically the associated units.

mapping-fields.png

Mapping choices are made according to the pulldown menu shown below.

mapping-fields-2.png

The_process to gather, map and publish this data has now been tested on two different datasets. It is not yet perfect but we improve with every iteration and, I believe, will shortly have an optimized process for scraping and publishing. I believe that the processes we are developing here will provide a smooth and highly functional system for gathering depositing data and efficiently integrate it to our database and make it easy to expand the associated data and knowledge associated with the structures on ChemSpider.

Buy me a Coffee

Molecule of the Day (MOTD) is one of those fun blogs that the public will likely enjoy…and there’s enough there foe even us chemists to remember just how much fun chemistry is. It seemed like a good idea to try out the ability to link URLs to ChemSpider on a few of the Molecule on the day articles. About an hours manual work and the entire MOTD blog archive could be made structure searchable. As it is I’ve linke dup a few tonight….about 60 seconds each to click on the MOTD blog post, search the structure in ChemSpider and paste the article title and URL. Voila…searchable Molecule of the Day. Some examples linked up…scroll to the bottom of the record:

Sodium Acetate Trihydrate

Benzoyl Peroxide 

1,3,5-Trioxane

Buy me a Coffee

Excuse the silence of late regarding this blog. There has been, and continues to be, a lot of work going on in preparation of a rollout of a new look and feel ChemSpider, hopefully to coincide with our anniversary roll out at Spring ACS in Chicago last year. Fingers crossed we will lift the beta label next week for New Orleans.

In celebration (it’s all serendipity actually!) I will be giving a talk on Sunday in New Orleans, coincident with the day we went live. The details and abstract are below (of course…when the abstract was written it was a placeholder..I had no idea what would be presented because we didn’t know where we’d be)

PAPER ID: 1168843
PAPER TITLE: “ChemSpider: Building a structure-centric community for chemists”

DIVISION: Division of Chemical Information
SESSION: Cheminformatics: From Teaching to Research SESSION START TIME: Sunday, April 6, 2008, 1:40 PM
DAY & TIME OF PRESENTATION: Sunday, April 6, 2008 from 3:40 PM to 4:05 PM
LOCATION: Marriott Convention Center, Room: Blaine Kern D

ChemSpider – Building a Structure Centric Community for Chemists

Antony J. Williams

Scientists commonly find themselves in a state of overwhelm in regards to the availability of information accessible to them. The distribution of resources now includes the entire space of the worldwide web, access to primary databases such as CAS and, commonly, a plethora of internally developed systems. While the web has provided improved access to chemistry-related information there has not been an online central resource allowing integrated chemical structure-searching of chemistry databases, chemistry articles, patents and web pages such as blogs and wikis. ChemSpider has built a structure centric community for chemists by providing free access to an online database and collaboration tool for chemists. The online database offers an environment for curating the data on ChemSpider as well as the deposition of chemical structures, analytical data and associated information and provides a significant knowledge base and resource for chemists working in different domains. An overview of present and future capabilities will be given.

On Tuesday I fly off to St Louis to the SBS Meeting presenting on ChemSpider and the ongoing collaborations around NISS/NCSU’s ChemModLab and SimBioSys’ LASSO. My talk is on Thursday morning.

 

How a Structure-Centric Community for Chemists Can Benefit Drug Discovery - Virtual Screening Experiments Utilizing a Publicly Accessible Ligand Database, QSAR Modeling Tools and a Virtual Docking Software Package

 ChemSpider is an online database of over 20 million chemical structures assembled from almost a hundred data sources including chemical and screening library vendors, publicly accessible databases and resources, commercial databases and Open Access literature articles. Such a public resource provides a rich source of ligands for the purpose of virtual screening experiments. These can take many forms. This work will present results from two specific types of studies: 1) Quantitative Structure Activity Relationship (QSAR) based analyses and 2) In-silico docking into protein receptor sites. We will review results from the application of both approaches to a number of specific examples using the software outlined below.

The QSAR analyses utilize the ChemModLab environment which is a free, web-based toolbox for fitting and assessing quantitative structure-activity relationships. Its elements include a cheminformatics front end to supply molecular descriptors, a set of statistical methods for fitting models, and methods for validating the resulting model. Five molecular descriptor sets are used with 16 math modeling methods to give a total of 80 QSAR models. The input is a file of compounds and a text file for biological activity.

The in-silico docking experiments are conducted using a combination QSAR/Docking approach using the SimBioSys eHITS and Lasso software programs. The docking procedure allows for the screening of a complete molecular database to obtain the correct binding poses and estimated binding affinities. The ligand based screening tool utilizes a novel conformation independent 3D QSAR descriptor, ideally suited for scaffold hopping.

If you are interested in what we’re up to and where we’re going I hope we get to meet-up at one of these gatherings!

Buy me a Coffee