Archive for December, 2008

The ChemSpider Journal of Chemistry Editorial Board is made up of nine people and the group of participants is listed below. We are on schedule to release a series of articles by the first week of January. At present we have a mixed bag of articles submitted to us by authors as well as a series of marked up articles from the Open Access Chemistry Journals.

Antony Williams (Editor-in-chief)

Antony Williams is the President of ChemZoo Inc. and the Host of ChemSpider. He has spent over a decade in the commercial scientific software business as Chief Science Officer for Advanced Chemistry Development (ACD/Labs) and during his tenure oversaw their product development, marketing and sales teams. He is an accomplished NMR spectroscopist with over 100 peer-reviewed publications. During his career he was the NMR Technology Leader for the Eastman-Kodak company and has worked in both academia and national government research institutions. He has recently taken his passion for providing access to chemistry related information and software services to the masses by hosting the ChemSpider service, working with the Wikipedia Chemistry system and advocating Open Notebook Science.

Board of Editors

Jean-Claude Bradley (Focus: Organic Chemistry and Open Science)

Jean-Claude Bradley is an Associate Professor of Chemistry at Drexel University. He leads the UsefulChem project, an initiative started in the summer of 2005 to make the scientific process as transparent as possible by publishing all research work in real time to a collection of public blogs, wikis and other web pages. Jean-Claude coined the term Open Notebook Science (ONS) to distinguish this approach from other more restricted forms of Open Science. Jean-Claude has a Ph.D. in organic chemistry and has published articles and obtained patents in the areas of synthetic and mechanistic chemistry, gene therapy, nanotechnology and scientific knowledge management.
*

Sean Ekins (Focus: Medicinal Chemistry and Computational ADME/Tox)

Sean Ekins (M.Sc., Ph.D. and D.Sc) is currently Collaborations Director at Collaborative Drug Discovery Inc., consults for several companies as well as being Adjunct Associate Professor, School of Pharmacy Department of Pharmaceutical Sciences, University of Maryland and Adjunct Professor, Department of Pharmacology, UMDNJ. He is on the scientific advisory board for Assay Depot, Emiliem Inc., ChemSpider and the advisory board for Chemical Informatics at Indiana University. Seanʼs interests in computational and in vitro approaches have resulted in ~100 peer reviewed papers and book chapters as well as edited three books to date. He also serves on the editorial boards of several journals.

Rajarshi Guha (Focus: Cheminformatics and Open Science)

Dr. Rajarshi Guha is a visiting Assistant Professor in the School of Informatics at Indiana University. His research focuses on the development of cheminformatics methodologies to address problems in chemical data mining and drug discovery in particular. His research makes extensive use of statistical methods and has been applied to a variety of biological systems. Along with algorithm development, he is extensively involved in cheminformatics software development, including development of toolkits, web services and integration of these into distributed infrastructures. As a believer in Open Source and Open Data, much of his research and software is available under Open Source licenses.

Robert Lancashire (Focus: Spectroscopy and Open Data)

Professor Robert Lancashire, BSc (Honours) Ph D (Monash) held a three year I.C.I. Postdoctoral position at the University of Wales, Cardiff before being appointed to The University of the West Indies at Mona in 1979. He was promoted to Professor of Computational Chemistry in October 2002. His current research is aimed at applying modern computer methods, using personal computers and the Internet, to assist in the delivery of chemical information. His work with MDL Information Systems Inc. in producing a convenient spectroscopic display plug-in for Internet browsers provided the first widely used, public means of freely displaying chemical spectroscopic data. In 2006, MDL estimated there were over 2 million downloads of CHIME worldwide. In March 2006, he released the Java project JSpecView as Open Source. This has now been incorporated into ChemSpider. He serves on the International Union of Pure and Applied Chemistry (IUPAC) Committee on Printed and Electronic Publications (CPEP) as well as on its subcommittee for electronic data (SEDS). This subcommittee develops protocols (the JCAMP-DX format) to store spectral data and is used by all instrument manufacturers in their spectrometers.

Cameron Neylon (Focus: Biological Chemistry and Open Science)

Cameron Neylon is Senior Scientist in Biomolecular Sciences at the Science and Technology Facilities Council (UK) and an advocate of the adoption of modern communication tools and open practice in research. His research is generally based on the use of structural and biophysical techniques to unravel the chemistry of biological molecules and the use of biological systems as tools for the chemical modification of proteins. The main current focus is on developing the capability of small angle scattering and allied techniques for the structural analysis of biomacromolecules and their complexes. He writes on Open Science and the application of new tools for the communication and dissemination of research at his blog “Science in the Open”
Igor Presniakov (Focus: Inorganic and Solid State Chemistry, Mössbauer spectroscopy)

Igor Presniakov is Senior Scientist at the Chemistry Department Lomonosov Moscow State University (Russia). Research activity in the field of synthesis and characterization of the new inorganic materials. He has been involved in the preparation and study of materials with unusual electronic and magnetic properties, such as metal-insulator transitions, superconductivity or colossal magnetoresistance. More than 110 publications in international journals; invited to deliver lectures at different International Conferences on Solid State Chemistry, Inorganic Chemistry, Materials Science, Crystallography, Magnetism, Superconductivity and Colossal Magnetoresistance. Participated in 9 Research Programs and Projects funded by the Russian Government and the European Community.

Alex Tropsha (Focus: Computational Chemistry and Molecular Modeling)

Alexander Tropsha is K.H. Lee Distinguished Professor and Chair of the Division of Medicinal Chemistry and Natural Products in the School of Pharmacy, UNC-Chapel Hill. He received PhD in Chemical Enzymology in 1986 from Moscow State University, Russia. His research interests are in the areas of Computer-Assisted Drug Design, Computational Toxicology, Cheminformatics, and Structural Bioinformatics. His has authored or co-authored more than 120 peer-reviewed research papers and reviews and co-edited two monographs in the area of computational drug discovery and cheminformatics. He is an elected member of the Board and vice-chair of the international Cheminformatics and QSAR Society.
Joerg Wegner (Focus: Structure-based Drug Design and Cheminformatics)

Jörg is working as scientist in structure-based drug design, molecular modeling, and cheminformatics. He is interested in antiviral drug targets and drug resistance, especially for the human immunodeficiency virus (HIV). The open data, open source, and open standards (ODOSOS) community was critical for his early work as open source project administrator, and for some of his publications. Later, this has lead to a beautiful relationship with other life-science-informatics community members. Now, he is contributing in his private time from a standpoint of a person working in industry, being fully aware that intellectual properties and confidentiality are critical for industrial partners.

A few months ago I met with Adam Azman in Chapel Hill to discuss how the names in our ChemSpider database could be used to expand his Chemical Dictionary. It seemed that we would be sitting on a treasure trove of name fragments that could help him in his efforts. So, we supplied Adam with 1.3 million identifiers and Adam has worked for the last few months to generate his Chemical Dictionary. He extracted over 100,000 name fragments from our collection as he has described in his blogpost here.

Extracted from Adam’s blog are his so-called Administrivia “The dictionary is licensed under the Creative Commons Attribution 3.0 License.  …  The dictionary is compatible for Microsoft Office (Windows or Mac), and  Open Office (Windows or Linux).  The install file includes instructions for upgrading old versions and installing it for the first time.  The dictionary should be useful for all chemists.  However, I am an organic chemist.  Thus, the dictionary was created from an organic chemist’s mindset.  It will probably be most useful for organic chemists.”

Adam has explained in detail how he did the work. I encourage you to read his post to fully understand the nature of the work and how much heavy-lifting he actually did.  It’s been a pleasure to help Adam and the community by supplying our own form of a “dictionary” to him for his particular treatment. It took a few hours of work from our side and months of hard work from him. I encourage you to take advantage of his efforts…if you are a chemist this is a real gift for the season. The dictionary can be downloaded from our site here.

Now I want you to consider timing. We are working hard on our ChemMantis project, a system for entity extraction and document markup. Part of this includes the generation of dictionaries for finding chemical names. We’ve already expanded our chemical dictionary using the database of identifiers from ChemSpider but for those of you working with other systems such as OSCAR3 or the other commercial markup systems dependent on chemical dictionaries you will likely find Adam’s contribution significant. Enjoy.

We are working hard to prepare the ChemSpider Journal of Chemistry for prime time and as a result the ChemMantis service will be disrupted from time to time and will go offline. We are choosing to do this over the holday season while the majority of you are enjoying the festivities. We are hoping that our work on ChemMantis doesn’t disturb those of you who have been reviewing our marked up articles. Please bear with us through the holiday season as we upgrade the system to support the journal.

When I was at the Scifoo meeting earlier this year I got very excited about the Google Datasets project. I must admit that my creative spirit and need to hang out with innovators has, for years, called out to me to “Take Chemistry to Google”. When I left SciFoo I left with a hard drive to put data onto. I had great ideas about using the ChemSpider dataset of InChIs and CSIDs to connect chemists. I had hoped to put the data into the Google Datasets Project but actually work with Google to “do something” with them other than just host them for other people to download. If you do a search on Google today (at least if I do) I get the following result…let me know what you get! I’ll admit my naivety on this but maybe there is a limitation of hits shown etc (David Bradley..any ideas?)

 Considering_that_part of the story for InChI, and I have given the story many times myself (!) is that the internet can be made structure searchable by InChI this is a limited result set especially considering that there 21.5 million of them on ChemSpider. Then there’s PubChem, Drugbank, and so many more.

My hope was that Google might be interested in connecting Google Scholar to structure searching and work with us to enable it. Couldn’t get anyone interested. I was in California for a week and asked whether I could stop by and talk about ChemSPider and how we could help Google with Chemistry – no interest. Overall I will say that I couldn’t get any traction with Google about Chemistry and it’s a great shame. I’ve had similar things said by others. One guy who used to be at Google who WAS interested in Chemistry was Simon Quellen-Field who runs the Sci-Toys website. I think Google needed an advocate for Chemistry in their Datasets Team so that it could have been more than just hosting data but rather doing something WITH the data for the community.

I’m disappointed that the project has come to an end since I was hopeful for its purpose and its impact. I think that someone else will pick it up. If not, then they should…

The letter said…

Thank you very much for trying out Google Research Datasets, providing interesting datasets, and giving us extremely useful feedback. We have learned a lot about the issues facing researchers and dataset producers from this testing period.

As you know, Google is a company that promotes experimentation with innovative new products and services. At the same time, we have to carefully balance that with ensuring that our resources are used in the most effective possible way to bring maximum value to our users.

It has been a difficult decision, but we have decided not to continue work on Google Research Datasets, but to instead focus our efforts on other activities such as Google Scholar, our Research Programs, and publishing papers about research here at Google.

The Google Research Datasets service will remain active until the end of January 2009 during which time any datasets may be downloaded. For those datasets that are impractical to download, we will also happily provide interested users with a copy via hard drive shipment.

Once again, we’d like to thank you for helping us test Google Research Datasets, it’s been a very useful experience, and we look forward to finding new ways to provide you with useful services in the future.”

The editorial board for the ChemSpider Journal of Chemistry is growing and has expanded to 6 people:

Sean Ekins (Focus: Medicinal Chemistry and Computational ADME/Tox)

Jean-Claude Bradley (Focus: Organic Chemistry and Open Science)

Rajarshi Guha (Focus: Cheminformatics and Open Science)

Alex Tropsha (Focus: Computational Chemistry and Molecular Modeling)

Robert Lancashire (Focus: Spectroscopy and Open Data)

Joerg Wegner (Focus: Structure-based Drug Design and Cheminformatics)

I hope to round out the board in the next few days and move forward with gathering manuscripts for publication and review.


Reblog this post [with Zemanta]

Collaborative Drug Discovery is gaining increasing traction in terms of providing a collaborative platform for scientists to work together on Drug Discovery. They provide “a web-based software platform to organize preclinical research data to help scientists advance new drug candidates more effectively.” Certainly the support of the Gates Foundation and their investment of almost $1.9M validates their approach and the importance of their work : Collaborative Drug Discovery Receives Gates Foundation Grant to Support the Development of a Database to Accelerate Discovery of New Therapies Against Tuberculosis.

I have started to work with their platform and compliment them on the ease-of-use, the aesthetics and the intention of their work. I am not managing any of my own data on the platform yet but over the next couple of weeks hope to start actively managing some data that I am collaborating on with one of our editorial board. In order to execute on their mission CDD has to provide privacy and security for certain data but in parallel has made available public access data (Click on the thumbnail for a view of their present public access data). There will be those will likely criticize that all of their data are not Open but as I have explained myself previously this is a decision of the depositor to declare Open Data. In the case of CDD their business model, the wishes of their users and the very nature of drug discovery that their users are engaged in demands that they offer a secure and private platform in parallel to their Public Access offerings. Their approach works.

We have been working with CDD to allow their users to access ChemSpider directly from within the CDD platform. This is in place now and has been discussed in a recent blog post. the integration in their interface is clear. See the entire blogpost for more details. We look forward to working with CDD in the future. Their approach is fresh, innovative and gaining a lot of support from very significant names in the arena of drug discovery.

Reblog this post [with Zemanta]

I am happy to announce early members of the editorial board for the ChemSpider Journal of Chemistry. To date four people have stepped up to serve the ChemSpider community. These are:

Sean Ekins (Focus: Medicinal Chemistry and Computational ADME/Tox)

Jean-Claude Bradley (Focus: Organic Chemistry and Open Science)

Rajarshi Guha (Focus: Cheminformatics and Open Science)

Alex Tropsha (Focus: Computational Chemistry and Molecular Modeling)

The editorial board will likely grow in the next few days but I am fortunate that people I trust, respect and care about the development of a comunity for chemistry have chosen to help us with the journal.

Things are coming together for the release of ChemSpider Journal of Chemistry by the end of 2008. We are presently developing an editorial board and invitations have been sent out to those who support what we have been trying to deliver in terms of ChemSpider as a resource for chemists, those who have a passion for sharing chemistry with the community and specifically people who are willing to actively participate.

One of the discussions that we hope to have completed before we go live is the “peer-review process”. The journal is going to be a mixture of content including interesting posts from the blogosphere, other Open Access content, reviews and original research articles. Our intention is that every research article will go online as a “preprint” for feedback from the community. If any issues are seen with the initial submission (missing figures, missing sections etc) then feedback will be given immediately to the submitter. When the “preprint” is put online then up to four people will be invited to provide feedback. This will be public feedback. In parallel the entire community is invited to participate in the feedback process.

Based on the feedback the submitter(s) will edit their article and respond to the community, if they wish, and, after a certain embargo period, may reissue their article. Feedback may then continue.

This journal format represents the shifts occuring in the reporting of science today. While today some scientists report via their blogs and wikis this is a very small community. With the ChemSpider Journal of Chemistry we are offering an environment where, in theory, the traffic exposure of a scientists work will be much higher than on a blog or wiki, a peer review process will be operating and, if the journal gains the intended credibility, will be appropriate for listing on a CV (not that blogs and wikis are inappropriate!).

This is a process in development…

This is a short announcement to inform users that anyone looking at Chemmantis articles at present will see issues with structures linked to chemical names. We are reworking structure image generation at present and are debugging some of the structure display issues. We are doing this live on the production system for a number of reasons including gathering feedback from certain collaborators.

We have also been expanding our dictionaries for mark-up on ChemMantis. The present list is shown below – we have been working on hardware vendors, software vendors and chemical vendors recently and have introduced the promised seperation for genes proteins and enzymes from the species list.

We are focused on providing tools to our users to ensure that they can add information of interest to structure-based records in ChemSpider. We have introduced DOI-based associations recently allowing users to connect publications of interest to chemical compounds on our database. The process is simple. Find the structure record of interest, use the Add DOI function and Publish. The process is outlined graphically below.

First, Login then navigate to the article of interest. In this case we are interested in associating a publication with the structure of Chaetoglobin A.

Find_the_paper of interest and the associated DOI. In this case we will associate the following RSC article. Click on Add DOI and enter the DOI. Click on Lookup, confirm that the data is correct and click on OK or cancel as appropriate.

The_associated_DOI_will_be_held_in_embargo until a curator confirms it, generally within a few hours. If we see no issues with the process we will remove the curation process. When approved you can see the information associated with the record as shown below. The DOI is linked directly to the article and will deliver traffic to the publishers serving both the users of ChemSpider and the publishing community. Simple.

Reblog this post [with Zemanta]

The ChemSpider Journal of Chemistry is an experiment. We intend to demonstrate how modern web technologies can be used to dramatically enhance the type of information that can be communicated using web-based tools over standard online publishing approaches. There are some publishers who are working in delivering additional value to their readers by providing enhanced HTML articles and adding information to their articles such as InChIs to allow structure-based queries online. These publishers include the Royal Society of Chemistry with their Project Prospect and the Nature Publishing Group with their Nature Chemical Biology papers. The majority of articles presented by the commercial publishers are not of a “just-in-time” nature and are delayed by the “processes of publishing”. They are generally fairly lengthy documents and report successful results. They are commonly peer-reviewed and have endured a significant timeline from initial writing to submission, publishers processing, review and publication. Science is however being reported in near real-time under Open Notebook Science (ONS) initiatives. We believe that an online journal can co-exist between the immediate nature of blogging and wiki tools hosting ONS efforts and the more standard processes of the scientific publishers. Some publishers are already allowing online and open peer-review whereby readers provide their feedback to the author in a public forum. Papers can enter a period of online peer review and commentary during which readers provide feedback to the author(s). As a result of this process the authors can engage in public discourse with the commentators and issue a final form of the manuscript. We will offer similar facilities.

We invite manuscripts from anybody interested in exposing their work in the field of chemistry and intersecting fields. In general we expect these communications to be 1500-3000 words in length but there is no limit. We encourage submissions relating to chemistry, biochemistry and chemical biology; regarding synthesis, the analytical sciences and computational chemistry; as research, as commentaries and as questions to the community. Provided the submission relates to the domain of the chemical sciences we will find a place for it within the ChemSpider Journal of Chemistry. We encourage submissions from academia and industry, from students and senior scientists, from individuals and teams, for successful research or failed experiments. We encourage submitters to challenge us to host your manuscripts in a manner which most clearly communicates your science. This may include hosting various forms of data made available to the public as Open Data, providing visualization tools for the display of molecules, spectra, images and videos. We intend to not be constrained and to make full use of web-based tools available today and coming online tomorrow.

All articles will be Open Access articles. We will abide by the Budapest Open Access Initiative which declares “By ‘open access’ to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.” Authors must agree to allow unrestricted reading, downloading, distribution, printing, searching and linking to the published work.

Over the past 2 years we believe we have demonstrated our passion for public science, our willingness to serve the community, and integrity in our actions. We hope that the ChemSpider Journal of Chemistry will provide a vehicle to all scientists operating within the domain of the chemical sciences to expose their work and interests to the community. We intend to deliver a facile process of submission and superior tools for delivery. We welcome your support and look forward to expanding the communication of chemistry.

Reblog this post [with Zemanta]

I’ve caught wind of some growing confusion in the world of “Chemical ‘Pedias” relative to Chemistry on Wikipedia. I think we might have added to the confusion so I want to clear it up here.

We originally released WiChempedia in April of this year as announced here. What is it? It is a subset of the ChemSpider database made up of structure-based records on Wikipedia. So, when you visit www.wichempedia.org what you will see is a redirect to wikipedia.chemspider.com, the wikipedia subset.  Any of the records under this subset are linked to Wikipedia Articles. For example, for this record you will see:

Wikipedia Article(s)

Quinacrine (trade name: Atabrine) is a drug with a number of different medical applications being initially used in the 1930s as an antimalarial drug. It has also been used as an antibiotic in the treatment of Giardiasis (an intestinal parasite), and in research as an inhibitor of phospholipase A2. It has also been proposed for use in systemic lupus erythematosus. Read more… or Edit at Wikipedia…
Notice that this is linked out to Wikipedia for you to read the entire article and that it is even possible to edit the article at Wikipedia. We do not grab the entire article for a compound. We grab only the beginning of the article and display this with a link to the original article. This dramatically reduces the work we would have to do if we hosted all of the Wikipedia Chemistry articles since we would need to stay updated with changes to all of the articles. Too much work. What does Wichempedia offer to chemists and to wikipedians interested in Chemistry? It offers structure and substructure searching of Wikipedia and access to a LOT more supporting information. For example, for this record you can see publications, spectra, safety/tox information etc. We are expanding on information for Wikipedians not just showing Wikipedia records again.
Another online resource tapping into Wikipedia Chemistry is Chempedia from Rich Apodaca. This is not to be confused by the OTHER ChemPedia. (People ask why we use weird names like ChemSPider and ChemMantis – try finding something NOT claimed on the web already!) Rich has taken a similar approach to accessing the Wikipedia monographs for display as detailed here. To use Chempedia is simple…a google like entry page where you enter a name. Entering Quinacrine provides the same Wikipedia text as on ChemSpider (it’s a short article), an image of the structure and InChI, and Mw. A comparison between the ChemBox on WIkipedia and Chempedia is shown below.
Rich_has_done_a lot more work on Chempedia to integrate into information on Wikipedia than we have done with ChemSpider. For Quinacrine for example Rich includes this information about latest edits and who edited.
7 edits since May 19, 2008. Last edited Jul 10, 2008 by Lightbot (31).
Rich is working on structure and substructure searching at present I believe.
There is confusion, I believe, about both Wichempedia, our own approach, Chempedia and Wikipedia chemistry. I saw this this week confirming my belief. The way this reads over 3000 people have contributed to Chempedia since Nov 2008. I was interested if this was true.
Chempedia
Chempedia is different from most other chemical databases in that its textual content is created and updated in real-time by a large and diverse community of volunteers worldwide through Wikipedia. This means every one of Chempedia’s compound monographs can be changed and adapted by you. And if you find a Compound Monograph is missing from Chempedia, you can create it and make it available for others to use. More then 3200 have contributed to this Wiki site as of Nov. 2008.
http://chempedia.com/
I clicked on the contributors link for Chempedia and saw that there were indeed 3207 contributors. However, just looking at page 1 we see that for MOST of the people listed they are no listed monographs and no contributions. In fact for the first 10 people listed there were 4 contributions. Maybe there is some historical issue here? Maybe current contributions only includes in this year. Not sure. There are pages where there is only one structure.
I was interested to see whether I was listed as a contributor since I have contributed to Wikipedia but not to Chempedia directly. As I clean data on ChemSpider I’ll make edits to Wikipedia.There is benefit to moving information from both Chempedia and ChemSpider back to Wikipedia and Rich Apodaca and I both contribute to Wikipedia.It says on the contributors list that I have contributed 15 times. I’m not sure what that means, maybe in terms of contributions to ChemBoxes, but I have left a lot of comments on Wikipedia. I think contributions must be edits..not sure.
Relative to the comments “This means every one of Chempedia’s compound monographs can be changed and adapted by you. And if you find a Compound Monograph is missing from Chempedia, you can create it and make it available for others to use. ” ChemPedia directs the user to Wikipedia to write an article and then links to it. The process is simple. When a search is done if an article doesn’t exist you get the response:

Suggestions:

  • Re-check your CAS number, monograph title, PubChem CID, or structure.
  • Remove keywords. Chempedia does not yet perform keyword searches.
  • If your article doesn’t exist on Wikipedia, create it. You can then add it to Chempedia.
if you click on create and go to Wikipedia to write the article. Then you link it back to Chempedia here.
This is great…users are directed to help Wikipedia and everyone wins. When the article is written ChemSpider will pick up the llink too and we’ll all be integrated. We haven’t introduced that onto ChemSpider..it’s a good idea though. Should we?
All is actually made clear here on the About ChemPedia page…”Chempedia is different from most other chemical databases in that its textual content is created and updated in real-time by a large and diverse community of volunteers worldwide through Wikipedia. This means every one of Chempedia’s compound monographs can be changed and adapted by you. And if you find a Compound Monograph is missing from Chempedia, you can create it and make it available for others to use. “
There are some examples of structures on ChemPedia not on Wikipedia yet (see below…is that the correct structure? It came from PubChem but I don’t know) and the same situation is true for ChemSpider.Eventually we will have systems in place to exchange such information on the fly.
Chempedia and Wichempedia are serving a valuable purpose. We are both dependent on the contributors to Wikipedia and are indebted to them!
Reblog this post [with Zemanta]

As posted previously I gave a talk on Monday at the Library of Congress. This meeting was about “Making the Web Work for Science and the Impact of e-Science and the Cyberinfrastructure.” It was one of the few occasions where I looked out at the audience, about 150 people, and didn’t know anyone (well, except for the person who invited me and one of my fellow bloggers, Michael Nielsen). I gave a talk of a very different flavor. It wasn’t about ChemSpider…it was about chemistry and access to information. I provided an overview of how access to information has changed over the past 20 years for me. I talked about the challenges for publishers serving the chemistry community and how their business models are being challenged and how I empathize with the struggle to figure out how to deal with it. I talked about quality and how care must be taken when using information online. We are ALL challenged with errors – whether you consider PubChem, ChemSpider, Wikipedia or any of the other online databases they all have errors – how do you find them? Some of them are obvious and I pointed to obvious examples in the talk. I hoped to educate the attendees in regards to the value of InChI which, while not a perfect fit yet, is a great start to structure-based communication of chemistry. I think I achieved my goals there.

I publicly blessed the efforts of publishers such as the RSC and Nature Publishing group for the efforts they are making to support InChI and improve the quality of document presentation online. I blessed CAS as a treasure trove of information and the gold standard of curated chemistry. We need them all to be successful for the sake of our science. The challenge is how to fit into the ongoing proliferation of free access to information without modifying the business models.

I also announced the ChemSpider Journal to be released this month.

The movie has been posted to SciVee and the talk is on Slideshare here (it’s already been read 67 times in 48 hours). The movie is about an hour long compared to the 25 minute presentation I gave. Not sure how that happened..maybe more relaxed sitting on the couch than standing in front of the crowd. I struggled to upload the movie to SciVee and received for SURE the best technical support ever for a free service. For those of you not visiting SciVee I encourage you to patronize it.

IF THE MOVIE BELOW DOES NOT SHOW IN THIS BLOG WINDOW PLEASE VISIT THE SCIVEE SITE HERE TO SEE IT

I have been giving a lot of presentations of late regarding ChemSpider, ChemMantis, chemistry document markup and the challenges to publishers. These have been both closed door presentations where people are seeking input regarding the business challenges for chemistry publishers as well as in more open forums. One of the more common questions that is coming up now is around ChemSpider and ChemMantis. How are they related and how are they different? I’d like to declare that here…

ChemSpider is a website providing access to a database of structure-based content. It is also a “linkbase” providing a way to navigate from structure-based records out to a multitude of resources with information about the chemical entities on ChemSpider. It is also a platform for the deposition of new content, the annotation and curation of existing content and access to a series of services for the prediction of properties and integration to other resources. The value of ChemSpider is, in many ways, dramatically reduced without the content.

ChemMantis is a platform for document markup, specifically focused on identifying chemistry related terms in various documents. At present we have algorithms and dictionaries for extraction of chemical names (trivial, trade and systematic), chemical groups, reactions and chemical families. We are also working on dictionaries for something we are loosely terming “species” – at present this includes bacteria, fungi, etc. These will be segregated appropriately in the very near future.

Following the extraction of these various entities we are connecting them out to allow searching of resources such as Wikipedia, ChemSpider, NCBI’s Entrez and Google. ChemMantis does NOT depend on ChemSpider but can make use of what is available on ChemSpider to the benefit of the user. ChemMantis will be a “product” in the future. It is something that can be installed inside an organization and used for document markup and indexing of chemistry related documents. It will also serve as the basis of our ChemSpider Journal. More detail to follow on that….

Reblog this post [with Zemanta]

I’ve blogged previously at the honor of ChemSpider starting to be indexed by the Chemical Abstracts Service. I take this as a blessing of the value we are offering to the community. Interestingly this has resulted in some confusion within the communiyt. Now when people are finding structures of interest in the CAS registry they would like to hop over to our site for details. Based on what I’ve heard/seen that’s not so easy…there is no ChemSpider ID number to use and no link into our database. Hrmmpphhh..a little user friendliness would go a long way.

Anyhow, today a comment on one of the ChemSpider records caught my eye. “…why do these records not have a CAS RN associated with them? CAS acknowledges you...”. Well the answer is simple to that. We don’t receive CAS numbers. I am not aware that ANYONE who has chemicals indexed in the CAS registry receives CAS numbers back as an outcome of being indexed. Am I right? I think CAS numbers have to be paid for when you register a compound. And if they are registered for you so be it. No cost, but you don’t get the numbers for your own usage. Anybody know different?

I have received some interesting comments off-blog (consistent with the way I said blogs work for me) regarding how CAS’ indexing of us is helping people find information through us. For example, for the record at which the comment was made (CSID 9727274) the user could find their way out to PubChem, where we sourced the Thomson Pharma information that is also on the record. Hmmm…CAS is indexing PubChem via ChemSpider. That’s an interesting state of affairs considering the historical collisions regarding ACS and PubChem.

There have also been examples where people searching for IP issues about chemical structures have ended up on ChemSpider through our indexing on CAS and then from us out to the SureChem database of patents. The searchers have not been able to find information about certain structures in the CAS patent database but have found it in the registry and from there via ChemSPider out to the SureChem patent database. I’m not sure I fully understand the segregation of the data on the registry and the patent database since I don’t have access to CAS tools but I think it’s great that online resources such as ChemSpider, PubChem and SureChem are all now being brought together through the CAS registry and indexing processes. This is very beneficial to the community.

Reblog this post [with Zemanta]

I am honored to be invited to join the Editorial Board of the Journal of Cheminformatics, a new journal from Chemistry Central. Over the years I’ve co-authored a lot of papers in ACS’ JCIM, previously JCICS (my list of publications is here) and my co-authors and I have always wondered about when another journal of the nature of JCIM would show up in the world of Open Access. Now it’s coming and I’m excited to be involved. We are writing a submission to the journal at present. The editors-in-chief and editorial board are listed below. I know the majority of these people personally and believe that this group will ensure the highest standards for the journal.

Editors-in-Chief
Christoph Steinbeck (United Kingdom)
David J. Wild (United States)

Editorial Board
Jean-Claude Bradley (United States)
Curt Breneman (United States)
Robert D. Clark (United States)
Jeremy Frey (United Kingdom)
Johann Gasteiger (Germany)
Val Gillet (United Kingdom)
Robert Glen (United Kingdom)
Jonathan Goodman (United Kingdom)
Rajarshi Guha (United States)
Mic Lajiness (United States)
Yvonne Martin (United States)
Peter Murray Rust (United Kingdom)
Alexander Tropsha (United States)
Wendy Warr (United Kingdom)
Ian Watson (United States)
Peter Willett (United Kingdom)
Antony Williams (United States)

Recently I gave a presentation at Drexel University and it was captured using Camtasia. The presentation was over an hour long and resulted in a >300Mbyte file. I have previously created ChemMantis movies and when I put them online on this blog the resolution was poor when shown through YouTube and I ended up creating a QuickTime Movie instead. YouTube appears to have a 100Mbyte limit so the question is what could I do with a 300Mbyte movie to share it…it reduced to 200Mbytes when I created a Quicktime movie but still was too big for YouTube.

Thanks to my good friend Jean-Claude Bradley from Drexel University he pointed out that SciVee might be a good home for the movie. He was right. I had heard of Scivee a few months ago but had been distracted with other things and hadn’t go back to check it out. What a great site…seamless to work with, great content on there and an ideal home for all of my future presentations too. I’m surprised more people haven’t heard of this resource. I recommend it highly. My SciVee movie is shown below..enjoy the “how to get Camtasia working” moment at the beginning…technologist at work :-)

Reblog this post [with Zemanta]

Today I had the privilege of meeting with many members of the team creating the RCSB Protein Data Bank. This resulted from the wonderful networking opportunity offered by the Scifoo camp held earlier this year at Google where I met Helen Berman, director of the PDB team, part of the worldwide Protein Data Bank. Helen and I shared some conversations sitting outside the Google offices in California and shared our opinions and visions regarding the quality of small molecule data available online. Today was an opportunity to take those conversations further, meet with members of the team and determine whether ChemSpider’s efforts could bring benefit to the PDB in terms of our curation efforts and whether ChemSpider users could benefit from having access to information on the PDB via hosting of the PDB ligand dictionary.

I gave a presentation (online here and based on others I have delivered previously) and received a one on one review of the deposition and curation processes of the PDB as well participated in a group discussion about how to continue the stringent and exacting process of validation and curation associated with small molecule structure sets. We discussed the complex relationships between systematic names, trivial names, registry IDs, database IDs, tautomers, charged states, SMILES and InChIs. It was a particularly validating day to spend time with a group of people who have responsibility for building one of the most valuable resources in the world and have faced the many challenges associated with validating structure-based data. There is a distinction between people who talk about what it takes to curate structure collections rather than those who actually do the job for a living. This team is made up of dedicated, passionate and skilled individuals who deeply care about the quality of their data and who do the heavy lifting and grunt work so that the users of the PDB enjoy the benefits. They have been working on a multi-year process to curate and improve the PDB data and are in the final major phase of the effort to clean up the archive and apply the processes to all new data moving forward . ChemSpider and PDB will be more integrated in the near future and we look forward to supporting their efforts for providing high quality structure data to the community and continuing to expand the network of integrated online chemistry.

I’ll be presenting at this conference next Monday in Washington if anyone is interested in stopping by to say hi….

Workshop: Making The Web Work For Science

Making the Web Work for Science:

The Impact of e-Science and the Cyber-Infrastructure

A One-Day Workshop Co-sponsored by CENDI and NFAIS and Hosted by FLICC

Library of Congress, 101 Independence Ave, SE, Washington, DC 20540

Mumford Room / December 8, 2008 / 9:00am – 4:30pm

AGENDA (11-4-08)

8:30am – 9:00am: Registration/Coffee

9:00am – 9:15am: Welcome / Opening Remarks Roberta Shaffer, Director of FLICC, Library of Congress

9:15am – 10:00am: Making the Web Work for Science: The Current Landscape

The opening keynote will provide an overview of how the Web is currently being utilized for the advancement of science and scholarly communication. Roberta Shaffer will introduce Dr. Christine Borgman, Professor & Presidential Chair in Information Studies, University of California, Los Angeles, and author of Scholarship in the Digital Age: Information, Infrastructure, and the Internet.
10:00am – 10:15am: Break and Networking Opportunity

10:15am – 11:45 pm: Making the Web Work for Science: The Content Providers’ Perspective

This session will focus on how innovative content providers, including Federal STI program leaders, librarians, and publishers are leveraging current Web technologies in order to maximize global access to and use of scientific and scholarly information. The use of Web 2.0 features such as Wiki’s, RSS feeds and blogs will be discussed as will plans for the future.

The panel participants are Dr. Walter Warnick, Director, Office of Scientific and Technical Information, Department of Energy; Dr. Sayeed Choudhury, Johns Hopkins University; and Howard Ratner, Executive Vice President and Chief Technology Officer, Nature Publishing Group. Karen Spence, DOE/OSTI, will moderate.

11:45am – 12:45pm: Lunch

12:45pm – 2:00pm: Making the Web Work for Science: What Scientists Really Need!

In this session, two practicing scientists will discuss their use of conventional and Web-based information tools for scientific research, what works and what does not, and what they believe the information community needs to provide in to maximize the full potential of the Web as an effective and essential resource for scientific discovery.

The panel participants are Dr. Antony Williams, Founder, ChemSpider; and Dr. Alberto Conti, Astrophysicist, Space Telescope Science Institute. Jill O’Neill, NFAIS, will moderate.

2:00pm – 3:30pm: Making the Web Work for Science: Challenges to Implementation

In this session, three experts will discuss the technological, legal and cultural challenges that all organizations must overcome – libraries, publishing institutions, scientific laboratories, etc. – so that each can utilize the full potential of the Internet and the Web met in the fulfillment of their common mission – to build the world’s knowledgebase through enabling research and managing the flow of scholarly communication.

The participants are Dr. Michael R. Nelson, Visiting Professor at Georgetown University; Fred Haber, Vice President and General Counsel, Copyright Clearance Center; Dr. Michael Nielsen, Physicist and Science Writer, Perimeter Institute for Theoretical Physics (Canada). Bonnie C. Carroll, Executive Director of the CENDI Secretariat, will moderate.

3:30pm – 3:45pm: Break

3:45pm – 4:30pm: Making the Web Work for Science: What the Future Holds

This final keynote will explore the future promise of the Web and the various ways in which the cyber-infrastructure can ultimately re-engineer not only how scientific research is conducted, but also how the resultant information is communicated, shared, verified, and built upon as scientists and scholars around the globe increasingly collaborate in building the world’s knowledgebase of scientific and scholarly information.

Ellen Herbst, NTIS Director, will introduce Dr. Christopher Greer, recently of the National Science Foundation’s Cyber-Infrastructure Office, and currently the Director of Networking and Information Technology Research and Development (NITRD) of the National Coordination Office.

4:30pm: Adjournment

PDF Version Available At

[http://cendievents.infointl.com/nfais_cendi_120808/docs/Agenda.pdf]

Source

[http://cendievents.infointl.com/nfais_cendi_120808/docs/agenda.html]

General Information / Registration / Etc.

There is a two-fee structure for this workshop to allow the sponsors’ and host’s members an opportunity to attend at a reduced cost. CENDI, NFAIS, and FLICC members will be charged $65.00; all others have a registration fee of $95.00.

[http://cendievents.infointl.com/nfais_cendi_120808/]

I think the press release here, and copied below, speaks for itself…When I posted the blog about the need for an InChIKey Resolver it resulted in a great discussion and series of comments. Since that time I’ve had many discussions with interested parties about the need. The RSC and ChemSpider share a mutual view regarding the need for the InChI resolver and we are honored to be entrusted to develop a resolver for the community. Will it be “the” resolver..only time will tell. There are various ways to deliver a system to do this so we’ll start here and garner feedback. There are many ways to “hunt a Welshman” (I can say that since I’m Welsh!) so there may be other efforts to deliver a resolver coming too.

“RSC and ChemSpider develop InChI Resolver

01 December 2008

An InChI Resolver, a unique free service for scientists to share chemical structures and data, will be developed by a collaboration between ChemZoo Inc., host of ChemSpider, and the Royal Society of Chemistry. 

Using the InChI – an IUPAC standard identifier for compounds – scientists can share and contribute their own molecular data and search millions of others from many web sources. The RSC/ChemSpider InChI Resolver will give researchers the tools to create standard InChI data for their own compounds, create and use search engine-friendly InChIKeys to search for compounds, and deposit their data for others to use in the future. 

The future of publishing

‘The wider adoption and unambiguous use of the InChI standard will be an important development in the way chemistry is published in the future, and the further development of the semantic web,’ comments Robert Parker, Managing Director of RSC Publishing. 

The InChI Resolver will be based on ChemSpider’s existing database of over 21 million chemical compounds and will provide the first stable environment to promote the use and sharing of compound data. ‘ChemSpider hosts the largest and most diverse online database of chemical structures sourced from over 150 different data sources’ adds Antony Williams of ChemSpider, ‘We have embraced the InChI identifier as a key component of our platform and the basis of our structure searches and integration path to a number of other resources. We have delivered a number of InChI-based web services and, with the introduction of the InChI Resolver, we hope to continue to expand the utility and value of both InChI and the ChemSpider service.’ 

Society support

‘As a learned society publisher it is important that RSC provide support for the standard and contribute to the development of the resolver, which promises to be a valuable service for the chemical science community.’ continues Parker, ‘our collaboration with ChemSpider on this project will enable this to be delivered quickly and sustainably.’ 

The imminent adoption of the InChI generation protocol will be a welcome and necessary step to the wider adoption of the InChI standard. “

I’ve had a number of people encourage me to “Twitter“. If you don’t know what twittering is then don’t distract yourself with it …I’ll let you know whether it has any value. I’m not so sure I want to share myself that much with the world…let’s see. However, if you want to see what a twit I am over the next few weeks then I’ll be twittering here: http://twitter.com/ChemSpiderman