Archive for the Vision Category

Last a week I had a pleasant chat with a reporter from Nature magazine, a Mr Geoff Brumfiel. Geoff was interested in ChemSpider…what it was, how it ran, who used it, who supported it, who liked it, who curated it, who didn’t like it and so on.

The results of that discussion, and others he spoke to about ChemSpider, are here in his article.

Chemists spin a web of data p139
Chemspider website provides free information on millions of molecules.
Geoff Brumfiel
doi:10.1038/453139a
Full Text | PDF

It is a rule at Nature, at least for this type of article, that I could not see the article before it went to press and therefore I didn’t get the chance to proofread and comment. Geoff has accurately captured the spirit of our discussions but a few detailed clarifications are needed too. I have pasted in black the article content and in italics the clarification.

providing the community with an open-access source of chemical information

I giggled and commented please don’t say it’s Open Access. Say it’s Free Access. Say there are Open Data. And now we have Creative Commons licenses. But don’t say it’s Open Access, not Strong, not weak, not gold, not green. Just Free Access. No price barriers to usage.

Chemist Antony Williams is hoping to change this in a move likely to ruffle the feathers of the American Chemical Society.

I commented that we are not purposely in competition with anyone. It’s not what drives us to do this. Whether others see us to be competitive is for them not us. We don’t intentionally try to ruffle feathers. It doesn’t mean that what we are doing won’t ruffle feathers of course. Whether it’s ACS or others. It’s not the goal..it might be an outcome.

The modest project has made chemists interested in open access take notice — last week, the number of daily users of the site surpassed 5,000.

We have crossed 5500 users for the past two nights. The trend is positive.

“Other potential sources of information, such as Wikipedia, lack the algorithms needed to search chemicals according to their structure. “

Structure searching is “feasible” of course with InChI Strings. But substructure isn’t and Wikipedia is treated as a text-based search by almost all of its users

“The site is maintained with modest profits from advertising and the work of about 30 active volunteers who double-check the data pulled in from outside.

The original investment in hardware and software costs has finally been recouped. Modest profits? No one gets paid for the work we do. There is a phenomenal sweat equity investment in the platform numbering many thousands of hours to get here. We are indebted to the many software collaborators, providers of tools and the people curating and depositing to the system. There have BEEN about 30 active volunteers. RIght now I would say the number of active depositors and curators is around 10. But it is growing. I hadn’t checked the number of REGISTERED users for a long time. We have over 1150 registered users…those who CAN login and curate data, deposit data, see new features etc. People do NOT have to register to use the site…but >1150 did. Wow. I didn’t know it was that many until i just checked (BIG SMILE)

““There’s an awful lot of chemical information, but there’s an awful lot of rubbish as well,” says Barrie Walker, a retired industrial chemist in Yorkshire, UK, who helps maintain the site.”

Don’t know whether Marrie said this or not. He IS an honest guy and he is our QUALITY GURU and we are proud that he is willing to give us his fine eyes. There IS garbage on the site still. But, after a year online and active curating it has been much reduced. About 200 edits a day are made to the site: names changed/deleted/added, spectra/structures/URLs/Publications added etc. It’s quite the pace. We have cleaned up 100s of thousands of incorrect associations from the external data sources. It’s been and will remain an enormous task with an enormous payback for the community

Williams adds that the site still has problems with certain searches. For example, it struggles to distinguish between isomers: molecules with the same chemical formula arranged in different structures.

We  can distinguish isomers no problem. The PROBLEM is that there is a mixture of isomeric species submitted from multiple data sources and data are mixed and intermingled in way that the user cannot get to the correct structure. Search taxol or Ginkgolide on the ChemSpider blog and read the mutliple blog posts about this. We can of course search all isomers for a particular chemical formula…

“But Williams nevertheless believes that the service may be able to compete with for-profit services. “What I’m doing is highly disruptive,” he says. “I think it can be done and it needs to be done.”

I think what WE are doing…its not me..it’s we…is disruptive. In a good way. Many chemists will benefit. Will it have an impact on for-profit services? Yes, maybe. As an outcome but not as the target. Our team of people, both internal to ChemSpider’s development and Advisory Group, and the people we don’t even know who are cleaning and depositing into the system for their colleagues in the community, are creating a powerful resource for Chemists. The FOCUS of this effort is to Build a Structure Centric Community for Chemists. We will change that soon…the focus on Structure-Centric will be to cover Chemistry in general and to Build a Community for Chemists.

We are well on our way and thanks to Nature, and Geoff in particular for exposing it. My comments above are not meant to detract from Geoff’s reporting abilities but it was a long discussion and some clarification statements are of value i believe.

Buy me a Coffee

We have made significant advances in the structure deposition system on ChemSpider. We’ve reported on our advances previously and working hard to polish it.In parallel we’ve done work to support deposition of batches of structures (100s to many thousands) as well as the deposition of CSV files to support Open Notebook Science. We are going to roll out deposition in phases - single deposition first, batch deposition next and then CSV file based batch deposition.

So…why are we encouraging the deposition of structures onto ChemSpider. We agree that we could accept RSS feeds (and we will). Our view is that people might to have “bragging rights” on their latest synthesis, might want to expose their latest paper on ChemSpider, might have a link to an article online that they might want to expose to people. While there are MILLIONS of structures online there is new chemistry reported everyday. What other system is there available as a structure-based community for chemists where people can deposit their structures, stories, links and comments to share with others? (And open up a conversation with others about synthesis, analysis etc.) Think of it a little like Flickr or YouTube for chemical structures. Anyone can post their structures for people to browse.

I’ve been doing some example depositions to show what’s feasible…these are simple to do…a few minutes work maximum.

1) I was a co-author of a publication and received a copy today. I wanted to put a link to the paper and associate it with the structure we analyzed. The structure already existed on the database so this was information to be added to the existing structure. Scroll down to the end of the page for this record to see this Supplemental information

Martin, G.E., Hilton, B.D, Blinov K.A. and Williams, A.J. “Using indirect covariance spectra to identify artifact responses in unsymmetrical indirect covariance calculated spectra “, Magnetic Resonance in Chemistry

[DOI: 10.1002/mrc.2141]

2) A new publication was released this week regarding a new compound Quesnoin. David Bradley blogged about it on Spinneret. In this case I wanted to add the structure, information about the structure as well as a link to the recently published article. Scroll to the bottom of this record.

There are many other examples online too here (1,2,3). Look at the Supplemental Information in each case.

There are some final tweaks being made at present but single deposition is now rolled out. We are looking for people NOW to start using the system so please ping me. An overview of the system is available here.

deposition_workflow1.png

 

 

The future will include users creating their own “catalogs” of structures, “social networking” and discussions around structures, team-based discussions, public and private structure collections and so on. It’s coming…in stages. We start here with the single deposition process.

Buy me a Coffee

Over the past few weeks I have had a few discussions with a member of the ChemSpider Advisory group regarding a concept to create WiChempedia. I’ve enjoyed these conversations with Alex Tropsha (professor and Chair in the Division of Medicinal Chemistry and Natural Products in the School of Pharmacy, UNC-Chapel Hill.) We are like-minded in a number of ways but specifically in what can be done to facilitate delivery of quality information to the chemistry community.

As you will notice if you frequent this blog I am rather a stickler for accuracy and quality (1,2,3). I think it’s important (4). Over the past few weeks I’ve spent more time looking at the quality of data on Wikipedia and trying to figure out the best way to bring together our efforts on ChemSpider to enhance the capabilities of integrated information and to support the quality efforts being made by the WP:CHEM team and help them. I also intend to facilitate the development of our own Wiki environment for chemistry and to generally enhance the tools available to chemists not only for Wikipedia type annotation but also to support Open notebook Science.

Now, I don’t want to reinvent the wheel. Wikipedia has a lot of what is necessary in terms of being a known system, a following of people and committed supporters in the WP:CHEM team. What I have been hoping for was a shift around structure and substructure searching on the MediaWiki platform but I know that is a tough request as the platform is not built for that type of thing, The InChKey holds some promise for exact structure searching but does not offer an opportunity for substructure searching without a lookup across a larger database. I want to facilitate information and data sharing further. I do want to provide the type of service that Wikipedia does in terms of general information but also layer cheminformatics tools onto that knowledge and information, allow addition of analytical data, analysis tools, real time predictions and analysis ultimately. This platform should certainly be wiki-enabled.

Decision made. Our intention is to deliver wiki-capabilities in ChemSpider and to use the Open Content associated with chemicals and drugs on Wikipedia inside the system. We will then provide an environment for people to continue to add to, enhance and curate the Wikipedia content as well as add their own. Last night (and well into the early morning) I spent some time talking to Martin Walker from WP:CHEM regarding my concerns that we might offend the Wikipedians with our efforts and that I did not want them to feel that we were ripping off their hard work but rather have our efforts seen as supportive and enabling. My intention as we work through downloading the data and to check, validate and correct what is sitting on Wikipedia directly for benefit to the community. Also, we will of course need to leave all Wikipedia content under the appropriate licensing for others to use. Martin commented that there are tens of mirrors of Wikipedia out there ripped purely with the purpose of exposing and getting ads revenue. We are not working from that model….our intention, as usual, is to build a structure centric community for chemists and with so much excellent work done on Wikipedia I want to take advantage of it and give back also by the work we will do.

Two domain names have been grabbed for this project : WiChempedia, for compatibility with Wikipedia, and also WeChempedia, to emphasize the community aspects of the project.

If you frequent this blog you will recall that we have made a commitment to Microsoft Sharepoint as our future platform for wiki’ing ChemSpider. That is where we believe this work will be done ultimately but we don’t have the platform in our hands yet.

The Xmas vacation is going to be full of holiday movies and manual examination and curation of the Wikipedia data. Wish us luck!

Buy me a Coffee

Following my recent post on high performance computing and the Cell B.E I saw this today re. Gamers handing over their compute cycles to PS3GRID.
I abstract here but point you to the full article for details:

PS3GRID is coordinated by researchers at the Research Unit on Biomedical Informatics (GRIB) at the Instituto Municipal de Investigación Médica and the Universidad Pompeu Fabra in Barcelona, Spain. The distributed infrastructure enables any PS3 to do computations on atomic and molecular simulations

The researchers, headed by GRIB scientist Gianni De Fabritiis, chose the PS3 because it is the first consumer device to contain the IBM Cell processor. “The Cell,” which is more than an order of magnitude faster than standard Intel or AMD processors, optimizes the types of computation commonly used in graphics applications. In addition, the Cell offers an inexpensive and powerful method to perform highly detailed molecular dynamics simulations of biomedical systems. Using the Cell, a PS3 has the computational power equivalent to about 20 PCs.”

Buy me a Coffee

I think the image below will tell the story of what’s coming soon to ChemSpider. As part of a collaboration with a member of our advisory group we will be unveiling this new capability for beta testing in the very near future. I’m sure some of you will see where we are going next…watch this space.

Buy me a Coffee

I subscribe to Scientific Computing so that it drops into my email inbox. I read Rob Farber’s article this week entitled “The Future Looks Bright for Teraflop Computing “. His opening question was “Wouldn’t it be great to have a teraflop of computing power sitting in your lab, desktop workstation, or remote instrument server?” What would that mean to your work?

For those of you using ChemSpider you will know that we have about 20 million compounds on the database. With that many compounds population of the database with properties such as InChIStrings, InChIKeys, physchem properties and systematic names can take many days if not weeks. With three computers only in our hands, one of them a web server and one of them the database server, we are limited to one system. Even that dual processor system provides slow throughput. Oh the joys of having access to teraflop processors!!!

In my previous post on focused libraries I commented on ongoing discussions regarding the potential to perform online docking. Evangelists such as Jean-Claude Bradley (on our advisory group) have been talking about this possibility as part of his approach to Open Notebook Science. Docking can be very time consuming and the speed of calculations is very important. I have been working on a project regarding the value of porting docking software to the Cell Broadband Engine processor from IBM. The development of that processor is an interesting story in itself since it was driven specifically by the needs of the gaming industry for better performance in their calculations. Now SimBioSys are porting their docking software to the Cell processor as described in this White Paper. The improvement in performance is quite amazing!!!

While working for a commercial software company we saw productivity gains moving to clusters. Dual processors in our laptops and annual performance gains from the general technology shifts offer faster calculations every year. Teraflops on the desktop (and even laptop) are likely a few years away…but GFlops are here..

Buy me a Coffee

When we first started the ChemSpider project we made a commitment to “Build a Structure Centric Community for Chemists”. We are well on the way to facilitating that we believe. We have talked about a “wiki” environment for collaboration. In this framework we see wiki to indicate a “collaborative environment”, not necessarily adherence to a specific wiki-platform. Our intention is to provide the ability for users of ChemSpider to collaborate in the co-management of content on the ChemSpider site. A number of our readers have taken our statements to indicate that we will be using the same wiki platform as that utilized on Wikipedia. We have looked at and considered a number of “wiki” tools, platforms, interfaces and user-experiences. At this time we have made a decision to utilize Microsoft Sharepoint as the platform on which to construct our wiki-environment. With a clear commitment to Web 2.0 already declared and our platform built on SQL server and ASP.NET we feel it is the appropriate platform for us to build on. We believe the correct platform choice has already demonstrated that we can deploy a good solution very quickly because of our technology choices.

Now, we realize that this might result in a series of jabs about us not using Open Source solutions and so on but we are more focused on delivering an appropriate scalable solution than building ChemSpider only on Open Source software. We will support anyone who wishes to do the same on Open Source though.

We will keep you informed of our progress. Now we need to migrate ourselves to .NET3 and we hope this will be a short term disruption in the future as we switch over. Watch this space.

Buy me a Coffee

For those of you who have been watching the blog of late you will be aware of the recent discussions about Open Data (1,2). We have offered the possibility to submitters of spectral data to declare their data either Open or Closed. Noel posted a comment on the blog asking the question “Why is the default Closed? Why even offer the option of Closed?”

So..my response to “Why not offer the option of Closed?” My opinion is that this is the submitters decision. It’s not our role to force “Openness” of data onto users. We are working to create an environment that provides value to ChemSpider users rather than one that forces them into a policy regarding openness. Personally, I would prefer to have access to data to help answer a question, even if they are NOT Open Data, than to not have access to those data. I have asked all of the people who have submitted data or had me submit data to ChemSpider whether they would like to have their data moved to open. 3 said yes 2 said no. I do NOT intend to force people to adhere to making their data Open. That is their choice, not mine. We are creating a community for collaboration. There is value in having access to data whether it is Open or not. if you look at the recent conversations about RSC and their Free Access versus Open Access we must agree that there IS value to Free Access to their articles despite the fact that they are not Open Access.

My friend Gary Martin has allowed us to deposit some of his data onto ChemSpider. He has commented twice (1,2) and I refer you to those blog postings for his opinions. They are interesting to read.

The reality is tha our policies, even as they are, appear to be appropriate to have people deposit their data. We already have over 100 spectra deposited on ChemSpider and more to come based on recent conversations. Some of these ARE Open Data and the depositors are acknowledged for this. They are sharing their data with you through us. That’s the benefit of building a community for chemists.

Buy me a Coffee

This week I was privileged to attend a PubChem Working Group meeting in Washington and sit around table with interested parties discussing the present and future state of PubChem. I had the opportunity to give an overview of ChemSpider and our vision of ourselves and where we are going. if you are interested in reviewing the commentary please find a PDF file of the presentation here (shared with permission of PubChem). I welcome any comments, feedback or questions either as a blog response or offline.

Buy me a Coffee

Seth Godin is a mentor to many marketers out there today. I’ve read a number of his books over the years and he has many comments. He is a self-professed “idea-giver” …read his latest blog posting. I specifically like his comment “ideas are easy, doing stuff is hard”. How true that is. Over the years I’ve had lots of ideas. I’ve shared many “beverage-based conversations” where big ideas have been put out. The trick is in the “money where your mouth is” execution of these ideas. Over the years I’ve had the pleasure of working with people who tend to deliver as well as talk. WAY more motivating than just listening to the promises of what could be.

A few years ago at a meeting in Washington I sat in on probably the earliest public forum discussion on the potential of InChI. As a result of excellent teamwork between NIST and IUPAC, and doing rather than just talking they got it done. There was some negativity expressed during the initial meetings about InChI but it did not distract the team from producing the prototype versions, initial release and now the latest update with InChIKey support.

Now, I’ll guarantee that Seth Godin doesn’t know what an InChIKey is (Seth, if you’re reading this prove me wrong :-) ). But I want to take the position of supporting the Big Idea of structure searching the web and suggesting InChI key as one way execute on this now. There is a lot of passion around doing this and it has shown up in a number of postings by Rich, by Joerg (in regards to Wikipedia in this discussion), by Egon (discussing RDF’ing molecular space) and Jim, among others.

I am reading and hearing exchanges about the web being made structure searchable and my mind drifts immediately to the “it’s not enough” stance. The InChIKey should address some of the issues seen with InChI string searches and likely will be way more popular with the search engines. As commented last night on ChemSpider news the InChI keys on ChemSpider now link directly to a Google search.

The challenge remains, once all of those keys are out there how will the web be SUBstructure searchable or SIMILARITY searchable. The solution would appear to be a centralized repository of structures with their associated InChI strings and InChIKeys. The InChIKey cannot be reversed to the structure. A centralized repository of millions of structures and associated InChI strings and keys would allow that repository to be searched by substructure/similarity and then when a structure(s) of interest is identified then the Google search on that string/key could be kicked off. Maybe the discussion regarding the creation of such a centralized repository has happened already so I’d be interested in hearing what the path forward for that is. If it’s happening then the questions are who will host, how will it be funded, is there a timeline etc. If it’s not happening or is way in the future then I have an interest in opening the discussion regarding using the ChemSpider database and appropriate services (presently under development) to provide an interim service.

Structure searching of the web is of course going to provide high value. It should not stop there of course. let’s have the proactive dialog now about the next phase to facilitate substructure and similarity searching. If the conversations are going on elsewhere please post the links as comments so that the readers can follow them. I’m sure that Egon, Joerg, Rich, PMR will all have thoughts about how this should look. The bottom line out there is if this is the path the underlying system needs to be able to handle at least 25 million structures (ChemSpider has 17 million already) in the short term and be scalable to many tens of millions. There aren’t too many open platforms that can do that yet. I am aware of commercial platforms supporting many millions but no Open Source platforms yet…

Buy me a Coffee

Recently I posted some statistics regarding traffic to the ChemSpider website examined using various tools…our own and the Alexa Rank engine. Peter Schneider has commented on the performance of the various rank engines. He also asked an interesting question: “But the real question is: Does emolecules generate more income with an Alexa Rank of 400 000? It is not the question, if a site has more visitors or not… The question is, which project will survive…” It

s definitely worth commenting on!I am looking into the Alexa Toolbar issue and if Peter is correct in his judgment of its bias we will likely take it down. What we are looking for is accurate representation. We are now tracking google analytics and have signed up on compete.com as he suggested so only time will tell now.I think Peter is right in that there needs to be some standard way to compare sites. Certainly ChemSpider is not out to “beat” eMolecules or PubChem, or any of the new systems which might come online in the near future. I believe we all share the same space and bring value in our own ways. I have great respect for what Klaus and the group are up to. I collaborated with the team directly while I was at ACD/Labs - integrating ChemSketch into Chmoogle (as it was then), arranging exposure at Reactive Reports and then again with the logP donations working with the PhysChem product manager at ACD/Labs .Does eMolecules generate more revenue than ChemSpider with a lower Alexa rank. I would hope so…they are a business! I am not sure of their business plan but it does include exposing companies catalogs through their site (for revenue I should expect. - see example with a NCH skin on top of eMolecules engine at http://nchlab.emolecules.com/). I have also heard that in certain cases that compounds sold via the website results in a percentage going to eMolecules. I don

t know it is true but it is rumored to be that way. (By the way..I suggested to Klaus that we exchange our relevant structure collections and index each others structure collections
and link between the sites but haven’t got a response yet. This type of exchange/integration is what Joerg is talking about here.)ChemSpider, on the other hand, is a passion project. Until about a month ago it was non-revenue generating …more bank account draining :-) All computer software, hardware, ISP fees etc were paid for out of our bank accounts. Yes, we founded a corporation to do this…we

re an overly “litigious society”.
Recently I chose a period of personal sabbatical so now I am the non-revenue generating member of the household (but a great chauffeur for the children). I am happy to say that now we actually have sponsors for the site. We did try the Adsense approach but the $2.50 per day wasn’t worth the reputation ding and the annoying screens. We’ve added “Buy me a Coffee” to the blogs…but so far we haven’t had one. So, we are depending on the kindness of our sponsors to keep the site going at present. If you look at the home page you will note that Waters was kind enough to sponsor the site and is a gold-level sponsor based on the magnitude of their support. We have recently received support from one of our other collaborators and their logo will post soon.

I can confirm that in my downtime I am looking for additional funding to the keep ChemSpider going in whatever way it comes: sponsorship, anonymous donations, grants, collaborations, begging, borrowing (no stealing…). ChemSpider can continue to move while there are free cycles to support it and enough income (or family monies available) to keep it exposed. If there is no way to create a revenue stream from the system it will certainly suffer in terms of the pace it moves when those of us working on it now get tired and some of us “go back to work” and have new career objectives to distract us. ChemSpider IS still a passion project. The intention is that there will always be an Open Access ChemSpider for chemists to use. I see no reason that everything you have access to now will ever be taken away. The majority of what we have in our development plans is for the good of all. I don’t know how else to commit to a deeper level of permanence for the site. We are not yet done with the conversations about Open Sourcing the code in the future.

So, thanks Peter for asking the question about “which project will survive”. If any readers have thoughts about garnering financial support for the system through sponsorship, grants, collaborative work etc please contact me at the usual address (antony.williams AT chemspider DOT com) and open the discussion. What we want is for ChemSpider to be around for many years to come..and I believe we can make that happen even in our spare time. That said, with dedicated effort the reach of this project can be truly massive…

 

 

 

Buy me a Coffee

This past week I received some inquiries and comments regarding the traffic coming to the ChemSpider Site. It was commented that it was not possible to compare eMolecules traffic and ChemSpider traffic on Compete. I confirmed this and have now registered ChemSpider so that this should be possible in the future. There are many Analytics tools out there to measure traffic at a site. We use Weblog Expert at our site for our internal analytics tool. The plot below shows a fairly linear growth in the number of unique visitors to the ChemSpider site since we went live on March 27th, just in time for the Spring ACS.

WebLog Expert plot

We also use Alexa to browse our performance. The statistics are shown below for the increase in global users accessing the site, the overall traffic rank and the number of page views per user.

Alexa Rank 1

The geographical distribution of visitors is actually quite surprising. Until recently the UK was actually the most popular visiting country but the US visits increased dramatically when we integrated the announcement regarding the Patent Searching went online. What is quite surprising is the low number of visitors from Germany, China and India. Based on my previous experiences in the chemoinformatics world I would expect Germany to be much HIGHER and certainly there should be increased traffic from India. That said, India wasn’t even on the list a week ago and is growing now as the message spreads. If any of you can help spread the message outside of the USA please do!

Alexa Rank 2

Addressing the original statement about being unable to compare stats on www.compete.com I’ve shown the geographical traffic ranks for eMolecules. Clearly there are a lot more countries for ChemSpider to provide value to! Hopefully our penetration will increase with time.

Emolecules on Alexa

Interestingly, there are also all types of rumors about the validity of Alexa but Alexa challenge this. It’s difficult to know what’s right so what’s reported here is simply what’s given online. What we are happy to report is an ongoing growth in the usage of the system. It validates our efforts.

Buy me a Coffee

For those of you watching the progress of ChemSpider since it’s initial exposure in March of this year we have been incrementally adding new features and specifically integration to other rich sources of information. We have delivered integration to multiple data sources (Click on the Data Sources checkbox under the Advanced Search for the list) as well as the integration to text-based searching of 50,000 Open Access articles via the ChemRefer service. Now we have extended the ability to include review of Patents.

In a collaboration with Reel Two we have provided a way to provide structure and substructure searching and access through millions of chemical structures integrated to patents on the US, European and Asian Patent Offices via their SureChem Portal. Following a search simply click through to the Detailed Results page for a particular structure and look in the Data Sources list for the word SureChem. See below as an example…note Surechem blocked in red.

Surechem Link









Clicking on any of the names in the Data Sources link launches a new Browser Window containing the links to the External Substances links as shown below.

links to Surechem Data Sources

Clicking on any of the External Links will take you to the actual patent sitting on the Patent Analysis website and identified via the Surechem query. For example, see here.

We have a number of ideas to enhance the deliver of patent information via ChemSpider but for the time-being we believe that the ChemSpider and the Reel Two SureChem integration offers a powerful means by which a chemist can navigate their way from a chemical structure to a patent. We welcome your feedback.

Buy me a Coffee

Over the past 48 hours there has been an interesting discussion on CHMINF. The discussion was around how to teach a large class of students to learn about literature searching, about structure searching, property searching etc. The tools are out there to perform such searches and to facilitate students learning about the types of resources they will need to access if and when they enter industry. The premise of the exchange was that some of the gold standard resources, while excellent, are commonly not affordable at the level necessary to train large classes of students. Below is a posting I placed back onto CHMINF. My question to you readers is as follows “Is there an academic who would like to work with me on a Lesson Plan involving ChemSpider?“. If so…contact me please.

The exchange…the > indicates the comments made by one of the commentators to the original post and I used it as the basis for my own feedback.

Colleagues,

I wonder whether or not it might be possible to use the ChemSpider service as one of the resources for the classes? For example, relative to some of the comments made below it is possible to perform the majority of searches at www.chemspider.com - this includes structure searches, property searches,name searches as well as LITERATURE searches of open Access articles. See details below…

>1. Ability to use a chemical drawing program to insert chemical drawing in a lab report.

AJW> On ChemSpider …refer to http://www.chemspider.com/news/?p=39

2. Ability to identify a compound by multiple methods, such as CAS registry number, IUPAC or CAS index name, common name.

AJW> For searching by numeric identifiers, systematic names or common names use the search page at http://www.chemspider.com/Search.aspx
and review the comments made at http://www.chemspider.com/news/?p=29 and http://www.chemspider.com/news/?p=23

>3. Ability to locate basic property information on a given compound in standard sources such as CRC Handbook, Lange’s, Merck, Dictionary of Organic Compounds, MSDS (whatever basic reference tools you have).

>AJW> I am not suggesting that ChemSpider is a reference tool as yet…but in terms of searching on basic property information use the Advanced Search at http://www.chemspider.com/Search.aspx?t=adv

Select the appropriate check box to perform searches by structure and substructure, via intrinsic properties, predicted properties, identifiers and data source. The nice thing about this approach is that the students will find the linkages into reference sources such as the NIST webbook, PubChem, Wikipedia and other rich sources of information

>4. Ability to locate 3-5 articles about a topic related to a specific compound.

>AJW> Use the ability to search on >50,000 Open access chemistry articles bytext. We are presently adding another 60,000 open access articles. Perform the search here:

http://www.chemspider.com/chemrefer.aspx

For example, search for dithiazoles and get this results set:
http://www.chemspider.com/ChemRefer.aspx?zoom_query=dithiazole&zoom_and=0

>5. Ability to identify the parts of a research paper and to summarize the relevance of the paper.
>6. Ability to cite articles using a standard citation style, such a ACS.

>… An assignment might be to locate information (defined by you and the lab director) about an organic compound of interest to them - why the molecule is of interest to them, some basic properties, locate 3 current articles on the compound - summarize relevance of one article in 2-3 sentences, cite all three articles according to a preferred style.

>It’s tempting to throw every possible nuance into such an assignment, but I’d stick to basics: compounds have names and properties, and you can find current literature about compounds by searching relevant article databases.

>AJW> And bring it all together using a system like ChemSpider…and I should think a PubChem and Pubmed combination would do the same, you would be able to interrogate structures, articles, properties and even spectra (scroll to the bottom of http://www.chemspider.com/RecordView.aspx?id=5557 to see an example of spectra…more examples will show up shortly).

I am VERY interested in working with someone, hopefully from this list, to potential develop a lesson plan that could be posted on ChemSpider for others to use as a skeleton to build on. If anyone has an interest in doing this please contact me directly. Thanks

Buy me a Coffee

You’ll likely have noticed recently me talking about books I read on this blog. My friends and colleagues call me a connector..read Malcolm Gladwell’s Blink for what that means. I generally get “connected” to books to read by other connectors. My most recent read, and directly connected to the future intention of ChemSpider is “Wikinomics - How Mass Collaboration Changes Everything”
.

For those of you who have not managed to stay up to date with some of the blog postings I want to reiterate part of the future mission for ChemSpider. Our intention is to Wiki-enable ChemSpider and allow people to add information to each and every chemical structure on the database. We have already enabled the curation of the data on ChemSpider as blogged previously (1,2,3). This is one level of community participation…check the results out here. Please keep curating..

What I like about the book Wikinomics is the the historical overview (of a VERY SHORT history) of how mass collaboration has affected the way we share information, how collaboration has impacted the world of software development (and how corporations are benefiting from the effort) and how technology in the time of Web 2.0 allows news to spread quickly and help people. This collaboration has given a lot of benefits to research, pharmaceuticals, and even to the world of mining, a fascinating story. I preferred the first half of the book and it did get a little preachy and repetitive but nevertheless is a great read. By the way…the author is Anthony D. Williams….no relation to the author of this blog…that would be Antony J. Williams. But hey, if people want to talk about me as a supporter of the world of Wiki then go for it…

At present the registration and structure deposition systems on ChemSpider are being completed and, fingers crossed, you’ll see it very shortly. The question then is what could mass collaboration mean in terms of extending the information associated with the chemical structures on the database. Imagine the addition of reaction details, images, connections to other websites, etc. Just imagine…and we’ll see if we can make it happen.

Buy me a Coffee

I get a lot of “stuff” sent to me in a day. Other than the usual >150 work emails the inbox is peppered with absurd photos, chain letters, a fraction of spam now the Bayes filter is trained and, once in awhile, something that is truly visionary in nature.

I was blown away by this example of the potential of the semantic web for knitting together images using two technologies - Seadragon and Photosynth. Watch this presentation..take the 7 minutes. Shoot me for saying it…and I know many who criticize Microsoft every chance they can to focus on the world of Open Source, but I’m one of those who believe that Microsoft and Open Source can absolutely co-exist. When I look at what came out of Microsoft Live Labs here with Photosynth all I can say is a big Hoo-haa….this is great stuff! Check out the blog too…keep watching…

Buy me a Coffee

Ah…the cathartic nature of being back on the blog…family vacations and work travel are very distracting….

I’ve blogged previously about the question “What is Web 2.0“. In the list of “what it takes to be Web 2.0 according to Wikipedia I noted that one of the criteria is “A rich, interactive, user-friendly interface based on Ajax or similar frameworks.”

If you’ve been using ChemSpider in the past couple of days you will notice at the Search Screen and the Services screen an improvement in usability. Why? Ajax. With literally a couple of hours of work these screens were ajaxified (if it doesn’t exist I’m using it in scrabble and demanding it gets included into Websters!) and the flow of using the screen improved significantly as ChemSpider took on more of a “desktop feel”. it feels good to have made one more step towards delivering “Web 2.0 compatability”. Truly the excitement is more on the development of the social networking system under development now - when completed it will extend the curation aspects of the database and specifically allow users to add their own data into the system. This should be unveiled in its first state within the month..hopefully sooner.

Back to Ajax and a new feature, “ChemSpider Suggest”. For those of you using the system by typing in a text string to locate a record we have noticed that spelling errors abound. Now, something of these are subtle…asprin instead of aspirin (phonetically correct some would say) while others are dramatic differences mostly driven by linguistic differences….when your first language is not English all spellings are phonetic in nature…and the phonetic result is based on how you pronounce things in your language. A tough situation to deal with. With ChemSpider suggest you can start typing the first few letters of the word you are interested in searching for and it will give you a list of potentials as shown below. Imagine not being sure how to spell prostaglandin or erythromycin…such a tool dramatically helps find the right word to search and the chances of ChemSpider finding what you’re interested in. There are two parameters we can tune at present and we’d like your input - the number of letters to type before a suggestion shows up and the number of rows to suggest. Let us know your thoughts on the blog or directly at feedback@chemspider.com. Enjoy!

ChemSpider Suggest

Buy me a Coffee

For those of you frequenting this blog you will have seen a number of comments regarding the suggested failures of ChemSpider…many of these have been pointed at either inorganic compounds or organometallic complexes. Each of these issues has been addressed on this blog in detail.

What I am interested in hearing about is the other side of the coin. How are you using ChemSpider? How are you deriving value from it? Are you focused more on the searching aspects of ChemSpider or all of the services we provide? Is the speed of searching sufficient for your needs? Do you prefer to use the Browser add-ins, the ChemSketch integration or the structure drawing applet on the site? Are you curating data….if not, why not?

ChemSpider presently has >600 people per day on average using the site and this is growing. Judging by the successes we see in the transaction log shown below (one page of MANY) users are getting value. So, let us, and others, know whether ChemSpider is living up to your expectations! If not, what can we do to improve it?

Buy me a Coffee

In this blog I am going to excerpt from another blog (and bolded to identify) regarding ChemSpider (based on my previous post it’s the way of the blogosphere) and it’s non Web 2.0 status since pages from the ChemSpider blog are being excerpted in the same way.

The question I posted for ChemSpider bloggers was whether or not the curation of data should be supported by the community. Whatever the answer should be the data is that curation is already underway and continues. Here I share comments posted elsewhere with parts of the material extracted for discussion.

To the question “Should the curation of data on ChemSpider be supported by the community?” the comments made were

…only if the community has time on its hands and wants to donate significant goods to commercial organisation(s) who will then own and control the content. (People already do this, of course - they are called scientists and as authors they donate their goods to commercial publishers. ) Put simply Chemspider is Web 1.0; The chemical blogosphere, Pubchem, Blue Obelisk, CrystalEye is Web 2.0. Chemspider’s business model was fine for the early web. No public content, significant effort to extract it, few alternative sites.”

So, some comments.

Yes, we scientists do donate our goods to commercial publishers. This past 12 months I’ve been author/co-author of almost a dozen peer-reviewed publications to some of the top journals in the world for some of the top publishers (ACS, Wiley, Elsevier for example). Some of the review processes have been slower than hoped and I do take issue to situations when editors receive two “Publish as is” and hold it up for months for one reviewer who comments “It’s too long.” The articles, when published have been exposed to many people and resulted in follow up from many scientists. I like the results, feel that the publishers do a stellar job of creating quality output and a generally seamless process. I’m not going to comment on profit margins for the publishers…you can find those rants elsewhere. To the contrary publications we have put to Open Access journals have produced no interest..yet the work was of similar caliber. The time of Open Access Journal exposure is here though and there will be increasing interest I judge. I believe ChemSpider will help this and will expose why in a later blog.

Web 2.0. I’ve asked people what it is and they generally all point to “community web”. Asked for examples they talk about reviews on Amazon, voting on Ebay, Flickr, YouTube, blogs, Wikipedia and so on. I’m sure you can add a few of your own “Web 2.0 definitions”. The general feeling is that Web 2.0 is about building community.

From the comments above about “The chemical blogosphere, Pubchem, Blue Obelisk, CrystalEye is Web 2.0” I have to assume that the intent here is to identify Web 2.0 as being connected to Open Source, downloadable content and integration.

With MySpace, YouTube and Flickr as the poster children of Web 2.0 I’m not sure how this matches up this intent. Certainly these sites are big business. They are not Open Source to the best of my knowledge. Downloadable content…I don’t think it’s possible to download the database. But these sites are major contributors to community building on Web 2.0.

I turned to Wikipedia for a more formal definition and extract below. From Wikipedia the definition of Web 2.0 is given as:

1) The transition of web sites from isolated information silos to sources of content and functionality, thus becoming computing platforms serving web applications to end-users
2) A social phenomenon embracing an approach to generating and distributing Web content itself, characterized by open communication, decentralization of authority, freedom to share and re-use, and “th