Archive for January, 2008

Martin Walker from Wikipedia Chemistry has posted a request on CHMINF for feedback regarding how Wikipedia might be used by chemists. I’ve posted it here for those of you who don’t read CHMINF but might be interested in providing feedback. Feedback to these types of questions are very important right now as decisions are being made about what ChemBox fields to expose/hide. Please give your feedback if interested.

“Thanks to work by Antony Williams, we (the chemists on Wikipedia) are currently validating the structural data on Wikipedia, and we are discussing the best way to present the information such as SMILES, InChIs, InChIKeys. To that end, I’d like to ask group members who use Wikipedia to reply to me (no need to clutter up the listserve) with their thoughts on the following:

1. Do you ever search Wikipedia, or the Internet in general, for a structure using SMILES, InChI or InChIKey?
2. Do you ever copy/paste such identifiers FROM Wikipedia into Google, etc, in order to do a search?
(I am well aware of the reliability question with Wikipedia, but let’s not open up Pandora’s box with that issue!)

We mainly want to find out if people need to SEE such identifiers in the article - bearing in mind they are designed for machines. We could hide them so a machine would see them but a casual reader would not, or “semi-hide” them (reader clicks to see). We could also place them on data pages such as this one:
http://en.wikipedia.org/wiki/Methanol_%28data_page%29

We are discussing this issue in our next IRC meeting - please join us if this is of special interest to you.
http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Chemistry/IRC_discussions

Thanks for your time,

Martin A. Walker
Department of Chemistry
SUNY College at Potsdam
Potsdam, NY 13676 USA
+1 (315) 267-2271
walkerma@potsdam.edu

Buy me a Coffee

There is a new contributor to the blogosphere…SimBioSys. I recommend adding the blog to your Google Reader. There are some very exciting things going on there right now. I have commented previously about how high performance computing engines such as the Cell Broadband Engine are being brought to bear on scientific problems. SimBioSys appear to be the only group who have chosen the Cell processor to port their virtual high-throughput screening and docking solution to. Their white paper makes for an interesting read.

In their most recent post “Roping in your next scaffold hop with LASSO” they talked about their LASSO publication: LASSO—ligand activity by surface similarity order: a new tool for ligand based virtual screening”. We are presently in the middle of a very exciting project regarding LASSO. We have teamed up to provide the virtual screening results for 40 target families on the full ChemSpider Library, currently containing over 18 million molecules. Using the LASSO similarity search tool, SimBioSys has screened the ChemSpider database against all 40 target families from the Database of Useful Decoys (DUD) dataset.

LASSO descriptors (Ligand Activity by Surface Similarity Order) contain a count of the different Interacting Surface Point Types (ISPT) found on a molecule. LASSO descriptors use 23 different surface point types, ranging from hydrogen bond donors/acceptor, to hydrophobic sites, to pi stacking interactions. Figure 1 shows a “histidinelike” fragment of a molecule. The triangles are the surface point types of this fragment, colored by type. Based on the idea that ligands must have surface properties compatible with the target site in order to bind, LASSO uses a descriptor of Interacting Surface Point Types (ISPT) to find molecules with diverse chemical scaffolds but similar surface properties.

lasso1.png

We are presently populating the ChemSpider database with 10s of millions of LASSO descriptors and this will allow screening of the ChemSpider database to:

? Find molecules which have a higher likelihood of binding to targets.
? Find molecules with better selectivity for a target.
? Reduce toxicity issues.

The 40 Target receptor families included in the screening results were chosen to cover a wide range of receptor classes due to their interest in drug discovery. Each target family had 10s to 100s of known active molecules, which were used as the basis for the query files used by LASSO, one query for each family. The similarity screening was performed on the full ChemSpider database across all 40 targets and the similarity scores for each structure/target pair is available via the ChemSpider website. Thus for each structure in the ChemSpider database, you can find its similarity score (based on surface properties) relative to actives of each of the 40 target receptors. In addition to allowing instant ranking results for a particular target of interest (retrieving molecules that are likely to be active for a receptor) this matrix of screening results can be used to find molecules that have predicted affinity for a target but low predicted affinity for all other targets. Performing such searches promises to improve selectivity and can be a guide to reducing toxicity concerns. More detail about this collaborative project will be forthcoming but the overview is provided here.

Watch this space for updates and an unveiling date.

Buy me a Coffee

I previously blogged about the fact that we had embedded a 3D optimizer under Jmol so that that 3D molecules could be displayed. There were two problems with the approach we took. 1) It was very time-consuming to wait for real-time 3D optimization for molecules 2) The 3D optimizer would sometimes fail to optimize a structure based on the starting geometry.

3dimage.png

We have just finished publishing millions of pre-optimized structures onto the  ChemSpider database. In MOST, but not all, cases the molecules are now pre-optimized. This makes display of the 3D molecule in JMol much faster. Since we were optimizing  millions of molecules we did set a threshold for the time within which the molecule should reach some minimum. As a result some molecules were not optimized and the 3D coordinates are still not available so a real time optimization is attempted using smi23d as discussed previously. If you find any structures which don’t optimize please send us the ChemSpiderID and comment to feedback|at|chemspider|DOT|com.

Buy me a Coffee

We have made significant advances in the structure deposition system on ChemSpider. We’ve reported on our advances previously and working hard to polish it.In parallel we’ve done work to support deposition of batches of structures (100s to many thousands) as well as the deposition of CSV files to support Open Notebook Science. We are going to roll out deposition in phases - single deposition first, batch deposition next and then CSV file based batch deposition.

So…why are we encouraging the deposition of structures onto ChemSpider. We agree that we could accept RSS feeds (and we will). Our view is that people might to have “bragging rights” on their latest synthesis, might want to expose their latest paper on ChemSpider, might have a link to an article online that they might want to expose to people. While there are MILLIONS of structures online there is new chemistry reported everyday. What other system is there available as a structure-based community for chemists where people can deposit their structures, stories, links and comments to share with others? (And open up a conversation with others about synthesis, analysis etc.) Think of it a little like Flickr or YouTube for chemical structures. Anyone can post their structures for people to browse.

I’ve been doing some example depositions to show what’s feasible…these are simple to do…a few minutes work maximum.

1) I was a co-author of a publication and received a copy today. I wanted to put a link to the paper and associate it with the structure we analyzed. The structure already existed on the database so this was information to be added to the existing structure. Scroll down to the end of the page for this record to see this Supplemental information

Martin, G.E., Hilton, B.D, Blinov K.A. and Williams, A.J. “Using indirect covariance spectra to identify artifact responses in unsymmetrical indirect covariance calculated spectra “, Magnetic Resonance in Chemistry

[DOI: 10.1002/mrc.2141]

2) A new publication was released this week regarding a new compound Quesnoin. David Bradley blogged about it on Spinneret. In this case I wanted to add the structure, information about the structure as well as a link to the recently published article. Scroll to the bottom of this record.

There are many other examples online too here (1,2,3). Look at the Supplemental Information in each case.

There are some final tweaks being made at present but single deposition is now rolled out. We are looking for people NOW to start using the system so please ping me. An overview of the system is available here.

deposition_workflow1.png

 

 

The future will include users creating their own “catalogs” of structures, “social networking” and discussions around structures, team-based discussions, public and private structure collections and so on. It’s coming…in stages. We start here with the single deposition process.

Buy me a Coffee

A few months ago we rolled out the ability to post analytical data onto ChemSpider. The deposition process at this point appears to be seamless. We have had no bugs or failures reported during the depositions of the last 80 spectra. We have had an initial deposition from a publisher as discussed here and believe that ChemSpider does offer an opportunity to many other potential contributors to expose their data to the public. There was an early perception that depositors were transferring copyright of the data to us but that is not the case. We enabled the facility for users to declare their data as Open Data for others to download -some depositors declare it Open and some don’t - it’s their choice.

I encourage all users to consider the deposition of analytical data to ChemSpider. Instrument Vendors  particularly might wish to expose their data from their latest and greatest instruments (new NMR probes, new algorithm processing techniques etc).

We will soon open the ability to deposit images and CIF crystal structure files also. We will use Jmol to display the CIF file. Image deposition will allow us to support 2D spectral data (since JSPecView does not support them yet) as well as photographs of crystals, surfaces etc.

We welcome any further suggestions for online exposure of data.

Buy me a Coffee

A short (and already outdated) article on ChemSpider has been published in IUPAC’s Chemistry International magazine. This is first of many articles referring to ChemSpider and a number of other review articles will mention our efforts in the next few months.

Buy me a Coffee

For frequent visitors to the blog you will know that we are advocates of InChiStrings, InChIKeys and the benefits of this standard. Recently I posted on the settings we use for InChI when generating…see here. Today IUPAC issued a comment regarding their suggested standard options. They have requested comments so I point you to their request here. Please provide them your unbiased feedback. We expect to follow the standard definitions resulting from the discussion and poll.

Buy me a Coffee

I have recently posted on the ChemConnector Blog about work I have initiated to curate the chemical structure records on Wikipedia. Overall my comments are that the information on Wikipedia is excellent but there are some issues to be aware of. If you have interest in this work visit the post.

Buy me a Coffee

Don’t be surprised if you visit ChemSpider in the next couple of days when you are addressed with a comment saying that ChemSpider is under maintenance.

As many of you will know we integrated to SureChem online chemistry patent  search a few months ago . We are now upgrading the system. We learned a lot during our proof of concept integration and are presently stripping all 8 million structures associated with SureChem off of the database and depositing the LATEST files from SureChem, almost 9 million chemical structures. If you look at the SureChem homepage you will see the exact number…as of today it says: 8,995,224 unique structures. When we did the initial integration we used SMILES strings to connect but due to the need to convert to SMILES on the fly and the heterogeneity of SMILES converters it was far from perfect. So, this time around we will be hooking together using InChIKeys. This will make direct connectivities far easier. We expect the entire process of stripping the old records, depositing the new structures and deduplication to take a few days and you might get timeouts on your searches because of the load on the database. We simply don’t have the system resources offline to do this and will be working directly on our live servers running the system. We hope to get some form of support eventually for ChemSpider but for now we have no choice other than to do what we can in parallel to running the service. Add to this the >1 million additional structures to be deposited from chemical vendors and the system will be under a bit of stress for a while.

We apologize in advance for the poor system performance as we perform this major upgrade but people that what is waiting on the other side of this work is going to provide a far better integration to SureChem.

Buy me a Coffee

I’ve reported previously on how Microsoft have started to use the ChemSpider web services and have hooked them into InfoMesa. Sam has used our services to add in a Search capability to return the InChI, InChIKey, Smiles, and a URL Link to Chemspider based on a search of a trade name, SMILES, etc. He’s also using the ability to return the chemical structure image. I’ll let the image on his blog tell the story more fully.

Buy me a Coffee

For the past 8 months I’ve been running two blogs associated with ChemSpider…the ChemSpider blog and the ChemSpider News blog. These had different intents - the ChemSpider blog (this one) was to cover the general directions, vision, activities, challenges and community conversations about the ChemSpider service. The ChemSpider News was to cover details of new functionality as it was rolled out. What has happened is that the postings on ChemSpider News regarding new functionality have been missed by the readers of the blog as a result of the readership being different (and different subscriptions to Feedburner being in place). This has not been good for the ChemSpider users since they have not been getting the latest and greatest news about new functionality. It’s a great shame to deliver new functionality and not have people use it for weeks on end because they don’t know about it.What I’ve also been posting on the ChemSpider blog are my own personal views regarding a whole variety of issues…none of them ChemSpider related. For example, my views on vaccines, on fluoride, on drugs, and on my views of various research undertakings. Recently John Doe took me to task in a recent response on the ChemSpider blog. While I vehemently disagree with his comment regarding the reasons behind my posting I have decided to separate news regarding ChemSpider and its mission, direction and new functionality from my own personal commentaries, pleas, musings and, once in a while, thigh-slapping fun exchanges (especially the ones about erectile dysfunction drugs and how the drug industry can learn from the golfing industry).

So, the ChemSpider News site, while it will remain online for past references it will be discontinued and all new functionality comments will be posted to the ChemSpider Blog. The ChemSpider blog will become just that…all about ChemSpider. Once in a while I’ll put a list of what’s been discussed on ChemConnector onto his blog in case any of the readers want to click over for a browse of a particular post.

chemconnector-logo.pngThe ChemConnector blog will be a communication vehicle to accompany the website www.chemconnector.com, to go online shortly. As many of you will know me from this site you will likely be aware that I took a sabbatical after 10 years working a company and have taken a few months down. For the time-being my intention is to spend some time consulting for groups and organizations interested in utilizing my skills and growing the business of ChemConnector - Connecting Scientists to Problem Solvers. This, of course, in parallel to the ongoing work on ChemSpider. More on this will come shortly….

Buy me a Coffee