I am posting this in order to help one of my “neighbors”, IUPAC in Research Triangle Park. Their office is about 30 minutes from where I live. This is a beautiful area of the world and I encourage people to contact the Secretariat directly should you have an interest in this role.

Post-doctoral Position in Chemistry Informatics

Develop, implement, and support web based applications to enable IUPAC Staff and Committee members to work more effectively. The emphasis will be on development of tools for communication and collaboration to allow scientists working on IUPAC projects to accomplish their project goals while minimizing the need for travel. This will build on the new architecture of the IUPAC web site that uses XML technology to organize the information used by IUPAC members as well as the general scientific public. In addition, methods will be developed to organize and present IUPAC’s information, now contained in books and journal articles, to make it more accessible and more useful.

This position is located at the IUPAC Secretariat in Research Triangle Park, North Carolina, USA and will require considerable travel.

Required background: PhD or equivalent in Chemistry or a related discipline so as to combine a reasonable chemical knowledge with computing expertise; experience with SQL databases and XML coding; excellent written English and the ability to deal with multiple projects simultaneously.

Salary and benefits are competitive and will depend on experience and qualifications.

IUPAC was formed in 1919 by chemists from industry and academia. For almost nine decades, the Union has succeeded in fostering worldwide communications in the chemical sciences and in uniting academic, industrial and public sector chemistry in a common language. IUPAC is recognized as the world authority on chemical nomenclature, terminology, standardized methods for measurement, atomic weights and many other critically evaluated data. In more recent years, IUPAC has been pro-active in establishing a wide range of conferences and projects designed to promote and stimulate modern developments in chemistry, and also to assist in aspects of chemical education and the public understanding of chemistry.

More information about IUPAC and its activities is available at <www.iupac.org>.

Contact:

John W. Jost, Executive Director

IUPAC Secretariat

P.O. Box 13757

Research Triangle Park, NC 27709-3757, USA

E-mail: secretariatATiupacDOTorg

Buy me a Coffee

I had commented recently on my pleasant experiences of working with MDPI regarding Molbank articles. See the posts here and here. SInce Peter Murray-Rust and I had both blogged on this issue (from different points of view) Deitrich Rordorf from MDPI went out of his way to make us both aware, via email, of a recent publication they had posted on their site:

“Just for your information and in reply to the blog posts regarding use of Creative Commons By Attribution License v3.0: We recently published an editorial “Changes Coming to MDPI Journals: Digital Object Identifier (DOI) and Creative Commons Attribution License” at http://www.mdpi.org/molecules/papers/13051079.pdf

The paper is entitled “Changes Coming to MDPI Journals: Digital Object Identifier (DOI) and Creative Commons Attribution License” and speaks for itself. I recommend that interested parties read the entire paper and commend MDPI on their decisions. EXCELLENT news.

Buy me a Coffee

Last a week I had a pleasant chat with a reporter from Nature magazine, a Mr Geoff Brumfiel. Geoff was interested in ChemSpider…what it was, how it ran, who used it, who supported it, who liked it, who curated it, who didn’t like it and so on.

The results of that discussion, and others he spoke to about ChemSpider, are here in his article.

Chemists spin a web of data p139
Chemspider website provides free information on millions of molecules.
Geoff Brumfiel
doi:10.1038/453139a
Full Text | PDF

It is a rule at Nature, at least for this type of article, that I could not see the article before it went to press and therefore I didn’t get the chance to proofread and comment. Geoff has accurately captured the spirit of our discussions but a few detailed clarifications are needed too. I have pasted in black the article content and in italics the clarification.

providing the community with an open-access source of chemical information

I giggled and commented please don’t say it’s Open Access. Say it’s Free Access. Say there are Open Data. And now we have Creative Commons licenses. But don’t say it’s Open Access, not Strong, not weak, not gold, not green. Just Free Access. No price barriers to usage.

Chemist Antony Williams is hoping to change this in a move likely to ruffle the feathers of the American Chemical Society.

I commented that we are not purposely in competition with anyone. It’s not what drives us to do this. Whether others see us to be competitive is for them not us. We don’t intentionally try to ruffle feathers. It doesn’t mean that what we are doing won’t ruffle feathers of course. Whether it’s ACS or others. It’s not the goal..it might be an outcome.

The modest project has made chemists interested in open access take notice — last week, the number of daily users of the site surpassed 5,000.

We have crossed 5500 users for the past two nights. The trend is positive.

“Other potential sources of information, such as Wikipedia, lack the algorithms needed to search chemicals according to their structure. “

Structure searching is “feasible” of course with InChI Strings. But substructure isn’t and Wikipedia is treated as a text-based search by almost all of its users

“The site is maintained with modest profits from advertising and the work of about 30 active volunteers who double-check the data pulled in from outside.

The original investment in hardware and software costs has finally been recouped. Modest profits? No one gets paid for the work we do. There is a phenomenal sweat equity investment in the platform numbering many thousands of hours to get here. We are indebted to the many software collaborators, providers of tools and the people curating and depositing to the system. There have BEEN about 30 active volunteers. RIght now I would say the number of active depositors and curators is around 10. But it is growing. I hadn’t checked the number of REGISTERED users for a long time. We have over 1150 registered users…those who CAN login and curate data, deposit data, see new features etc. People do NOT have to register to use the site…but >1150 did. Wow. I didn’t know it was that many until i just checked (BIG SMILE)

““There’s an awful lot of chemical information, but there’s an awful lot of rubbish as well,” says Barrie Walker, a retired industrial chemist in Yorkshire, UK, who helps maintain the site.”

Don’t know whether Marrie said this or not. He IS an honest guy and he is our QUALITY GURU and we are proud that he is willing to give us his fine eyes. There IS garbage on the site still. But, after a year online and active curating it has been much reduced. About 200 edits a day are made to the site: names changed/deleted/added, spectra/structures/URLs/Publications added etc. It’s quite the pace. We have cleaned up 100s of thousands of incorrect associations from the external data sources. It’s been and will remain an enormous task with an enormous payback for the community

Williams adds that the site still has problems with certain searches. For example, it struggles to distinguish between isomers: molecules with the same chemical formula arranged in different structures.

We  can distinguish isomers no problem. The PROBLEM is that there is a mixture of isomeric species submitted from multiple data sources and data are mixed and intermingled in way that the user cannot get to the correct structure. Search taxol or Ginkgolide on the ChemSpider blog and read the mutliple blog posts about this. We can of course search all isomers for a particular chemical formula…

“But Williams nevertheless believes that the service may be able to compete with for-profit services. “What I’m doing is highly disruptive,” he says. “I think it can be done and it needs to be done.”

I think what WE are doing…its not me..it’s we…is disruptive. In a good way. Many chemists will benefit. Will it have an impact on for-profit services? Yes, maybe. As an outcome but not as the target. Our team of people, both internal to ChemSpider’s development and Advisory Group, and the people we don’t even know who are cleaning and depositing into the system for their colleagues in the community, are creating a powerful resource for Chemists. The FOCUS of this effort is to Build a Structure Centric Community for Chemists. We will change that soon…the focus on Structure-Centric will be to cover Chemistry in general and to Build a Community for Chemists.

We are well on our way and thanks to Nature, and Geoff in particular for exposing it. My comments above are not meant to detract from Geoff’s reporting abilities but it was a long discussion and some clarification statements are of value i believe.

Buy me a Coffee

Peter Murray-Rust responded to my recent comments about a Free Lunch. There are a number of comments to be made and an exciting opportunity to use Open Data and linking from ChemSpider.

I’d asked the question about how many records there were on CrystalEye. In our world a unique record is a unique InChI, not so on CrystalEye and appropriately so as the crystal structure itself is presumably the unique record. Makes sense.

PMR> We don’t know how many unique structures there are. I’m guessing that there are about 130,000+ entries but that many are duplicates. We (or rather Nick) does a good job on disambiguating by cell dimensions but this is not foolproof and indeed no method is.

What we will do with multiple crystal structures for a single chemical structure is link all unique crystal structures from the unique chemical structure. In this way people can query the chemical structure and find all associated analytical data - spectra and crystallographic files. If we were to list the number of unique depositions on ChemSpider I think we would be around 40 million depositions..an estimate though!

PMR> The main duplication comes from the Crystallography Open Database which has about 45,000 structures.

I looked at the Crystallography Open Database this morning. it states on the home page “Updated daily: 68268 entries in the COD”. We may have an opportunity with the COD to link up to their data and reduce the need for us to host CIFs. Excellent…we’re all for reducing workload and providing links into other systems. It’s what we do.

PMR> The only thing stopping us putting them (AJW> The structures from CrystalEye)  in Pubchem, or anywhere, is work. We need to make sure that we have data integrity and referential integrity. We’re going to do it, but at present Nick is writing his thesis. We have some limited funding earmarked for this and hope to start it soon. When it’s finished it will be in RDF/CML.

This is great news. This means that after the summer we can download the data directly via PubChem and link up to CrystalEye that way. Perfect. We’ll stop working on integrating to CrystalEye now and wait for the integration path via PubChem and focus on other data sources. Thank you Peter, Nick, Andrew and Jim!!! That said I don’t believe that PubChem will take CML, they will convert using their tools to produce their compatible formats and InChI being one of them. That will break organometallics etc. UNLESS PubChem are going to adopt CML now and that would be an interesting positive shift in terms of a sign of support for the format. A strong positive. I’l chat with the PubChem team so that if CML is coming we can consider adopting in some way and be ready.

From my post “AJW> It would be good to see CML be a standard. I’ve been following it for a decade and when it gets accepted by a larger majority then we might adopt it.”

PMR: Chicken and egg… :-) You won’t adopt it until other people adopt it and they won’t adopt it till you do. But we make progress. It’s now mainstream in part of Accelrys software (funded by DTI). It’s being put into compchem codes by the COST project, and it’s really the only choice for datuments (combined data and documents) as in semantic publishing and the results of test-mining.

It’s nice to know that ChemSpider has that type of influence now. It’s good to see it going into Accelrys’ software and I had heard that from Dan’s blog and had added the CML Blog to my reader. I’m definitely watching and willing to follow. We’re busy leading so many other things right now we’ll wait for adoption and then jump on it like a “hobo on a muffin”.

Buy me a Coffee

One of the blogs I really enjoy reading is Deepak Singh’s Business,Bytes,Genes and Molecules. Today there was a blog post about ChemSpider but something strange happened…I could ONLY read it in Google Reader. When I tried to navigate to the actual website it asked me to Save a file. See below.

It may be harmless but I’ve suffered enough at the hands of “bad files” to not grab it. Anyone else seeing this symptom? It’s in both browsers (IE and FF) and on two computers.

Anyhow, thankfully I can read it in Google Reader. There’s a point Deepak raises and I insert it here..

“On the web, data should be available as an addressable resource. The fact that data is available as RDF is great (and I wish more data was available as such). However, my personal preference is that data, especially open data, needs to be accompanied by APIs and bindings that allow the data to be accessed in a number of formats (not a dump per se). I think over time the acceptable formats will be established, much like XML/JSON/RSS have become the standard transport formats. The key aspect here are the business models. Is the business in providing a service on top of the data? For example for more than X number of API calls, there could be a fee associated.”

Just in case people have missed them we have a whole series of Web Services available already and they are being used. You can find details about them here:

Mass Spec Web Services

Taverna Hooks to ChemSpider Web Services for Metabolomics

Web Services Demo Pages and Example Code

Microsoft Hook Web Services into Infomesa

Waters Deliver Integration Via Web Services

There are more examples. We have thousands of calls a day using the Web Services at present and welcome more feedback on them!

Buy me a Coffee

Jean-Claude Bradley was “asked by the Institute for the Future to highlight a dozen “Signals” that may point to new trends in science as part of the X2 Project“. He has listed his selections on his blog-posting and people are encouraged to vote. JC mentioned ChemSpider twice and I am honored and humbled that he feels our efforts deserve recognition.

JC has recognized our efforts in depositing analytical data on ChemSpider and our web services to generate InChIStrings and InChIkeys.

Buy me a Coffee

In a recent post about ChemSpider we’ve been accused of wanting a Free Lunch. I copy a segment of the post and comment with insertions.

“Data are normally produced for a particular purpose and the reuse them for another cost money. I’ll exemplify this by taking CrystalEye data - about 120,000 crystal structures and 1 million molecular fragments - which were aggregated, transformed and validated by Nick Day as part of his thesis. (BTW Nick is writing up - it’s a tribute to his work that CrystalEye runs without attention for months on end).

AJW> It is true…it is a tribute to Nick that CrystalEye can run for months without attention. Kudos. I am interested in how much pressure the site is under. How many searches/users in a day etc.? We find that our struggles in uptime (and these are negligible) are primarily based on stress on the servers. For nighttime users tonight things will have been slow…we deposited over 100,000 new molecules from 5 new data sources. That does create some slowness. We will hit about 40,000 transactions today. Our problems are ISP issues and powercuts. But we are also not in a University using thick pipes etc.

One comment…it was 130,000 structures according to a previous blog and has been expanding since then from daily depositions. Right now I would expect it to be 140,000 rather than 120,000. When we did try scraping the data our best estimate was about 90,000. We might have missed something in our scraping and it’s why we asked for a dump of the data.

The primary purpose of CrystalEye was to allow Nick to test the validity of QM calculations in high-throughput mode. It turned out that the collection might be useful so we have posted it as Open Data. To add to its value we have made it browsable by journal and article, searcahable by cell dimensions, searchable by chemical substructure and searchable by bond-length. This is a fair range of what the casual visitor might wish to have available. Andrew Walkingshaw has transformed it into RDF and built a SPARQL endpoint with the help of Talis. It has a Jmol applet and 2D diagrams, and links back to the papers. So there is a lot of functionality associated with it.

AJW> The team has done a good job in putting the site together. The JMol applet is an excellent utility for us all to use and thanks to that team for sure! Egon has been challenging us to RDF the site and it’s on our list, but keeps getting pushed down based on other requests. Since he’s the only voice asking it will keep getting pushed down unfortunately.

This has come under some criticism to the effect that we haven’t really made it Openly available. For example Antony Williams(Chemspider blog) writes (Acting as a Community Member to Help Open Access Authors and Publishers):

“This [interaction with MDPI] is contrary to some of my experiences with some other advocates of Open Data and Open Access where trying to get their “Open Data” is like pulling teeth.”

PMR: I assume this relates to CrystalEye - I don’t know of any other case.

AJW> There are other examples and he’s right. He doesn’t know of them and I’d prefer he not rant on my behalf so I’ll not name them.

Antony and I have had several discussions about CrystalEye - basically he would like to import it into his database (which is completely acceptable) but it’s not in the format he wants (multi-entry files in MDL’s SDF format, whereas CrystalEye is in CML and RDF).

AJW> To clarify, again. I DON’T want to import CrystalEye into ChemSpider. I DON’T! All I want is the set of structures and unique associated URLs so that users of ChemSpider can find that there is crystal structure information over on CrystalEye and can click the link and be on CrystalEye and get the benefit of Nick, Andrew and Peter’s work. I don’t want to reproduce their effort. I want to integrate to it. I’ve said it many times on Peter’s blog and on this one.

This type of problem arises everywhere in the data world. For example the problem of converting between map coordinates (especially in 3D) can be enormous. As Rich says, it costs money. There is generally no escape from the cost, but certain approaches such as using standards such as XML and RDF can dramatically lower the costs. Nevertheless there is a cost. Jim Downing made this investment by creating an Atom feed mechanism so that CrystalEeye couls be systematically downloaded but I don’t think Chemspider has used this.

AJW> If Jim can contact me by email and provide me with detailed instructions to download the entire file of structures ONLY and their associated URLs that would be excellent. I’ll send the request to him tonight.

The real point is that Chemspider wishes to use the data for a different purpose from which it was intended.

AJW> The problem is that stories keep getting made up about what we want. ALL I want to do is drive traffic to CrystalEye so that people who don’t know about it can use it. No more than that. I don’t get how trying to provide an integration path is so difficult. I’ll ask Jim to help.

That’s fine. But as Rich says it costs money. It’s unrealistic to expect we should carry out the conversion for a commercial company for free. We’d be happy to a mutually acceptable business proposition and it could probably be done by hiring a summer student.

AJW> I am interested in what commercial benefit integrating to CrystalEye can have. It’s work on our side. I’m not sure what a mutually acceptable business proposition would look like. It can’t be that much work to send us a set of InChIStrings and URLs for the CrystalEye dataset..they already exist on CrystalEye. So, I’ll assume that this is a last comment on “No thanks to CrystalEye data in ChemSpider”. I have to ask why not put them in PubChem. Since PubChem is held as the standard of OpenData why not put CrystalEye there?

I continue to stress that CrystalEye is completely Open. If you want it enough and can make the investment then all the mechanism are available. There’s a downloader and converters and they are all Open (though it may cost money to integrate them).

AJW> Just fyi ChemSpider has adopted Creative Commons licenses.

FWIW we are continuing to explore the ways in which CrystalEye is made available. We’re being funded by Microsoft as part of the OREChem project and the result of this could represent some of the way in which the Web technology is influencing scientific disciplines. We’d recommend that those interested in mashups and re-use in chemistry took a close look at RDF/SPARQL/CML/ORE as those are going to be standard in other fields.

AJW> It would be good to see CML be a standard. I’ve been following it for a decade and when it gets accepted by a larger majority then we might adopt it.

I have spoken previously about the challenges of Scraping CrystalEye Content and staying in relationship with publishers. I have approached CAS and spoken with the Copyright team at ACS. In December of last year I spoke about the 5 month delay to discuss with ACS about whether or not we could scrape CIF files from ACS journals directly. Well, I had a nice chat with two ACS people in New Orleans, one of them from ACS Pubs. We had a nice chat about ChemSpider and I answered a lot of questions about what we were doing, where we were going, how we are “funded” (we are not!) etc. Many pages of notes were taken. At the end of the meeting I asked the question “So, relative to my question about CrystalEye and scarping CIFS. Are Supplementary Data ok to scrape or not?”

The answer? “We haven’t made a decision yet. We need to discuss”.

Are crystal structures really that special? It’s been difficult to get JUST the structures associated with even Open Data. Now I’ve been waiting over 7 months for a question to be answered by ACS…and it’s binary. YES or NO.

At this point I give up. Peter Murray-Rust has had ACS CIFs scraped from their publications for a LONG time. And continues to scrape them. Cambridge University/Unilever School of Informatics didn’t get permission and have been very vocal about what they’ve done and no legal action re. copyright has been taken so I’ll assume it’s not an issue. If it’s not an issue then we can go ahead.

If we can go ahead then why wouldn’t we? We have…we already have scraped the collection of CIFs from ACS, from a broader range of ACS journals than CrystalEye taps into. It’s Supplementary Data, it’s non-copyrightable and now its ours to publish. We already support CIF displays on ChemSpider so what we need to do now is to mass convert/handle the data and deposit onto ChemSpider. We also have the IUCR CIFs to deposit. I guess ChemSpider will soon become “CrystalEye 2″ as we host the data. That said we are NOT crystallographers so I have an open request to the community for someone with interest/skills in crystallography to join our advisory group and support this effort. Feel free to ping me.

Buy me a Coffee

Over the past year ChemSpider has been challenged over the nature of our offering in terms of Open Data etc. A small number of people focused a lot of time talking about this while we remained focused on improving the website and having it available for people to use as a Free Access website. I spoke to Peter Suber about Open Access and then John Willbanks about Creative Commons.

Since ChemSpider is the aggregate of a number of people’s work (including provision of software by collaborators) I had to get into conversation to see what licenses would be acceptable to those groups.

With the redesign of the website we have structured ourselves in a way to add licenses as we see appropriate now. So, as of today we have added the Creative Commons Attribution Share Alike 3.0 United States License and the appropriate logo is on all sections of a Record View except for the predicted properties. Once we get approval from our collaborators for this same license (and discussions are underway) then the whole record view will be Licensed.

At that point, you are free :

  • to Remix — to make derivative works

Under the following conditions:

  • Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).
  • Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license.

Buy me a Coffee

I am in catch up mode tonight reading a week long backlog of blog posts. I’ve caught up tonight with some of Peter’s posts about semantic chemical authoring (1,2). I’ll respond shortly with comments regarding our own efforts in pulling together the web. I agree with Peter that improved semantic chemical authoring tools are necessary but we are focused right now on doing what we can with what is already available online. What it takes is coding, some regular expressions, some visual inspection and work. Lots of work. More later…

One of the things we are working on is connecting blog posts and wiki pages to ChemSpider as evidenced by our work with Molecule of the Day and our integrations to TotallySynthetic posts on an ongoing basis. What we expect of the authors though is that they author with care. We are generally using name to structure conversion capabilities to generate the chemical structures for connecting to on ChemSpider. Paul Doherty at TotallySynthetic used to provide us with inChIStrings and InChIKeys to connect up to but stopped because it was a lot of work I believe. Molecule of the Day generally discusses fairly simple molecule relative to TotallySynthetic’s COMPLEX molecules. Manual inspection is unfortunately necessary even in the simplest of cases. And it IS time-consuming. Robots will gather information and, in my judgment, PROLIFERATE incorrect data unless someone is going to do the work to inspect OR the system provides a curation platform to quickly remove errors.

I blogged tonight on the ChemConnector blog about the importance of dashes and spaces in systematic names. It should be very clear from that post how important it is. it is a major challenge to use name to structure conversion tools on chemical names that are imperfect and do not represent the structure they are meant to represent. There needs to be respect for chemical names and as we move them from system to system, database to database we need to do our best to retain their integrity. This HAS BEEN a major challenge for us as we scrape data from various data sources OR when people provide us data files such as Wikipedia and we need to check name-structure connections. It is not difficult to lose the integrity of a chemical name.

Back to Peter Murray-Rust’s discussions about semantic chemical authoring. Peter is talking about building a site of aggregated information from various websites.

PMR> “We’re in the process of aggregating a repository of common chemicals (somewhere in the range 1000-10000 entries) and we are taking data from various publicly available web sites. Typical sources are Wikipedia, any aggregator with Open Data policies and MSDS sheets (chemical safety information). One such site is INCHEM (Chemical Safety Information from Intergovernmental Organizations which lists about 1500 materials (most are chemical compounds though some are mixtures).”

Readers of this blog will know we’ve already done this. Both for NIOSH and for the Oxford MSDS set. We took a select subset of information. We integrated this with our Wikipedia set of data on ChemSpider (and, of course, also on WiChempedia).

PMR> “…From this we extract the most important information and turn it into CML - names, formula, connection tables, properties, etc.”

Our process was extraction of the same (but there arent any connection tables to grab from NIOSH or Oxford MSDS) then we converted names to structures and ran some “confirmation processes” including visual inspection when necessary.

PMR> “There are a large number of simple but niggly lexical problems, such as the degrees symbol for temperature (totally inconsistent within and between documents) And the semantics - how do you record a boiling point as “between 120 and 130 at 20 mm Hg”? (CML can do this, but it takes work to do the conversion.)”

Oh yes…these are problems. the inconsistencies between records is a pain but can be dealt with by mapping as shown here. Recording a boiling point “between 120 and 130 at 20 mm Hg” is no issue really. See this figure for something just as complex  regarding “loss of waters”.

PMR> “And the sites have errors. Here’s a rather subtle one which the average human would miss (we needed a machine to find it). You’ll have to go to the page for chloromethylmethylether - I daren’t try to transcribe it into Wordpress. The error is in the displayed page (no need to scroll down).”

There are a couple of issues here. We actually prefer NOT to use either the molecular formula or the molecular weight. In our Wikipedia work we found a lot of errors around these parameters and for the Wikipedia work at least the name, SMILES, InChI etc were more correct while MFs and MW would be wrong.

There may absolutely be value in using both MF and MW to confirm the structure and I definitely see the value. This would definitely help resolve some of the Nomenclature-Structure issues that can can arise from converting the names! One of the things that occurred in the blog post was that my earlier comments came to pass regarding removal of a space in the chemical name.

The names on the ORIGINAL InCHEM page were:

CHLOROMETHYL METHYL ETHER, Chloromethoxymethane with a CAS Number of 107-30-2 and an EINECS number of 203-480-1.

There was NOT a name listed as “chloromethylmethylether” which PMR listed in his post. The only difference is dropping one space. It’s only an accidental removal but dramatically changes the meaning of the record. This is where Peter’s use of either MF or MW becomes crucial! That loss of a space CAN cause big problems as described here. Does it cause a problem this time? Check below…look at the name with and without the space and the result of conversion in a commercial Name to Structure software package.

The CORRECT structure is on ChemSpider here and already includes the following Supplemental Information.

User Data

  • experimental physchem properties
    • Boiling Point: 138F

    • Freezing Point: -154F

    • Specific Gravity: 1.06

    • Solubility: Reacts

    • Ionization Potential: 10.25 eV

  • miscellaneous
    • Appearance: Colorless liquid with an irritating odor.

    • First Aid: Eye: Irrigate immediately Skin: Soap wash immediately Breathing: Respiratory support Swallow: Medical attention immediately

    • Exposure Routes: inhalation, skin absorption, ingestion, skin and/or eye contact

    • Symptoms: Irritation eyes, skin, mucous membrane; pulmonary edema, pulmonary congestion, pneumonitis; skin burns, necrosis; cough, wheezing, pulmonary congestion; blood stained-sputum; weight loss; bronchial se cretions; [potential occupational carcinogen]

    • Target Organs: Eyes, skin, respiratory system Cancer Site [in animals: skin & lung cancer]

    • Incompatibilities and Reactivities: Water [Note: Reacts with water to form hydrochloric acid & formaldehyde.]

    • Personal protection and Sanitation: Skin: Prevent skin contact Eyes: Prevent eye contact Wash skin: When contaminated/Daily Remove: When wet (flammable) Change: Daily Provide: Eyewash, Quick drench

    • Exposure Limits: NIOSH REL : Ca See Appendix A OSHA PEL : [1910.1006] See Appendix B

Peter IS right. We DO Need Semantic Chemical Authoring Tools. However, we’ve already gone a long way without them and what is already online CAN be dealt with. Incredible care is needed with nomenclature  and just spaces can mess things up! I know we have errors on our database - both structures and names. What is to be expected with 20 million structures and associated data? However, we are cleaning them up, rather quickly. We are scraping and integrating data at an increasing rate having learned a lot of lessons over the past year.

I’ll comments on Peter’s other Semantic Chemical Authoring posts in the next couple of days.

Buy me a Coffee

It’s nice to see good press for ChemSpider. I was happy to see a positive comment regarding our contribution to Free Access Chemistry on the web from a recent interview with members of the Microsoft Bio-IT Alliance team. I quote:

“As an example of a BioIT Alliance success story, Potenzone noted that ChemZoo has built a website called ChemSpider, with 20 million free, searchable chemical compounds that is “a bit like a Wikipedia for chemistry.”

“If you type a compound name into Microsoft Word … you can right click on that and query the ChemSpider database,” said Jordan. That functionality could also involve querying a proprietary database. This is an application for a rather underutilized feature in Word called Smart Tags, which Novartis has implemented for its own proprietary searches, said Potenzone.”
I enjoyed sitting in on the Bio-IT Alliance lunch at Bio-IT. Clearly the alliance has traction and momentum and we are proud to be a part of it.

Buy me a Coffee

I’ve blogged previously about us adding safety and toxicity data to ChemSpider. We are busily sourcing new information from other data sources to add information and in the past couple of days we have added NIOSH data as it is a rich source of additional safety information. For example, the record for 1,2,3-trichloropropane shows:

  • First Aid: Eye: Irrigate immediately Skin: Soap wash Breathing: Respiratory support Swallow: Medical attention immediately

  • Exposure Routes: inhalation, skin absorption, ingestion, skin and/or eye contact

  • Symptoms: Irritation eyes, nose, throat; central nervous system depression; in animals: liver, kidney injury; [potential occupational carcinogen]

  • Target Organs: Eyes, skin, respiratory system, central nervous system, liver, kidneys Cancer Site [in animals: forestomach, liver & mammary gland cancer]

  • Incompatibilities and Reactivities: Chemically-active metals, strong caustics & oxidizers

  • Personal protection and Sanitation: Skin: Prevent skin contact Eyes: Prevent eye contact Wash skin: When contaminated Remove: When wet or contaminated Change: No recommendation Provide: Eyewash, Quick drench

Some additional examples are here: Temefos, Warfarin and Allyl Alcohol. Note that each of these also has a coincident extract from Wikipedia. We are therefore integrating Wikipedia articles, safety, toxicity, experimental and predicted properties. Our plan for semanticising and integrating the chemistry web is clearly well underway.

Buy me a Coffee

The past three weeks have been rather extreme in terms of travel and late night work. The result has been a significant drop in blog postings, responsiveness to other blog postings and generally silence. There has been a lot going on and I now have a couple of weeks at home and will be playing catch up. I’ve started with a few postings on the ChemConnector Blog. These might be of interest to readers

A Green Solution for Virtual Screening Using the IBM Cell processor - An introduction to eHITS Lightning

FPGAs, GPUs and Cells - A Call for Comments from the Community

My New Found Friend for Microsoft Outlook - Meet XOBNI!

Buy me a Coffee

I am off to Bio-IT in Boston this coming week and I am honored to have been asked to talk on ChemSpider. I wasn’t on the agenda as of 72 hours ago but was offered an opportunity as a result of a cancellation in one of the sessions. I’m looking forward to seeing what’s new in the world of informatics this year and due to be unveiled at Bio-IT. At SBS there was only one person in the room when I talked who had even heard of ChemSpider. I didn’t take offense…about 60 people went away informed! I hope for a similar opportunity at Bio-IT. The blog will be fairly quiet this week. Catch-up time is next week.

AN fyi, I recently wrote an article entitled “Public Chemical Compound Databases” in Current Opinion in Drug Discovery & Development 2008 11(3). The abstract is:

“The internet has rapidly become the first port of call for all information searches. The increasing array of chemistry-related resources that are now available provides chemists with a direct path to the information that was previously accessed via library services and was limited by commercial and costly resources. The diversity of the information that can be accessed online is expanding at a dramatic rate, and the support for publicly available resources offers significant opportunities in terms of the benefits to science and society. While the data online do not generally meet the quality standards of manually curated sources, there are efforts underway to gather scientists together and ‘crowdsource’ an improvement in the quality of the available data. This review discusses the types of public compound databases that are available online and provides a series of examples. Focus is also given to the benefits and disruptions associated with the increased availability of such data and the integration of technologies to data mine this information.”

It’s not an Open Access article but it’s out there if anyone is interested or is subscribed. Enjoy.

Buy me a Coffee

Those of you frequenting the blog will know that we have a dedicated subset on ChemSpider for Molbank and that I have found the MDPI management and editorial team a pleasure to work with. I discussed my want to stay in relationship with them in a recent blogpost and, as stated in that posting, followed up with them to make them aware of an error in their article and the ongoing discussions in the blogosphere about their “openness”. In case the readers of the blog aren’t set up to catch the comments on the blogposts I am pointing to a comment made today by a member of MDPI.

“We are aware that our current MDPI copyright statement is not in line with the BBB definitions on open access. We are currently smoothly moving to a CC By Attribution License v3.0. Marine Drugs (http://www.mdpi.org/marinedrugs/) has already been published under that license since January 2008. IJMS (http://www.mdpi.org/ijms/) and other MDPI journals will start publishing under this license in the May respectively June 2008 issues. All previous content published by MDPI will be released under the CC By license within a couple of months on our new publication platform (now under testing). So this discussion about MDPI and open access will soon be part of history.”

My experience of working in the domain of creating a community for chemists is quite a simple one. If you want to know what a group is up to just ask them. Seems that MDPI has a