Archive for the Quality and Content Category

I announced in July of this year that we were performing predictions using the EPISuite of prediction tools.I’m glad to say that one of our servers is now in “cooling mode” after running red hot for over 4 months. We’ve been feeding all single component ChemSpider entities with Molecular Weight <500 (non-radicals). The results are now posted on ChemSpider under the EPISuite tab. We hope you find them of value and offer our thanks to the EPA for providing us access to the software.

A lot of people have been helping to improve the quality of ChemSpider content by depositing new data and “Cleaning up” errors in the data over the past few months. it’s been a long climb. Our thanks to all of you who have contributed. I’ll be the first one to put my hand up and acknowledge that in some ways I have not made the act of contributing to the curation process very easily since I’ve been feeding the data out via the blog in chunks, as it has developed. Following a recent “long flight” I am happy to announce that the Curators Handbook/Bible is now available in its first form and is available online here. This document gives some pretty detailed guidance regarding how to curate the ChemSpider database. As always we welcome feedback. If something is not clear let us know and we will expand/enhance as appropriate.

What I also want to do is to thank those people who have commented on how truly impressed they are with the rate at which we are cleaning the data. In general most curation requests identified on the site are addressed within 24 hours. There are some issues hanging out there that we don’t have solutions for at present, specifically in regards to organometallic data handling, but we are still thinking about a path forward.

It is finally time to rollout more attractive structure depictions. We have needed some more attractive structure depictions for a while but they have become an absolute must have as we rollout the following new capabilities:

1) The ability to make YOUR chemical blog structure searchable (watch this space…). We suggested one path previously…this is BETTER…

2) Structure balloons for using with our document markup tools, both browser-based and Microsoft Word based

We all judge quality of visual aesthetics quickly. We know a good structure when we see one. This is an announcement that we will be rolling out new structures across the site in the next few days. You will see better looking structures showing up across the site – during deposition, during service-based predictions, during searches and, well, everywhere. While not perfect as yet a little more tweaking and the entire database will be supported by the new structure depiction algorithms. As it is you should see some examples now on the database…one shown below. We welcome your feedback!

Frequent users of ChemSpider might have noticed a change in layout of the record view pages of late. As we layer more information onto a record view page (EPI Suite predictions, SimBioSys LASSO scores, spectral data, MORE predictions to come) the record view pages become increasingly heavy. As a result we have had to navigate the challenge of increasignly heavy pages and user experience. Since we have added the ability to perform structure searching on Pubmed recently and are now in the process of adding a new update for Patent searching we have chosen to hide the Data Source outlinks until you choose to see them.

So, if you are looking for original data sources and a list of potential commercial vendors please click on the button indicated below to fold out the list. Commercial vendors are indicated as discussed previously here.

Users of ChemSpider might have noticed some performance isseus in the past 2-3 weeks with our web services, service availability and speed of searches. I put my hand in the air and say “Yup, acknowledged”. Hopefully they have not been too disruptive BUT it is for the overall benefit of the service ultimately. We have been streaming in 8 MILLION links to Pubmed in order to make Pubmed structure and substructure searchable. We are NOT rolling this out with full fanfare yet but I do want to explain the performance issues you might be experiencing. We work on Microsoft technology and while we are advocates for the platforms of .NET, IIS and SQL Server we definitely are putting them under pressure as we keep expanding the database and adding more value. We have thoughts about how to resolve this but want to finishg populating the tables first.

The upside….the majority of links are already in place. For an example visit a structure and look for PubMed as a data source and click on one of the links. For example, for Valium here you will see in the datasource table a series of Pubmed IDs next to the PubMed datasource…

  16971504, 17673, 874970, 406430, 17881, 327854, 879884, 577681, 560225, 195649, …

These will link you out to PubMed directly. Try it out…

Now, do we have implementation issues? YES. The lists of external IDs can be long so right now we show only the first 10. We wiil deal with display of others shortly. We need to provide a way to curate out “junk” entries. For example, “methyl” is on Chemspider as a fragment and has links to PubMed IDs…you’ll see why if you click them..it was done with text mining. These issues will be resolved but for now we announce that PubMed is structure and substructure searchable via ChemSpider. We will explain how we did it shortly but for now we will acknowledge the massive contribution of our colleagues at SureChem. More to come…

There has been an outpouring of offers from the ChemSpider community in terms of helping to examine/clean and enhance information regarding carbohydrates on ChemSpider. Almost 2 dozen users have now made an offer to help. Very exciting really!

I’ve already outlined the necessity to improve the quality of associations between structures and identifiers on the database. However, I am also hoping that users will write articles about carbohydrates using the rich-text formatting capabilities (ADD Description), will add spectra if they have them, will link up articles if they have interesting papers and will add URLs to interesting online content also.

We have now delivered the ability to curate and enhance records on ChemSpider and look forward to having our users help, starting with Carbohydrates…

As the number of spectra uploaded to ChemSpider increases (and it is now increasing at quite a rate) we have noticed that ther increased loading time associated with records with a large numbr of spectra can be very long, especially if the spectra are “heavy”, for example for C13 specra at high-frequency and with zero-filling. When there are a number of spectra there are even more challenges.

With this in mind we have introduced the ability to Load a Spectrum when the user wants to see the spectrum and not automatically on loading the page. An example is shown here for recently uploaded spectra from the Drexel University laboratory of Jean-Claude Bradley.

Please est it out and let us know if you see any issues. the example listed above has a “heavy C13″ spectrum so loading might take awhile. 

An announcement was made on the Blue Obelisk Discussion List this week reagrding a new database of 4 million molecules at present but up to 50 million molecules in the future. It is called molecules.gnu-darwin.org/ and lists with the following comments:

Some facts: The Molecules website contains more than 4 million small molecule structure files in pdb format, and molecular graphics representations. About 50 million molecules are still in the pipe, and they are expected to appear here over the course of the next few weeks and months. The pdb format is readable by common FOSS molecule viewer software, such as RasMol and PyMOL. In due course, we plan to provide high quality structures via energy minimization refinement, and additional resources.

Molecules@gnu-darwin.org is founded in the spirit of free software, open source, and public access. It is hoped that access to these files will be a wonderful community resource for science education, research, and entertainment as well. We are looking for investment or funding to expedite and expand this work, and lead the field, with an eye towards an advanced, complete, synthetic, structural, and informatical bioorganome. Meanwhile, the site is already an exceptional lab resource, and molecular catalog, providing the means and building blocks towards additional novel structures. We aim to be the best.

The structural biology, protein crystallography, and molecular graphics talent that is building the Molecules archive is available to work for you in a contract or consulting arrangement. Wide-ranging expertise is available. Molecules@gnu-darwin.org is built entirely with FOSS, free and open source software, GNU-Darwin OS, and it is under the aegis of The GNU-Darwin Distribution. Here is a link to the Distribution résumé. Our founder is an X-ray laboratory admin for the Department of Biophysics and Biophysical Chemistry of Johns Hopkins University School of Medicine. You can also read his CV. We would like to build a community around this website, and we are looking for volunteers and collaborators to help. Regarding any aspect of the work of this site, please feel free to contact us, molecules@gnu-darwin.org, with gdmolecules in the subject line. Cheers!”

I’m always interested in potential databases to connect to that will add additional capabilities and diversity to ChemSpider’s information. I have browsed the database and searched on some common molecules (Xanax, aspirin, Taxol and others) and found no hits. This seemed strang but it does say “Search warning: not yet fully spidered

The statement that there are 50 million molecules in total coming suggests that the database is a republication of PubChem and the SDF archives seem to suggest so too since they redirect to PubChem for the download: http://molecules.gnu-darwin.org/ftp.ncbi.nlm.nih.gov/pubchem/Substance/CURRENT-Full/

At present the database therefore appears to be the PubChem database in PDB format. I hope that there is some additional information added to warrant our linking to this new database.

We have added the compound collection from Trans World Chemicals to ChemSpider. This is a collection of almost 1600 compounds. The collection can be viewed here.

ChemSpider has been working hard to support Wikipedia for a number of months now. We have been curating the structures on Wikipedia, I have been an active member of the WP:Chem team, we have extended our integration of WIkipedia to show the leed of the Wikipedia article on associated record views and have a lot of background activities going on re. Wikipedia at present (info will be released shortly). There are new articles released on Wikipedia on an ongoing basis and we stay up to date as best we can monitoring bots for updates. Harvesting monographs out of Wikipedia based only on ChemBoxes and Drugboxes is not sufficient for sure since not every article about drugs and chemicals on Wikipedia has an associated Drugbox or ChemBox. For example… You have likely heard of Rember for Alzheimers already? A search on Google for Rember Alzheimers will give about 2 million hits. It’s already being discussed in the blogosphere including Derek Lowe’s  In the Pipeline. Rember turns out to be methylene blue. There is already an article on Wikipedia about Rember but there is no chembox as yet. As I was researching Rember out of interest I noticed we did not have methylene blue linked to Wikipedia and Rember wasn’t associated with methylene blue. Adding the name was of course easy..5 seconds work after login. We have now added the ability to associate data sources directly too. What does this mean? On a record view page is a list of “Data Sources” associated with a compound. This is where depositions about a compound came from and, generally, links back to the associated web pages. Previously in order to populate the Data Source table it would be necessary to deposit the structure and associated info as an SDF file. TOO MUCH work. So, now we have made it easy. To add a data source simply login and select “Edit” (top right hand side of the data source table). To add a new data source simply click Add and input the information into the pop up box.The input is the name to be listed in the Data Source table, the URL to the information on the Data Source page (if info exists) and the name of the Data Source. This is one caveat of adding such links..the data source must exist. If you want to add data associated with your own website you need to register yourself, add a Data Source and wait for us to approve. Wikipedia is a special case since when the link is made we grab the leed of the article directly and show it in the Record View. For methylene blue there are two related Wikipedia articles so we have linked to them both as you can see on the record view. Simple go to ChemSpider and search for rember and you’ll see two linked Wikipedia articles.

We’ve been enhancing our deposition system so that the addition of 10s of thousands of new compounds to ChemSpider doesn’t have too big an impact on the performance of ChemSpider. The deposition of every structure demands the calculation of associated properties and deduplication against the database and needed to be optimized. As a result of our improved processing we are now cleaning up our backlog of new structures, something which is well overdue we know but we didn’t want to overly stress the servers for our users. New data are now on the database from the following companies. There are more to come…

In keeping with our commitment to continue to index Open Access journals for searching on ChemSpider we are happy to announce our indexing of Libertas Academica. Most people I have spoken to about our indexing of Open Access journals have never heard of this Open Access publisher. Libertas Academica offers “Open access journals on clinical medicine, bioinformatics, biology, chemistry, pharmacology, gene signalling, systems biology, informatics, virology, substance abuse, translational science and complimentary medicine.” I know of LA-press because of their Analytical Chemistry Insights journal.

Their list of Popular Journals is given below and their full list of journals is given on the third tab.

The publisher allows direct commenting on articles on their website as shown here for their article on “High-Performance Liquid Chromatographic Method for Determination of Phenytoin in Rabbits Receiving Sildenafil” (This article is already linked from the structures of Phenytoin and Sildenafil)

Following our previous approach of using Taxol and Paclitaxel as a measure of potential contibution to search results on ChemSpider searching Libertas Academica gives 6 hits on Taxol while a search on Paclitaxel gave 23 hits.

Our growing list of Open Access Publishers is rather impressive at this point…see below. It will continue to grow.

The Environmental Protection Agency has provided permission for ChemSpider to utilize their EPI SuiteTM software to predict a number of physical properties for the chemicals on the ChemSpider database. The properties include:
KOWWIN™: Estimates the log octanol-water partition coefficient, log KOW, of chemicals using an atom/fragment contribution method.
AOPWIN™: Estimates the gas-phase reaction rate for the reaction between the most prevalent atmospheric oxidant, hydroxyl radicals, and a chemical. Gas-phase ozone radical reaction rates are also estimated for olefins and acetylenes. In addition, AOPWIN™ informs the user if nitrate radical reaction will be important. Atmospheric half-lives for each chemical are automatically calculated using assumed average hydroxyl radical and ozone concentrations.
HENRYWIN™: Calculates the Henry’s Law constant (air/water partition coefficient) using both the group contribution and the bond contribution methods.
MPBPWIN™: Melting point, boiling point, and vapor pressure of organic chemicals are estimated using a combination of techniques.  Included is the subcooled liquid vapor presssure, which is the vapor pressure a solid would have if it were liquid at room temperature.  It is important in fate modeling.
BIOWIN™: Estimates aerobic and anaerobic biodegradability of organic chemicals using 7 different models; two of these are the original Biodegradation Probability Program (BPP™).  The seventh and newest model estimates anaerobic biodegradation potential.
BioHCWIN: Estimates biodegradation half-life for compounds containing only carbon and hydrogen (i.e. hydrocarbons).
PCKOCWIN™: The ability of a chemical to sorb to soil and sediment, its soil adsorption coefficient (Koc), is estimated by this program. EPI’s Koc estimations are based on the Sabljic molecular connectivity method with improved correction factors.
WSKOWWIN™: Estimates an octanol-water partition coefficient using the algorithms in the KOWWIN™ program and estimates a chemical’s water solubility from this value. This method uses correction factors to modify the water solubility estimate based on regression against log Kow.
WATERNT™: Estimates water solubility directly using a “fragment constant” method similar to that used in the KOWWIN™ model.
HYDROWIN™: Acid- and base-catalyzed hydrolysis constants for specific organic classes are estimated by HYDROWIN™. A chemical’s hydrolytic half-life under typical environmental conditions is also determined. Neutral hydrolysis rates are currently not estimated.
BCFWIN™: This program calculates the BioConcentration Factor and its logarithm from the log Kow. The methodology is analogous to that for WSKOWWIN™. Both are based on log Kow and correction factors.
KOAWIN: KOA is the octanol/air partition coefficient and has multiple uses in chemical assessment.  The model estimates KOA using the ratio of the octanol/water partition coefficient (KOW) from KOWWIN™, and the dimensionless Henry’s Law constant (KAW) from HENRYWIN™. • AEROWIN™: Estimates the fraction of airborne substance sorbed to airborne particulates, i.e. the parameter phi (φ), using three different methods.  AEROWIN™ results are also displayed with AOPWIN™ output as an aid in interpretation of the latter.
WVOLWIN™: Estimates the rate of volatilization of a chemical from rivers and lakes; calculates the half-life for these two processes from their rates. The model makes certain default assumptions-water body depth; wind velocity; etc.
STPWIN™: Using several outputs from EPI Suite™, this program predicts the removal of a chemical in a Sewage Treatment Plant; values are given for the total removal and three contributing processes (biodegradation, sorption to sludge, and stripping to air.) for a standard system and set of operating conditions.
LEV3EPI™: This level III fugacity model predicts partitioning of chemicals between air, soil, sediment, and water under steady state conditions for a default model “environment”; various defaults can be changed by the user.

The values for individual structures are available in the Record View under the EPI Summary.

For example, the information for Xanax is below.

 Log Octanol-Water Partition Coef (SRC):
    Log Kow (KOWWIN v1.67 estimate) =  3.87
    Log Kow (Exper. database match) =  2.12
       Exper. Ref:  BioByte (1995)

 Boiling Pt, Melting Pt, Vapor Pressure Estimations (MPBPWIN v1.42):
    Boiling Pt (deg C):  441.81  (Adapted Stein & Brown method)
    Melting Pt (deg C):  185.42  (Mean or Weighted MP)
    VP(mm Hg,25 deg C):  1.65E-008  (Modified Grain method)
    Subcooled liquid VP: 7.84E-007 mm Hg (25 deg C, Mod-Grain method)

 Water Solubility Estimate from Log Kow (WSKOW v1.41):
    Water Solubility at 25 deg C (mg/L):  13.1
       log Kow used: 2.12 (expkow database)
       no-melting pt equation used

 Water Sol Estimate from Fragments:
    Wat Sol (v1.01 est) =  0.15855 mg/L

 ECOSAR Class Program (ECOSAR v0.99h):
    Class(es) found:
       Aliphatic Amines
Henrys Law Constant (25 deg C) [HENRYWIN v3.10]:
   Bond Method :   9.77E-012  atm-m3/mole
   Group Method:   Incomplete
 Henrys LC [VP/WSol estimate using EPI values]:  5.117E-010 atm-m3/mole

 Log Octanol-Air Partition Coefficient (25 deg C) [KOAWIN v1.10]:
  Log Kow used:  2.12  (exp database)
  Log Kaw used:  -9.399  (HenryWin est)
      Log Koa (KOAWIN v1.10 estimate):  11.519
      Log Koa (experimental database):  None

 Probability of Rapid Biodegradation (BIOWIN v4.10):
   Biowin1 (Linear Model)         :   0.6009
   Biowin2 (Non-Linear Model)     :   0.2660
 Expert Survey Biodegradation Results:
   Biowin3 (Ultimate Survey Model):   2.2574  (weeks-months)
   Biowin4 (Primary Survey Model) :   3.1733  (weeks       )
 MITI Biodegradation Probability:
   Biowin5 (MITI Linear Model)    :  -0.1488
   Biowin6 (MITI Non-Linear Model):   0.0042
 Anaerobic Biodegradation Probability:
   Biowin7 (Anaerobic Linear Model): -0.4906
 Ready Biodegradability Prediction:   NO

Hydrocarbon Biodegradation (BioHCwin v1.01):
    Structure incompatible with current estimation method!

 Sorption to aerosols (25 Dec C)[AEROWIN v1.00]:
  Vapor pressure (liquid/subcooled):  0.000105 Pa (7.84E-007 mm Hg)
  Log Koa (Koawin est  ): 11.519
   Kp (particle/gas partition coef. (m3/ug)):
       Mackay model           :  0.0287
       Octanol/air (Koa) model:  0.0811
   Fraction sorbed to airborne particulates (phi):
       Junge-Pankow model     :  0.509
       Mackay model           :  0.697
       Octanol/air (Koa) model:  0.866 

 Atmospheric Oxidation (25 deg C) [AopWin v1.92]:
   Hydroxyl Radicals Reaction:
      OVERALL OH Rate Constant =   7.6246 E-12 cm3/molecule-sec
      Half-Life =     1.403 Days (12-hr day; 1.5E6 OH/cm3)
      Half-Life =    16.834 Hrs
   Ozone Reaction:
      No Ozone Reaction Estimation
   Fraction sorbed to airborne particulates (phi): 0.603 (Junge,Mackay)
    Note: the sorbed fraction may be resistant to atmospheric oxidation

 Soil Adsorption Coefficient (PCKOCWIN v1.66):
      Koc    :  2.151E+006
      Log Koc:  6.333 

 Aqueous Base/Acid-Catalyzed Hydrolysis (25 deg C) [HYDROWIN v1.67]:
    Rate constants can NOT be estimated for this structure!

 Bioaccumulation Estimates from Log Kow (BCFWIN v2.17):
   Log BCF from regression-based method = 0.932 (BCF = 8.559)
       log Kow used: 2.12 (expkow database)

 Volatilization from Water:
    Henry LC:  9.77E-012 atm-m3/mole  (estimated by Bond SAR Method)
    Half-Life from Model River: 1.053E+008  hours   (4.388E+006 days)
    Half-Life from Model Lake : 1.149E+009  hours   (4.786E+007 days)

 Removal In Wastewater Treatment:
    Total removal:               2.37  percent
    Total biodegradation:        0.10  percent
    Total sludge adsorption:     2.27  percent
    Total to Air:                0.00  percent
      (using 10000 hr Bio P,A,S)

 Level III Fugacity Model:
           Mass Amount    Half-Life    Emissions
            (percent)        (hr)       (kg/hr)
   Air       0.000217        33.7         1000
   Water     21              900          1000
   Soil      78.9            1.8e+003     1000
   Sediment  0.094           8.1e+003     0
     Persistence Time: 1.48e+003 hr

We started the calculations a number of weeks ago and are updating our progress on the ChemSpider Forum here. We now have values predicted for 3 million compounds.

It is NOT possible at present to search on these properties in the same way that other properties can be searched on the Search Predicted Properties page as shown below.

After all EPI Suite properties are predicted we will selectively make some of these available for searching. The interest so far appears to be in Henry’s Law values, Water Solubility and Melting Point (something that is very difficult to predict with accuracy!). We welcome your comments.

We will be able to extract experimental values for some properties and display directly. For example, logP shows an “experimental database match” for Xanax.

Log Octanol-Water Partition Coef (SRC):
Log Kow (KOWWIN v1.67 estimate) = 3.87
Log Kow (Exper. database match) = 2.12

Exper. Ref: BioByte (1995)

It is going to take a number of weeks to generate EPI Suite values for 21.5 million molecules but we are moving in that direction. Our sincere thanks to the EPA for allowing us to use their EPI Suite software on ChemSpider for the benefit of the community

I have spoken on this blog many times about the challenges of cleaning up data in chemistry databases. We’re expending a lot of efforts, with the assistance of many others, in cleaning up the data on ChemSpider and, as a benefit, assisting in cleaning up date in other databases also. The efforts to curate the chemical structure data on Wikipedia continues and the work is now focused on delivering ‘bots that will drive a cleansed data file to the individual records. Over the past few months I have developed a great appreciation for the efforts, dedication and commitment of the many contributors to Wikipedia Chemistry. There are many 10s of people editing and contributing to the articles and then there is the “core WP:Chem team” who show up for the IRC chats most Tuesdays at noon. Many of the past weeks have focused on how to curate the data and utilize ‘bots and control curated data moving forward. I am honored to share “IRC-space” with them!

Over the past few weeks I have been similarly blessed to interact with the ChEBI team via email as we have done our work to deposit their Entities of the Month (1,2). During the process of doing so we have exchanged many emails and have cleaned a number of errors in our mutual datasets. In my opinion a PERFECT example of the results of such detailed efforts is for Vancomycin. One week ago a search on vancomycin would give a dozen hits. Many of these had incomplete stereochemistry. Now a search on ChemSpider gives one hit for vancomycin here. This is the result of working with Kirill Degtyarenko at ChEBI. The conversation was initiated by my observation regarding stereo in the structure on ChEBI.

For details on how this is identified to be the correct structure read the description on that page. VERY DETAILED and includes links out to three publications.

Compare this with a search for vancomycin on PubChem giving 66 hits. Some of these differences are due to the different approaches for our text searches – the PubChem results list includes VANCOMYCIN HYDROCHLORIDE and Gatifloxacin & Vancomycin for example. However, there are a number of “vancomycins” also.

We believe we have the correct vancomycin identified at this point…we welcome any challengers!

Thanks to the efforts of contributors such as Heinz Kolshorn new compounds and associated analytical data are finding their way onto ChemSpider on a regular basis. These are chemical compounds that have been synthesized and fully characterized. Unless they are published they are unlikely to find their way into chemical registry systems or into training databases for the commercial NMR prediction packages such as those of ACD/Labs, Bio-Rad, Modgraph or Wolfgang Robien’s collection. As a result this type of information will be “Lost Chemistry“. These particular data from Heinz will almost certainly find their way into the NMRShiftDB since Heinz is hosting the database at his lab at the University of Mainz.

Heinz has been putting actual experimental spectra and the associated shift assignments onto ChemSpider of late. An example is here. This is enabled by our ability to upload and store both spectra and images. There are better ways to display the shift assignments by allow mouseover display of the structure and peak associations but this is not yet available on the system but clearly a nice to have. For now the information is there for others to use and is indicative of the value of integrating images and spectral data. I can envisage other pairings such as UV-spectra versus photo of colored solution for example.

Over the past few months we have recognized those people who have spent their time depositing to the content of ChemSpider either as depositors or curators. Recently I commented about one of our Advisory Group, Chris Singleton, taking on a major project to deposit spectral data to ChemSpider. If you visit the spectral data page and scroll through you will see that there are now 33 pages of spectra, each page containing 20 spectra. The majority of these are NMR spectra and the largest single collection is that deposited by Chris over the past few weeks. The data were those obtained from the Madison Metabolomics Consortium Database and described in a publication by Q. Cui, et al; “Metabolite identification via the Madison Metabolomics Consortium Database”, Nature Biotechnology, 26,162 (2008). Our sincere thanks to Chris for all of his work!

There is another raft of spectra waiting to be processed and deposited so the spectral data collection will continue to grow.

I have blogged previously about ChEBI entities of the month and our work to include the information to ChemSpider. In order to do so we had to introduce rich text support. This work is done and reported here. As of today nearly all ChEBI Entity of the Month information is now posted to ChemSpider. During the processs we have provided feedback to the team about some suggested changes to some structure depictions and have also noted some differences in stereochemistry between our reference structures and those on ChEBI. This type of interaction has us all be very vigilant about accuracy and it was great (and fast) to work with the group at ChEBI to cross-validate the limited dataset. Everyone gains.

The Rich text editor worked perfectly and without failure and is ready to roll out to the general public we think but we would still like some beta-testers to help test it please.

Zemanta Pixie

Okay, this is clearly a rather tongue in cheek blog post but i couldn’t resist.

Search “sex” on ChemSpider and you get two hits…here

Click on the first structure and you will find that one of the identifiers for this compound is SEX, and it is an explosive.

Just READ the second structure and you will see it is SEX. It’s CLEAN sex though. The dirty sex was described in a recent article in a C&E News article and points back to the poor image originally published by the New York Times when they issued a book review of Pamela Paul’s Book “Bonk, The Curious Coupling of Science and Sex“. In order to have CLEAN sex I removed inappropriate substitutions and bonds.

It still looks like sex though…

ChemSpider added the Directory of Useful Decoys over the weekend. This dataset is well known to the community of scientists performing computational docking experiments and is outlined below. The dataset contributed over 128,000 molecules to the collection.

DUD, a directory of useful decoys for benchmarking virtual screening. DUD is designed to help test docking algorithms by providing challenging decoys. It contains:

  • A total of 2,950 active compounds against a total of 40 targets
  • For each active, 36 “decoys” with similar physical properties (e.g. molecular weight, calculated LogP) but dissimilar topology.

DUD is provided by the Shoichet Laboratory in the Department of Pharmaceutical Chemistry at the University of California, San Francisco (UCSF). To cite DUD, please reference Huang, Shoichet and Irwin, J. Med. Chem., 2006, 49(23), 6789-6801. doi 10.1021/jm0608356. There is a DUD wiki page where you can discuss DUD and an errata page where problems are reported and explained.”

In an ongoing commentary about the DailyMed dataset (1,2) I have been showing some of the struggles regarding creating curated datasets from publicly available data. This post shows an example of when trade names collide. The DailyMed record for sclerosol shows no chemical structure in the label….but describes the compound as follows:

“Sclerosol® Intrapleural Aerosol (sterile talc powder 4 g) is a sclerosing agent for intrapleural administration supplied as a single-use, pressurized spray canister with two delivery tubes of 15 cm and 25 cm in length. Each canister contains 4.0 g of talc, either white or off-white to light grey, asbestos-free, and brucite-free grade of talc of controlled granulometry. The composition of the talc is ≥ 95% talc as hydrated magnesium silicate. The empirical formula is Mg3 Si4 O10 (OH)2 with molecular weight of 379.3.”

Sclerosol is Talc. A search on Sclerosol online however brings us numerous hits for dimethyl sulfoxide on ChemIndustry and the Comparitive Toxicogenomics database and on MeSH. So, is Sclerasol also DMSO?

The PubChem record merges the relationship between Talc and DMSO rather well. Visit the record here. The substance summary is as follows:

“A highly polar organic liquid, that is used widely as a chemical solvent. Because of its ability to penetrate biological membranes, it is used as a vehicle for topical application of pharmaceuticals. It is also used to protect tissue during CRYOPRESERVATION. Dimethyl sulfoxide shows a range of pharmacological activity including analgesia and anti-inflammation.”

Further information is the MeSH details shown below.

The image of the associated structure is shown below…notice it’s representative of talc.

It appears that DMSO and Talc were meshed somehow.

Sclerasol on ChemSpider is Talc. I am not stating that the structure representation of talc is appropriate but it IS the same as the one displayed on PubChem. DMSO on ChemSpider is here and never had the name Sclerasol associated with it. Since we derived some of our data from PubChem I am not sure how we managed to separate the DMSO and Sclerasol association in our processes…but we did.

So, MAYBE Sclerasol is a name for DMSO…but I don’t think so.

Why is this important? As we are working on text mining and will use a lookup dictionary of chemical names and structures as part of the process we are putting in the work to create a high quality dictionary. it’s important for us moving forward.

I’ve started a review of the DailyMed dataset as it is representative of some of the struggles with preparing a curated dataset of chemical structures, chemical names and trade names. In the first comment I pointed to issues with structure representations. I believe one of the worst is shown for qvar to the left. An examination of the qvar record gives the name as beclamethasone propionate. This particular compound has the chemical structure shown below. Not only is the stereochemistry missing from the structure on DailyMed but also half the ring has been lost, maybe during a scanning process? I wonder whether the label circulating out there to the public has this issue? Would the public care? Probably not. But when trying to build a curated dataset it’s rather important.


The past couple of days has seen an interesting exchange going on over on the SimBioSys blog.

Zsolt Zsoldos is someone I respect, not only for his passion for his science but also for his want to educate others in the challenges of what he does in developing software. I believe his blog post entitled “Crystal Structure Errors in CSD too” was an honest attempt to tell people to be “careful” when using data from databases. I don’t care whether the database is ChemSpider, PubChem, the CAS Registry or any of the other databases available via free access of commercial transaction, they ALL have errors. It is inevitable. Zsolt’s attempt to highlight that such errors exist was done, I believe, with pedagogical intent.

“J” then came back and gave some appropriate comments in response to Zsolt’s post and they should be consumed in series. It appears there was some type of backroom conversation, likely with the CCDC,  about how these comments were not prominent enough. Zsolt then posted this:

Update: Since the posting of this blog entry, we have received 2 public comments — displayed in a standard way as all comments by the WordPress blog software, and some private emails originating from CCDC. One of the complaints from CCDC was that the second comment — which explains the problems and directs the blame on my naivity for my wrong expectations about the data — was not displayed as prominently as the original article.”

He then posted the comment into the original article. Huh? Not sure why Zsolt should have felt obliged to do this for anyone. It’s a WordPress issue re how comments are displayed. He should not have felt obliged to insert the text into the article. Zsolt then went on to comment about the licence agreement and permission to use the CSD. What is more interesting to me is his view here:

“On a personal opinion: such restrictions on the use of scientific facts do not seem to make much sense to me. As the IUCr position paper explains: There is a long-standing acceptance within crystallography of the principle that such primary data sets should be freely available for sharing and re-use (with appropriate credit) within the structural science community. Also the FAQ on the CystalEye site explains: “As this supplementary data is a set of facts and is not part of the article full-text it does not fall under the copyright, and it should therefore be free to both view and download“. Nevertheless, CCDC has the legal right to stop us from using the data, since we signed a licensing agreement containing such conditions. That was a mistake on our part, one that we have to live with now. Let this case be a warning for others who have not yet made such mistake to sign the draconian agreement. ”

Those of you who have been watching the discussion between myself and ACS over the past few months will know I have been trying to get confirmation that “supplementary data” are Open Data and that we could scrape the CIFs if we chose to…it’s a MANY month conversation at this point. The Unilever School at Cambridge, via Nick Day’s work, has generated CrystalEye and, after many conversations, we were provided the data source and have it on ChemSpider now. We are awaiting constructive feedback from Nick and Peter Murray-Rust regarding our implementation of their data on our site. THis is especially important when there are licensing issues as appear to have been enforced on SimBioSys, evidenced by this Public Apology to CCDC. Read the post for details. It is Zsolt’s concluding statement that feeds directly into the value of Open Data in science and the value of CrystalEye to the community.

He comments: “One lesson I learned from this exchange is the importance of Open Data for scientific advancement (some scientists believe that research data must be free), e.g. such that is available from CrystalEye. When even non-profit organizations (registered as a charity) use draconian license agreements protecting data created and published by others, then fully commercial entities (like pharmaceutical companies) must be guarding their own data even stronger. It makes it difficult to make scientific progress if a single blog mention of an error in a data entry invites the wrath of the company who sells services on the data.”

As efforts like CrystalEye prevail, as the copyrightability and position of publishers regarding supplementary data is resolved, and the efforts of groups such as ChemSpider are applied to gathering Open Data and developing algorithms from these data, there is likely to be increasing tension showing up such as we see here.

here has been a response to my post about Chemical Names and Structures here.

PMR>”For certain purposes, it is valuable to collect as many names as possible, for example for location of lookup. But these should be accompanied with metadata. A similar example is from ChemSpiderMan (ed.):

On a record view we list “Names and Synonyms”. The question marks Peter sees are for a French name shown here: Looks fine in my broswer and pasted in here too: N-{2-[({5?-[(dim�th?ylamino)m?�thyl]fur?an-2-yl}m?�thyl)sul?fanyl]�th?yl}-N’-m�?thyl-2-ni?tro�th�ne?-1,1-diam?ine. So, not junk (saying that the French name is junk would offend the Parisians). Notice that the Z- has been removed (for now) and that the name is labeled French on the record. If any of you are seeing issues in your browser let us know and we will investigate at our end.

PMR: Without the metadata giving the langauage information is losr. For example what does “pain” mean? If the language is not given there is a tendency to interpret this as english.  We have to acknowledge that the language of science is currently english (it wasn’t when I started and we had to read French and German  papers). So RDF, for example, provides a language qualifier (e.g. @en or @fr). The addition of that qualifier transforms the information from junk to meaningful. “

First of all, it’s interesting to note that the French name has been rendered as “junk” in Peter’s blog as shown here.

This probably relates to his original comment that the name is junk in his browser too…but acceptable in mine. On the other hand his blog post may look fine to him and looks bad in mine! Oh those dependencies…I see similar things show up in WordPress regularly.

Peter suggests that there should be metadata giving the language information. Good idea. See my previous blog post about that particular issue and the fact that we allow curators to layer on metadata AND we capture and retain it WHEN it is available.

If you look at this record you will see that there are names labeled as Polish, German and Dutch.

Chloropre​ne [Wiki]

1,3-Butad​iene, 2-c​hloro-

126-99-8 [RN]

204-818-0 [EINECS]

2-Chloor-​1,3-butad​ieen [Dutch]

2-Chlor-1​,3-butadi​en [German]

2-Chlorbu​ta-1,3-di​en [German]

2-Chloro-​1,3-butad​iene

2-Chlorob​utadiene

Chloropren [Polish]

Most labels were captured during the deposition process. One was added manually.Notice also the direct links to Wikipedia, the Registry number link to perform a search of PubChem and the link to EINECS.

As I commented in my post on ranitidine, and extracting from Peter’s post “Notice …….. that the name is labeled French on the record.” So, what Peter suggests is already in place on ChemSpider. I display below what is presently available to curators to label the names with. Notice this includes language,
EINECS numbers, CAS Registry Numbers, INNs, JANs etc.


The list of languages is easy to expand. Anybody have any requests?

A further comment “PMR: I very much like the idea of regarding chemical names as social identifiers. But, of course, that only works for humans. The machines can aggregate the tags but they cannot make inferences from them. The problem is that when they are put into databases they lose their social context and are managed by hard boolean logic. That fails immediately and often dramatically. A major cause is the loss of metadata and authorities. In this world you cannot use voting (which is why Chempedia cannot be seen as an authority for CAS numbers, only a useful guide). We have to use authorities (provenance) in our information. Thus the statements: Ranitidine is the Z-isomer and Ranitidine is the E-isomer may be seen as contradictory. That’s why people have suggested that RDF should have quads, not triples, such as Antony_Williams asserts ranitidine hasIsomer Z Wikipedia asserts ranitidine hasIsomer E Both these are true. That is the language we should use in the semantic web PeterMR still deliberately fails to make an assertion about this isomerism and is waiting to see what others think.”

This leads us into a deeper discussion about retention of metadata and authorities. We retain metadata when it is deposited or we can harvest it. Let’s consider the information below extracted from the same compound on ChemSpider:

Notice all of the

and note that they all link through to the original source of information, in this case NIOSH.

  • Appearance: Colorless liquid with a pungent, ether-like odor.

  • First Aid: Eye: Irrigate immediately Skin: Soap wash immediately Breathing: Respiratory support Swallow: Medical attention immediately

  • Exposure Routes: inhalation, skin absorption, ingestion, skin and/or eye contact

  • Symptoms: Irritation eyes, skin, respiratory system; anxiety, irritability; dermatitis; alopecia; reproductive effects; [potential occupational carcinogen]

  • Target Organs: Eyes, skin, respiratory system, reproductive system Cancer Site [lung & skin cancer]

  • Incompatibilities and Reactivities: Peroxides & other oxidizers [Note: Polymerizes at room temperature unless inhibited with antioxidants.]

  • Personal protection and Sanitation: Skin: Prevent skin contact Eyes: Prevent eye contact Wash skin: When contaminated Remove: When wet (flammable) Change: No recommendation Provide: Eyewash, Quick drench

  • Exposure Limits: NIOSH REL : Ca C 1 ppm (3.6 mg/m 3 ) [15-minute] See Appendix A OSHA PEL ?: TWA 25 ppm (90 mg/m 3 ) [skin]

There are also properties and each piece of data links out to the original source.For this record it is the same source. For some records it is already multiple sources.

Experimental physchem properties

  • Boiling Point: 139F

  • Flash Point: -4F

  • Freezing Point: -153F

  • Specific Gravity: 0.96

  • Solubility: Slight

  • Ionization Potential: 8.79 eV

  • Vapor Pressure: 188 mmHg

This particular structure has been deposited onto the ChemSpider database a total of 18 times from the  source databases listed below. Where possible i.e. when the structure is available online on the suppliers website and can be hyperlinked to, then each external ID links to the depositor. There is an error! The Aldrich depositions are for the polymer forms! Curators can know this info out.

Data Source External ID(s)
ChemDB 6681768
ChemIDplus 000126998, 014523898
DiscoveryGate 31369
DTP/NCI 18589
EINECS N/A
EPA DSSTox 1084_NTPBSI_v2b, 325_CPDBAS_v5b, 326_CPDBAS_v5b, 724_HPVCSI_v2c
Istituto Superiore di Sanità 601
NIOSH EI9625000
NIST 2143397875
NIST Chemistry WebBook 2143397875
PubChem 31369
Sigma-Aldrich 205397_ALDRICH, 205400_ALDRICH
Thomson Pharma 00243363

Also available to master curators is the ability to see who has been editing the names and synonyms and a full record of depositions, by who and when.

So, names are labeled with language and links to Wikipedia and other info. The predicted properties and systematic name are generally labeled according to the provider of the algorithm(s). We keep track of every URL and publication deposition and know which user deposited what and when…if the site is “vandalized” then we know which user did so.

Overall I’d say we have a lot of metadata for this record. The same is true for tens of thousands of records on ChemSpider and the amount of such information is growing literally daily. We’re not done yet of course – there is much more to add. We put a lot of thought into the design of this system and associated metadata but we also chose to jump off the cliff and start “doing”. There is a lot to learn from managing 20 million molecules and the complexity that comes with doing so. We continue to morph and extend as necessary and welcome input.

To clarify re. ranitidine…. I am NOT asserting that ranitidine has Z-isomer. I am stating that ranitidine has multiple names on ChemSpider, some with no stereochemistry and some with Z-stereochemistry. I also
report that a published crystal structure reports a Z-orientation.  I also report that a commercial software package suggests that the three tautomeric structures below are possible for ranitidine.

I also report, just for fun of course, that the InChI algorithm will declare two of these isomers, the bottom two, as equivalent when “mobile protons” are taken into account. Compare the ON InChIKeys below when mobile proton perception is detected by the InChI algorithm.   Need  more information?

With the curation capabilities we have in place, with the retained metadata, linkages to depositors and other sites and the revision history available, I would say that we are well equipped to manage the data for chemists and continue to enhance our platform for chemists worldwide.

Recently I posted on whether or not there is “a right structure for a compound“. I taked about trade names and registered chemical entities and posited the question regarding “whether a Registered Trade Name is absolute? I’m asking the question since I’m actually not sure. ”

There were two responses…

1) Rich Apodaca commented:”you’d probably find agreement among chemists that a trade name uniquely identifies one specific chemical entity. Ditto CAS Number.”

2) Peter Murray-Rust, as is his way (does anyone ever get a comment on their blog from PMR?), posted a detailed and thoughtful response on his own blog here.

I, like Rich, am of the opinion that a CAS Number does uniquely identify a specific chemical entity, not necessarily a unique structure. Of course, CAS numbers can be confusing too as I have commented here. Aspirin, for example, has 6 CAS numbers! So Rich and I agree on this…can anyone from CAS confirm or not whether our belief is right?

So, what about Trade Names? There were a number of purposeful errors in my original post to stimulate thought and feedback about my question. There is a LOT of confusion about identifiers and chemicals. The relationships are convoluted and even I struggle with certain aspects. So, let’s examine the confusions!.

I commented that “Zantac is a registered trade name for the chemical here. ” Check out the chemical structure there.

Now check out the Wikipedia text on that record view: “Ranitidine (INN) is a histamine H2-receptor antagonist that inhibits stomach acid production. It is commonly used in the treatment of peptic ulcer disease (PUD) and gastroesophageal reflux disease (GERD). It is currently marketed over the counter under the trade name Zinetac and Zantac by GlaxoSmithKline and by many other companies under various other names. ”

One might assume therefore that I am correct in my statement about Zantac. Check out the DailyMed label for Zantac here. This declares: “The active ingredient in ZANTAC Injection and ZANTAC Injection Premixed is ranitidine hydrochloride (HCl)”. Ah-ha…Zantac is a hydrochloride form of Ranitidine then? A search for Zantac gives THREE results on DailyMed…different in formulations but all pointing to the HCl form of ranitidine as the active component. So, based on this statement is it correct to label the structure here with the label Zantac? It doesn’t have the HCl so in theory, no. Is Wikipedia correct in saying that Ranitidine is “marketed over the counter as Zantac”. No. Hmmm. A conundrum? No. It’s clear. Zantac should ONLY be a Ranitidine HCl formulation. A couple of button clicks and the record now say Zantac (as HCl). But there are a LOT of other trade names associated with that Ranitidine record that don’t have such definitions (yet).

There is a Ranitidine Hydrochloride on ChemSpider here. It came as part of the recent CrystalEye deposition and is at this record. The associated publication is here, the title of the article is “Ranitidine hydrochloride, a polymorphic crystal form” and the abstract says:

” In the title compound, dimethyl({5-[2-(1-methylamino-2-nitroethenylamino)ethylthiomethyl]-2-furyl}methyl)ammonium chloride, C13H23N4O3S+·Cl-, protonation occurs at the dimethylamino N atom. The ranitidine molecule adopts an eclipsed conformation. Bond lengths indicate extensive electron delocalization in the N,N‘-dimethyl-2-nitro-1,1-ethenediamine system of the molecule. The nitro and methylamino groups are trans across the side chain C=C double bond, while the ethylamino and nitro groups are cis. The Cl- ions link molecules through hydrogen bonds.”

When I take the orientation information and draw the molecule from the crystal structure then I get:

and when I name this I get: (Z)-N-{2-[({5-[(dimethylamino)methyl]furan-2-yl}methyl)sulfanyl]ethyl}-N’-methyl-2-nitroethene-1,1-diamine, a Z-orientation.

Let’s return to Peter’s analysis of the list of identifiers associated with Ranitidine on the ChemSpider record in question. He comments

“PMR: ….. It is clear that

(Z)-N-{2-​[({5-[(Di​methylami​no)methyl​]furan-2-​yl}methyl​)sulfanyl​]ethyl}-N​’-methyl-​2-nitroet​hen-1,1-d​iamin

and

N-[2-[[[-​5-[(Dimet​hylamino)​methyl]-2​-furanyl]​methyl]th​io]ethyl]​-N’-methy​l-2-nitro​-1,1-ethe​nediamine

are not identical. One describes a compound whose stereochemistry is asserted, the other describes one where the stereochemistry is not asserted. Butene and 1-butene and 2-butene and (Z)-butene are all different. They all have different InChIs. Some of them may refer to the same concept in some contexts, but they are not synonyms. Fowler (Modern English Usage) says “perfect synonyms are extremely rare”.”

We are in absolute agreement about this issue. The names are not identical. One declares stereo and the other doesn’t. The question then is what synonyms are useful to the user of ChemSpider to locate the structure if they have a systematic name. One might assume that the more the merrier. There is an enormous number of variants of bracket styles and dashes that could give rise to probably dozens of names that are all consistent with the structure and the names shown come from different sources.

Additionally the comment is made “If we are representing something in a machine, and we assert the two are to be used interchangeably then we have to be very sure that they can be. Adding a “(Z)” may appear a reasonable thing to do – in this case it is a diastrous act that corrupts information.” This is the problem with identifiers – they are confounded with complexity and supports the concept that there are no absolutes in names associated with compounds.

In discussing Wikipedia Peter has previously pointed to Wikipedia as “Open, re-usable, very highly curated, and the first place that students look. That – or a derivative – is where the world’s chemistry should reside.” I have covered the complexity of Taxol/paclitaxel previously (1,2,3) so where does WIkipedia stand on Rantidine?

Wikipedia actually shows and names an E-orientation as shown below

So, Wikipedia says E, ChemSpider says Z- and no-specific stereochemistry (in its identifiers). The crystal structure specifies Z-stereo. Oh dear, what can the matter be?

I then searched PubChem and found 2E’s and a Z- under Zantac. I searched MeSH for ranitidine and found no stereo specified. I searched ChEBI for both ranitidine and zantac and found nothing.

Further down the rabbit hole we go…

PMR> “The robotic aggregation of chemical names and identifiers, if done without metadata and ontology, corrupts information. That’s a strong statement, but we can see it in the current case. First there is junk out there. Robotic name harvesting harvests junk. (Christoph Steinbeck described it in worse terms at the RSC meeting. ) Here’s a snip from page571454

Validated by Experts, Validated by Users, Non-Validated, Removed by Users, Redirected by Users, Redirect Approved by Experts

Ranitidine [Wiki]

(Z)-N-{2-​[({5-[(Di​m?thylami​no)m?thyl​]furan-2-​yl}m?thyl​)sulfanyl​]?thyl}-N​’-m?thyl-​2-nitro?t​h?ne-1,1-​diamine

(Z)-N-{2-​[({5-[(Di​methylami​no)methyl​]furan-2-​yl}methyl​)sulfanyl​]ethyl}-N​’-methyl-​2-nitroet​hen-1,1-d​iamin

The “?” characters show up in my browser – I don’t know what they are, but they are not normal “e”s (ASCII 101). The first name is not a synonym – I’m sorry, but it’s junk. Associating junk with good information degrades the good information rather than increasing the quality of the junk (There is a more formal proof somewhere by Shannon – I believe – that machines cannot act as 100% proofreaders).”

On a record view we list “Names and Synonyms”. The question marks Peter sees are for a French name shown here:

Looks fine in my broswer and pasted in here too: N-{2-[({5​-[(diméth​ylamino)m​éthyl]fur​an-2-yl}m​éthyl)sul​fanyl]éth​yl}-N’-mé​thyl-2-ni​troéthène​-1,1-diam​ine. So, not junk (saying that the French name is junk would offend the Parisians). Notice that the Z- has been removed (for now) and that the name is labeled French on the record. If any of you are seeing issues in your browser let us know and we will investigate at our end.

Further

“PMR: A trade name represents a product, not a compound and certainly not a connection table. In some cases it may refer to a pure substance, which itself is describable by a connection table, but these are not synonyms. And aggregating them as synonyms adds error rather than clarity. However there is an even stronger reason why “Zantac” does not describe ranitidine. See the FDA page. Zantac (Ranitidine Hydrochloride) Tablets Zantac contains (not “is”) ranitidine hydrochloride.”

A Trade Name DOES represent a product. It can represent MANY formulations also. The active component is commonly the material of interest that we would like to see as a connection table.

However, if one wants to find the active component in Zantac what would YOU do to find out? Type in Zantac on Wikipedia maybe? Look where it takes you: http://en.wikipedia.org/wiki/Zantac. So, Zantac redirects to Ranitidine..don’t forget the earlier statement about Wikipedia: “Open, re-usable, very highly curated, and the first place that students look. That – or a derivative – is where the world’s chemistry should reside.” Should the same be true for ChemSpider? I think so. But this is a choice we have to make to provide a service to the users.On MeSH a search on Zantac takes you to Ranitidine. On PubChem Zantac takes you to Ranitidine(s). So, association of Zantac with Ranitidine is appropriate BUT there is a need for ontologies, I agree. ChEBI has a good model for this (more later).

Interestingly, a search on Ranitidine on ChemSpider provides the following list of names:

PMR comments: “But the current aggregations of chemicals (Chemspider, eMolecules, Chempedia) are designed for use by machines as well as humans. And unless high-quality metadata is given, along with a structured ontology then machine aggregation of chemistry corrupts rather than enhances. For that reason we are building molecular repositories based on metadata and ontologies. In the current era of the web it’s becoming essential. ”

I  look forward to seeing how Zantac and Ranitidine are handled in this new world- if its a structured ontology then it sounds like an integration of MeSH with structures? Wikipedia is over 5000 organics now and is the culmination of thousands of hours of work by many dedicated individuals. And is not error-free. Any other efforts will be prone to similar issues so it’s going to be a major undertaking and I look forward to the results. The ChEBI team are already doing a good job in this area. You can see an ontology Tree View here. So, I’m definitely excited to see what will be better! Exciting times.

PMR comments? “Now, I suggested that the “(Z)” should not have been added to “ranitidine” to indicate the stereochemistry. You can find pages out there with “(E)”. What is the “correct structure”? Or is this a meaningless question?”

In my opinion this is NOT a meaningless question it is a good question. You saw what the crystal structure showed. SHould the name include stereochemistry? If so, when?

Please stay engaged in these discussions with both Peter and I. They are important and meaningful.

Following the announcement by JC Bradley that Drexel University now has an eCrystals Repository I connected with Simon Coles. We’ve exchanged a few email and have the go ahead to scrape the eCrystals structures and DOIs from their eCrystals repository in Southampton and will be doing so over the next few days and adding the data to ChemSpider. Watch out for the new collection as it goes online.