Pages

Monday, August 15, 2016

Alzheimer’s disease, PaDEL-Descriptor, CDK versions, and QSAR models

A new paper in PeerJ (doi:10.7717/peerj.2322) caught my eye for two reasons. First, it's nice to see a paper using the CDK in PeerJ, one of the journals of an innovative, gold Open Access publishing group. Second, that's what I call a graphical abstract (shown on the right)!

The paper describes a collection of Alzheimer-related QSAR models. It primarily uses fingerprints and the PaDeL-Descriptor software (doi:10.1002/jcc.21707) for it particularly. I just checked the (new) PaDeL-Descriptor website and it still seems to use CDK 1.4. The page has the note "Hence, there are some compatibility issues which will only be resolved when PaDEL-Descriptor updates to CDK 1.5.x, which will only happen when CDK 1.5.x becomes the new stable release." and I hope Yap Chun Wei will soon find time to make this update. I had a look at the source code, but with no NetBeans experience and no install instructions, I was unable to compile the source code. AMBIT is now up to speed with CDK 1.5, so the migration should not be too difficult.

Mind you, PaDEL is used quite a bit, so the impact of such an upgrade would be substantial. The Wiley webpage for the article mentions 184 citations, Google Scholar counts 369.

But there is another thing. The authors of the Alzheimer paper compare various fingerprints and the predictive powers of models based on them. I am really looking forward to a paper where the authors compare the same fingerprint (or set of descriptors) but with different CDK versions, particularly CDK 1.4 against 1.5. My guess is that the models based on 1.5 will be better, but I am not entirely convinced yet that the increased stability of 1.5 is actually going to make a significant impact on the QSAR performance... what do you think?


Simeon, S., Anuwongcharoen, N., Shoombuatong, W., Malik, A. A., Prachayasittikul, V., Wikberg, J. E. S., Nantasenamat, C., Aug. 2016. Probing the origins of human acetylcholinesterase inhibition via QSAR modeling and molecular docking. PeerJ 4, e2322+. 10.7717/peerj.2322

Yap, C. W., May 2011. PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints. Journal of Computational Chemistry 32 (7), 1466-1474. 10.1002/jcc.21707

Saturday, August 13, 2016

The Groovy Cheminformatics scripts are now online

My Groovy Cheminformatics with the Chemistry Development Kit book sold more than 100 times via Lulu.com now. An older release can be downloaded as CC-BY from Figshare and was "bought" 39 times. That does not really make a living, but does allow me to financially support CiteULike, for example, where you can find all the references I use in the book.

The content of the book is not unique. The book exists for convenience, it explains things around the APIs, gives tips and tricks. In the first place for myself, to help me quickly answer questions on the cdk-user mailing list. This list is a powerful source of answers, and the archive covers 14 years of user support:


One of the design goals of the book was to have many editions allowing me to keep all scripts updated. In fact, all scripts in the book are run each time I make a new release of the book, and, therefore, which each release of the CDK that I make a book release for. That also explains why a new release of the book currently takes quite a bit of time, because there are so many API updates at the moment, as you can read about in the draft CDK 3 paper.

Now, I had for a long time also the plan to make the scripts freely available. However, I never got around to making the website to go with that. I have given up on the idea of a website and now use GitHub. So, you can now, finally, find the scripts for the two active book releases on GitHub. Of course, without the explanations and context. For that you need the book.

Happy CDK hacking!

Sunday, July 17, 2016

Use of the BridgeDb metabolite ID mapping database in PathVisio

A long time ago Martijn van Iersel wrote a PathVisio plugin that visualizes 2D chemical structures of metabolites in pathways as found on WikiPathways. Some time ago I tried to update it to a more recent CDK version, but did not have enough time at the time to get it going. However, John May's helpful DepictionGenerator made it a lot easier, so I set out this morning in updating the code base to use this class and CDK 1.5.13 (well, strictly speaking it's running a prerelease (snapshot) of CDK 1.5.14). With success:


The released version is a bit more tweaked and shows the 2D structure diagram more filling the Structure tab. I have submitted the plugin to the PathVisio Plugin Repository.

Now, you may know that these GPML pathways only contain identifiers, and no chemical structures. But this is where the metabolite identifier mapping database helps (doi:10.6084/m9.figshare.3413668.v1): it contains SMILES strings for many of the compounds. It does not contains SMILES string from Wikidata, but I will start adding those in upcoming releases too. The current SMILES strings come from HMDB.

To show how all this works, check out the below PathVisio screenshot. The selected node in the pathway has a label uracil and the left most front dialog was used to search in the metabolite identifier mapping database and it found many hits in HMDB and Wikidata (middle dialog). The Wikidata identifier was chosen for the data node, allowing PathVisio to "interpret" the biological nature of that node in the pathway. However, along with many mapped identifiers (see the Backpage on the right), this also provides a SMILES that is used by the updated ChemPaint plugin.


Sunday, July 10, 2016

Setting up a local SPARQL endpoint

... has never been easier, and I have to say, with Virtuoso it already was easy.

Step 1: download the jar and fire up the server
OK, you do need Java installed, and for many this is still the case, despite Oracle doing their very best to totally ruin it for everyone. But seriously, visit the Blazegraph website (@blazegraph) and download the jar and type:

$ java -jar blazegraph.jar

It will give some output on the console, including a webpage with SPARQL endpoint, upload form etc.


That it tracks past queries is a nice extra.

Step 2: there is no step two

Step 3: OK, OK, you also want to try a SPARQL from the command line
Now, I have to say, the webpage does not have a "Download CSV" button on the SPARQL endpoint. That would be great, but doing so from the command line is not too hard either.

$ curl -i -H "Accept: text/csv" --data-urlencode \
  query@list.rq http://192.168.0.233:9999/blazegraph/sparql

But it would be nice if you would not have to copy/paste the query into a file, or go to the command line in the first place. Also, I had some trouble finding the correct SPARQL endpoint URL, as it seems to have changed at least twice in recent history, given the (outdated) documentation I found online (common problem; no complaint!).

HT to Andra who first mentioned Blazegraph to me, and the Blazegraph team.

Friday, July 08, 2016

Metabolomics 2016 Write up #1: some interesting bits

A good conference needs some time to digest. A previous supervisor advised me that a conference travel of 5 days takes 5 full day to follow up on everything. I think he is right, though few of us actually block our schedules to make time for that. Anyway, I started following up on things last weekend, resulting in a first two blog posts:
The second was pretty much how I have been blogging a lot: it's my electronic lab notebook. The first is about how people can link out to WikiPathways. That post explains how people can create links between identifiers and pathways.

But there was a lot of very interesting stuff at Metabolomics 2016. I hope to be blogging about more things, but please find some initial coverage in the slides of a presentation I gave yesterday at our department:

Also check the Twitter hashtag #metsocdublin2016.