Pages

Saturday, December 09, 2017

inventaire.io: every house a library

Two weeks ago someone made me aware of inventaire.io. This new social profile page allows you to digitize a list of your books. That in itself is already very useful. In the past I have made some efforts to create overviews of the books I own. If only to get some count, with respect to insurance, etc. But a very neat feature is the ability to find other people in your neighborhood with whom you may exchange books: for each book you can indicate who can see you own that book (no one, by default; if not mistaken, inheriting the ugw approach of UNIX), and if others can borrow or buy this book.

This shows users with public profiles in The Netherlands, showing the uptake has not been massive yet, but I'm hoping this post changes that :)


But the killer feature I am waiting for is a map like this, but then for books. Worldcat has a feature where it lists you the nearest library that has a copy of the book you are looking for:


And such a feature (map or list) would be brilliant for inventaire.io: every house would potentially be a library. That idea appeals to me.

Oh, and it's Wikidata-based but that should hardly be a surprise.

Sunday, December 03, 2017

De Nationaal Plan Open Science Estafette: mijn eerste Open Science stappen

Note: For my English-only followers, I will translate this to English later.

openscience.nl
Eerder dit jaar werd er in Delft een meeting voor Nederlandse onderzoekers georganiseerd om te horen over en feedback te geven op het Nationaal Plan Open Science (doi:10.2777/061652). Ik prijs me rijk dat ik hieraan heb kunnen en mogen meedenken, want meer toegankelijkheid tot wetenschappelijke kennis ligt me aan het hart. Tijdens de lunch konden mensen hun Open Science laten zien. Hieruit vloeide het idee voort om een estafette op te zetten om zo veel mogelijk te laten zien hoe veel Open Science er eigenlijk al in Nederland gedaan wordt. Hierbij de start: elke volgende estafetteloper vertelt iets over hun Open Science onderzoek. En of de focus nu op Open Data, Open Access, of op Open Source is maakt niet zo veel uit. Want de diversiteit in het Nederlandse Open Science-gemeenschap is nu eenmaal groot.

Mijn Open Science-verhaal gaat terug naar de tijd dat ik student scheikunde was aan wat nu de Radboud Universiteit heet. Wij kregen daar in 1994 toegang tot het internet, en dit opende voor mij een wereld van Open kennis! Onze bibliotheek was goed gevuld, maar soms moest ik naar afdelingen om daar bepaalde tijdschriften in te kunnen kijken. Altijd ongemakkelijk om een koffiekamer met senior onderzoekers binnen te lopen als student.

Ik leerde HTML en later Java. Java, met hun applets, brachten het internet tot leven. Het kon 3D modellen van chemische structuren laten zien. Dat kan een tijdschrift niet. Twintig jaar later kunnen tijdschriften dat nog steeds niet, maar dat terzijde. In de drie, vier jaren daarna leerde ik drie projecten kennen, elk Open Source: Jmol (nu JSmol), JChemPaint, en het bestandsformaat "Chemical Markup Language" (CML). De eerste was om 3D structuren te laten zien op het internet en het tweede visualiseerde twee-dimensionale (2D) chemische structuren. CML was een formaat waarin ik zowel 2D- en 3D-coördinaten kon opslaan. Maar het ding was dat Jmol en JChemPaint helemaal geen CML konden lezen.

Maar daar kwam Open Science om de hoek kijken. Immers, ik kon van Jmol en JChemPaint de broncode downloaden, aanpassen, en delen met anderen. Ik was overtuigd! En ging aan de slag. Natuurlijk had ik mijn aanpassingen gewoon zelf kunnen gebruiken, maar omdat ik dacht dat het voor andere misschien ook wel bruikbaar zou kunnen zijn, stuurde ik mijn aanpassingen ("patches") naar de auteurs van Jmol en JChemPaint. Dolgelukkig en trots was ik toen de wetenschappers in Duitsland en de Verenigde Staten het in hun versie opnamen!

En het heeft me allemaal geen windeieren gelegd. In mijn laatste jaar van mijn studie heb ik een abstract naar een internationale conferentie ingestuurd. Die werd geaccepteerd en dus moest ik naar de Washington (Georgetown, om precies te zijn), om wat over mijn werk te vertellen. Maar bovendien had ik met de auteurs van Jmol en JChemPaint afgesproken in South Bend, waar we de basis gelegd hebben voor een nieuw Open Science project, de Chemistry Development Kit (CDK). Duur, maar gelukkig kreeg ik een beurs van een Nederlands bedrijf. Een wonderlijke reis was het. In de slaaptrein avondeten met een soldaat die tijdens D-Day in actie is geweest, in New York van de stoep stappen omdat er zware jongens aankomen (die een bekende boys band blijken te zijn), en in het WTC staan (een jaar voor 9/11) en horen hoe toeristen bij de musicalticketverkoop vragen "What is broadway?".

Ik ben trots dat ik aan deze Open Science projecten heb kunnen bijdragen en dat ik medeontwerper ben van de CDK. Door hun Open karakter hebben deze projecten in flinke impact gehad, en na twintig jaar, nog steeds hebben. Natuurlijk, het is niet de functie van een eiwit en metaboliet, maar deze projecten hebben zeker niet alleen mijn onderzoek geholpen. Met dank aan anderen, natuurlijk: Hens Borkent, Dick Wife, Dan Gezelter, Christoph Steinbeck, en Peter Murray-Rust.

Trouwens, over estafettes gesproken, Open Science is op zichzelf ook een estafette: je neemt het stokje over van de mensen voor je, geeft er je eigen draai aan, en geeft daarna het stokje weer door aan de volgende. En het stokje wordt elke dag mooier!

Deze Nationaal Plan Open Science Estafette gaat ook verder. Ik mag mijn stokje doorgeven aan Rosanne Hertzberger. Ik ben superbenieuwd naar haar Open Science verhaal! En natuurlijk van alle lopers die daarna aan de estafette deelnemen!

Sunday, November 26, 2017

Winter solstice challenge: what is your Open Knowledge score?

Source: Wikimedia, CC-BY 2.0.
Hi all, welcome to this winter solstice challenge! Umm, to not give our southern hemisphere colleagues not a disadvantage, as their winter solstice has already passes, you're up for a summer solstice challenge!

Introduction
So, you know ImpactStory and Altmetric.com (if not, browse my blog); these are wonderful tools to see what people are doing with your work. I hope you already know about OpenCitations, a collaboration of publishers, CrossRef, and many others, to make all citation data available. They just passed the 50% milestone, congratulations on that amazing achievement! For the younger scientists it may be worth realizing that for the past 20 years, at least, this data was copyrighted and not to be used unless you paid. Elsevier is, BTW, the major culprit still claiming IP on this, but RT this if you are surprised.

So, the reason I introduce both ImpactStory and OpenCitations is the following. Scientific articles are data and knowledge dense documents. If we did not redirect the reader to other literature. That may give a more complete sketch of the context, describe a measurement protocol, describe how certain knowledge was derived, etc. Therefore, just having your article Open Access is not enough: the articles you cite should be Open Access too. That's the next phase if really making an effort to have all of humanity benefit from the fruits of science.

I know it is hard already to calculate a "Open Access" score, though ImpactStory does a great job at that! So, calculating this for your paper and the papers those papers cite is even harder. You may need to brush up your algorithm and programming skills.

Eligibility
Anyone is allowed to participate. Submission of your entry is done online, e.g. in your blog, in a public write up, or even a open notebook! However, you need at least on citable research object. That is, it needs a DOI. Otherwise, I cannot give you the prize (see below). The score should be based on all your products. Bonus points for those who include software and data citations. Excluding citable object to boost your score (for example, I would have to exclude my book chapters), is seen as cheating the system.

Depth
Your article B may cite three articles (C, D, J) but
article D also cited articles (F, I). So, your
Open Knowledge score is recursive.
Source: Wikipedia, CC-BY-SA 4.0
Calculating your Open Knowledge score can be done at multiple levels. After all, your article depends (cites) articles, and your software depends on libraries, but those cited articles and software dependencies recursively also cite articles and/or software. The complexity is non-trivial, making it a perfect solstice challenge indeed!

Prizes
The prize I have to offer is my continued commitment to Open Science, but that you already get for free and may not be enough boon. So, instead, soon after the winter/summer solstice at the end of this year, I will blog about your research boosting your #altmetrics scores. Yes, I will actually read and try to understand it!

And because there is the results and the method, neither of which exist yet, there are two categories! I just doubled your chance of winning! That's because humanity is worth it! One prize for the best tool to calculated your Open Knowledge score, and one prize for the researcher with the highest score.

Audience Prize
If someone feels a need to organize an audience prize, this is very much encouraged! (Assuming Open approaches, of course :)

Wednesday, November 22, 2017

Monitoring changes to Wikidata pages of your interest

Source: User:Cmglee, Wikipedia, CC-BY-SA 3.0
Wikidata is awesome! In just 5 years they have bootstrapped one of the most promising platforms for the future of science.Whether you like the tools more, or the CCZero, there is coolness for everyone. I'm proud to have been able to contribute my small 1x1 LEGO bricks to this masterpiece and hope to continue this for many years to come. There are many people doing awesome stuff, and many have way more time, have better skills, etc. Yes, I'm thinking here if Finn, Magnus, Andra, the whole Su team, and many, many more.

The point of this post, is to highlight something this matters and something that comes up over and over again and where there just are solutions, like implemented by Wikidata: provenance. We're talking a lot about FAIR data. Most of FAIR data is not technological, it's social. And most of the technical work going on now, is basically to overcome those social barriers.

We teach our students to cite primarily literature and only that. There is a clear reason for that: the primary literature has the arguments (experiments, reasoning, etc) that back a conclusion. Not any citing is good enough: it has to be the exact right shape (think about that Lego brick). This track record of our experiments is a wonderful and essential idea. It removes the need for faith and even trust. Faith is for the religious, trust is for the lazy. Now, without being lazy, it is hard to make progress. But as I have said before (Trust has no place in science #2), every scholar should realize that "trust" is just a social equivalent of saying you are lazy. There is nothing wrong with being lazy: a side effect of it is innovation.

Ideally, we do not have to trust any data source. If we must, we just check where that source got its data from. That works for scholarly literature, and works for other sources too. Sadly, scholarly literature has a horrible track record here: we only cite stuff we find more trustworthy. For example, we prefer to cite articles from journals with high impact factors. Second, we don't cite data. Nor software. As a scholarly community, we don't care much about that (this is where lazy is evil, btw!).

Wikidata made the effort to make a rich provenance model. It has a rich system of referring to information sources. It has version control. And it keeps track of who made the changes.

Of all the awesomeness of Wikidata, Magnus is one of the people that know how to use that awesomeness. He developed many tools that make doing to right thing a lot easier. I'm a big fan of his SourceMD, QuickStatement, and two newer tools, ORCIDator and SPARQL-RC. This latter tool leverages SPARQL (and thus Wikidata RDF) and the version control system. By passing a query, it will list all changes in a given time period. I am still looking for a tool that can show my all changes for items I originally created, but this already is a great tool to monitor the quality of crowdsourcing for data in Wikidata I care about. No trust, but the ability to verify.

Here's a screenshot for the changes of (some of my) output of scientific output I am author of:


Sunday, November 12, 2017

New paper: "WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research"


Focus on metabolic pathways increases the
number of annotated metabolites, further improving
the usability in metabolomics. Image: CC-BY.
TL;DR: the WikiPathways project (many developers in the USA and Europe, contributors from around the world, and many people curating content, etc) has published a new paper (doi:10.1093/nar/gkx1064/4612963), with a slight focus on metabolism. 

Full story
Almost six years ago my family and I moved back to The Netherlands for personal reasons. Workwise, I had a great time in Stockholm and Uppsala (two wonderful universities; thanks to Ola Spjuth, Bengt Fadeel, and Roland Grafström), but being immigrant in another country is not easy, not even for a western immigrant in a western country. ("There is evil among us.")

We had decided to return to our home country, The Netherlands. By sheer coincidence, I spoke with Chris Evelo in the week directly following that weekend. I had visited his group in March that year, while attending a COST-action about NanoQSAR in Maastricht. I had never been to Maastricht University yet, and this group, with their Open Source and Open Data projects, particularly WikiPathways, would give us enough to talk about. Chris had a position on the Open PHACTS project open. I was interested, applied, and ended up in the European WikiPathways group led by Martina Kutmon (the USA node is the group of Alex Pico).

Fast forward to now. It was clear to me that biological text book knowledge was unusable for any kind of computation or machine learning. It was hidden, wrongly represented, and horribly badly annotated. In fact, it still is a total mess. WikiPathways offered machine readable text book knowledge. Just what I needed to link the chemical and biological worlds. The more accurate biological annotation we put in these pathways, or semantically link to these pathways, the more precise our knowledge becomes and the better computational approaches can find and learn patterns not obvious to the human eye (it goes both ways, of course! Just read my PhD thesis.)

Over the past 5-6 years I got more and more involved in the project. Our Open PHACTS tasks did involve WikiPathways RDF (doi:10.1371/journal.pcbi.1004989), but Andra Waagmeester (now Micelio) was the lead on that. I focused on the Identifier Mapping Service, based on BridgeDb (together with great work from Carole Goble's lab, e.g. Alasdair and Christian). I focused on metabolomics.

Metabolomics
Indeed, there was plenty to be done in terms of metabolic pathways in WikiPathways. The current database had a strong focus on the genetics and proteins aspects of the pathways. In fact, many metabolites were not datanodes and therefore did not have identifiers. And without identifiers, we cannot map metabolomics data to these pathways. I started working on improving these pathways, and we did some projects using it for metabolomics data (e.g. a DTL Hotel Call project led by Lars Eijssen).

The point of this long introductions is, I am standing on the shoulders of giants. The top right figure shows, besides WikiPathways itself, and the people I just mentioned, more giants. This includes Wikidata, which we previously envisioned as hub of metabolite information (see our Enabling Open Science: Wikidata for Research (Wiki4R) proposal). Wikidata allows me to solve the problem that CAS registry numbers are hard to link to chemical structures (SMILES): it has some 70 thousand CAS numbers.


SPARQL query that lists all CAS registry numbers in Wikidata, along with the matching
SMILES (canonical and isomeric), database entry, and name of the compound. Try it.
A lot more about CAS registry numbers is found in my blog.
Finally, but certainly not least, is Denise Slenter, who started this spring in our group. She picked up things I and others were doing very quickly (for example this great work from Maastricht Science Programme students), gave those her own twist, and is now leading the practical work in taking this to the next level. This new WikiPathways paper shows the fruits of her work.

Metabolism
Of course, there are plenty of other pathways database. KEGG is still the gold standard for many. And there is the great work of Reactome, RECON, and many others (see references in the NAR article). Not to mention the important resources that integrate pathways resources. To me, unique strengths of WikiPathways include the community approach, very liberal licence (CCZero), many collaborations (do we have a slide on that?), and, importantly, its expressiveness. The latter allows our group to do the systems biology work that we do, analyzing microRNA/RNASeq data, studying diseases at a molecular interaction level, see the effects of personal genetics (SNPs, GWAS), and visually integrate and summarize the combination of experimental data and text book knowledge.

OK, this post is now already long enough. And seeing from the length, you can see how much I am impressed with WikiPathways and where it goes. Clearly, there is still a lot left to do. And I am just another person contributing to the project and honored that we could give this WikiPathways paper a metabolomics spin. HT to Alex, Tina, and Chris for that!

Slenter, D. N., Kutmon, M., Hanspers, K., Riutta, A., Windsor, J., Nunes, N., Mélius, J., Cirillo, E., Coort, S. L., Digles, D., Ehrhart, F., Giesbertz, P., Kalafati, M., Martens, M., Miller, R., Nishida, K., Rieswijk, L., Waagmeester, A., Eijssen, L. M. T., Evelo, C. T., Pico, A. R., Willighagen, E. L., Nov. 2017. WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research. Nucleic Acids Research. http://dx.doi.org/10.1093/nar/gkx1064