Here are the things I discovered this week that fascinates me.

Computational biology

Digital twin of a cancer patient

I discovered this interesting event by the newsletter of Society of Industrial and Applied Mathematics (SIAM): NCI-DOE collaboration 2020 Ideas Lab: toward building a cancer patient “digital twin”. Unfortunately, researchers in industry are not eligible to apply. However, I find the idea of building a digital twin of a cancer patient daring and interesting, and it is particularly interesting that the idea lab aims to bring mathematical, active learning, and ensemble model approaches together. Therefore, it is an interesting approach to bring mechanistic modelling, machine learning, and multi-scale modelling together. The application deadline is April 30th, 2020.

Protein-protein interactions in mouse brain

Following the paper last week about binary protein interactions in human, a new paper on Cell Systems reports protein-protein interactions that are found in mouse brain: BraInMap Elucidates the Macromolecular Connectivity Landscape of Mammalian Brain. I was impressed by the identification of a 28-member (!) RNA binding complex that the authors suggested to be associated with amyotrophic lateral sclerosis.

I wonder what would be the difference between human and mouse version of the dataset, how dynamic and stochastic this network is in reality, and whether it is possible to generate hypotheses about drug targets by integrating such protein-protein interaction data and single-cell or even spatial-resolved RNA expression data. Until we answer these questions, the resource can be checked out here:

Drug discovery

Machine learning for drug-target binding: half empty or half full?

Sturm and Mayr *et al. reported their evaluation of performance of machine-learning models trained with public data when applied to data in the pharma industry when it comes to drug target prediction. Colleagues from AstraZeneca and Janssen as well as academic collaborators joined the stdy.

The good news is that the performance is comparable when validated with either public or proprietary data. In particular, deep neural networks outperform other approaches. The (expected) bad news is that the comparable performance has a median F1 score lower 0.5 (ROC-AUC around 0.7). It varies by target families.

Is it half empty or half full? The results leave room for interpretation and discussion for experts. My take is that the result reflects to a certain extent the best performance we can achieve in such tasks. In another word, further optimization of machine-learning models, let it be boosting- or neural-network based methods, is unlikely to significantly improve the prediction accuracy. High-throughput methods to test drug-target binding and better computational modelling approaches that complement machine-learning approaches may be necessary to overcome the hurdle.

Last but not least, a bravo for the authors: though the industrial dataset is not available, the software code is available at


This week I went on reading the chapter 3 and 4 of Fühlen, Denken, Handeln (German) by Gerhard Roth. It was a good refreshment of basic (molecular) neurobiology.

Other gems

  • In an interview with Nature, Jim Yong Kim, the former head of the World Bank, said something that I resonate with: I think it’s very hard to convince others to do things that are complicated and labour-intensive, unless you’ve done it yourself, or at least started doing it. … You need to actually do it and say, it’s not perfect, but here it is.
  • How to ask questions the smart way by Eric S. Raymond.

Happy weekend!