Here are the things I discovered this week that fascinates me.

This is a drawing of my daughter, Agnes Boya Zhang, inspired by the story of Alice’s Adventures in Wonderland by English mathematician Charles L. Dodgson under the pseudonym Lewis Carroll.

Alice's Adventures in Wonderland, drawn by Agnes Boya Zhang on

Sharing is like a muscle

In an interesting blog post on stackoverflow, Ben James shares his learning from interviewing top-notch developers: great developers share a lot.

They may share not primarily for the purpose of serving others but only for themself. The sharing mindset, however, helps both them and other people. And it comes before the success, and is not the result of that.

A comment by Eric Lippert was particularly interesting: How do you become an expert on something? Well, find a pile of questions or a place where people are asking questions about your topic. If you try and answer each one, you’ll become an expert quickly enough.

Ben argues that though it’s important to create things for yourself, it may be wise to keep them public by default. This “public by default” philosophy means that we put what we try, what we learn, what we understand, and what we build in the public. They can be both success stories and failures from which one can draw lessons from. It may be a daunting task at the beginning. However, sharing is like a muscle. The more regularly you do it, the more efficient you become.

This opinion was also expressed by others, including Ruan Yifeng in his blog series ruanyf/weekly. I am trying to living up to it.

Drug discovery

Characterizing small molecules at multiple scales

Duran-Frigoda et al. reported on Nature Biotechnology their effort to characterize approved drugs at multiple scales: 2D finger prints, targets, interactome, cellular information, and up to clinical outcome.

They used vector presentations of data at multiple levels to cluster drugs that have been approved. The presentations were used to cluster compounds, to make predictions about their biological properties, and to link disease data with compound data, with the hope of repurposing compounds for other diseases than the disease for which the compound was originally designed for.

I like the concept of the paper because it supports the some proposals that my colleagues and I made in the review article Multiscale modelling of drug mechanism and safety, published end of 2019 on Drug Discovery Today, in particular that we have to integrate information and models at multiple scales. While statistical modelling and machine learning can inform us a lot when we analyse drugs retrospectively, I think it is still an open question how to construct and integrate models at different scales effectively prospectively.


TASL identified as an adapter in TLR7-9 signalling

Toll-like receptors (TLRs) play a critical role in recognizing pathogens and initiating immune responses. Heinz et al., Nature (2020) showed that CXorf21, a previously uncharacterised protein, interacts with the endolysosomal transporter SLC15A4.

Both CXorf21 and SLC15A4 have been linked with autoimmune diseases before by genetic studies. In this study, the authors found that loss of the CXorf21 protein, which the authors named ‘TLR adapter interacting with SLC15A4 on the lysosome’ (TASL), abrogated responses to endolysosomal TLR agonists in human immune cells. Deletion of either SLC15A4 or TASL impaired the activation of the interferon response pathway, mediated by IRF5, without affecting either NF-kB or MAPK signalling.

TASL works as an adapter, interacting with SLC15A4 and activating IRF5, by a conserved pLxIS motif, in which p denotes a hydrophilic residue and x denotes any residue. Its role in TLR7/8/9 signalling is comparable to IRF3 adapters, including STING1 (stimulator of interferon response cGAMP interactor 1), MAVS (mitochondrial antiviral signalling protein), and TICAM1 (toll-like receptor adapter molecule 1, also known as TRIF).

Chronic obstructive pulmonary disease (COPD)

Collaborating with Boehringer Ingelheim, Nature published a special Nature Outlook about chronic obstructive pulmonary disease (COPD). It contains several articles introducing the disease and relevant research.

Interestingly, almost at the same time, Rao et al. reported in Cell that regenerative metaplastic clones in COPD lung drive inflammation and fibrosis. Metaplastic means that cells show abnormal change in its nature. The authors cloned individual cells from lung tissue of patients with and without COPD. In normal lungs, normal distal airway progenitor are most prevalent. In contrast, in COPD lungs, three variant progenitors are found, which are epigenetically committed to distinct metaplastic lesions.

When transplanted in immunodeficient mice, these variant clones induced pathology including metaplasia, neutrophilic inflammation, and fibrosis seen in COPD. Surprisingly, these variants actually exist in control and fetal lung as minor constituents, and they may act in normal processes of immune surveillance, a natural physiological function to allow recognition and destruction of transformed cells before they grow into tumours, and to kill tumours after they are formed.

This suggests that abnormal expansion of these cells are associated with COPD etiology. Despite the interesting data of cell transplantation, I am not fully sure whether their expansion is the major cause of COPD. Nevetheless, confirming these findings and understanding whether modulating the expansion process provides clinically meaningful benefits are needed.

CRIPSR as an broad-spectrum antiviral

Abbott et al. from the Stanford University reported on Cell their application of the CRISPR (clustered regularly interspaced short panlindromic repeats) technique as a antiviral therapeutics.

Instead of commonly known Cas9, which cleaves DNA, they used Cas13, which contains at least four known subtypes. Zhang lab originally showed that Cas13 cleaves RNA, protecting bacteria from RNA phages and serving as a platform for RNA manipulation.

Abbott et al. identified conserved regions between viral genomes, for instance pan-corona virus targets, and designed CRISPR RNAs (crRNAs) that target such conserved RNAs. They used such crRNAs as a prototype of a prophylactic antiviral therapy named as PAC-MAN (prophylactic antiviral CRISPR in human cells).

They used all coronovavirus genomes that are available and performed bioinformatics analysis to identify conserved regions. They identified 22 nucleotide crRNAs in SARS-CoV-2 genomes with perfect match that has one or no mismatch to SARS or MERS. Based on mismatch analysis to human transcriptome (fewer than 2 mistmatches), crRNA expression analysis, and removal of sequences containing UUUU (poly-T sequences), they identified about 3.2k crRNA candidates. They tested only 40 crRNAs, covering both RNA-dependent RNA polymerases (RdRp) and N genes. They used reporter assays to show some of the crRNA candidate worked.

Since they did not have access to live SARS-CoV-2 virus, they tested their approach using an H1N1 strain of influenza A virus, using more or less the same bioinformatics and lab approaches described for CoV-2. They show that at higher concentrations (MOI), some crRNA could indeed specifically down-regulate IAV genome segments. However, as the authors pointed out, no evidence was provided about the in vivo efficacy.

Finally, they used bioinformatics analysis to show that 6 crRNAs are able to target 90% of known CoV genomes. It suggests that the approach may be used even for genomes with high mutation frequencies and genomic diversities.

It remains to be seen whether this approach would yield benefit in vivo. And the toxicity in human cells, as far as I can see, remains to be rigorously assessed. The delivery of the system is also a big challenge. On the other side, I am happy to see that bioinformatics approach can assist high-throughput and fast candidate selection of sequence-based vaccination and antiviral therapies.

Stimulating dendritic cells enhances T-cell therapy

Lai et al. reported on Nature Immunology their effort to engineer mouse T cells to secrete Fms-like tyrosine kinase 3 ligand (Flt3L), a cytokine and a dendritic cell (DC) growth factor. They observed enhanced T-cell and dendritic-cell activation and inhibition of tumour growth.

The data suggests that by augmenting dendritic cells, engineered T cells may deliver better efficacy profiles for tumours that lack the antigen recognized by the T cells.

Other interesting staff in biology

  • Simultaneous Readout of Lineage Histories and Gene Expression Profiles in Single Cells with CRIPSR array repair lineage tracing (CARLIN) by Bowling et al. on Cell. The technique allows in vivo tracking of individual cells. I could imagine that such techniques can be used to understand time-dependent response to drug responses both in vitro and in vivo.
  • Kitchen et al. reported on Cell that Aquaporin-4 (AQP4) mediates flux of water into astrocytes in hypoxia conditions, causing central nervous systems edema. Its translocation to cell surface is mediated by calmodulin (CaM, human genes CALM1, CALM2, and CALM3) and protein kinase A (PKA). Inhibition of AQP4 localization by trifluoperazine, a typical antipsychotic to treat schizophrenia, ablated CNS edema. The author argues that is because trifluoperazine modulates calmodulin despite that it is also believed to block dopamine receptors D1 and D2.
  • Kim et al. reported on Cell a highly complex architecture of SARS-CoV-2 transcriptome. Using DNA nanoball sequencing, which uses rolling circle replication to amplify small fragments of genomic DNA into DNA nanoballs, and Nanopore direct RNA sequencing, they identified many discontinuous transcription events, unknown ORFs due to fusion, deletion, and frameshift, and modification sites on viral transcripts. Whether any functions are associated with these events, though, is not clear yet.

Computer sciences

MathJax 3 for Jekyll

I updated the MathJax script used by my blog to version 3.0, following the Get Started tutorial.

The new version failed to display the mathematics at the beginning for two reasons. First, the default inline delimiter has changed from ‘$’ to ‘[’. Second, kramdown in Jekyll did not work well with this version well. I got helpful information from the blog 11011110, and found a solution in the last point of V2 API changes.

See the _includes/custom-head.html file of the source of this blog for the solutions to both problems.

Versioneer for Python

I discovered the tool python-versioneer. It manages version numbers in distutils-based python projects. Once setup, the developer does not need to update the embedded version string any more. Instead, just record a new tag in the version-control system, for instance git tag 1.0.0 for git. The versioneer tool will take care of giving the release a proper name.

Syncing a forked project with the upstream project

I consulted this help document of GitHub to do that.

First, set the upstream project by git remote add upstream URL.

## fetch commits to upstream/master
git fetch upstream
## checkout my local master
git checkout master
## merge changes from upstream/master into my local master
git merge upstream/master
## now I can push my changes and changes in the upstream
git push origin master

Hypothesis testing for Python

It is not about statistical testing but about programming testing.

I found an interesting Python library to run tests named hypothesis. With the library, I can write tests that are parametrized by rules, which generates simple and comprehensible examples that make your tests fail, or by concrete examples.

The documentation to get started is available at readthedoc. And the example in the getting-started document implemented by me is available in my sandbox on GitHub.

Tricks that I learned

  • How to install gcc-10 in Linux MINT 19.3? Run sudo add-apt-repository ppa:ubuntu-toolchain-r/test, and then sudo apt-cache search gcc-10, you should see gcc-10 packages. Note that add-apt-repository may fail when a secure connection to the server is not possible, when instance when the machine is sitting behind a firewall.
  • I learned to use IFTTT to send a Twitter message when my blog is updated, inspired by an article on the website ictsolved.
  • In Python 3 (>=3.4), I can use importlib.reload(PKG/FILE) to clear the cache and reload a package or a source file. I discovered the solution from Stackflow.
  • It is not a good idea to name a Python module import, same as the keyword import, because it breaks importing.
  • Axis in Python versus MARGIN in R
  • Use git pull --rebase origin master will allow local changes patched to changes in the master branch on the remote machine. To allow --rebase automatically, in git > 1.7.9, use git config --global pull.rebase true. I learned the idea from the interesting article Comparing Workflows by Atlassian (the company behind Bitbucket), and I discovered the trick of global configuration here.
  • The difference between git checkout, git reset, and git revert for local and remote staged files and commits, also learned from the git tutorials by Atlassian. git checkout checkouts a branch or a commit out for trying things. git reset deals with local commits that I want to undo. git revert can undo commits that are published. While git reset changes the history, git revert does not - it only appends a revert step in the history. Since a private history is less sensitive than a public history, use git reset only with local changes that want to undo, and use git revert to revert committed changes in the public repository. Finally, I learned that HEAD~2 is a shortcut to say two commits before the current commit.
  • Use the tee command to write the output of another program foo to both the standard output and a given file: foo | tee outfile. Source: StackOverflow.

Computational biology

Predicting cell lineages using autoencoders and optimal transport

Yang et al. reported their analysis strategy of single-cell imaging data on PLOS Computational Biology (2020). The corresponding author is Caroline Uhler, with whom I had pleasant interactions during the Graphical Models Summer School two years ago in Basel, co-organized by Niko Beerenwinkel, her, and other colleagues. She has recently also joined D-BSSE (Department of Biosystems Science and Engineering) of ETH Zürich located at Basel.

Modelling infectious disease dynamics

Sarah Cobey discusses in a Perspective paper on Nature lessons that we can learn from modelling infectious disease dynamics. I posted my learning notes here.

Other interesting staff in computational biology


Discrete mathematics: an open introduction

Discrete mathematics is a branch of mathematics which builds the fundamentals of computer science and applied mathematics that are important in drug discovery such as bioinformatics as well as graph and network analysis.

Oscar Levin wrote a free, open-source textbook for a first or second year undergraduate course for math majors. The book comes with exercises and can be used as an interactive online ebook at openmathbooks.

Geometry of data

Philipp Mekler made me aware of a paper by Breiding et al. about learning algebraic varieties from samples.

Simply put, an algebraic variety is the set of solutions of a system of polynomial equations where the target variable equals zero. ‘Variety’ can be read by applied researchers as ‘manifold’ or ‘shape’. Learning algebraic varieties from few samples, in my limited understanding, is to learn about the shape (geometry) of the data.

Bernd Sturmfels gave a lecture guiding us through the main ideas of the paper, available on youtube. An implementation of the learning algorithm is implemented in a software package written in Julia, a scientific computing language, available on GitHub PBrdng/LearningAlgebraicVarieties.

Other gems

  • I discovered learn x in y minutes thanks to a Hacker News threshold. I used it to refresh my knowledge in Python and to learn using docker.
  • In an interview with Bennie Mols, neuroscientist Christof Koch talked about the Integrated Information Theory (Wikipedia link). He distinguished between intelligence and consciousness. According to him, intelligence is about behaviour, for instance what to do in a new environment. Consciousness is about being. And he argues that consciousness can not be deduced from behaviour. Rather, we have to check whether the system has causal power.
  • The Rijks museum in Amsterdam published the largest and most detailed ever photograph of The Night Watch by Rembrandt, available on the museums’ website.

photo of operation nightwatch, copyright of Rijks

  • From a hacker news user: Scrum is a way to take a below average or poor developer and turn them into an average developer. It’s also great at taking great developers and turning them into average developers.
  • The Jargon File, a collection of hacker languages and slangs that reflect hackers’ tradition, folklore, and humour. Discovered by Ruan Yifeng’s blog and Wikipedia when I tried to trace the source of the Hanlon’s razor: Never attribute to malice that which can be adequately explained by stupidity.

Have a happy weekend!