Here are the things I discovered this week that fascinates me.

Before we dive into the topics, here is an piano exercise piece that I recorded this week: the simplified, arranged version of the Maple Leaf Rag by Scott Joplin.

Shameless Advertisement

Bachelor-level course Applied Mathematics and Informatics in Drug Discovery

In the coming fall semester, I will teach the course Applied Mathematics and Informatics in Drug Discovery at the Department of Mathematics and Computer Science, University of Basel again. In the last fall semester, around 35 students of various disciplines including mathematics, computer science, physics, chemistry, and drug sciences took the course and the exam.

The information about the course is available at the course directory of the University Basel.

HBVouroboros automates sequencing-based HBV genotyping and expression profiling

This week I published an open-source bioinformatics tool, named HBVouroboros, on GitHub BEDApub/HBVouroboros.

The ouroboros (or uroboros) is an ancient symbol depicting a serpent or dragon eating its own tail. In its twisted form, an ouroboros reassembles covalently closed circular DNA (cccDNA) of hepatitis B virus (HBV).

HBVouroboros implements a Snakemake-based workflow to map Illumina RNA-sequencing reads to HBV cccDNA. It offers following functionalities:

  • Mapping reads to reference genomes of of eight major HBV genotypes (A-H). The mapping procedure takes care of reads that span the junctions between the two ends of the linear form of the cccDNA. This holds true also for mapping procedures described later.
  • De novo assembly of the HBV genome.
  • Inference of the reference strain from which the reads are likely generated and genotype calling using BLAST and data from HBVdb.
  • Base-level, gene-level, and HBV-genome-level quantification of read counts, as well as structural variants with regard to the inferred reference strain.

Check it out and let me know if you have suggestions how to improve it: BEDApub/HBVouroboros.

Drug discovery

Feasibility and safety profiles of CRISPR-edited T cell therapy

Lu et al. reported on Nature Medicine the results of a first-in-human phase I clinical trial of CRISPR-Cas9 PD-1-edited T cells with advanced non-small-cell lung cancer.

All primary and secondary endpoints were met. Of 22 patients enrolled, 12 (54%) were able to receive treatment. The authors used sequencing to determine mutation frequency at selected sites: The median mutation frequency of off-target events was 0.05% (range, 0-0.25%) at 18 candidate sites.

Aviv Regev joining Roche as head of Genentech Research and Early Development

I wrote another post about that.


APOE4 and blood-brain barrier

Montagne et al. reported in Nature that genetic variant APOE4 of the apolipoprotein E (APOE) gene causes cognitive impairment by destabilizing the blood-brain barrier (BBB). They also reported the soluble form of platelet-derived growth factor B (PDGFB) as a biomarker of APOE4-associated BBB destabilization, independent of Tau and Amyloid-beta status.

The mechanism suggested by the authors is that APOE4 induces cyclophilin A (CypA, Peptidylprolyl isomerase A, or the PPIA gene) and matrix metalloproteinase 9 (MMP9). They are components of the inflammatory pathways in pericytes. Pericytes are multifunctional mural cells that wrap around endothelial cells that line the capillaries and venules. Paracrine signalling of CypA and MMP9 in other pericytes and perhaps endothelial cells leads to destabilization of BBB.

The freely available News and Views article by Ishii and Iadecola summarized the findings.

Migroglia and border-associated macrophages

Utz et al. explained on Cell their work to dissect two types of macrophages in the central nervous system: microglia and border-associated macrophages (BAMs). They are different in phenotype, transcriptional expression profile, and localization. Both are present in the yolk sac. Whereas the development of microglia depends on TGF-beta, BAMs develop independent of it.

Doubling the half-life of ex vivo expanded Tregs with IL-2

Furlan et al. reported on Blood Advances that adding Interleukin-2 (IL-2) in the presence of rapamycin, an immunosuppressant drug, increases the half-life of of the regulatory T cells (Tregs), a subpopulation of T cells that modulate the immune system, maintain tolerance to self-antigens, and prevent autoimmune diseases.

Single-cell sequencing showed that early after addition of IL-2, cells showed gene expression profiles similar to those of activated Tregs. Interestingly, the most activated ones seem also to show signs of p53-mediated apoptosis. After 20 days of in vitro culture, however, the expression profiles resemble rather those of resting Tregs.

A human skeletal muscle atlas

Xi et al. reported on Cell Stem Cell a collection of human skeletal muscle cell expression profiles measured by single-cell RNA sequencing. The profiled muscle cells include embryonic, fetal, and postnatal stages, as well as cell cultures derived from human pluripotent stem cells (hPSC). An important finding is that pluripotent stem-cell derived muscle cells produced by all the methods they tried resembled muscle progenitor cells at an early stage, and did not align to adult muscle stem cells.

Cell-specific drug effects in pancreatic islets

Marquina-Sanchez et al reported on Genome Biology an interesting finding using single-cell RNA sequencing with spike-in reference cells as controls.

They observed contamination by cell-free RNA, which can constitute up to 20% in human primary tissue samples. By assuming linear mixing model, the authors could decontaminate the data. Using this and other analysis methods, they investigated the effect of FOXO1, a transcription factor of the Forkhead family, and compounds that inhibits FOXO in human and mouse pancreatic islets.

The computational pipeline is available at Github. 10X and Drop-seq sequencing data are available in NCBI GEO, under accession number GSE147203 and GSE147202, respectively.

Computer science

Probabilistic programming and Bayesian methods for programmers

I discovered this Github project of CamDavidsonPilon sometime ago. It introduces Bayesian Methods to programmers and hackers.

Review on deep learning and graph networks

I read the review by Battaglia et al. on Relational inductive biases, deep learning, and graph networks (2018), which was discovered from 集智俱乐部 on WeChat.

The learning notes were posted in the blog post Deep learning with graph networks.

Interview with Donald Knuth

The Quanta Magazine published an interview with Donald Knuth, among others the author of The Art of Computer Programming, under the title The computer scientist who can’t stop telling stories.

The following sentences by Knuth impressed me most.

  • “The best way to communicate from one human being to another is through story.”
  • “The ultimate test of whether I understand something is if I can explain it to a computer. … In most of life, you can bluff, but not with computers.”
  • “A person’s success in life is determined by having a high minimum, not a high maximum. … if almost everything you do is up there, then you’ve got a good life. And so I try to learn how to get through things that others find unpleasant.”

The dual role of story-telling and programming intrigued me. What is the difference between a story and a computer program? With a story you appeal to emotions, including curiosity, fantasy, and satisfaction of hearing a good story. With a program you instruct, give facts, and design logic. Despite apparent differences, they are both founded on verbal or written communication.

His words led me to realise that communication with language is key to both effective collaboration with human and powerful instructions to machines. In my opinion, this is one of the most important soft skills that I should improve and sharpen as a computational biologist working in drug discovery.

Tricks that I learned

  • Clean unused packages in conda with conda clean --all.
  • Update dependent packages using pip with pip install PACKAGE --upgrade --upgrade-strategy [eager|only-if-needed].
  • A trick to delete files with special character in their names in Linux: use ls -lito display the index number (inode) of the files. Suppose the file you wish to delete has a inode number of 1234, which is shown in the first column of the output of ls -li, then use find . -inum 1234 -delete to delete it. The link above also introduces other methods.

Bioinformatics and computational biology

MuSiC: Bulk tissue cell-type deconvolution with multi-subject single-cell expression reference

Wang et al. reported on Nature Communications in 2019 the method of Multi-subject Single Cell Deconvolution, or MuSiC.

MuSiC starts from single-cell RNA sequencing data from multiple subjects. It classifies cells into cell types, and constructs a hierarchical clustering tree that reflects the similarity between cell types. MuSiC then determines the group-consistent genes and calculates cross-subject mean and cross-subject variance for these genes in each cell type. The method up-weights genes with low cross-subject variance (namely genes consistently highly expressed in a cell type irrespective of the donor) and down-weights genes with high cross-subject variance. The informative genes, which show stable expression profiles across individuals, are used to deconvolute bulk-sequencing data.

The code is implemented in the R package MuSiC and is available at xuranw/MuSiC on GitHub. I discovered this project thanks to Gregor Sturm.

Other gems

  • hackl/tikz-network visualizes complex networks in Latex, for instance multilayer networks.
  • bness/GGN is a software that learns from dynamics in order to infer and reconstruct the network structure.
  • goldendict is a feature-rich desktop dictionary programm. The code is stable (the last update is 2011). I used it on my Linux machine this week. Several functionalities are important for me: supporting both local dictionaries and online ones including Wiktionary and Google Translation, Hunspelling-supported spell check, and cross-platform support.
  • MuseScore is an open-source software for music notation. I started using it to arrange some of my exercise piano pieces. This week I followed the compact and informative tutorial series of the software made by Dr. George Hess on YouTube. It was well made and introduces someone totally new to the software like me the basics to start inputting and laying out notes quickly.

Have a nice weekend!