Here are some discoveries that fascinate me this week.

Before we start, here is a small piece A theme from the Musetta’s Walzer that I recorded this week. It was a simplified version of the Piccolo Valzer composed by Giacomo Puccini, and the theme is used in his Opera La Boheme.

JDZhang · 20200614-Thema-aus-Mesettas-Waltzer


Drug discovery

DNA-encoded library, machine learning, and virtual screening

I documented in another blog post my learning of the paper by authors from Google and X-Chem, describing a machine learning approach to learn from DEL screening hits and to perform virtual screening with made-on-demand libraries. It is a nice application of graph network methods.

Biology

DNA of neutrophil extracellular traps promotes cancer metastasis via CCDC25

Neutrophils are cells in the blood that are promptly (within minutes) and strongly activated upon infection by microorganisms. They have a very special trick to clear microorganisms: they can secret neutrophil extracellular traps, which consist of chromatin DNA filaments that are coated with proteins that are usually stored in granules in the cells. These traps made of DNA and proteins coat and lock the microorganisms so that the whole thing can be cleared by phagocytotic cells like macrophages.

The traps, alas, seem not only provide advantage to the host, though. Yang et al. reported on Nature that they found abundant such traps in the metastases, and the traps in the serum can predict the occurrence of metastases. They report that DNA in the traps, known as neutrophil-extracellular-trap (NET) DNA, attract cancer cells rather than acting only as a ‘trap’. They found that the protein CCDC25 (coiled-coil domain containing 25, GeneID 55246) acts as a NET-DNA sensor, which can act the ILK-beta-parvin pathway to enhance cell motility.

They showed that NET-mediated metastasis can be abrogated in CCDC25-knockout cells. The results suggest that targeting CCDC25 may be a therapeutic strategy for the prevention of cancer metastasis.

A News and Views article by Emma Nolan and Ilaria Malanchi summarizes the findings.

Notch3 drives synovial fibrosis and arthritis

The synovium, also known as the synovial membrane, is a connective tissue that lines the inner surface of capsules of synovial joints and tendon sheath. It is a mesenchymal tissue composed mainly of fibroblasts that surrounds the joints.

In rheumatoid arthritis (RA), the synovial tissue gets enlarged substantially (known as hyperplasia) and becomes inflamed and invasive so that it can destroy the joint.

Wei et al. reported on Nature that NOTCH3 (Notch receptor 3, GeneID 4854) signalling drives differentiation of fibroblasts that express the CD90 protein, product of the THY1 (Thy-1 cell surface antigen, GeneID 7070) gene. They used single-cell sequencing and synovial tissue organoids to demonstrate that NOTCH3 signalling drives both transcriptional and spatial gradients, spreading out (emanating as the authors said) from vascular endothelial cells outwards in fibroblasts.

NOTCH3 itself and target genes of the Notch signaling pathways are upregulated in synovial fibroblasts from patients of active rheumatoid arthritis. The link between NOTCH3 and inflammatory arthritis is supported by mouse models. The authors argue that synovial fibroblasts possess a positional identity which is regulated by Notch signalling derived from endothelial cells. They suggest that the crosstalk between endothelial cells and fibroblasts via Notch signalling underlying inflammation and pathology in inflammatory arthritis. The results, the authors reckoned, provide a molecular basis by which stromal cells, defined as connective tissue cells of any organ, which usually include fibroblasts and pericytes, can be therapeutically targeted in rheumatoid arthritis by modulation of NOTCH3 signalling.

The study identified JAG2 and DLL4, both ligands of Notch3 signalling, expressed in both synovial tissue and synovial organoid (Figure 3A). The RNA-sequencing data is available at ImmPort under accession code SDY1599 for human studies, and at the Gene Expression Omnibus under accession code GSE145286. Intermediate data (UMAP projections and normalized counts) are available at the single-cell portal of the Broad Institute. And the code for data analysis is available on GitHub immunogenomics/notch.

Other gems in biology

Here are other gems in biology that I found.

  • Deczkowska et al. explores the TREM2 receptor pathway as a major pathology-induced immune signalling mediator.
  • Beumer et al reports transcriptomics profiling and secretome analysis of human enteroendocrine cells in a human organoid biobank.
  • Yeo et al. reports on Nature Biotechnology the EPIC tool, a web-based analytical and discovery platform for mass cytometry data from immune cells in a standarized manner.
  • Javdan et al. reports on Cell their effort of personalized mapping of drug metabolism by the human gut microbiome.

Programming

Submitting Snakemake jobs to Slurm

See my other blog post Submitting Snakemake jobs to Slurm.

SQLite as a file format

SQLite as an application file format was a hot thread on HackerNews. The HDF5 file format (website of the HDF group, or Wikipedia on HDF files) may be an alternative option, especially for scientific data.

Other programming tricks that I learned

  • Often we have a list of strings in a tab-delimited file, like S1,S2,S3,...,S10,..., which we would like to sort by the numeric part of the string. The solution is to use sort -V. -V or --version-sort performs natural sort of (version) numbers within text. Source: nitin@StackOverflow.
  • gensub in awk can perform pattern matching and back reference. Below is an example. Source: GNU manual of gawk.
echo "S10_10" | awk '{
  a=gensub(/S(.+)_(.+)/, "\\1", "g");
  b=gensub(/S(.+)_(.+)/, "\\2", "g");
  print a,b;}'
## output: 10 10
  • If files in a directory in Linux shows question marks ? in every property when we use ls to list them, it may be that the permissions have gone wrong. The solution is to apply chmod -R a+rX DIR_NAME to the directory. Source: StackOverflow.

Statistics and machine learning

  • I summarize three classical approaches to hypothesis testing in this blog post.
  • I summarize the mathematical concept of constraint and the statistical concept of degree of freedom, and its derivation in various settings, in this blog post.

Other gems

Here are other things that I did and gems that I found.

  • I wrote a short blog post about time.
  • If you are impressed, touched, or fascinated by the author Wolfgang Herrndorf like I am, here is a website about him that may interest you: ueberwolfgang.de (in German).
  • I learned about what how we can use software to save our voice for the future self or future generation, and when it does not help from this Hacker News thread My wife might lose the ability to speak in 3 weeks - how to prepare?.
  • Hacker news thread Which tools have made you a much better programmer?. New things that I want to try: i3 for windows management and improved tiling, strace for diagnostic, debugging and userspace utility for linux, and functional programming.
  • I enjoyed reading the Technology Quarterly of the Economist magazine (June 2020) on artificial intelligence and its limits. The main message that I got: A full-blown AI winter is unlikely. But the autumn is coming.
  • From chaos to free will, an article on the online life, philosophy, and culture magazine Aeon (more exactly, its brand Psyche). I still have to read it, but the beginning reads interesting.
  • Another an interesting article on Aeon on how to overcome a fear of heights by Poppy Brown.