
Publications
Alternative DNA conformation formed by sequences called flipons potentially alter the readout of genetic information by directing the shape-specific assembly of complexes on DNA The biological roles of G-quadruplexes formed by motifs rich in guanosine repeats have been investigated experimentally using many different methodologies including G4-seq, G4 ChIP-seq, permanganate nuclease footprinting (KEx), KAS-seq, CUT&Tag with varying degrees of overlap between the results. Here we trained large language model DNABERT on existing data generated by KEx, a rapid chemical footprinting technique performed on live, intact cells using potassium permanganate. The snapshot of flipon state when combined with results from other in vitro methods that are performed on permeabilized cells, allows a high confidence mapping of G-flipons to proximal enhancer and promoter sequences. Using G4-DNABERT predictions,with data from ENdb, Zoonomia cCREs and single сell G4 CUT&Tag experiments, we found support for a model where G4-quadruplexes regulate gene expression through chromatin loop formation.
Authors: Maria Poptsova, Alan Herbert, Dmitry Konovalov, Dmitry Umerenkov
A long-standing question concerns the role of Z-DNA in transcription. Here we use a deep learning approach DeepZ that predicts Z-flipons based on DNA sequence, structural properties of nucleotides and omics data. We examined Z-flipons that are conserved between human and mouse genomes after generating whole-genome Z-flipon maps and then validated them by orthogonal approaches based on high resolution chemical mapping of Z-DNA and the transformer algorithm Z-DNABERT. For human and mouse, we revealed similar pattern of transcription factors, chromatin remodelers, and histone marks associated with conserved Z-flipons. We found significant enrichment of Z-flipons in alternative and bidirectional promoters associated with neurogenesis genes. We show that conserved Z-flipons are associated with increased experimentally determined transcription reinitiation rates compared to promoters without Z-flipons, but without affecting elongation or pausing. Our findings support a model where Z-flipons engage Transcription Factor E and impact phenotype by enabling the reset of preinitiation complexes when active, and the suppression of gene expression when engaged by repressive chromatin complexes.
Authors: Nazar Beknazarov, Dmitry Konovalov, Alan Herbert, Maria Poptsova
A long-standing question concerns the role of Z-DNA in transcription. Here we use a deep learning approach based on the published DeepZ algorithm that predicts Z-flipons based on DNA sequence, structural properties of nucleotides and omics data. We examined Z-flipons that are conserved between human and mouse genomes after generating whole-genome Z-flipons maps by training DeepZ on ChIP-seq Z-DNA data, then overlapping the results with a common set of omics data features. We revealed similar pattern of transcription factors and histone marks associated with conserved Z-flipons, showing enrichment for transcription regulation coupled with chromatin organization. 15% and 7% of conserved Z-flipons fell in alternative and bidirectional promoters. We found that conserved Z-flipons in CpG-promoters are associated with increased transcription initiation rates. Our findings empower further experimental explorations to examine how the flip to Z-DNA alters the readout of genetic information by facilitating the transition of one epigenetic state to another.
Authors: Nazar Beknazarov, Dmitry Konovalov, Alan Herbert, Maria Poptsova
Z-DNA and Z-RNA were shown to play an important role in various processes of genome functioning acting as flipons that launch or suppress genetic programs. Genome-wide experimental detection of Z-DNA remains a challenge due to dynamic nature of its formation. Recently we developed a deep learning approach DeepZ, based on CNN and RNN architectures, that predicts Z-DNA regions using additional information from omics data collected from different cell types. Here we took advantage of the transformer algorithm that trains attention maps to improve classifier performance. We started with pretrained DNABERT models and fine-tuned their performance by training with experimental Z-DNA regions from mouse and human genome wide studies. The resulting DNABERT-Z outperformed DeepZ. We demonstrated that DNABERT-Z finetuned on human data sets also generalizes to predict Z-DNA sites in mouse genome.
Authors: Dmitry Umerenkov, Vladimir Kokh, Alan Herbert, Maria Poptsova
Here we describe an approach that uses deep learning neural networks such as CNN and RNN to aggregate information from DNA sequence; physical, chemical, and structural properties of nucleotides; and omics data on histone modifications, methylation, chromatin accessibility, and transcription factor binding sites and data from other available NGS experiments. We explain how with the trained model one can perform whole-genome annotation of Z-DNA regions and feature importance analysis in order to define key determinants for functional Z-DNA regions.
Authors: Nazar Beknazarov, Maria Poptsova
Identifying roles for Z-DNA remains challenging given their dynamic nature. Here, we perform genome-wide interrogation with the DNABERT transformer algorithm trained on experimentally identified Z-DNA forming sequences (Z-flipons). The algorithm yields large performance enhancements (F1 = 0.83) over existing approaches and implements computational mutagenesis to assess the effects of base substitution on Z-DNA formation. We show Z-flipons are enriched in promoters and telomeres, overlapping quantitative trait loci for RNA expression, RNA editing, splicing, and disease-associated variants. We cross-validate across a number of orthogonal databases and define BZ junction motifs. Surprisingly, many effects we delineate are likely mediated through Z-RNA formation. A shared Z-RNA motif is identified in SCARF2, SMAD1, and CACNA1 transcripts, whereas other motifs are present in noncoding RNAs. We provide evidence for a Z-RNA fold that promotes adaptive immunity through alternative splicing of KRAB domain zinc finger proteins. An analysis of OMIM and presumptive gnomAD loss-of-function datasets reveals an overlap of Z-flipons with disease-causing variants in 8.6% and 2.9% of Mendelian disease genes, respectively, greatly extending the range of phenotypes mapped to Z-flipons.
Authors: Dmitry Umerenkov, Alan Herbert, Dmitrii Konovalov, Anna Danilova, Nazar Beknazarov, Vladimir Kokh, Aleksandr Fedorov, Maria Poptsova