scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis in Brain (2310.02713v1)
Abstract: Single-cell RNA sequencing (scRNA-seq) has made significant strides in unraveling the intricate cellular diversity within complex tissues. This is particularly critical in the brain, presenting a greater diversity of cell types than other tissue types, to gain a deeper understanding of brain function within various cellular contexts. However, analyzing scRNA-seq data remains a challenge due to inherent measurement noise stemming from dropout events and the limited utilization of extensive gene expression information. In this work, we introduce scHyena, a foundation model designed to address these challenges and enhance the accuracy of scRNA-seq analysis in the brain. Specifically, inspired by the recent Hyena operator, we design a novel Transformer architecture called singe-cell Hyena (scHyena) that is equipped with a linear adaptor layer, the positional encoding via gene-embedding, and a {bidirectional} Hyena operator. This enables us to process full-length scRNA-seq data without losing any information from the raw data. In particular, our model learns generalizable features of cells and genes through pre-training scHyena using the full length of scRNA-seq data. We demonstrate the superior performance of scHyena compared to other benchmark methods in downstream tasks, including cell type classification and scRNA-seq imputation.
- A single-cell atlas of the human substantia nigra reveals cell-specific pathways associated with neurological disorders. Nature communications, 11(1):4183, 2020.
- Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nature immunology, 20(2):163–172, 2019.
- Deepimpute: an accurate, fast, and scalable deep neural network method to impute single-cell rna-seq data. Genome biology, 20(1):1–14, 2019.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Comprehensive single-cell transcriptional profiling of a multicellular organism. Science, 357(6352):661–667, 2017.
- Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509, 2019.
- Rethinking attention with performers. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=Ua6zuk0WRH.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Single-cell rna-seq denoising using a deep count autoencoder. Nature communications, 10(1):390, 2019.
- Distinct amyloid-β𝛽\betaitalic_β and tau-associated microglia profiles in alzheimer’s disease. Acta Neuropathologica, 141:681–696, 2021.
- Integrated analysis of multimodal single-cell data. Cell, 184(13):3573–3587, 2021.
- Conserved cell types with divergent features in human versus mouse cortex. Nature, 573(7772):61–68, 2019.
- Saver: gene expression recovery for single-cell rna sequencing. Nature methods, 15(7):539–542, 2018.
- Massively parallel single-cell rna-seq for marker-free decomposition of tissues into cell types. Science, 343(6172):776–779, 2014.
- Single-cell genomic profiling of human dopamine neurons identifies a population that selectively degenerates in parkinson’s disease. Nature neuroscience, 25(5):588–595, 2022.
- Pan-cancer single-cell rna-seq identifies recurring programs of cellular heterogeneity. Nature genetics, 52(11):1208–1218, 2020.
- Fast, sensitive and accurate integration of single-cell data with harmony. Nature methods, 16(12):1289–1296, 2019.
- Single-nucleus transcriptome analysis reveals dysregulation of angiogenic endothelial cells and neuroprotective glia in alzheimer’s disease. Proceedings of the National Academy of Sciences, 117(41):25800–25809, 2020.
- Analytic pearson residuals for normalization of single-cell rna-seq umi data. Genome biology, 22(1):1–20, 2021.
- Molecular characterization of selectively vulnerable neurons in alzheimer’s disease. Nature neuroscience, 24(2):276–287, 2021.
- Scibet as a portable and fast single cell type identifier. Nature communications, 11(1):1818, 2020.
- An accurate and robust imputation method scimpute for single-cell rna-seq data. Nature communications, 9(1):997, 2018.
- Decoupled weight decay regularization. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=Bkg6RiCqY7.
- Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell, 161(5):1202–1214, 2015.
- Doubletfinder: doublet detection in single-cell rna sequencing data using artificial nearest neighbors. cell syst. 8, 329–337. e4, 2019.
- Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
- Single-nucleus chromatin accessibility and transcriptomic characterization of alzheimer’s disease. Nature genetics, 53(8):1143–1155, 2021.
- Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution. arXiv preprint arXiv:2306.15794, 2023.
- Molecular signatures underlying neurofibrillary tangle susceptibility in alzheimer’s disease. Neuron, 110(18):2929–2948, 2022.
- Single-cell rna sequencing to explore immune cell heterogeneity. Nature Reviews Immunology, 18(1):35–45, 2018.
- Hyena hierarchy: Towards larger convolutional language models. arXiv preprint arXiv:2302.10866, 2023.
- Zero-shot text-to-image generation. In International Conference on Machine Learning, pp. 8821–8831. PMLR, 2021.
- Impact of the human cell atlas on medicine. Nature medicine, 28(12):2486–2496, 2022.
- Astrocytes and oligodendrocytes undergo subtype-specific transcriptional changes in alzheimer’s disease. Neuron, 110(11):1788–1805, 2022.
- Single-cell rna-seq: advances and future challenges. Nucleic acids research, 42(14):8845–8860, 2014.
- Molecular diversity and specializations among the cells of the adult mouse brain. Cell, 174(4):1015–1030, 2018.
- Single-cell sequencing of human midbrain reveals glial activation and a parkinson-specific neuronal state. Brain, 145(3):964–978, 2022.
- Diverse human astrocyte and microglial transcriptional responses to alzheimer’s pathology. Acta Neuropathologica, 143(1):75–91, 2022.
- Recovering gene interactions from single-cell data using data diffusion. Cell, 174(3):716–729, 2018.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Single-cell rna-seq reveals new types of human blood dendritic cells, monocytes, and progenitors. Science, 356(6335):eaah4573, 2017.
- Single-cell transcriptomic atlas of the human substantia nigra in parkinson’s disease. Biorxiv, pp. 2022–03, 2022.
- Scanpy: large-scale single-cell gene expression data analysis. Genome biology, 19:1–5, 2018.
- Scrublet: computational identification of cell doublets in single-cell transcriptomic data. cell syst. 8, 281–291. e9, 2019.
- A human brain vascular atlas reveals diverse mediators of alzheimer’s risk. Nature, 603(7903):885–892, 2022a.
- scbert as a large-scale pretrained deep language model for cell type annotation of single-cell rna-seq data. Nature Machine Intelligence, 4(10):852–866, 2022b.
- Single-cell transcriptomic atlas of alzheimer’s disease middle temporal gyrus reveals region, cell type and sex specificity of gene expression with novel genetic risk for mertk in female. medRxiv, pp. 2023–02, 2023.
- Massively parallel digital transcriptional profiling of single cells. nat commun 8: 14049. Data Set5. Putative transcription factors binding motifs identified for genes in trans-eQTL (expression quantitative trait loci) hotspots Data Set6. Putative master regulators in the trans-eQTL (expression quantitative trait loci) hotspots Figure S, 1, 2017.
- Single-cell transcriptomic and proteomic analysis of parkinson’s disease brains. BioRxiv, pp. 2022–02, 2022.
- Gyutaek Oh (10 papers)
- Baekgyu Choi (2 papers)
- Inkyung Jung (2 papers)
- Jong Chul Ye (210 papers)