Single cell technologies such as single-cell RNA-seq (scRNA-seq) are rapidly revolutionizing a wide range of biomedical research areas. Unlike traditional bulk RNA-seq technologies that measure global gene expression averaged over a heterogeneous cell population, scRNA-seq examines steady state DNA transcription in individual cells and is thus useful for unveiling transcriptomic heterogeneity at the single-cell level. Life scientists are facing substantial statistical and computational challenges in analyzing single cell data because they are massive, sparse, heterogeneous, and noisy. At the same time, data complexity is growing exponentially. Thus, data must be carefully managed, analyzed, and ultimately made available for discovery, allowing data to be the driving force behind discovery. The complexity of single cell data and its large volume have made it essential to implement novel statistical machine learning methods, as well as quantum computing algorithms for situations when an exhaustive search is required.
To facilitate the analysis of single-cell data, The Texas A&M Regional Center for Excellence in Cancer (TREC) has established a Single Cell Data Science (SCDS) Core at TAMU. Drs. James Cai, Yang Ni and Robert Chapkin, along with collaborators, are developing novel statistical methods to address the challenges in:
- Identifying cell types and their subsets,
- Characterizing cell differentiation dynamics,
- Elucidating cell-cell communication and gene regulation at the single-cell level,
- Monitoring structural, functional, or phenotypic changes under different experimental conditions,
- Relating rare transitional cells or cell phenotypes to disease progression,
- Analysis of spatial transcriptomics, and,
- Integration of multi-omics single-cell data.
The unique challenge of data integration in single-cell multi-omics is that each observation/cell can be assayed only by one modality. Therefore, neither horizontal nor vertical data integration applies here. This line of research has been supported by CPRIT RP230204 and R01GM148974-01.
The CORE: The mission of SCDS is to provide bioinformatics and informatics research services that have a measurable impact on the ability of research investigators at Texas A&M University to share their findings and publish their work. SCDS has merged scientific experience with technical know-how to tackle basic research and clinical projects that require skills in single-cell data management, processing, and analytics.
Spatial transcriptomics data analysis – Spatial transcriptomics is a rapidly growing field that promises to characterize tissue organization and architecture at single-cell resolution comprehensively. SCDS provides data analysis service for spatial transcriptomics, which is a powerful approach for TREC cancer research projects, enabling investigators to obtain a holistic understanding of cells in their specific context. For this purpose, we have developed STGEATOOL, Spatial Transcriptomic Gene Expression Analysis Tool, to allow users with limited coding experience to conduct analysis through user interfaces. For example, it enables users to choose any gene of interest and display its spatially resolved expression on the original tissue section. Given that the whole transcriptome is measured, users can choose any number of genes in any combination to view and analyze at the same time. Knowledge of relative locations of cells and gene expression profiles helps to characterize the local cellular environment within the tissue and to better understand the role of different cell types in normal function and tumor pathology.
Multi-omics single-cell data integration and analysis – An integrated analysis of multi-omics single-cell data can generate new knowledge and hypotheses that cannot be otherwise obtained with data collected from any single modality. One of the primary analytic challenges arises when each cell is assayed on a single modality and matching samples across modalities is impeded. To address this challenge, SCDS provides cutting-edge single-cell data integration tools that allow joint analyses of multi-omics single-cell data. For example, we have recently developed integrative clustering algorithms that identify cell subtypes of human lymphoblastoid cells using scRNAseq gene expression data and scATAC-seq chromatin accessibility data. The algorithm automatically identifies marker genes of each cell subtype. Our analysis can detect uncharacterized cell subtype structure, leading to the finding of a more heterogeneous nature among lymphoblastoid cells than previously thought. These new approaches can allow TREC investigators to gain a more comprehensive understanding of tumor heterogeneity in a less unbiased, data-driven fashion as compared to traditional bulk multi-omics data and single-modal single-cell data analyses.
Virtual gene knockout analysis – SCDS advanced machine learning tools allow virtual knockout (KO) analysis to be performed with scRNAseq data. Gene KO experiments are a proven, powerful approach to study gene function. However, systematic KO experiments targeting many genes are usually prohibitive due to limited experimental and animal resources. Our method scTenifoldKnk is an efficient virtual KO tool that enables systematic KO investigation of gene function using scRNAseq data. In scTenifoldKnk analysis, a gene regulatory network (GRN) is first constructed with data from wild-type samples, and a target gene is then virtually deleted from the constructed GRN. Manifold alignment is used to align the resulting reduced GRN to the original GRN to identify differentially regulated genes, which are then used to infer target gene functions in analyzed cells. ScTenifoldKnk virtual KO analysis can recapitulate the main findings of real-animal KO experiments and recover gene(s) expected functions in relevant cell types. Systematic virtual KO analysis can be applied to virtually knock out a large number of expressed genes individually to obtain a perturbation profile for each gene. The perturbation profile can be used to infer the function of the KO gene, to identify genes that have similar perturbation profiles, to identify genes that tend to be perturbed together by the same KO gene, and to identify genes involved in specific expression programs characterized in malignant vs. nonmalignant cells.
Impact: SCDS brings together the Data Sciences and Life Sciences communities at TAMU in a novel way. Specifically, the core complements and improves traditional hypothesis-driven research by flipping the model for Cancer Biology and Life Sciences research around so that data drives the experiments leading to novel hypotheses. The core values of SCDS include bioinformatics and advanced computation to augment the ability to achieve data science solutions-driven approaches to life science problems and to maintain the strong informatics skill-set necessary for unraveling and integrating data sets originating from various single-cell sources and technologies.
Novelty: The proposed single cell multi-omics research described in our TREC projects is highly innovative because it enables the characterization of previously unapproachable clinical phenomena, such as “deep landscapes” of cancer heterogeneity that reflect the dynamics of the tumor microenvironment (TME). We anticipate that the application of single cell profiling tools will allow Texas A&M University investigators to integrate two or more -omics layers (DNA, RNA and metabolites) in single cells (Figure 1). The development of these novel capabilities is significant because the profiling of multiple aspects of individual tumor cells enables monitoring of diverse cell types interact to influence the negative effects of dysfunctional or exhausted cells in the TME. The latter is noteworthy, because cancer cell “immune escape” influences patient responsiveness to immunotherapy throughout different stages of tumor initiation, progression and dissemination.
Figure 1. Overview of basic single cell multi-omics strategies to assess the tumor microenvironment (TME). Phenotypes include the assessment of cell heterogeneity, anti-tumor immunity (T cell exhaustion), and stem cell homeostasis. Single cells from tumors will be isolated for the purpose of analyzing the relevant biological molecules (small molecular weight metabolites, mRNA, protein, and DNA). The parallel single cell proteomic, transcriptomic and epigenomic data will be computationally integrated to generate a quantitative dynamic portrait (deep phenotyping) of the tissue or tumor.