DNA Deconvolution industry
Our research focused on identifying high-level evidence supporting applications of DNA deconvolution, published recently in high-impact factor scientific journals. The studies included in this document have identified new utilities for the DNA deconvolution method in (i) predicting the risk of systemic diseases; (ii) identifying biomarkers for inflammatory diseases, and aging; (iii) identifying novel biomarkers for cancer/tumor; (iv) identifying cellular composition across different tissues (from single or multiple sources); and (v) identifying DNA mixture in forensic sexual offense cases. The promise of these applications has been supported by the data outlined in the research document. To cover the expert quotations, we outlined the study investigators' recommendations and study conclusions.
- Deconvolution is a generic term for a procedure that estimates the proportion of each cell type in a bulk sample together with their corresponding cell-type-specific gene expression profiles (GEP's). DNA methylation (DNAm) can be used as a biomarker of cell types, and through deconvolution approaches, to infer underlying cell-type proportions.
- Cell-type deconvolution algorithms have two main categories: reference-based and reference-free.
- 1. Reference-based algorithms are supervised methods that determine the underlying composition of cell types within a sample by leveraging differentially methylated regions (DMRs) specific to cell type, identified from DNAm measures of purified cell populations.
- 2. Reference-free algorithms are unsupervised methods for use when cell-type-specific DMRs are not available, allowing scientists to estimate putative cellular proportions, or control for potential confounding from cell type.
I. Identifying cell-type-specific DNA methylation signals: implications in predicting the risk of smoking-related diseases.
- A recent meta-analysis study (Chenglong Y. et. al., Nature 2020) utilized the cell-type deconvolution algorithm to identify cell-type-specific DNA methylation signals in seven large Epigenome-Wide Association Studies (EWAS).
- The study investigated the highly reproducible smoking-associated DNA methylation changes in whole blood, which could have important implications for understanding and predicting the risk of smoking-related diseases.
- The study indicated that most of the highly reproducible smoking-associated hypomethylation signatures are prominent in the myeloid lineage.
- The study also identified that a 'myeloid-specific, smoking-associated hypermethylation signature' is enriched for DNase Hypersensitive Sites, in Acute Myeloid Leukemia. These study findings have a broader implication about how smoking affects immune-cell subtypes, and may also influence the risk of smoking-related diseases.
II. DNA methylation analysis - identifying the cellular composition (in tissues)
- In another study (Schmidt Marco et. al., 2020), the deconvolution of cellular subsets in human tissues using targeted DNA methylation analysis (at individual CG dinucleotide - CpG site) was performed.
- DNA methylation (DNAm) at CG dinucleotides (CpGs) is a stable and heritable modification, which is directly associated with cellular differentiation. DNA methylation can be quantitatively analyzed on a single-base resolution, every cell has only two alleles, which makes DNAm ideally suited for deconvolution approaches.
- DNA methylation (DNAm) profiles have been used to establish an atlas for multiple human tissues and cell types. DNAm is suitable for deconvolution of cell types because each CG dinucleotide (CpG site) has only two states per DNA strand—methylated, or non-methylated, and these epigenetic modifications are very consistent during cellular differentiation. So far, the deconvolution of DNAm profiles implies complex signatures of many CpGs that are often measured by genome-wide analysis with Illumina BeadChip microarrays.
- In this study, the investigators compiled and curated 579 Illumina 450k BeadChip DNAm profiles of 14 different non-malignant human cell types. The training and validation strategy was implemented to test and identify cell type-specific CpGs.
- This proof of concept study indicated that the "DNAm analysis (at individual CpGs) reflects the cellular composition of cellular mixtures and different tissues."
III. DNA mixture deconvolution of Sexual Offense Samples - test with high sensitivity and specificity.
- A study (Victoria R.W. et. al., 2018) investigated whether the DNA mixtures (forensic evidence from sexual offense) could be selectively collected, identified, and analyzed using enhanced DNA mixture deconvolution (the DEPArray™ system). The analysis could be performed for a single cell (from a single source) or a group of cells (from heterogeneous sources).
- The interpretation of DNA mixtures from forensic evidence from multiple sources has always been a significant challenge. However, Single-cell separation technology can be used to address this mixture separation challenge, specifically using the DEPArray™ system from Menarini Silicon Biosystems.
- The study indicated that the sperm profiles were identified in 27 of 32 DEPArray™ processed samples, with 26 of 27 (96.2%) yielding single source profiles. In contrast, single-source profiles were obtained from 9 of 28 (32.1%) differentially extracted samples.
- The study has shown that the DEPArray™ workflow (i) leads to fewer mixture samples, (ii) enables purification of sperm and epithelial cell fractions without the need for differential extraction, (iii) improve the amplification success rate of samples, and (iv) improve the interpretation of low-template DNA samples.
- The study findings reflect that the DNA quantitation can be performed through cell counting rather than more laborious qPCR methods.
- The study has broader implications The DEPArray™ permits higher sensitivity and more specific sperm cell identification (higher specificity) than microscopic and differential extraction methods in forensic analysis; which means this test eliminates the need for additional confirmatory tests to look for the presence of human sperm.
IV. A reference-free deconvolution of complex DNA methylation data - Cancer Samples
- Deconvolution methods dissect methylomes of cell mixtures into their basic constituents. In case reference DNA methylomes of purified cell types are available, they can be used to infer the proportions of different cell types across the samples.
- The methylation states of particular CpGs can be used as powerful biomarkers for various conditions, including cancer, inflammatory diseases, and aging.
- This study proposed a three-stage protocol (shown below) for reference-free deconvolution of DNA methylomes: (i) data preprocessing, confounder adjustment, and feature selection, (ii) deconvolution with multiple parameters, and (iii) guided biological inference and validation of deconvolution results.
- This protocol simplifies the analysis and integration of DNA methylomes derived from complex samples (including tumors).
- Applying this protocol to lung cancer methylomes (from The Cancer Genome Atlas (TCGA) revealed components linked to stromal cells, tumor-infiltrating immune cells, and associations with clinical parameters.
- Another schematic (shown below) represents the evaluation of ICA on The Cancer Genome Atlas (TCGA) lung adenocarcinoma (TCGA-LUAD) dataset. An overview of the ICA procedure:
- (a) Components linked to confounding factors (here sex, age, ethnicity or race) are removed from the contribution matrix and an adjusted DNA methylation matrix is constructed.
- (b) Associations between the confounding factor sex and ethnicity with the entries of the proportion matrix M produced by ICA. P-values were computed using one-way ANOVA, points within the violin plots represent the median and the thick line 50 % of the samples.
- (c) Beta-value distributions of the transformed (D*) and the untransformed (D) DNA methylation matrices.
- (d) Associations between LMC proportions and qualitative phenotypic traits. The color represents the absolute difference of the mean LMC proportions in the different groups defined by the phenotypic traits and significant p-values according to a two-sided t-test are indicated by a bold border.