Everything on Multiomics

An illustration of the cell with genomics, transcriptomics, epigenomics, proteomics and metabolomics

Imagine a city full of factories, hundreds or even thousands of factories producing different products. These factories are working simultaneously with the goal of keeping the economy running and, ideally, also letting the economy grow.

This picture is similar to our body. We have not just thousands of factories but millions of these factories inside of us. We call these factories tissues. Inside the tissues, we have the cells that make up most of these tissues.

Until recently, it was hard to understand what was happening inside each of these tissues. We were able to understand the average performance of these tissues, but we couldn’t have exact measurements of their performance. It is similar to knowing how the economy is doing in general without realizing the shortcomings of each of the individual factories.

Multiomics is a way to measure the performance of these tissues, not just tissues but also the cells inside these tissues. It measures the performance on many different levels like the genes, which part of the genes is being transcribed, i.e., building messenger RNA, which of these RNAs are building proteins, and which proteins are leading to changes inside the cell or the tissue.

If we compare this to a city full of factories, then measuring multiomics on the tissue level is like measuring the mean performance of the factory.

Measuring multiomics on the cellular level is like measuring everything that is happening inside the factory. And I mean literally everything—what every single worker is doing, what they are talking about, what they are eating, what plans have been laid out by their managers, what products are being produced, which are useful, which are not. All this information we are able to analyze with the help of multiomics. You can imagine that by gathering so much information, we are also creating a lot of noise, a lot of useless information too.

Multiomics analysis is a rapidly growing field in biotechnology that allows us to analyze several aspects of cell biology and integrate these different analyses on the cell level and the tissue level. There are many levels to multiomics, and we will talk about most of them here.

The key thing to understand in multiomics is the meaning of the suffix -omics, which means the study of the totality of something. I will repeat it again: the study of the totality of something. So if you get this, then you will get what multiomics is all about. By changing the prefix multi to something else like gen-omics, we mean by this to understand the totality of the genes. If we use proteomics, then we are talking about understanding the totality of the proteins. Thus, by changing these different prefixes (proteo- and geno-), we are considering different levels on a cellular level. We can also integrate all this information together to understand everything happening inside the cell from the blueprint (genomics) to the final products (metabolomics).

Now we will dive into a more detailed description of multiomics. I will talk about the meaning of the different levels of multiomics, how they could be integrated, and the challenges in multiomic analysis.

Let’s start on the cell level, especially on the nucleus level, and work our way up. We will start with Genes, which represent the blueprint of the cell.

Genomics

Genomics describes the study of the whole genome. It tries to make sense of the 3.2 billion base pairs that we humans carry inside the nucleus of the cell. With genomics, we are interested in understanding the coding (genes) part of the DNA and the non-coding part of the DNA. We look at the healthy and the unhealthy parts of the DNA. These unhealthy parts include single nucleotide variations (SNVs), insertions, deletions, copy number variations (CNVs), duplications, and inversions. Essentially, any feature on the genome that could be associated with disease or other outcomes.

Here are examples of some of the DNA anomalies we can detect using genomics:

Single Nucleotide Variations (SNVs): A single nucleotide variation (SNV) is a variation at a single nucleotide position in the DNA sequence among individuals.

Example: SNP rs1799853: A well-known SNV in the CYP2C9 gene where thymine (T) is replaced by cytosine (C) at position 430, leading to a change in amino acid and affecting drug metabolism.

Insertions: An insertion is the addition of one or more nucleotide base pairs into a DNA sequence.

Example: BRCA1 c.5382insC: An insertion of a cytosine (C) nucleotide in the BRCA1 gene. This frameshift mutation is associated with an increased risk of breast and ovarian cancer.

Deletions: A deletion is the loss of one or more nucleotide base pairs from a DNA sequence.

Example: DeltaF508 in CFTR: A three-base-pair deletion in the CFTR gene resulting in the loss of a phenylalanine (F) at position 508. This is the most common mutation causing cystic fibrosis.

Copy Number Variations (CNVs): CNVs are large segments of DNA that are either duplicated or deleted, resulting in a variation in the number of copies of a particular region.

Example: CYP2D6 Gene Deletion: Some individuals have a deletion of the entire CYP2D6 gene, which affects drug metabolism, leading to a poor metabolizer phenotype.

Duplications: A duplication is a type of CNV where a segment of the genome is duplicated, resulting in multiple copies of that segment.

Example: Charcot-Marie-Tooth Disease Type 1A (CMT1A): Caused by a duplication of a 1.5 Mb region on chromosome 17 that includes the PMP22 gene. This duplication leads to the overexpression of PMP22, causing peripheral neuropathy.

Inversions: An inversion occurs when a segment of DNA is reversed end to end.

Example: Hemophilia A Inversion: About 45% of severe cases of hemophilia A are caused by an inversion in intron 22 of the F8 gene. This inversion disrupts the gene, preventing the production of functional clotting factor VIII.

Another way to understand what multiomics is is to look at the different tools used in the analysis. Methods to analyze genomics:

Next-Generation Sequencing (NGS):

  • Whole Genome Sequencing (WGS): Sequencing the entire genome to identify genetic variations.
  • Whole Exome Sequencing (WES): Sequencing only the exonic regions (coding parts) of the genome to identify mutations associated with diseases.
  • Targeted Sequencing: Sequencing specific regions of interest for more focused studies.

Variant Calling:

Tools: GATK (Genome Analysis Toolkit), FreeBayes, Samtools.

Purpose: Identifying single nucleotide polymorphisms (SNPs), insertions, deletions, and other genetic variations from sequencing data.

Genome-Wide Association Studies (GWAS):

Tools: PLINK, SNPTEST.

Purpose: Identifying genetic variants associated with specific traits or diseases.

Structural Variant Analysis:

Tools: Manta, LUMPY, DELLY.

Purpose: Identifying larger structural variations like deletions, duplications, inversions, and translocations.

Epigenomics

Epigenomics studies chemical changes to the DNA or histones that affect how accessible DNA sections are to transcription. If DNA isn’t accessible, the gene can’t be expressed or influence the cell’s characteristics. The epigenome explains why some genes are active in some cells and inactive in others. The epigenome consists of various types of data:

DNA Methylation: This involves adding methyl groups to DNA regions called CpG islands, which usually repress gene expression.

Histone Modifications: Changes to histone proteins can alter DNA accessibility.

Open Chromatin Profiling: This measures how exposed sections of DNA are for transcription.

3D DNA Structure: The three-dimensional arrangement of DNA within a cell can show which sections are in contact or inaccessible.

A cell or tissue’s epigenomic profile is unique and can help in identification, similar to transcriptomics. The epigenome can change in response to the environment or disease, leading to activation and also deactivation of certain genes.

Methods to analyze epigenomics:

Tools: Bismark, MACS2.

Techniques: Bisulfite sequencing for DNA methylation, ChIP-seq for histone modifications.

Transcriptomics

Transcriptomics describes the study of the RNA transcripts that are being produced from the genome. These transcripts usually lead to building proteins, but they can also have a regulatory function in the cell.

The transcriptome represents all the RNA molecules, including messenger RNA (mRNA), ribosomal RNA (rRNA), transfer RNA (tRNA), and non-coding RNAs (ncRNAs) such as microRNAs (miRNAs) and long non-coding RNAs (lncRNAs), produced in a cell or a group of cells.

Methods to analyze transcriptomics:

RNA Sequencing (RNA-Seq):

Tools: STAR, HISAT2 for alignment; DESeq2, edgeR for differential expression analysis.

Purpose: Profiling gene expression levels by sequencing RNA.

Microarrays:

Tools: limma, Affymetrix and Agilent platforms.

Purpose: Measuring gene expression levels using hybridization-based techniques.

Single-Cell RNA Sequencing (scRNA-Seq):

Tools: Cell Ranger, Seurat, Scanpy for data analysis.

Purpose: Profiling gene expression at the single-cell level.

Quantitative PCR (qPCR):

Tools: Standard qPCR machines and SYBR Green or TaqMan assays.

Purpose: Quantifying specific RNA molecules with high sensitivity.

Spatial Transcriptomics:

Tools: 10x Genomics Visium, Slide-seq, MERFISH.

Purpose: Mapping gene expression in the spatial context of tissues.

Proteomics

Proteomics refers to the quantitative and qualitative study of the whole proteins expressed by the genome. One would think that the resulting proteins are the end products of the RNA transcripts. This might be true for most cases, but sometimes these proteins undergo certain modifications such as phosphorylation, glycosylation, acetylation, and ubiquitination which alter their function.

Ways to analyze proteomics:

Mass Spectrometry (MS):

Techniques: Tandem MS (MS/MS), Time-of-Flight (TOF) MS, Orbitrap MS.

Tools: MaxQuant, Proteome Discoverer, Mascot for data analysis.

Purpose: Identifying and quantifying proteins and their modifications.

Protein Microarrays:

Tools: Standard microarray platforms, ArrayExpress for data analysis.

Purpose: High-throughput screening of protein interactions, functions, and expression levels.

Western Blotting:

Tools: Standard laboratory equipment and chemiluminescence or fluorescence detection systems.

Purpose: Detecting specific proteins using antibody-based techniques.

Quantitative Proteomics:

Techniques: SILAC (Stable Isotope Labeling by/with Amino acids in Cell culture), iTRAQ (Isobaric Tags for Relative and Absolute Quantitation).

Tools: Specialized software for analyzing labeled proteomics data.

Purpose: Quantifying protein abundance in different conditions.

Protein-Protein Interaction (PPI) Analysis:

Techniques: Co-immunoprecipitation (Co-IP), yeast two-hybrid screening.

Tools: STRING, Cytoscape for network visualization.

Metabolomics

Metabolomics is about analyzing the very small products of a cell or a tissue—the things that we weren’t able to analyze before because the technology wasn’t there yet. Now, with the help of Mass Spectrometry, we are able to analyze small products that weigh less than 2000 DA. These products include drug metabolites, lipids, amino acids, and sugars. The field of metabolomics has been growing fast in recent years because of the increased sensitivity and the introduction of high-resolution mass spectrometry that allowed for single-cell metabolomics.

Methods to analyze metabolomics:

Mass Spectrometry (MS):

Techniques: Liquid Chromatography-MS (LC-MS), Gas Chromatography-MS (GC-MS).

Tools: XCMS, MetaboAnalyst for data analysis.

Purpose: Identifying and quantifying metabolites.

Nuclear Magnetic Resonance (NMR) Spectroscopy:

Tools: Chenomx, MetaboMiner.

Purpose: Profiling metabolites based on their magnetic properties.

Metabolite Profiling:

Techniques: Targeted and untargeted metabolite profiling.

Tools: MZmine, Compound Discoverer.

Purpose: Comprehensive analysis of metabolite concentrations and profiles.

Metabolic Flux Analysis:

Techniques: Isotopic labeling experiments.

Tools: OpenFLUX, INCA for computational analysis.

Purpose: Studying the rates of metabolic reactions.

Metabolite-Metabolite Interaction Analysis:

Techniques: Correlation analysis, network analysis.

Tools: MetScape, MetExplore for network visualization.

Purpose: Understanding interactions and dependencies between different metabolites.

We can think about genomics as what can happen. Transcriptomics is what might happen. Proteomics is what makes it happen. Metabolomics is about what is happening right now.

Integration of multiomics:

These different multiomics that we talked about earlier provide a snapshot of what is happening at a certain time during a certain experiment. Earlier, they were only performed on a tissue level, so we only had a mean measurement of one type of omics on the whole tissue level. We could only have a reading on a portion of the data from the tissue samples.

Now it is possible to conduct multiomic analysis on the cellular level. It is called single-cell multiomics. So, it is a multiomic analysis performed on the single-cell level, which gives us a snapshot of a single cell inside a tissue from a specific time and a specific experiment.

And it is not just one type of omic analysis. It is possible to combine different types of omics to have a better understanding of the integration of these structures. We can integrate genomics and transcriptomics of the same cell to understand whether all genes are actually being transcribed. It is also possible to mix other types of different omics on the single-cell level. This comes, of course, with certain limitations.

Another type of multiomic analysis is spatial Multi Omics, which allows for multiomic analysis on the tissue level to understand the interplay between the cellular and intracellular structures within the tissue. This method of profiling preserves the structure of the tissue, which allows us to understand where specific changes are coming from within the sample.

One popular method used in Spatial Multiomics is seqFISH:

Sequential Fluorescence In Situ Hybridization (seqFISH) is an advanced technique used in spatial multiomics to visualize and quantify RNA molecules within their spatial context in tissues and cells.

The technique relies on iterative rounds of hybridization, imaging, and probe stripping to detect numerous RNA targets using a limited number of fluorescent labels.

Hybridization refers to the process where two complementary strands of nucleic acids (DNA or RNA) pair to form a double-stranded molecule through base pairing. It relies on the principle that adenine (A) pairs with thymine (T) in DNA or uracil (U) in RNA, and cytosine (C) pairs with guanine (G).

How seqFISH works:

Probe Design: RNA-specific probes are designed to bind to target RNA sequences. Each probe set is tagged with a unique fluorescent barcode that can be read out over multiple rounds of hybridization and imaging.

Hybridization: In the first round, a subset of RNA probes is hybridized to their target sequences within the tissue or cells. These probes are fluorescently labeled, allowing visualization under a microscope.

Imaging: The sample is imaged to capture the fluorescent signals corresponding to the bound probes. Each signal represents the presence of a specific RNA molecule at a particular location.

Probe Stripping: After imaging, the fluorescent probes are stripped from the sample, and the process is repeated with a new set of probes. This iterative process allows for the detection of a large number of RNA species using a limited set of fluorophores.

Data Analysis: The images from multiple rounds are aligned and analyzed to reconstruct the spatial distribution of numerous RNA molecules within the sample. The unique fluorescent barcodes are decoded to identify each RNA species.

There are mainly three types of multiomic integration:

Vertical integration: We integrate different levels of multiomics from the same experiment, same tissue, same time, same cell. The cell is the anchor. One example is integrating genomics and transcriptomics on a single-cell level. This is also called matched integration because we are collecting data from the same cell.

Horizontal integration: We integrate the same omic, for example, transcriptomics, from different samples. This could be from the same individual but at different time points of the experiment or even the same type of tissue but from different individuals.

Diagonal or mosaic integration: We integrate different omics from different experiments. One example would be integrating multiomics data during disease progression. This is the most challenging type of integration because there is no anchor.

Horizontal and diagonal integrations are examples of unmatched integration because we are collecting data from different samples and not from the same cell.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *