Question

Comparing Gene Expression Before and After Treatment Using Nanopore Sequencing Data

0

Entering edit mode

12 weeks ago

Dim • 0

Dear Community,

I am currently working on a project involving the comparison of two distinct sets of genes sequenced using nanopore technology under different conditions. My primary objective is to determine if the expression levels of specific genes have been altered due to the applied conditions, essentially comparing gene abundance "before" and "after" treatment.

For alignment, I am using the K-mer alignment (KMA) tool (reference: KMA Tool - BMC Bioinformatics), which generates a results table for each condition. The resulting *.mapstat files include the following columns:

refSequence: Name of the template sequence.
readCount: Number of reads mapped to the template.
fragmentCount: Number of fragments mapped to the template.
mapScoreSum: Accumulated mapping score (ConClave score).
refCoveredPositions: Number of covered positions in the template with a minimum depth of 1.
refConsensusSum: Total number of bases identical to the template.
bpTotal: Total number of bases aligned to the template.
depthVariance: Variance of the depth over the template.
nucHighDepthVariance: Number of positions in the template where the depth is more than 3 standard deviations higher.
depthMax: Maximum depth at any position in the template.
snpSum: Total number of SNPs.
insertSum: Total number of insertions.
deletionSum: Total number of deletions.

Below are examples of the results tables I obtain:

![Before Treatment]

before

![After Treatment]

after

As illustrated, there are more genes identified in the "before" condition compared to the "after" condition, with some genes disappearing post-treatment.

My main questions are as follows:

Which columns should be primarily considered for my calculations?
Based on my understanding, the most informative columns appear to be:
- readCount: Indicates the raw abundance of each gene.
- fragmentCount: Useful for paired-end sequencing data.
- depthMax: Shows the peak coverage.
- refCoveredPositions: Indicates the number of covered positions with a minimum depth of 1.
- bpTotal: Provides a sense of overall coverage.
How can I compare these two tables given the difference in the number of genes identified?
Normalization:
- What value(s) should be normalized, and how should this be done?
- Would Log2 normalization be appropriate for this context?

This is a new area for me, and I would greatly appreciate any insights or recommendations from those with more experience in this field.

Thank you in advance for your assistance.

sequencing normalisation nanopore • 308 views

ADD COMMENT • link 12 weeks ago by Dim • 0