Question

Generate Read counts from bam file

0

Entering edit mode

13 months ago

singhankit973 • 0

Currently i am working on a project related to LHON disease (rare mitochondrial disorder which leads to progressive visual loss).

I have 9 RNA-seq fastq files out of which 3 are for carriers, 3 for affected and 3 for control. Data downloaded is taken from here link

To do read mapping for this dataset what should I take as reference genome (human reference genome or mitochondrial reference genome)

After doing read mapping I have to generate read counts to apply DeSeq2 on it.

If I'm using mitochondrial reference genome to align the fastq files then in further steps I'll need annotation file (gtf) to get read counts. But I'm unable to find annotation file of mitochondrial genome. How can I get this annotation file for mitochondrial genome?

And my other query is: can i use human reference genome(hg38) for read mapping because annotation file for this genome is available and then generate read counts.

Please tell me which approach will be better.

RNAseq reference_genome Deseq2 read_counts • 1.0k views

ADD COMMENT • link updated 13 months ago by NextGenSeek ▴ 10 • written 13 months ago by singhankit973 • 0

0

Entering edit mode

Are you analysing RNA or DNA heteroplasmy levels?

ADD REPLY • link 13 months ago by NextGenSeek ▴ 10

score 2 · Answer 1 · 2023-11-28

2

Entering edit mode

13 months ago

ATpoint 86k

I don't follow. You always align to the full genome (which includes the mt genome in case of human reference genomes).

Anyway, if you go to the link you provide and scroll down to Supplementary file the authors already provide a matrix of raw counts, so why bother and not just use that?

ADD COMMENT • link 13 months ago by ATpoint 86k

score 0 · Answer 2 · 2023-11-28

0

Entering edit mode

13 months ago

Enrique • 0

Hello, I recommend you using the mitochondrial reference genome. For the GTF file (or GFF, they are in general the same), checkout this post: Where are the associated annotation (GFF) files for mitochondrial genomes on NCBI?.

ADD COMMENT • link 13 months ago by Enrique • 0

3

Entering edit mode

No, absolutely not. Mapping to such a tiny subset leads to false positives. Use the entire genome that includes the mt reference.

ADD REPLY • link 13 months ago by ATpoint 86k

0

Entering edit mode

Great appreciation. If you don't use restrictive arguments in the mapping, is better to use the entire genome to avoid the false positives related to the "low complexity" of the mitochondrial genome.

ADD REPLY • link 13 months ago by Enrique • 0

0

Entering edit mode

It has nothing to do with low complexity. You always map to the entire genome since the reads can come from the entire genome. If you take away the true origin of the reads during mapping then the aligner will still try to match it elsewhere, leading to false results.

ADD REPLY • link 13 months ago by ATpoint 86k