Question

htseq, stringtie, featurecounts all not working on nanopore direct RNA

0

Entering edit mode

5 months ago

CL • 0

I am analyzing Nanopore direct RNA seq data. I tried to analyze the reads using htseq (gives me 0 count for all reads), featureCounts (0 successful alignment assignment), and stringtie2 (WARNING: no reference transcripts were found for the genomic sequences where reads were mapped! Please make sure the -G annotation file uses the same naming convention for the genome sequences).

Based on some searches, it should be a mismatch between the naming convention of chromosome in my bam file versus the gtf file. I used minimap2 to do the alignment.

I found that in my bam file, there is no chromosome number information. Usually, the chromosome number should be in the third column, but instead, it looks like this:

77407ca3-9603-4b25-8a43-bcab8384e080    0       AT2G34250_ID9   1135    0       2S52M1D9M3I4M1D62M1D22M5D5M1D75M5D42M1I24M1D6M1D14M2D3M3D16M5S    *       0       0       GTTGATTTCATGGGAGCCATCGGGTCGGGAACCGGAATTCTG ...

The third column is the transcript ID.

Could it be the reason why all these methods do not work? If so, how should I address it?

These are the codes I am using.

htseq-count -f bam -q $a.sorted.bam AtRTD2_19April2016_sorted.gtf > a.htseq.txt
featureCounts -F gtf -s 2 -p -T 15 -a AtRTD2_19April2016_sorted.gtf  -o a_featurecounts_m6A.txt a.sorted.bam
stringtie -L -G AtRTD2_19April2016_sorted.gtf -o a_stringtie.gtf a.sorted.bam

stringtie minimap2 htseq quantification nanopore • 579 views

ADD COMMENT • link updated 4 months ago by colindaven 7.0k • written 5 months ago by CL • 0

2

Entering edit mode

AT2G34250_ID9

Looks like you are using transcript sequence to align against? Your GTF file must match. Otherwise use the entire genome for alignment and then use the GTF file with gene models in it.

If you are aligning to transcripts you could simply count number reads aligning to each transcript by using samtools idxstats.

ADD REPLY • link 5 months ago by GenoMax 147k

0

Entering edit mode

Thanks! But I would also love to use StringTie to annotate. For example, I would want to calculate percentage of reads mapped to CDS vs 5'UTR, 3'UTR, etc. Do you have any advice on that?

ADD REPLY • link 5 months ago by CL • 0

score 1 · Answer 1 · 2024-07-02

As Genomax says, you should use a genome for alignment, and a GTF from that genome to quantify the reads.

Those tools will all work, you have just apparently aligned reads to transcripts and not a genome.

By the way, there are many, many tutorials on RNA-seq available. The ones at Galaxy Training are particularly nice.