Question

Htseq is giving me 0 counts using the GFF3 of miRBase

2

Entering edit mode

2.8 years ago

Rafael Soler ★ 1.3k

Hello!

I am trying to annotate a miRNA-seq so that it gives me mature miRNAs where I already have 5p and 3p. For this, I have used the index mm10.fa and the miRBase mmu.gff3. I have aligned with HISAT2 and am trying to count with HTSeq, however I get 0 reads.

for i in `seq 3 8`
do
htseq-count -f bam -r pos -s no -m union -t=miRNA -i=ID --additional-attr=Name L852${i}.sorted.bam mmu.gff3 > L852${i}_count.txt
done

Result:

MIMAT0000372    mmu-miR-294-3p  0
MIMAT0000373    mmu-miR-295-3p  0
MIMAT0000374    mmu-miR-296-5p  0
MIMAT0000375    mmu-miR-297a-5p 0
MIMAT0000375_1  mmu-miR-297a-5p 0
MIMAT0000375_2  mmu-miR-297a-5p 0
MIMAT0000375_3  mmu-miR-297a-5p 0
MIMAT0000376    mmu-miR-298-5p  0

However, when I extract the GTF miRNAs from mm10

cat Mus_musculus.GRCm38.100.gtf | grep 'miRNA' -> miRNA.gtf

And I do the counting:

for i in `seq 3 8`
do
htseq-count -f bam -r pos -s no -m union --additional-attr=gene_name L852${i}.sorted.bam miRNA.gtf > L852${i}_count.txt
done

Results:

ENSMUSG00000077903  Mir294  14
ENSMUSG00000077921  Mir872  5202
ENSMUSG00000077925  Mir453  0
ENSMUSG00000077928  Mir208b 0
ENSMUSG00000077931  Mir877  1046

So I don't know what could be happening, has de coordinates are the same...

Any idea? Thanks!

The mmu.gff3 is like this:

chr1    .   miRNA_primary_transcript    12425986    12426106    .   +   .   ID=MI0021869;Alias=MI0021869;Name=mmu-mir-6341
chr1    .   miRNA   12426016    12426038    .   +   .   ID=MIMAT0025084;Alias=MIMAT0025084;Name=mmu-miR-6341;Derives_from=MI0021869
chr1    .   miRNA_primary_transcript    20679010    20679082    .   +   .   ID=MI0000249;Alias=MI0000249;Name=mmu-mir-206
chr1    .   miRNA   20679017    20679039    .   +   .   ID=MIMAT0017004;Alias=MIMAT0017004;Name=mmu-miR-206-5p;Derives_from=MI0000249
chr1    .   miRNA   20679055    20679076    .   +   .   ID=MIMAT0000239;Alias=MIMAT0000239;Name=mmu-miR-206-3p;Derives_from=MI0000249

And the miRNA.gtf is like this:

1   ensembl gene    5644645 5644745 .   -   .   gene_id "ENSMUSG00000093015"; gene_version "1"; gene_name "Gm22463"; gene_source "ensembl"; gene_biotype "miRNA";
1   ensembl transcript  5644645 5644745 .   -   .   gene_id "ENSMUSG00000093015"; gene_version "1"; transcript_id "ENSMUST00000175274"; transcript_version "1"; gene_name "Gm22463"; gene_source "ensembl"; gene_biotype "miRNA"; transcript_name "Gm22463-201"; transcript_source "ensembl"; transcript_biotype "miRNA"; tag "basic"; transcript_support_level "NA";
1   ensembl exon    5644645 5644745 .   -   .   gene_id "ENSMUSG00000093015"; gene_version "1"; transcript_id "ENSMUST00000175274"; transcript_version "1"; exon_number "1"; gene_name "Gm22463"; gene_source "ensembl"; gene_biotype "miRNA"; transcript_name "Gm22463-201"; transcript_source "ensembl"; transcript_biotype "miRNA"; exon_id "ENSMUSE00000945922"; exon_version "1"; tag "basic"; transcript_support_level "NA";

Counts GFF3 miRNA HTSeq GTF • 1.0k views

ADD COMMENT • link 2.8 years ago by Rafael Soler ★ 1.3k

score 1 · Answer 1 · 2022-03-10

1

Entering edit mode

2.8 years ago

Carambakaracho ★ 3.3k

Your first column sequence identifiers don't match between the mmu.gff3 an miRNA.gtf. Those need to match the fasta headers

ADD COMMENT • link 2.8 years ago by Carambakaracho ★ 3.3k

1

Entering edit mode

Thank you so much! It was a headache! Please post it as a reply as I can mark it as correct!

ADD REPLY • link 2.8 years ago by Rafael Soler ★ 1.3k