Htseq is giving me 0 counts using the GFF3 of miRBase
1
2
Entering edit mode
2.8 years ago
Rafael Soler ★ 1.3k

Hello!

I am trying to annotate a miRNA-seq so that it gives me mature miRNAs where I already have 5p and 3p. For this, I have used the index mm10.fa and the miRBase mmu.gff3. I have aligned with HISAT2 and am trying to count with HTSeq, however I get 0 reads.

for i in `seq 3 8`
do
htseq-count -f bam -r pos -s no -m union -t=miRNA -i=ID --additional-attr=Name L852${i}.sorted.bam mmu.gff3 > L852${i}_count.txt
done

Result:

MIMAT0000372    mmu-miR-294-3p  0
MIMAT0000373    mmu-miR-295-3p  0
MIMAT0000374    mmu-miR-296-5p  0
MIMAT0000375    mmu-miR-297a-5p 0
MIMAT0000375_1  mmu-miR-297a-5p 0
MIMAT0000375_2  mmu-miR-297a-5p 0
MIMAT0000375_3  mmu-miR-297a-5p 0
MIMAT0000376    mmu-miR-298-5p  0

However, when I extract the GTF miRNAs from mm10

cat Mus_musculus.GRCm38.100.gtf | grep 'miRNA' -> miRNA.gtf

And I do the counting:

for i in `seq 3 8`
do
htseq-count -f bam -r pos -s no -m union --additional-attr=gene_name L852${i}.sorted.bam miRNA.gtf > L852${i}_count.txt
done

Results:

ENSMUSG00000077903  Mir294  14
ENSMUSG00000077921  Mir872  5202
ENSMUSG00000077925  Mir453  0
ENSMUSG00000077928  Mir208b 0
ENSMUSG00000077931  Mir877  1046

So I don't know what could be happening, has de coordinates are the same...

Any idea? Thanks!


The mmu.gff3 is like this:

chr1    .   miRNA_primary_transcript    12425986    12426106    .   +   .   ID=MI0021869;Alias=MI0021869;Name=mmu-mir-6341
chr1    .   miRNA   12426016    12426038    .   +   .   ID=MIMAT0025084;Alias=MIMAT0025084;Name=mmu-miR-6341;Derives_from=MI0021869
chr1    .   miRNA_primary_transcript    20679010    20679082    .   +   .   ID=MI0000249;Alias=MI0000249;Name=mmu-mir-206
chr1    .   miRNA   20679017    20679039    .   +   .   ID=MIMAT0017004;Alias=MIMAT0017004;Name=mmu-miR-206-5p;Derives_from=MI0000249
chr1    .   miRNA   20679055    20679076    .   +   .   ID=MIMAT0000239;Alias=MIMAT0000239;Name=mmu-miR-206-3p;Derives_from=MI0000249

And the miRNA.gtf is like this:

1   ensembl gene    5644645 5644745 .   -   .   gene_id "ENSMUSG00000093015"; gene_version "1"; gene_name "Gm22463"; gene_source "ensembl"; gene_biotype "miRNA";
1   ensembl transcript  5644645 5644745 .   -   .   gene_id "ENSMUSG00000093015"; gene_version "1"; transcript_id "ENSMUST00000175274"; transcript_version "1"; gene_name "Gm22463"; gene_source "ensembl"; gene_biotype "miRNA"; transcript_name "Gm22463-201"; transcript_source "ensembl"; transcript_biotype "miRNA"; tag "basic"; transcript_support_level "NA";
1   ensembl exon    5644645 5644745 .   -   .   gene_id "ENSMUSG00000093015"; gene_version "1"; transcript_id "ENSMUST00000175274"; transcript_version "1"; exon_number "1"; gene_name "Gm22463"; gene_source "ensembl"; gene_biotype "miRNA"; transcript_name "Gm22463-201"; transcript_source "ensembl"; transcript_biotype "miRNA"; exon_id "ENSMUSE00000945922"; exon_version "1"; tag "basic"; transcript_support_level "NA";
Counts GFF3 miRNA HTSeq GTF • 1.0k views
ADD COMMENT
1
Entering edit mode
2.8 years ago
Carambakaracho ★ 3.3k

Your first column sequence identifiers don't match between the mmu.gff3 an miRNA.gtf. Those need to match the fasta headers

ADD COMMENT
1
Entering edit mode

Thank you so much! It was a headache! Please post it as a reply as I can mark it as correct!

ADD REPLY

Login before adding your answer.

Traffic: 1662 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6