htseq-count not working with miRbase gff?
1
2
Entering edit mode
9.4 years ago
manekineko ▴ 150

I have tried to count mapped reads on the human miRNA regions from the miRBAse human GFF

the output was just:

__no_feature    70216353
__ambiguous     0
__too_low_aQual 0
__not_aligned   1383340
__alignment_not_unique  0

So there should be something I do wrong?

I have run it - htseq-count SAMfile Gfffile (SAMfile is from bowtie with option -S)

The GFF is from MirBase:

chr1    .    miRNA_primary_transcript    17369    17436    .    -    .    ID=MI0022705;Alias=MI0022705;Name=hsa-mir-6859-1
chr1    .    miRNA    17409    17431    .    -    .    ID=MIMAT0027618;Alias=MIMAT0027618;Name=hsa-miR-6859-5p;Derives_from=MI0022705
chr1    .    miRNA    17369    17391    .    -    .    ID=MIMAT0027619;Alias=MIMAT0027619;Name=hsa-miR-6859-3p;Derives_from=MI0022705
chr1    .    miRNA_primary_transcript    30366    30503    .    +    .    ID=MI0006363;Alias=MI0006363;Name=hsa-mir-1302-2
chr1    .    miRNA    30438    30458    .    +    .    ID=MIMAT0005890;Alias=MIMAT0005890;Name=hsa-miR-1302;Derives_from=MI0006363
chr1    .    miRNA_primary_transcript    187891    187958    .    -    .    ID=MI0026420;Alias=MI0026420;Name=hsa-mir-6859-2
chr1    .    miRNA    187931    187953    .    -    .    ID=MIMAT0027618_1;Alias=MIMAT0027618;Name=hsa-miR-6859-5p;Derives_from=MI0026420
chr1    .    miRNA    187891    187913    .    -    .    ID=MIMAT0027619_1;Alias=MIMAT0027619;Name=hsa-miR-6859-3p;Derives_from=MI0026420
chr1    .    miRNA_primary_transcript    632325    632413    .    -    .    ID=MI0022558;Alias=MI0022558;Name=hsa-mir-6723
chr1    .    miRNA    632382    632403    .    -    .    ID=MIMAT0025855;Alias=MIMAT0025855;Name=hsa-miR-6723-5p;Derives_from=MI0022558
htseq-count mirbase gff • 4.6k views
ADD COMMENT
0
Entering edit mode

Oh I forgot to use -t, is it also necessary to use -i ID?

And should the SAM file be sorted?

I will read the manual.

ADD REPLY
0
Entering edit mode

Sorry, I just edited my answer to say this as you submitted this.

Yes you should use -i ID.

Your Sam file is likely already sorted (many aligners do this for you) or HTSeq would have thrown an error.

ADD REPLY
0
Entering edit mode

Thanks!

since my SAM is quite big (15G) I wait quite a lot, befor I see results. Is there a way to see if it is counting properly with this flags in the middle of the run?

ADD REPLY
0
Entering edit mode

You can use samtools to subsection a random percentage of your input file, for example 5% would use

samtools view -s 0.05 infile.sam > sampledfile.sam

Sample Sam File

ADD REPLY
0
Entering edit mode

Is there a way to count reads but with option if they are longer than XXnt?

ADD REPLY
0
Entering edit mode

Not as far as I am aware.

You could control this during the preprocessing or alignment stages.

It is also possible to extract only reads of a specified length from a sam file Bam : Extract Only Alignment With A Specific Length

Also, please mark my answer if accepted if it has solved your original issue with read counting.

ADD REPLY
0
Entering edit mode

I found the solution - BBmap have tool to reformat SAM

reformat.sh in=file.sam out=file.sam minlength=50 maxlength=200
ADD REPLY
0
Entering edit mode

Hi manekineko,

What program exactly did you download to reformat your sam file? Also, what do the options minlength and maxlength do?

Thanks!

ADD REPLY
3
Entering edit mode
9.4 years ago
CraigM ▴ 90

Did you use the -t flag to change the feature type to miRNA_primary_transcript or miRNA? As far as HTSeq would be concerned (without using this flag) you have given it no features to look for.

The default looks for exon in the third column

You should also use the -I flag to use 'ID' as the id attribute (default looks in the last column for 'gene_id')

Ensure you read the manuals before using these tools.

http://www-huber.embl.de/users/anders/HTSeq/doc/count.html

ADD COMMENT

Login before adding your answer.

Traffic: 1615 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6