Hi, I have a problem running featureCounts to generate a count matrix for miRNA.
This is my featureCounts comand: featureCounts -F GFF -R -t "miRNA" -g ID -o output.counts -a miRNA-annotation.gff ccc_sorted.bam
With this comand I obtain:
========= Running =======
Load annotation file miRNA-annotation.gff ...
Features : 16770
Meta-features : 16726
Chromosomes/contigs : 43
Process BAM file ccc_sorted.bam...
Single-end reads are included.
Assign reads to features...
Total reads : 668600
Successfully assigned reads : 38347 (5.7%)
Running time : 0.02 minutes
==============================
This rate is to low! So I tried to put "=" in the comand: featureCounts -F GFF -R -t "miRNA" -g ID= -o output.counts -a miRNA-annotation.gff ccc_sorted.bam And with this comand I obtain:
=========== Running =============
Warning: failed to find the gene identifier attribute in the 9th column of the provided GTF file. The specified gene identifier attribute is 'ID=' The attributes included in your GTF annotation are 'ID=miR1120-3p-898.path1'
Load annotation file miRNA-annotation.gff ...
Features : 16770
Meta-features : 16726
Chromosomes/contigs : 43
Process BAM file ccc_sorted.bam...
Single-end reads are included.
Assign reads to features...
Total reads : 668600
Successfully assigned reads : 668600 (100.0%)
Running time : 0.02 minutes
=====================
I don't know what is the problem. I try to convert my gff file in gtf but I lost the information of miRNA.
Any help would be appreciated
Can you post a few lines from your GFF annotation file?
GFF annotation file:
Please don't post additional information related to the question as an answer, please provide such info as comment or response to previous comments.
Have you tried using
-g ID
(orName
without the=
sign)? You should also examine the alignments (using the annotation file) to see if the reads are aligning outside of the features you are interested in.Yes, I tried -g ID and Name, and for both, I tried with = and without =. The results for ID and Name, are the same. Both without =, resulted in 5,7 % and both with = resulted in 100%. The problem with the = is the warning.
Also try using the
-M
option to count multi-mapping reads. Have you checked the alignments using a genome viewer?-M option isn't the solution, the results are the same. Nothing change.
Yes, everything is correct. This bam is a subset of my data, which contains only the reads that mapped in the regions annotated as miRNAs. So the 100% is normal and expected.
Maybe the problem is the gff.
Are there counts in the output file you get or are all values 0?
In the output I get one line (ignoring the header), with all chromosomes and the count 668600.
The output summary, I get this: Status ccc_sorted.bam
Assigned 668600
Unassigned_Ambiguity 0
Unassigned_MultiMapping 0
Unassigned_NoFeatures 0
Unassigned_Unmapped 0
Unassigned_MappingQuality 0
Unassigned_FragmentLength 0
Unassigned_Chimera 0
Unassigned_Secondary 0
Unassigned_Nonjunction 0
Unassigned_Duplicate 0