Question

Warning when using gff format in featureCounts - miRNAs

0

Entering edit mode

7.7 years ago

MeiNB ▴ 10

Hi, I have a problem running featureCounts to generate a count matrix for miRNA.

This is my featureCounts comand: featureCounts -F GFF -R -t "miRNA" -g ID -o output.counts -a miRNA-annotation.gff ccc_sorted.bam

With this comand I obtain:

========= Running =======

Load annotation file miRNA-annotation.gff ...
Features : 16770
Meta-features : 16726
Chromosomes/contigs : 43

Process BAM file ccc_sorted.bam...
Single-end reads are included.
Assign reads to features...
Total reads : 668600
Successfully assigned reads : 38347 (5.7%)
Running time : 0.02 minutes

==============================

This rate is to low! So I tried to put "=" in the comand: featureCounts -F GFF -R -t "miRNA" -g ID= -o output.counts -a miRNA-annotation.gff ccc_sorted.bam And with this comand I obtain:

=========== Running =============

Warning: failed to find the gene identifier attribute in the 9th column of the provided GTF file. The specified gene identifier attribute is 'ID=' The attributes included in your GTF annotation are 'ID=miR1120-3p-898.path1'

Load annotation file miRNA-annotation.gff ...
Features : 16770
Meta-features : 16726
Chromosomes/contigs : 43

Process BAM file ccc_sorted.bam...
Single-end reads are included.
Assign reads to features...
Total reads : 668600
Successfully assigned reads : 668600 (100.0%)
Running time : 0.02 minutes

=====================

I don't know what is the problem. I try to convert my gff file in gtf but I lost the information of miRNA.

Any help would be appreciated

miRNA featureCounts raw counts • 4.7k views

ADD COMMENT • link 7.7 years ago by MeiNB ▴ 10

0

Entering edit mode

Can you post a few lines from your GFF annotation file?

ADD REPLY • link 7.7 years ago by GenoMax 152k

0

Entering edit mode

GFF annotation file:

chr1A     data     miRNA   755777  755886  100     +       .       ID=miR1120-3p-898.path1;Name=miR1120-3p-898;Target=miR1120-3p-898 1 110;Gap=M110

chr1A     data     miRNA   755784  755886  100     +       .       ID=miR1120-3p-1946.path4;Name=miR1120-3p-1946;Target=miR1120-3p-1946 8 110;Gap=M103

chr1A     data    miRNA   1239102 1239256 100     -       .       ID=miR1125-5p-312.path1;Name=miR1125-5p-312;Target=miR1125-5p-312 1 155;Gap=M155

chr1A     data    miRNA   1736279 1736371 100     -       .       ID=miR1131-5p-73.path1;Name=miR1131-5p-73;Target=miR1131-5p-73 1 93;Gap=M93

ADD REPLY • link updated 7.7 years ago by GenoMax 152k • written 7.7 years ago by MeiNB ▴ 10

0

Entering edit mode

Please don't post additional information related to the question as an answer, please provide such info as comment or response to previous comments.

ADD REPLY • link 7.7 years ago by Sej Modha 5.3k

0

Entering edit mode

Have you tried using -g ID (or Name without the = sign)? You should also examine the alignments (using the annotation file) to see if the reads are aligning outside of the features you are interested in.

ADD REPLY • link 7.7 years ago by GenoMax 152k

0

Entering edit mode

Yes, I tried -g ID and Name, and for both, I tried with = and without =. The results for ID and Name, are the same. Both without =, resulted in 5,7 % and both with = resulted in 100%. The problem with the = is the warning.

ADD REPLY • link 7.7 years ago by MeiNB ▴ 10

0

Entering edit mode

Also try using the -M option to count multi-mapping reads. Have you checked the alignments using a genome viewer?

ADD REPLY • link 7.7 years ago by GenoMax 152k

0

Entering edit mode

-M option isn't the solution, the results are the same. Nothing change.

Yes, everything is correct. This bam is a subset of my data, which contains only the reads that mapped in the regions annotated as miRNAs. So the 100% is normal and expected.

Maybe the problem is the gff.

ADD REPLY • link 7.7 years ago by MeiNB ▴ 10

0

Entering edit mode

Are there counts in the output file you get or are all values 0?

ADD REPLY • link 7.7 years ago by GenoMax 152k

0

Entering edit mode

In the output I get one line (ignoring the header), with all chromosomes and the count 668600.

The output summary, I get this: Status ccc_sorted.bam

Assigned 668600

Unassigned_Ambiguity 0

Unassigned_MultiMapping 0

Unassigned_NoFeatures 0

Unassigned_Unmapped 0

Unassigned_MappingQuality 0

Unassigned_FragmentLength 0

Unassigned_Chimera 0

Unassigned_Secondary 0

Unassigned_Nonjunction 0

Unassigned_Duplicate 0

ADD REPLY • link 7.7 years ago by MeiNB ▴ 10