I run featureCounts like this:
featureCounts -p -F GFF -a ./NCBI/GCF_000612305.1_Egrandis1_0_genomic.gff -t exon -g ID -o ./featureCounts/A1.counts.txt -f --extraAttributes gene ./raw_data/A1.sam.bam
And I extracted exons sequences from whole fasta genome by AGAT like this:
agat_sp_extract_sequences.pl -f ./NCBI/GCF_000612305.1_Egrandis1_0_genomic.fna -gff ./NCBI/GCF_000612305.1_Egrandis1_0_genomic.gff -t exon --split -o ./NCBI/GCF_000612305.1_Egrandis1_0_genomic.exons.fna
AGAT returns a few exons sequence that featureCounts have been ignored.
Relevant files: GCF_000612305.1_Egrandis1_0_genomic.fna.gz GCF_000612305.1_Egrandis1_0_genomic.gff.gz
Please, see a list of them (not considered by featureCounts):
NC_014570.1 RefSeq gene 539 1600 . - . ID=gene-EucgrC_p001;Dbxref=GeneID:9829619;Name=psbA;gbkey=Gene;gene=psbA;gene_biotype=protein_coding;locus_tag=EucgrC_p001
NC_014570.1 RefSeq CDS 539 1600 . - 0 ID=cds-YP_003933945.1;Parent=gene-EucgrC_p001;Dbxref=Genbank:YP_003933945.1,GeneID:9829619;Name=YP_003933945.1;gbkey=CDS;gene=psbA;locus_tag=EucgrC_p001;product=photosystem II protein D1;protein_id=YP_003933945.1;transl_table=11
NC_014570.1 RefSeq gene 151110 157952 . - . ID=gene-EucgrC_p083;Dbxref=GeneID:9829735;Name=ycf2;gbkey=Gene;gene=ycf2;gene_biotype=protein_coding;locus_tag=EucgrC_p083
NC_014570.1 RefSeq CDS 151110 157952 . - 0 ID=cds-YP_003934017.1;Parent=gene-EucgrC_p083;Dbxref=Genbank:YP_003934017.1,GeneID:9829735;Name=YP_003934017.1;gbkey=CDS;gene=ycf2;locus_tag=EucgrC_p083;product=hypothetical chloroplast RF21;protein_id=YP_003934017.1;transl_table=11
However, this ones is considered by featureCounts:
NC_014570.1 RefSeq gene 12775 14090 . - . ID=gene-EucgrC_p006;Dbxref=GeneID:9829628;Name=atpF;gbkey=Gene;gene=atpF;gene_biotype=protein_coding;locus_tag=EucgrC_p006
NC_014570.1 RefSeq CDS 13946 14090 . - 0 ID=cds-YP_003933950.1;Parent=gene-EucgrC_p006;Dbxref=Genbank:YP_003933950.1,GeneID:9829628;Name=YP_003933950.1;gbkey=CDS;gene=atpF;locus_tag=EucgrC_p006;product=ATP synthase CF0 subunit I;protein_id=YP_003933950.1;transl_table=11
NC_014570.1 RefSeq CDS 12775 13184 . - 2 ID=cds-YP_003933950.1;Parent=gene-EucgrC_p006;Dbxref=Genbank:YP_003933950.1,GeneID:9829628;Name=YP_003933950.1;gbkey=CDS;gene=atpF;locus_tag=EucgrC_p006;product=ATP synthase CF0 subunit I;protein_id=YP_003933950.1;transl_table=11
NC_014570.1 RefSeq exon 12778 13245 . - . ID=id-EucgrC_p006-2;Parent=gene-EucgrC_p006;Dbxref=GeneID:9829628;exon_number=2;gbkey=exon;gene=atpF;locus_tag=EucgrC_p006;number=2
NC_014570.1 RefSeq exon 13932 14090 . - . ID=id-EucgrC_p006-1;Parent=gene-EucgrC_p006;Dbxref=GeneID:9829628;exon_number=1;gbkey=exon;gene=atpF;locus_tag=EucgrC_p006;number=1
NC_014570.1 RefSeq gene 74329 74442 . + . ID=gene-EucgrC_p044;Dbxref=GeneID:9829682;Name=rps12;exception=trans-splicing;gbkey=Gene;gene=rps12;gene_biotype=protein_coding;locus_tag=EucgrC_p044;part=1/2
NC_014570.1 RefSeq gene 145633 146435 . + . ID=gene-EucgrC_p044;Dbxref=GeneID:9829682;Name=rps12;exception=trans-splicing;gbkey=Gene;gene=rps12;gene_biotype=protein_coding;locus_tag=EucgrC_p044;part=2/2
NC_014570.1 RefSeq CDS 74329 74442 . + 0 ID=cds-YP_003933983.1;Parent=gene-EucgrC_p044;Dbxref=Genbank:YP_003933983.1,GeneID:9829682;Name=YP_003933983.1;exception=trans-splicing;gbkey=CDS;gene=rps12;locus_tag=EucgrC_p044;product=ribosomal protein S12;protein_id=YP_003933983.1;transl_table=11
NC_014570.1 RefSeq CDS 145633 145863 . + 0 ID=cds-YP_003933983.1;Parent=gene-EucgrC_p044;Dbxref=Genbank:YP_003933983.1,GeneID:9829682;Name=YP_003933983.1;exception=trans-splicing;gbkey=CDS;gene=rps12;locus_tag=EucgrC_p044;product=ribosomal protein S12;protein_id=YP_003933983.1;transl_table=11
NC_014570.1 RefSeq CDS 146409 146435 . + 0 ID=cds-YP_003933983.1;Parent=gene-EucgrC_p044;Dbxref=Genbank:YP_003933983.1,GeneID:9829682;Name=YP_003933983.1;exception=trans-splicing;gbkey=CDS;gene=rps12;locus_tag=EucgrC_p044;product=ribosomal protein S12;protein_id=YP_003933983.1;transl_table=11
NC_014570.1 RefSeq exon 145633 145863 . + . ID=id-EucgrC_p044-2;Parent=gene-EucgrC_p044;Dbxref=GeneID:9829682;exon_number=2;gbkey=exon;gene=rps12;locus_tag=EucgrC_p044;number=2
I could see that those that isn't considered by featureCounts haven't exon in the third column. This is the reason? Is there a way to by pass this? Editing the GFF file?
Thank you!
It is certainly worth a try. The
-t exon
option specifies the GTF feature type in column 3 of the GTF/GFF3 that featureCounts uses. I suggest you do a small experiment by changing the CDS term in the third column to exon for a couple of genes and see whether it works. Please do post your findings here for the benefit of the community.