Why does featurecounts give me an output file with only 0s?
0
0
Entering edit mode
3.2 years ago
Shraddha ▴ 90

Hello, I'm trying to run featurecounts on my .bam files, but the resulting file yields only 0s in every row and column. Here are the steps I have taken so far:

  1. (de novo) Assembled 40 transcripts from RNASeq data
  2. Used bowtie2 to align my transcripts against a reference + generate .bam files from the fastq files. Note: The reference transcriptome does not come from the same organism, but a closely related one.
  3. Ran featurecounts with the command:

    featureCounts -a ~/NAS/new_canna_out/cannabis_reference/cannabis_ref.gff3 -g gene_id -T 10 -o ~/NAS/new_canna_out/sp_fc *.bam

This gives me the following error:

ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file.
The specified gene identifier attribute is 'gene_id'
An example of attributes included in your GTF annotation is 'Parent=transcript:evm.model.01.1;Name=evm.model.01.1-E1;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=evm.model.01.1-E1;rank=1'

To counter this, I have tried giving the following command instead:

 featureCounts -a ~/NAS/new_canna_out/cannabis_reference/cannabis_ref.gff3 -g gene_id -t exon_id -T 10 -o ~/NAS/new_canna_out/sp_fc *.bam

as well as:

featureCounts -a ~/NAS/new_canna_out/cannabis_reference/cannabis_ref.gff3 -g Name -T 10 -o ~/NAS/new_canna_out/sp_fc *.bam

The first option (using -t exon_id) gives the error that no features were loaded in the format GTF. The second (using -g Name) does produce a result, but with only 0s

as seen here

I have checked that the reference and annotation contain the same gene IDs. I expected the -t flag to accept exon IDs (as seen in the featurecounts guide) but that doesn't work either.

What could the problem be? Thanks!

featurecounts • 1.9k views
ADD COMMENT
1
Entering edit mode

Show us a few lines from cannabis_ref.gff3. As error says your file likely contains Parent=transcript:evm.model.01.1;Name=evm.model.01.1-E1;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=evm.model.01.1-E1;rank=1' which deos not include the field gene_id.

In fact only Name (or Parent depending what that is pointing to) may be useful.

ADD REPLY
0
Entering edit mode

Thanks for your response! Actually, yes, some lines contain a gene_id and some contain an exon_id, as seen in this example:

1       CS10    exon    13599   13709   .       +       .       Parent=transcript:evm.model.01.1;Name=evm.model.01.1-E1;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=evm.model.01.1-E1;rank=1
1       CS10    gene    20925   22139   .       +       .       ID=gene:novel_gene_1_5bd9a17a;biotype=protein_coding;gene_id=novel_gene_1_5bd9a17a;logic_name=gff3_genes
1       CS10    gene    27010   31685   .       +       .       ID=gene:evm.TU.01.2;biotype=protein_coding;description=AT-rich interactive domain-containing protein 3 [Source:Projected from Arabidopsis thaliana (AT2G17410) UniProtKB/Swiss-Prot%3BAcc:Q940Y3];gene_id=evm.TU.01.2;logic_name=gff3_genes

I have tried running featureCounts with -g Name, but that is what yielded a result file with all 0s.

ADD REPLY
1
Entering edit mode

You can try using -g ID. That should generate counts. Since you have exon entries in your GTF hopefully they are tied to parent gene entries otherwise you may get counts that span entire gene region which will include introns.

ADD REPLY
0
Entering edit mode

No luck, -g ID just yielded the same error, stating that featurecounts failed to find the gene identifier attirbute in the 9th column of the provided GTF file.

ADD REPLY
1
Entering edit mode

You can try following:

  1. Take a back up of gtf
  2. Can you try features with following options: -g ID and -t ID? or
  3. -g Parent and -t exon

Since you posted 3 lines of gtf, it is difficult to check other features.

ADD REPLY
0
Entering edit mode

Thanks for your response! I tried both options 2 and 3. Running it with -g, -t ID gives me the 'no features were loaded' error. -g Parent, -t exon gives me the same output file with no assigned alignments.

Why the backup of the gtf file, btw?

ADD REPLY
0
Entering edit mode

I forgot to type the third option. It seems gene_id field is in your gtf, but not present at required place to parse. You can try reshuffling gene_id information to place at the correct position. For that reason, I suggested to take a back up of gtf.

In addition, check if gtf has same chromosome/contig information as reference (for eg chr1 vs 1) and strand information (you can get this from either core or inferexperiments.py from RSeqC). Please load the bam into IGV or any visualizer and check the coverage at place/gene of interest.

Btw, where did you get the gtf from?

ADD REPLY
0
Entering edit mode

I see! Yes, I'll try to make a copy of the gff with the ID replaced with gene_id. I've checked the annotation with the reference transcriptome, and they have matching gene IDs.

And I got the gff from ENSEMBL, here: http://ftp.ensemblgenomes.org/pub/plants/release-51/

ADD REPLY
0
Entering edit mode

I was talking chromosome notation in reference genome (genome.fa) and gtf (first column). I hope you checked if they are same. Please do not change ID just like that. There are other features starting with ID in gtf.

ADD REPLY

Login before adding your answer.

Traffic: 3341 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6