featureCounts/HTSeq-count problem with different annotation files
1
0
Entering edit mode
9.1 years ago

I'm analysing RNA-Seq data ( I should add that my bioinformatic knowledge is quite limited) and after aligning using TopHat I want to count reads in order to perform differential analysis of expression.

To achieve this I used both HTSeq-count and features counts and I have different GFF/GTF files that I use as annotation for the features.

I'm going to show you the example of the annotation file for the same gene

One of the files look like this (lets called it gene.gtf):

scaffold_5    XENBASE_XT_7_1_GENE_MODEL_2    gene    72053579    72062232    .    +    .    Name=t;ID=4652532;GeneName=T,%20brachyury%20homolog;GeneFunction=T-box%20transcription%20factor;GeneSynonyms=ntl,brachyury,Xbrachyury,bra,X-bra,Xbra

And the other like this (exon.gtf):

scaffold_5    xenTro7_1_genome_v7_2    start_codon    72055079    72055081    0.000000    +    .    gene_id "Xetrov72001125|Xetrov72001125|t"; transcript_id "Xetrov72001125|Xetrov72001125|t";
scaffold_5    xenTro7_1_genome_v7_2    CDS    72055079    72055284    0.000000    +    0    gene_id "Xetrov72001125|Xetrov72001125|t"; transcript_id "Xetrov72001125|Xetrov72001125|t";
scaffold_5    xenTro7_1_genome_v7_2    exon    72053579    72055284    0.000000    +    .    gene_id "Xetrov72001125|Xetrov72001125|t"; transcript_id "Xetrov72001125|Xetrov72001125|t";
scaffold_5    xenTro7_1_genome_v7_2    CDS    72057175    72057439    0.000000    +    1    gene_id "Xetrov72001125|Xetrov72001125|t"; transcript_id "Xetrov72001125|Xetrov72001125|t";
scaffold_5    xenTro7_1_genome_v7_2    exon    72057175    72057439    0.000000    +    .    gene_id "Xetrov72001125|Xetrov72001125|t"; transcript_id "Xetrov72001125|Xetrov72001125|t";
scaffold_5    xenTro7_1_genome_v7_2    CDS    72058300    72058434    0.000000    +    0    gene_id "Xetrov72001125|Xetrov72001125|t"; transcript_id "Xetrov72001125|Xetrov72001125|t";
scaffold_5    xenTro7_1_genome_v7_2    exon    72058300    72058434    0.000000    +    .    gene_id "Xetrov72001125|Xetrov72001125|t"; transcript_id "Xetrov72001125|Xetrov72001125|t";

Basically the first file only has the whole gene feature while the other is divided by exons.

When I run HTSeq-count or featureCounts with the same bam file using either one or the other annotation file I get completely different results. With the gene.gtf I get around 5000 reads for this gene while I get 0 reads for any feature using the exon.gtf file.

Am I missing something really obvious?

RNA-Seq HTSeq-count featureCounts • 3.6k views
ADD COMMENT
0
Entering edit mode

Can anyone help me ? I want to count my miRNA by using featureCounts. I have a bam file after mapped with the hg38 by bowtie2. I tried a lot but I couldn't. Thanks in advance

ADD REPLY
0
Entering edit mode

I tried a lot but I couldn't

You need to specify what exactly you tried to get meaningful help. Also don't post new questions as ANSWERS on existing threads.

ADD REPLY
0
Entering edit mode
9.1 years ago
Kamil ★ 2.3k

Could I ask you to post the commands that you are executing? It would be helpful to see all of the options you use with each command. My guess is that you may be missing some command line option.

On the other hand, you might consider the possibility that you have reads outside of exons. The exon intervals span 2715 nucleotides while the one gene interval spans 8653 nucleotides.

ADD COMMENT
0
Entering edit mode

This is the command I used with featureCounts

featureCounts -p -a ../gene.gtf -t gene -g Name -o counts_name.txt accepted_hits.bam
featureCounts -p -a ../exon.gtf -t exon -g gene_id -o counts_exon.txt accepted_hits.bam

and with HTSeq-count

htseq-count -s no -m intersection-nonempty -t gene -i Name -f bam accepted_hits_nsorted_paired_end.bam ../gene.gtf  > filename.counts
htseq-count -s no -m intersection-nonempty -t exon -i gene_id -f bam accepted_hits_nsorted_paired_end.bam ../gene.gtf  > filename.counts

I did not include the all the exon present in the second file, otherwise it would be too long. But the first coordinate of the first exon correspond to the first coordinate of the gene and the last coordinate of the last exon corresponds to the last coordinate of the gene (when comparing the two files).

ADD REPLY

Login before adding your answer.

Traffic: 2431 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6