Question

featureCounts/HTSeq-count problem with different annotation files

0

Entering edit mode

9.8 years ago

monteiro.s.rita • 0

I'm analysing RNA-Seq data ( I should add that my bioinformatic knowledge is quite limited) and after aligning using TopHat I want to count reads in order to perform differential analysis of expression.

To achieve this I used both HTSeq-count and features counts and I have different GFF/GTF files that I use as annotation for the features.

I'm going to show you the example of the annotation file for the same gene

One of the files look like this (lets called it gene.gtf):

scaffold_5    XENBASE_XT_7_1_GENE_MODEL_2    gene    72053579    72062232    .    +    .    Name=t;ID=4652532;GeneName=T,%20brachyury%20homolog;GeneFunction=T-box%20transcription%20factor;GeneSynonyms=ntl,brachyury,Xbrachyury,bra,X-bra,Xbra

And the other like this (exon.gtf):

scaffold_5    xenTro7_1_genome_v7_2    start_codon    72055079    72055081    0.000000    +    .    gene_id "Xetrov72001125|Xetrov72001125|t"; transcript_id "Xetrov72001125|Xetrov72001125|t";
scaffold_5    xenTro7_1_genome_v7_2    CDS    72055079    72055284    0.000000    +    0    gene_id "Xetrov72001125|Xetrov72001125|t"; transcript_id "Xetrov72001125|Xetrov72001125|t";
scaffold_5    xenTro7_1_genome_v7_2    exon    72053579    72055284    0.000000    +    .    gene_id "Xetrov72001125|Xetrov72001125|t"; transcript_id "Xetrov72001125|Xetrov72001125|t";
scaffold_5    xenTro7_1_genome_v7_2    CDS    72057175    72057439    0.000000    +    1    gene_id "Xetrov72001125|Xetrov72001125|t"; transcript_id "Xetrov72001125|Xetrov72001125|t";
scaffold_5    xenTro7_1_genome_v7_2    exon    72057175    72057439    0.000000    +    .    gene_id "Xetrov72001125|Xetrov72001125|t"; transcript_id "Xetrov72001125|Xetrov72001125|t";
scaffold_5    xenTro7_1_genome_v7_2    CDS    72058300    72058434    0.000000    +    0    gene_id "Xetrov72001125|Xetrov72001125|t"; transcript_id "Xetrov72001125|Xetrov72001125|t";
scaffold_5    xenTro7_1_genome_v7_2    exon    72058300    72058434    0.000000    +    .    gene_id "Xetrov72001125|Xetrov72001125|t"; transcript_id "Xetrov72001125|Xetrov72001125|t";

Basically the first file only has the whole gene feature while the other is divided by exons.

When I run HTSeq-count or featureCounts with the same bam file using either one or the other annotation file I get completely different results. With the gene.gtf I get around 5000 reads for this gene while I get 0 reads for any feature using the exon.gtf file.

Am I missing something really obvious?

RNA-Seq HTSeq-count featureCounts • 3.9k views

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 9.8 years ago by monteiro.s.rita • 0

0

Entering edit mode

Can anyone help me ? I want to count my miRNA by using featureCounts. I have a bam file after mapped with the hg38 by bowtie2. I tried a lot but I couldn't. Thanks in advance

ADD REPLY • link updated 3.0 years ago by Ram 45k • written 8.2 years ago by k.kathirvel93 ▴ 310

0

Entering edit mode

I tried a lot but I couldn't

You need to specify what exactly you tried to get meaningful help. Also don't post new questions as ANSWERS on existing threads.

ADD REPLY • link updated 3.0 years ago by Ram 45k • written 8.2 years ago by GenoMax 153k

Ram · Answer 1 · 2015-10-23

0

Entering edit mode

9.8 years ago

Kamil ★ 2.3k

Could I ask you to post the commands that you are executing? It would be helpful to see all of the options you use with each command. My guess is that you may be missing some command line option.

On the other hand, you might consider the possibility that you have reads outside of exons. The exon intervals span 2715 nucleotides while the one gene interval spans 8653 nucleotides.

ADD COMMENT • link updated 5.8 years ago by Ram 45k • written 9.8 years ago by Kamil ★ 2.3k

0

Entering edit mode

This is the command I used with featureCounts

featureCounts -p -a ../gene.gtf -t gene -g Name -o counts_name.txt accepted_hits.bam
featureCounts -p -a ../exon.gtf -t exon -g gene_id -o counts_exon.txt accepted_hits.bam

and with HTSeq-count

htseq-count -s no -m intersection-nonempty -t gene -i Name -f bam accepted_hits_nsorted_paired_end.bam ../gene.gtf  > filename.counts
htseq-count -s no -m intersection-nonempty -t exon -i gene_id -f bam accepted_hits_nsorted_paired_end.bam ../gene.gtf  > filename.counts

I did not include the all the exon present in the second file, otherwise it would be too long. But the first coordinate of the first exon correspond to the first coordinate of the gene and the last coordinate of the last exon corresponds to the last coordinate of the gene (when comparing the two files).

ADD REPLY • link updated 5.8 years ago by Ram 45k • written 9.8 years ago by monteiro.s.rita • 0