Question

Contents of .gff3 files necessary for gene counts

0

Entering edit mode

7.5 years ago

oma219 ▴ 40

Hello,

I having second thoughts about how I made my annotation file I just wanted to get second thoughts. So originally, I had an annotation file in the .gff3 format that whenever I ran it with htseq-count or featureCounts would give me an error since not every line had a gene_id tag? So for getting gene counts, are the only lines I need the ones that are type exon and have a gene_id tag?

Here are the top lines of my annotation:

  ##gff-version 3
#!genome-build JGI ASM20922v1
#!genome-version GCA_000209225.1
#!genome-date 2007-09
#!genome-build-accession GCA_000209225.1
#!genebuild-last-updated 2007-09
NEMVEscaffold_1 JGI supercontig 1   3256212 .   .   .   ID=supercontig:NEMVEscaffold_1;Alias=DS469507.1
NEMVEscaffold_1 JGI gene    25970   37222   .   -   .   ID=gene:NEMVEDRAFT_v1g196074;biotype=protein_coding;description=Predicted protein  [Source:UniProtKB/TrEMBL%3BAcc:A7REN5];gene_id=NEMVEDRAFT_v1g196074;logic_name=jgi_nemve
NEMVEscaffold_1 JGI gene    40959   43943   .   -   .   ID=gene:NEMVEDRAFT_v1g196075;biotype=protein_coding;description=Predicted protein  [Source:UniProtKB/TrEMBL%3BAcc:A7REN6];gene_id=NEMVEDRAFT_v1g196075;logic_name=jgi_nemve
NEMVEscaffold_1 JGI pseudogene  46093   47031   .   +   .   ID=gene:NEMVEDRAFT_v1g5443;biotype=pseudogene;gene_id=NEMVEDRAFT_v1g5443;logic_name=jgi_nemve
NEMVEscaffold_1 JGI gene    47232   48589   .   +   .   ID=gene:NEMVEDRAFT_v1g237689;biotype=protein_coding;description=Predicted protein  [Source:UniProtKB/TrEMBL%3BAcc:A7REN7];gene_id=NEMVEDRAFT_v1g237689;logic_name=jgi_nemve
NEMVEscaffold_1 JGI gene    48731   52946   .   -   .   ID=gene:NEMVEDRAFT_v1g43493;biotype=protein_coding;description=Predicted protein  [Source:UniProtKB/TrEMBL%3BAcc:A7REN8];gene_id=NEMVEDRAFT_v1g43493;logic_name=jgi_nemve

sequencing rna-seq • 2.3k views

ADD COMMENT • link updated 4.9 years ago by Biostar 20 • written 7.5 years ago by oma219 ▴ 40

0

Entering edit mode

I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar.

ADD REPLY • link 4.9 years ago by WouterDeCoster 47k

0

Entering edit mode

Can you copy / paste the errors and commands used? Which versions?

ADD REPLY • link 7.5 years ago by h.mon 35k

0

Entering edit mode

  ==========     _____ _    _ ____  _____  ______          _____
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
        v1.5.0-p1

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                           P /home/ubuntu/data/rnaseq/nematostella/venu ... ||
||                                                                            ||
||             Output file : DICDvenus1_counts_featureCounts.txt              ||
||             Annotations : /home/ubuntu/data/rnaseq/nematostella/venus/ ... ||
||                                                                            ||
||                 Threads : 1                                                ||
||                   Level : meta-feature level                               ||
||              Paired-end : no                                               ||
||         Strand specific : no                                               ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||       Read orientations : fr                                               ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file /home/ubuntu/data/rnaseq/nematostella/venus/genom ... ||

Warning: failed to find the gene identifier attribute in the 9th column of the provided GTF file.
The specified gene identifier attribute is 'gene_id'
The attributes included in your GTF annotation are 'Name=EDO49941-1;Parent=transcript:EDO49941;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=EDO49941-1;rank=1'

||    Features : 160436                                                       ||
||    Meta-features : 1                                                       ||
||    Chromosomes/contigs : 4910                                              ||
||                                                                            ||
|| Process BAM file /home/ubuntu/data/rnaseq/nematostella/venus/alignment ... ||
||    Paired-end reads are included.                                          ||
||    Assign reads to features...                                             ||
||    Total reads : 59521278                                                  ||
||    Successfully assigned reads : 29642258 (49.8%)                          ||
||    Running time : 0.82 minutes                                             ||
||                                                                            ||
||                         Read assignment finished.                          ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//