Contents of .gff3 files necessary for gene counts
0
0
Entering edit mode
7.5 years ago
oma219 ▴ 40

Hello,

I having second thoughts about how I made my annotation file I just wanted to get second thoughts. So originally, I had an annotation file in the .gff3 format that whenever I ran it with htseq-count or featureCounts would give me an error since not every line had a gene_id tag? So for getting gene counts, are the only lines I need the ones that are type exon and have a gene_id tag?

Here are the top lines of my annotation:

  ##gff-version 3
#!genome-build JGI ASM20922v1
#!genome-version GCA_000209225.1
#!genome-date 2007-09
#!genome-build-accession GCA_000209225.1
#!genebuild-last-updated 2007-09
NEMVEscaffold_1 JGI supercontig 1   3256212 .   .   .   ID=supercontig:NEMVEscaffold_1;Alias=DS469507.1
NEMVEscaffold_1 JGI gene    25970   37222   .   -   .   ID=gene:NEMVEDRAFT_v1g196074;biotype=protein_coding;description=Predicted protein  [Source:UniProtKB/TrEMBL%3BAcc:A7REN5];gene_id=NEMVEDRAFT_v1g196074;logic_name=jgi_nemve
NEMVEscaffold_1 JGI gene    40959   43943   .   -   .   ID=gene:NEMVEDRAFT_v1g196075;biotype=protein_coding;description=Predicted protein  [Source:UniProtKB/TrEMBL%3BAcc:A7REN6];gene_id=NEMVEDRAFT_v1g196075;logic_name=jgi_nemve
NEMVEscaffold_1 JGI pseudogene  46093   47031   .   +   .   ID=gene:NEMVEDRAFT_v1g5443;biotype=pseudogene;gene_id=NEMVEDRAFT_v1g5443;logic_name=jgi_nemve
NEMVEscaffold_1 JGI gene    47232   48589   .   +   .   ID=gene:NEMVEDRAFT_v1g237689;biotype=protein_coding;description=Predicted protein  [Source:UniProtKB/TrEMBL%3BAcc:A7REN7];gene_id=NEMVEDRAFT_v1g237689;logic_name=jgi_nemve
NEMVEscaffold_1 JGI gene    48731   52946   .   -   .   ID=gene:NEMVEDRAFT_v1g43493;biotype=protein_coding;description=Predicted protein  [Source:UniProtKB/TrEMBL%3BAcc:A7REN8];gene_id=NEMVEDRAFT_v1g43493;logic_name=jgi_nemve
sequencing rna-seq • 2.3k views
ADD COMMENT
0
Entering edit mode

I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar.

ADD REPLY
0
Entering edit mode

Can you copy / paste the errors and commands used? Which versions?

ADD REPLY
0
Entering edit mode
  ==========     _____ _    _ ____  _____  ______          _____
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
        v1.5.0-p1

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                           P /home/ubuntu/data/rnaseq/nematostella/venu ... ||
||                                                                            ||
||             Output file : DICDvenus1_counts_featureCounts.txt              ||
||             Annotations : /home/ubuntu/data/rnaseq/nematostella/venus/ ... ||
||                                                                            ||
||                 Threads : 1                                                ||
||                   Level : meta-feature level                               ||
||              Paired-end : no                                               ||
||         Strand specific : no                                               ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||       Read orientations : fr                                               ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file /home/ubuntu/data/rnaseq/nematostella/venus/genom ... ||

Warning: failed to find the gene identifier attribute in the 9th column of the provided GTF file.
The specified gene identifier attribute is 'gene_id'
The attributes included in your GTF annotation are 'Name=EDO49941-1;Parent=transcript:EDO49941;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=EDO49941-1;rank=1'

||    Features : 160436                                                       ||
||    Meta-features : 1                                                       ||
||    Chromosomes/contigs : 4910                                              ||
||                                                                            ||
|| Process BAM file /home/ubuntu/data/rnaseq/nematostella/venus/alignment ... ||
||    Paired-end reads are included.                                          ||
||    Assign reads to features...                                             ||
||    Total reads : 59521278                                                  ||
||    Successfully assigned reads : 29642258 (49.8%)                          ||
||    Running time : 0.82 minutes                                             ||
||                                                                            ||
||                         Read assignment finished.                          ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//
ADD REPLY
0
Entering edit mode

That's a warning. Not an error.

ADD REPLY

Login before adding your answer.

Traffic: 2458 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6