Question

FeatureCounts Output contains gene_id or transcript_id?

0

Entering edit mode

3.5 years ago

Bane ▴ 30

Hello! 1 month ago i completed a transcriptome study. While making the normalization step, i used featurecounts. My featurecounts code was;

featureCounts -a Beta_vulgaris_ncbi.gtf -g transcript_id -o results.txt

The gtf file downloaded from NCBI database. I wanted the transcript_id but my result table column says gene_id. I did not realize that until yesterday. Now i am trying to make annotation with biomaRt and i am stuck. I dont know which filter i will use in biomaRt because i dont know if i have gene_id or transcript_id. Tried both filters and non of them worked. So how can i figure it out that what kind of ID numbers they are..

My FeatureCounts Result

biomaRt DESeq FeatureCounts Annotation • 3.8k views

ADD COMMENT • link 3.5 years ago by Bane ▴ 30

1

Entering edit mode

Hi Bane

rna-XM_XXXXXXXX are transcripts ID.

#gtf-version 2.2                                
#!genome-build RefBeet-1.2.2                                
#!genome-build-accession NCBI_Assembly:GCF_000511025.2                              
#!annotation-source NCBI Beta vulgaris subsp. vulgaris Annotation Release 101                               
NC_025812.2 Gnomon  gene    9973    24201   .   +   .   gene_id "LOC104882799"; db_xref "GeneID:104882799"; gbkey "Gene"; gene "LOC104882799"; gene_biotype "protein_coding"; 
NC_025812.2 Gnomon  exon    9973    10441   .   +   .   gene_id "LOC104882799"; transcript_id "XM_010673718.2"; db_xref "GeneID:104882799"; gbkey "mRNA"; gene "LOC104882799"; model_evidence "Supporting evidence includes similarity to: 2 ESTs, 9 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 24 samples with support for all annotated introns"; product "2,3-bisphosphoglycerate-independent phosphoglycerate mutase, transcript variant X1"; exon_number "1"; 
NC_025812.2 Gnomon  exon    11579   11706   .   +   .   gene_id "LOC104882799"; transcript_id "XM_010673718.2"; db_xref "GeneID:104882799"; gbkey "mRNA"; gene "LOC104882799"; model_evidence "Supporting evidence includes similarity to: 2 ESTs, 9 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 24 samples with support for all annotated introns"; product "2,3-bisphosphoglycerate-independent phosphoglycerate mutase, transcript variant X1"; exon_number "2"; 
NC_025812.2 Gnomon  exon    12998   13148   .   +   .   gene_id "LOC104882799"; transcript_id "XM_010673718.2"; db_xref "GeneID:104882799"; gbkey "mRNA"; gene "LOC104882799"; model_evidence "Supporting evidence includes similarity to: 2 ESTs, 9 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 24 samples with support for all annotated introns"; product "2,3-bisphosphoglycerate-independent phosphoglycerate mutase, transcript variant X1"; exon_number "3"; 
NC_025812.2 Gnomon  exon    13280   13483   .   +   .   gene_id "LOC104882799"; transcript_id "XM_010673718.2"; db_xref "GeneID:104882799"; gbkey "mRNA"; gene "LOC104882799"; model_evidence "Supporting evidence includes similarity to: 2 ESTs, 9 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 24 samples with support for all annotated introns"; product "2,3-bisphosphoglycerate-independent phosphoglycerate mutase, transcript variant X1"; exon_number "4"; 
NC_025812.2 Gnomon  exon    19664   20088   .   +   .   gene_id "LOC104882799"; transcript_id "XM_010673718.2"; db_xref "GeneID:104882799"; gbkey "mRNA"; gene "LOC104882799"; model_evidence "Supporting evidence includes similarity to: 2 ESTs, 9 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 24 samples with support for all annotated introns"; product "2,3-bisphosphoglycerate-independent phosphoglycerate mutase, transcript variant X1"; exon_number "5"; 
NC_025812.2 Gnomon  exon    21206   21398   .   +   .   gene_id "LOC104882799"; transcript_id "XM_010673718.2"; db_xref "GeneID:104882799"; gbkey "mRNA"; gene "LOC104882799"; model_evidence "Supporting evidence includes similarity to: 2 ESTs, 9 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 24 samples with support for all annotated introns"; product "2,3-bisphosphoglycerate-independent phosphoglycerate mutase, transcript variant X1"; exon_number "6"; 
NC_025812.2 Gnomon  exon    21482   21577   .   +   .   gene_id "LOC104882799"; transcript_id "XM_010673718.2"; db_xref "GeneID:104882799"; gbkey "mRNA"; gene "LOC104882799"; model_evidence "Supporting evidence includes similarity to: 2 ESTs, 9 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 24 samples with support for all annotated introns"; product "2,3-bisphosphoglycerate-independent phosphoglycerate mutase, transcript variant X1"; exon_number "7"; 
NC_025812.2 Gnomon  exon    23113   23268   .   +   .   gene_id "LOC104882799"; transcript_id "XM_010673718.2"; db_xref "GeneID:104882799"; gbkey "mRNA"; gene "LOC104882799"; model_evidence "Supporting evidence includes similarity to: 2 ESTs, 9 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 24 samples with support for all annotated introns"; product "2,3-bisphosphoglycerate-independent phosphoglycerate mutase, transcript variant X1"; exon_number "8";

ADD REPLY • link 3.5 years ago by andres.firrincieli 3.8k

0

Entering edit mode

Thanks a lot. Now i feel pity that, why i did not think to look at the annotation file. Thank you a lot

ADD REPLY • link 3.5 years ago by Bane ▴ 30

0

Entering edit mode

Because you used -g transcript_id, featureCounts used the transcript ID's fro annotation file in your results table. By default (if you had not provided -g option) it would have used gene_id. You are also not summarizing the counts at gene level (-t exon) so your counts are at the exon level.

ADD REPLY • link 3.5 years ago by GenoMax 148k

1

Entering edit mode

Featurecounts first coloum is always named geneid, it doesn't change to what was specified on the command line. However, when you look at the first line it preserves the command call parameters, out of experience one may trust what is specified there.

ADD REPLY • link 3.5 years ago by Michael 55k

0

Entering edit mode

Oh thats a relief, thank you.

I do have another question now. Does the NCBI transcript and gene ids competible with ensembl or pytozome? Because when i try to convert my NCBI transcript_id to ensembl gene_id or anything that can give me a clue, the biomaRt shows no result. I manually search some of the NCBI transcript_ids in the annotation file that i downloaded from pytozome, and found no result.

How does the biomaRt work? I don't think biomaRt is simply making table of all kind of IDs and comparing the given data with its database.

ADD REPLY • link 3.5 years ago by Bane ▴ 30

0

Entering edit mode

XM* ID's are predicted transcript ID's. They are not going to be directly translatable to Ensembl and phytozome. If you are interested in Ensembl ID's it may be best use Ensembl's version of the beet genome and corresponding annotation file. You can find that here.

ADD REPLY • link 3.5 years ago by GenoMax 148k

0

Entering edit mode

but i deleted all the earlier data. I just have the featurecounts results. should i make the whole normalization steps from the beginning?

ADD REPLY • link 3.5 years ago by Bane ▴ 30