extraction of original gene iDs from reference annotation file
0
0
Entering edit mode
6.4 years ago

hi,

i have used stringtie for the transcript assembly. stringtie is assigning its own labels (i.e gene IDs and transcript IDs) whle I need original gene IDs. can someone kindly suggest the way to get original IDs for the assembed transcripts from genome annotation file?? the stringtie output and genome annotation look like this:

stringtie output file:;

chr1    StringTie   transcript  328661  330868  1000    +   .   gene_id "MSTRG.4"; transcript_id "MSTRG.4.1"; 
chr1    StringTie   exon    328661  329729  1000    +   .   gene_id "MSTRG.4"; transcript_id "MSTRG.4.1"; exon_number "1"; 
chr1    StringTie   exon    329840  330067  1000    +   .   gene_id "MSTRG.4"; transcript_id "MSTRG.4.1"; exon_number "2"; 
chr1    StringTie   exon    330758  330868  1000    +   .   gene_id "MSTRG.4"; transcript_id "MSTRG.4.1"; exon_number "3"; 
chr1    StringTie   transcript  580963  583751  1000    -   .   gene_id "MSTRG.5"; transcript_id "MSTRG.5.1"; 
chr1    StringTie   exon    580963  582109  1000    -   .   gene_id "MSTRG.5"; transcript_id "MSTRG.5.1"; exon_number "1"; 
chr1    StringTie   exon    583479  583751  1000    -   .   gene_id "MSTRG.5"; transcript_id "MSTRG.5.1"; exon_number "2";

genome annotation file

chr4    GLEAN   mRNA    123284514   123288477   0.999991    -   .   ID=Cotton_A_18927_BGI-A2_v1.0;Name=Cotton_A_18927;source_id=CottonA_GLEAN_10022228;identical_support_id=CUFF67.1103.1;evid_id=Cot030308.1
chr4    GLEAN   CDS 123288376   123288477   .   -   0   Parent=Cotton_A_18927_BGI-A2_v1.0
chr4    GLEAN   CDS 123287662   123287826   .   -   0   Parent=Cotton_A_18927_BGI-A2_v1.0
chr4    GLEAN   CDS 123287427   123287536   .   -   0   Parent=Cotton_A_18927_BGI-A2_v1.0
chr4    GLEAN   CDS 123287129   123287237   .   -   1   Parent=Cotton_A_18927_BGI-A2_v1.0
chr4    GLEAN   CDS 123286939   123287051   .   -   0   Parent=Cotton_A_18927_BGI-A2_v1.0
chr4    GLEAN   CDS 123286180   123286330   .   -   1   Parent=Cotton_A_18927_BGI-A2_v1.0
chr4    GLEAN   CDS 123284514   123285671   .   -   0   Parent=Cotton_A_18927_BGI-A2_v1.0
chr9    GLEAN   mRNA    17802711    17803334    1   +   .   ID=Cotton_A_16149_BGI-A2_v1.0;Name=Cotton_A_16149;source_id=CottonA_GLEAN_10030787;evid_id=Cot023903.1
chr9    GLEAN   CDS 17803146    17803334    .   +   0   Parent=Cotton_A_16149_BGI-A2_v1.0
chr9    GLEAN   CDS 17802984    17803035    .   +   1   Parent=Cotton_A_16149_BGI-A2_v1.0
chr9    GLEAN   CDS 17802711    17802862    .   +   0   Parent=Cotton_A_16149_BGI-A2_v1.0

thanks in anticipation

rna-seq • 2.3k views
ADD COMMENT
0
Entering edit mode

Can you please upload the StringTie syntax?

ADD REPLY
0
Entering edit mode

yes please

stringtie <aligned_reads.bam> [options]*

and here is its manual

http://ccb.jhu.edu/software/stringtie/index.shtml?t=manual

ADD REPLY
0
Entering edit mode

may i know the answer for the above query

ADD REPLY

Login before adding your answer.

Traffic: 1750 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6