extract transcrips_id and gene_id from output cuffcampare
0
0
Entering edit mode
4.2 years ago

آHi all. I have a cuffcompare output and I want to extract the transcript_id andgene_id in column 9, which is a string, using grep or AWK. Thank you for your guidance

1   Cufflinks   exon    2899    3255    .   +   .   gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "1"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1";

1 Cufflinks exon 3354 3616 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "2"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 4357 4455 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "3"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 5457 5560 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "4"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 7136 7944 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "5"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 8028 8150 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "6"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 8408 8608 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "7"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 9210 9615 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "8"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 10102 10187 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "9"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 10274 10430 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "10"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 10504 10817 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "11"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1";

Assembly • 1.0k views
ADD COMMENT
0
Entering edit mode

1 Cufflinks exon 2899 3255 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "1"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 3354 3616 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "2"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 4357 4455 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "3"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 5457 5560 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "4"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 7136 7944 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "5"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 8028 8150 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "6"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 8408 8608 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "7"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 9210 9615 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "8"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 10102 10187 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "9"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 10274 10430 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "10"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 10504 10817 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "11"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1";enter code here

ADD REPLY
0
Entering edit mode

It's hard to tell the format of your file, but if it's in GTF format you can use this perl one liner.

perl -pe 's/.+gene_id\s\"(\w+)\".+transcript_id\s\"(\w+)\".+/$1\t$2/' file.txt
ADD REPLY
0
Entering edit mode

hello. amazing, as usual.

ADD REPLY

Login before adding your answer.

Traffic: 1980 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6