Question

Extracting START and STOP codon position from Augustus GFF

0

Entering edit mode

5.1 years ago

mhfk2901 ▴ 20

Hello everyone. This is my first time posting questions here. I am new to the field of Bioinformatics and I don't really know how to use command lines well. I am trying to identify the position of START and STOP codon from my AUGUSTUS prediction but any attempt to do so using grep has been a failure so far. I did go through related questions but I don't really understand the command lines given.

Example of my gff output

# Predicted genes for sequence number 73 on both strands
# start gene g3
254 AUGUSTUS    gene    1   491 0.98    -   .   g3
254 AUGUSTUS    transcript  1   491 0.98    -   .   g3.t1
254 AUGUSTUS    intron  109 168 1   -   .   transcript_id "g3.t1"; gene_id "g3";
254 AUGUSTUS    CDS 1   108 0.98    -   1   transcript_id "g3.t1"; gene_id "g3";
254 AUGUSTUS    CDS 169 491 1   -   0   transcript_id "g3.t1"; gene_id "g3";
254 AUGUSTUS    start_codon 489 491 .   -   0   transcript_id "g3.t1"; gene_id "g3";
# protein sequence = [MSSRSLAALAVVGAVALCARSASASGVTSDTSGIAGQTYDYIVVGAGLAGTTVAARLAENSAISILLIEAGGDDRGNS
# QVYDIYEYAQAFNGPLDWAWQSDRGKVLHGGKTLGGSSSINGGHWTRGLNAQYDAMSSLLEDSEQ]
# end gene g3
###
#

If I can just extract the line 254 AUGUSTUS start_codon 489 491 . - 0 transcript_id "g3.t1"; gene_id "g3"; , that would be good enough for me.

augustus grep gff gff3 gene prediction • 2.2k views

ADD COMMENT • link 5.1 years ago by mhfk2901 ▴ 20

1

Entering edit mode

you can simply use grep command to extract the line.

grep "start_codon" file

ADD REPLY • link 5.1 years ago by Prakash ★ 2.2k

1

Entering edit mode

Yes, Prakash. I just realized that it was so easy to do so. Thank you!

ADD REPLY • link 5.1 years ago by mhfk2901 ▴ 20

0

Entering edit mode

This problem has been solved. Thank you

ADD REPLY • link 5.1 years ago by mhfk2901 ▴ 20