Entering edit mode
5.2 years ago
mhfk2901
▴
20
Hello everyone. This is my first time posting questions here. I am new to the field of Bioinformatics and I don't really know how to use command lines well. I am trying to identify the position of START and STOP codon from my AUGUSTUS prediction but any attempt to do so using grep has been a failure so far. I did go through related questions but I don't really understand the command lines given.
Example of my gff output
# Predicted genes for sequence number 73 on both strands
# start gene g3
254 AUGUSTUS gene 1 491 0.98 - . g3
254 AUGUSTUS transcript 1 491 0.98 - . g3.t1
254 AUGUSTUS intron 109 168 1 - . transcript_id "g3.t1"; gene_id "g3";
254 AUGUSTUS CDS 1 108 0.98 - 1 transcript_id "g3.t1"; gene_id "g3";
254 AUGUSTUS CDS 169 491 1 - 0 transcript_id "g3.t1"; gene_id "g3";
254 AUGUSTUS start_codon 489 491 . - 0 transcript_id "g3.t1"; gene_id "g3";
# protein sequence = [MSSRSLAALAVVGAVALCARSASASGVTSDTSGIAGQTYDYIVVGAGLAGTTVAARLAENSAISILLIEAGGDDRGNS
# QVYDIYEYAQAFNGPLDWAWQSDRGKVLHGGKTLGGSSSINGGHWTRGLNAQYDAMSSLLEDSEQ]
# end gene g3
###
#
If I can just extract the line 254 AUGUSTUS start_codon 489 491 . - 0 transcript_id "g3.t1"; gene_id "g3"; , that would be good enough for me.
you can simply use
grep
command to extract the line.Yes, Prakash. I just realized that it was so easy to do so. Thank you!
This problem has been solved. Thank you