I need an script to select specific sequences among many results of a local blast output file. Briefly, I have an blast output file containing many alignments in different frame shifts with E-value and bit scores. Each sequence has different frame shift results.
Now, I want to select just the frame shifts with the highest score in each alignment and the best alignments and based bit score or E-value. No limitation for the type of file.
Anyone can introduce me a way to learn how prepare such scripts as I am new in Perl scripting, or if there is a file to be edited for my goal.
Thank you in advance.
Don't need to be in perl.
If you're working with tab separated values (so outfmt 6) , you can easly do a filter with awk.
Assuming evalue is on the 4th column, thìs command will print only lines where evalue is lower than 0.02
awk -F'\t' '$4 < 0.02 {print ;}'
Same things could be done with every params. Remember always to declare which is the field separator, said after "-F" arguments.
Field starts to be counted from $1, because $0 is the whole line.
To get just the first results, assuming your blast query reports output from most representative record to the less representative one, you can do again a trick with awk. $1 is used assuming your query id is on the 1st field.
And why does it have to be Perl?
Why not put the whole result in a pandas dataframe in Python, then filter it as you want ? Not easy to create a dataframe in Perl btw