I have a file output from a tblastn search, in outformat 6 (tabular/ tab delimited), with the columns: subject gi, evalue, subject start, subject end. e.g.
595625618 0.0 472083 473231
341932553 3e-128 53534 54640
152022606 4e-95 2695055 2693919
388532432 0.0 840617 841774
574094067 0.0 10789 11946
I would like to generate a new file with the start and end columns modified by a fixed value. I just need a simple script to do this.
However, depending on which strand the gene is on (+ or -) will determine whether a value is subtracted or added to the start column. i.e. I wish to "look upstream". Hence if start<stop, subtract 2000bp from start to give the new start. If start>stop, add 2000bp from the start value. The new stop value will always take the value of the old start.
The evalue needn't be preserved in the new file, but the gi should.
Any pointers in scripting would be appreciated.
Thanks Ram, yes awk seems to be equipped for the job, I will give that a go.