I got a Table-A from tBLASTN a small segment of it is given below. I want to filter the data. I want to ask if in each row, column-2 (subject ID), 9(s.start) and 10(s.end) have same values, i.e., if the rows are redundant than keep only the row having lowest e-value. Can anybody help me with a R-script for this..?
query id subject id % identity alignment length mismatches gap opens q. start q. end s. start s. end evalue bit score
Chr1_FK1 ADDD02134481.1 89.77 88 9 0 11 98 1 264 7.00E-23 92.4
Chr2_FK1 ADDD02134481.1 75 88 22 0 11 98 1 264 3.00E-20 85.5
Chr2_FK3 ADDD02134481.1 76.14 88 21 0 11 98 1 264 6.00E-21 87.4
ENSGALG00000028120 ADDD02134481.1 76.14 88 21 0 11 98 1 264 5.00E-21 87.4
Chr2_FK1 ADDD02198275.1 78.41 88 19 0 11 98 1 264 3.00E-21 87.4
Chr2_FK3 ADDD02198275.1 79.55 88 18 0 11 98 1 264 5.00E-22 89.7
ENSGALG00000028120 ADDD02198275.1 78.41 88 19 0 11 98 1 264 4.00E-22 89.7
ChrUn2_FK2 ADDD02198275.1 78.41 88 19 0 11 98 1 264 2.00E-21 87.8
ChrUn2_FK3 ADDD02198275.1 78.41 88 19 0 11 98 1 264 3.00E-21 87.4
ChrUn2_FK4 ADDD02198275.1 79.55 88 18 0 11 98 1 264 5.00E-22 89.7
ENSGALG00000027303 ADDD02271118.1 89.69 97 10 0 4 100 1 291 3.00E-41 139
Chr27_FK34 ADDD02271118.1 88.66 97 11 0 4 100 1 291 5.00E-40 136
Chr27_FK35 ADDD02271118.1 88.66 97 11 0 4 100 1 291 1.00E-40 137
Chr27_FK36 ADDD02271118.1 88.66 97 11 0 4 100 1 291 1.00E-40 137
I used the following script, it shows the following error
Please check the headers once you import the data and tweak the query, substituting.the header names in my query with the ones actually found in the data frame.