Hello, I have 10 sample files with multiple fasta sequences in each of them. I performed an "all_vs_all" blastn for all the samples for a specific analysis. So, I have obtained output files like sample_1_vs_sample_2.tsv, sample_2_vs_sample_1.tsv and so on. Now I want to filter out all the duplicate and reverse duplicate blast hits and keep only the unique hits for each sample file against all others. For example, consider the following scenario:
query subject pident
108.tig00000003_nlr_1:522-1277 108.tig00000003_nlr_3:513-1256 92.063
108.tig00000003_nlr_1:522-1277 108.tig00000005_nlr_1:524-1267 88.243
108.tig00000003_nlr_1:522-1277 108.tig00000005_nlr_2:524-1267 85.789
108.tig00000003_nlr_2:495-1265 108.tig00000003_nlr_4:3-374 98.387
108.tig00000003_nlr_3:513-1256 108.tig00000005_nlr_1:524-1267 94.631
108.tig00000003_nlr_3:513-1256 108.tig00000003_nlr_1:522-1277 92.063
108.tig00000003_nlr_3:513-1256 108.tig00000005_nlr_2:524-1267 88.636
108.tig00000003_nlr_4:3-374 108.tig00000003_nlr_2:495-1265 98.387
108.tig00000005_nlr_1:524-1267 108.tig00000003_nlr_3:513-1256 94.631
108.tig00000005_nlr_1:524-1267 108.tig00000005_nlr_2:524-1267 88.503
108.tig00000005_nlr_1:524-1267 108.tig00000003_nlr_1:522-1277 88.243
108.tig00000005_nlr_2:524-1267 108.tig00000003_nlr_3:513-1256 88.667
108.tig00000005_nlr_2:524-1267 108.tig00000005_nlr_1:524-1267 88.533
108.tig00000005_nlr_2:524-1267 108.tig00000003_nlr_1:522-1277 85.827
108.tig00000008_nlr_1:1019-1360 108.tig00000110_nlr_1:1005-1346 88.824
108.tig00000009_nlr_5:618-1259 108.tig00000013_nlr_1:739-1290 81.703
108.tig00000010_nlr_3:1981-2334 108.tig00000010_nlr_4:1026-1376 85.507
You can see that 108.tig00000003_nlr_1:522-1277 108.tig00000003_nlr_3:513-1256 92.063
and 108.tig00000003_nlr_3:513-1256 108.tig00000003_nlr_1:522-1277 92.063
in the first and sixth line as an example of reverse duplicate. I want to keep only one hit and remove all the other occasions where it is in the subject column indicating reverse duplicate hits. How can I do that in R?
Thank you.
Wow. Thank you so much for this solution. Just FYI, I also found a solution which is as follows: