Hi
I'm trying to find a way to plot the results of my blastn script (.table output file).
The output file is quite big since I had to characterize more than 200 gRNA target sites.
The file contains also a lot of duplicated hits on the subject column and i want just to have the best hits (in the blast script i selected as max e-value 0 ).
Below there is an example, as you can see for a unique gRNA I have multiple results from different subjects that refer to similar sequences, I would like to retain only the best alignment.
kmer_100288745.0 gi|57165544|gb|AY754180.1| Anopheles gambiae clone GAM_RSP30 satellite AgY477, partial sequence; junction region, complete sequence; and satellite AgY53B, partial sequence 100 25 0 0 1 25 99 123 7.23E-07 50.1
kmer_100288745.0 gi|57165543|gb|AY754179.1| Anopheles gambiae clone GAM_RSP27 satellite AgY477, partial sequence; junction region, complete sequence; and satellite AgY53B, partial sequence 100 25 0 0 1 25 99 123 7.23E-07 50.1
kmer_100288745.0 gi|57165542|gb|AY754178.1| Anopheles gambiae clone GAM_RSP22 satellite AgY477, partial sequence; junction region, complete sequence; and satellite AgY53B, partial sequence 100 25 0 0 1 25 99 123 7.23E-07 50.1
kmer_100288745.0 gi|57165541|gb|AY754177.1| Anopheles gambiae clone GAM_RSP24 satellite AgY477, partial sequence; junction region, complete sequence; and satellite AgY53B, partial sequence 100 25 0 0 1 25 98 122 7.23E-07 50.1
kmer_100288745.0 gi|57165540|gb|AY754176.1| Anopheles gambiae clone GAM_RSP3 satellite AgY477, partial sequence; junction region, complete sequence; and satellite AgY53B, partial sequence 100 25 0 0 1 25 99 123 7.23E-07 50.1
do you have any suggestions on how to use R to have a nice plot?
Are there any tools or packages?
Thanks!
What sort of information you would like to plot? As in distribution of e-value, percent identity, coverage, etc.?
I would like to plot a sort of map showing how many times I found a specific ID, to give me the possibility to compare different matches from different kmers.
Column1
kmer-1 --> match Sat1 1000 times
Column2
kmer-2 --> match Sat3 300 times
and so on....
It could be a diagram or also a density plot, whatever basically.