Hello
I have the blast result in table format. Below are the first three columns.
First column is the query id (in this example we have 2 queries 60317531,60317532)
Second column is the hits against the query sequence and have 3 parts
a) swiss prot id (sp|Q10CQ1|
)
b) gene name (MAD14
)
c) organism (ORYSJ
)
I would like to make the bar chart of genes which are present and how many times they appear against each query. For example for the first query (60317531)
MAD14 2 times
MAD15 1 time
AGL8 2 time
AP1 3 time
# Fields: query id subject id % identity
gi|60317531|gb|AAX18712.1| sp|Q10CQ1|MAD14_ORYSJ 84.21
gi|60317531|gb|AAX18712.1| sp|P0C5B1|MAD14_ORYSI 83.40
gi|60317531|gb|AAX18712.1| sp|Q6Q9I2|MAD15_ORYSJ 68.91
gi|60317531|gb|AAX18712.1| sp|Q42429|AGL8_SOLTU 57.20
gi|60317531|gb|AAX18712.1| sp|O22328|AGL8_SOLCO 58.00
gi|60317531|gb|AAX18712.1| sp|Q41276|AP1_SINAL 65.79
gi|60317531|gb|AAX18712.1| sp|D7KWY6|AP1_ARALL 65.79
gi|60317531|gb|AAX18712.1| sp|Q8GTF4|AP1C_BRAOB 64.21
gi|60317532|gb|AAX18713.1| sp|B4YPV4|AP1C_BRAOA 64.21
gi|60317532|gb|AAX18713.1| sp|Q96355|1AP1_BRAOT 64.21
gi|60317532|gb|AAX18713.1| sp|P0DI14|AP1_BRARP
Genes should be on the x axis and frequency of these genes on the y axis while the query id as the title of the graph.
Is there any automatic way, I can do it because I have ~40,000 queries and around ~100 hits against each query in a single file?
Thanks in advance
in R you could use strsplit on the subject_id column to extract the gene name and do a table() on it to have the number of occurences for each gene. Then a simple barplot() should work.