I did the pan-genome
analysis, from which I got the core, accessory, and unique gene sequences. Now, I need to know specifically which are strains shared more genes among them in the accessory gene cluster
. Hence, I opted for a strategy, where I firstly extracted all the gene sequences for each strain from accessory gene cluster
and saved them in a single fasta file. Then I did ANI
analysis, based on the ANI value shall I consider that the Top ANI
value showed pairs are shared more genes among them? or should I go for blastn
?
I need to know, what is the difference between ANI
and blastn
?
Why don't you run a cluster analysis on the
accessory gene cluster frequency table (binary matrix 1,0 aka presence,absence)
to find which strains share a similaraccessory pan-genome
?@andres.firrincieli I have used BPGA pipeline for my analysis, in which output does not have the following files.
accessory gene cluster frequency table (binary matrix 1,0 aka presence,absence)
. InBPGA
I can obtaincore sequences, accessory sequences and unique sequences
as three individual files. All the strain sequences are clustered in a single individual file, that is where I am facing this problem.