I wonder if someone could suggest me a tool for manipulation with fasta files as well as for calculation of distances and making the phylogenetic trees in R.
My specific task is the following: I got the VCF recoded to multi-fasta file so the header corresponds to individual and the each SNP is presented by a nucleotide (in case if the nucleotide hasn't been read in the position it is N and in case of heterozygous site it is R, M, S, etc), the lengths of sequences is similar for each individual (in other words it is kind of already "aligned" fasta). Then I would like to perform the following manipulations: I want to upload the fasta as a dataframe so the individuals would be row names and the nucleotide will be present in column cells, so it would be possible to operate with them. For example: remove all heterozygous SNPs or positions with N etc. After that, I would like to calculate the distances (playing with methods here) between the samples and make an nj tree with bootstrap support.
I tried to do it with ape/phangorn but still with no success (I tried to load fasta a as dataframe to operate with it but failed), maybe my idea is totally wrong an I should choose another tool or approach. If somebody could suggest some tutorials I would be grateful.
If you needed individual variant information, why recode the VCF to a multi-fasta? Why not work directly on the VCF file?
And why do you wish to use R to get from what you have to a phylogenetic tree? Why not look for available tools that could go from VCF to tree?
Yeah, I understand that you mean tools like SNPhylo and I agree that it's ok. But the key thing in my case is the ability to manipulate the SNPs, namely remove ones (heterozygous or unread SNPs) and look for changes in the tree.
So why not work directly on the VCF? You can manipulate the VCF to get variants to a
./.
state and use that to re-generate the tree.What software should I use in this case?