I am working on 1001 genomes from the 1001 genomes project. I wish to implement the VCF file into FASTA files incorporating the indels, etc to better work on the genomes. I know about GATK but is there any other tool that can help me achieve this, preferably in python environment?
First do you really mean ‘1001’ genomes or ‘1000’ genomes? Then, what do you mean by ‘implement’, do you really mean convert? Finally, it’s not clear how introducing indels into a fasta makes any sense. Why would having the information in fasta format help?
Thank you for your response. 1001 genome project is a catalog of Arabidopsis thaliana genetic variation. I want to incorporate the SNPs of the VCF into the reference genome so that I have separate fasta files for all the individual genomes corresponding to all the individual VCFs.