Hi everyone,
I have a list of variants (SNPs; within a vcf file) that I'm trying to design allele specific PCR primers for (using WASP-https://bioinfo.biotec.or.th/WASP).
The input required for WASP is a fasta file. The problem is that my reference sequence (GRCm38) is massive, so when I've created the fasta file using bcftools:
cat GRCm38_68.fa | vcf-consensus vcf_file.vcf.gz > out.fa
The output file is 2.8GB and the sequences are of entire chromosomes.
Is there an any way to get say the nearest 20 bps (both up and downstream) from each variant which can be converted to the fasta format.
Kind Regards,
Kyle
Yep that works - thank you Pierre!
I'm working on getting the co-ordinates of the variants via:
bcftools query -f '%CHROM %POS' vcf_file.vcf.gz
and then passing the co-ordinates +/- 25 bps to samtools