Hello Everyone,
I would like to modify a reference genome fasta file using a list of position in bed file format. The bed file contains positions like this:
chr1 17716 C G
chr1 17925 A G
chr1 18115 C T
So for example I would like to change the position at chr1 17716 from C to G in the fasta file. I have seen some python packages and a GATK tool for modifying a fasta file using a list of VCF positions but was wondering if there was any tool that would do this using a bed file as input rather than a VCF. Or would it be better to convert the bed file to VCF format by adding a VCF header (I'm not sure if this would work?).
Thanks in advance for your help.
this works well thank you! I am trying to run it changing over a million positions in a 2gb fasta file. Do you recommend any change in approach for this? I'm a little concerned as it produces no intermediate output that I can't tell how long it will take. For example would sorting the example.bed speed it up or does it search through the file for each position without sorting making a difference. The only improvement I can think of would be to sort and split the fasta and bed files into smaller pieces, run independently and then merge at the end.