Masking Variable Sites in a Fasta File
1
0
Entering edit mode
9.4 years ago
Jautis ▴ 580

Hi, I have a fasta file representing a reference genome and I would like to modify it to mask variable sites when I map variable sites. I'm interested in doing this because I have bisulfite reads from several related species, but BSmap and Bismark don't offer an option to mask variable sites while mapping.

The initial genome is in a fasta file. The sites I would like masked in a vcf file.

Thank you!

fasta vcf • 2.7k views
ADD COMMENT
0
Entering edit mode
9.4 years ago

If you can convert your VCF to BED format (see Converting a VCF with SNPs and indels to BED format) you can use the pyfaidx faidx command to mask your FASTA file with a special character, or as lowercase letters:

vcf2bed < variable_sites.vcf | faidx genome.fasta --bed - -m

Note that the -m and -M options will modify your FASTA file in-place, so you probably want to make a copy first.

ADD COMMENT

Login before adding your answer.

Traffic: 2119 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6