Question

Fasta with Common SNPs masked

0

Entering edit mode

10.8 years ago

robert • 0

How can I mask a sequence with SNPs depending on MAF? The sequence I am interested in is human build 37 and I'd like to mask SNPs that have frequencies of >1% or >5% in dbSNP. Is there some resource out there with common SNPs already masked?

SNP fasta • 4.1k views

ADD COMMENT • link updated 3.4 years ago by Ram 45k • written 10.8 years ago by robert • 0

Ram · Answer 1 · 2014-06-25

1

Entering edit mode

10.8 years ago

Pierre Lindenbaum 165k

get a BED file of the SNPs you want to discard http://genome.ucsc.edu/cgi-bin/hgTables?command=start group:variation All_Snp138 , filter->create->avHet

then use maskfasta to mask the reference: http://bedtools.readthedocs.org/en/latest/content/tools/maskfasta.html

ADD COMMENT • link 10.8 years ago by Pierre Lindenbaum 165k

0

Entering edit mode

Thanks, Pierre. That seems like it would work but I can't find any documentation on what the "avHet" filter is. Is that average heterozygosity?

ADD REPLY • link updated 3.4 years ago by Ram 45k • written 10.8 years ago by robert • 0

0

Entering edit mode

yes click on "table description" : "Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters."

ADD REPLY • link 10.8 years ago by Pierre Lindenbaum 165k

Ram · Answer 2 · 2014-06-25

0

Entering edit mode

10.8 years ago

Ashutosh Pandey 12k

I have never used this tool but it seems useful for what you want to achieve.

http://genomecomb.sourceforge.net/docs/cg_genome_seq.html

(This command returns the sequences of the genomic regions given in the file region file in fasta format (to stdout or to a file outfile). Regionfile is a tab delimited file with at least following columns: chromosome begin end. Repeatmasker repeats are soft masked (lower case) in the output sequences. Optionally you can hardmask repeats, and soft or hardmask known (dbsnp) variants based on frequency.)

http://hgdownload.cse.ucsc.edu/goldenPath/hg19/snp138Mask/ (Already masked reference fasta based on dbSNP)

ADD COMMENT • link updated 5.5 years ago by Ram 45k • written 10.8 years ago by Ashutosh Pandey 12k

0

Entering edit mode

Thanks, Ashutosh. I'll take a look at this! I was originally using the fastas from http://hgdownload.cse.ucsc.edu/goldenPath/hg19/snp138Mask/ but it included too many SNPs. I only want to mask the high-frequency SNPs and preferably only the SNPs which are high-frequency in Asian populations.

ADD REPLY • link updated 3.4 years ago by Ram 45k • written 10.8 years ago by robert • 0