Is there a quick method to extract regularly-spaced features/SNPs from a VCF file?
1
0
Entering edit mode
7.9 years ago
mmats010 ▴ 80

I am attempting to build linkage map using a list of variants from a sexual population we have whole genome sequenced. Marker density is the least of our problems, so I would like to reduce the ~400,000 SNPs I have high confidence in to ~2,000. However, I would like them to be regularly spaced.

In my head, this means: Select SNP, move 100kb down the contig, select the next SNP after the 100kb interval, move another 100kb, select another SNP, repeat. We aren't quite sure of the recombination rate, and some of my contigs are shorter than 100kb (and I'd like at least 1, maybe 2, markers from them).

I know I could tell GATK to give me a random subset of my marker population, but I'd like to do something more methodical. The window based clustering criteria in VariantFiltration also don't seem very useful, since it is a sliding window and not a binned window. Does anyone know of a function of a common toolset which can perform these functions? I'm not proficient in perl or python, so I'm unsure how to write my own script.

I have a .intervals file and a multisample VCF file, as well as a .txt table of everything in the VCF file and I intend to load the data into JoinMap to perform the actual mapping.

Thanks,

Mike

SNP gatk vcf sequencing mapping • 2.3k views
ADD COMMENT
3
Entering edit mode
7.9 years ago

I'm not sure I fully understand why you want to arbitrarily select SNPs at a given distance. But I think you may be looking for the "--thin" flag in VCFtools. Also take a look at this. I have never used it, but I hope it helps.

ADD COMMENT
1
Entering edit mode

Thanks, this did the trick. I mostly want to trim down because JoinMap 4.1 can't handle a monster dataset like mine (JoinMap 5.0 will, however). So I want to give it at least a manageable dataset for it to trim down internally. I also want my markers to be at least somewhat evenly space.

ADD REPLY

Login before adding your answer.

Traffic: 2499 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6