Create a UCSC chain file from a VCF
2
4
Entering edit mode
10.6 years ago
rubic ▴ 270

Hi,

I have a VCF file (which contains indels as well as SNPs) obtained from an individual's DNA (from a whole genome microarray) with respect to the reference genome. Is there any tool out there the will produce a UCSC chain file wrt to the reference genome given the VCF file?

genome vcf chain • 5.1k views
ADD COMMENT
0
Entering edit mode

hu ? a chain file map the regions between two assemblies. How would you use a VCF file to build this chain file ?

ADD REPLY
0
Entering edit mode

The indels in the VCF file are in essence how the 'assembled' genome of the individual differs from the reference assembly, in coordinates. So given these indels, and hence a chain file, a personal genome can be created.

ADD REPLY
1
Entering edit mode

A VCF file is a list of local differences, many of which may be inexact, especially longer variations. It is very unlikely that they would contain information at sufficient accuracy to create an entire assembly. These files are not designed to correct for cumulative errors, yet when you remap intervals a single error can affect all subsequent coordinates.

ADD REPLY
0
Entering edit mode

That may be true or not depending on what data the VCF was generated from.

I'm just asking whether anyone knows of a script/tool out there that takes a VCF (produced for a single individual sample) and creates a chain file based on the indels in the VCF (the homozygous indels if we are to be acurate), to save me writing that myself. That's all.

ADD REPLY
0
Entering edit mode

There are two different questions here really.

  1. Is, theoretically speaking, the VCF format sufficiently well specified to unabmigously describe all the differences between genomes
  2. Do current tools fill in the VCF format at sufficient detail to allow a full genome reconstruction of acceptable accuracy

I don't actually know the answer to either of these. But my guess would be a no.

But I'd be happy to learn more on what actually happens in practice.

ADD REPLY
1
Entering edit mode
10.6 years ago
rubic ▴ 270

there's a very simple answer to my question and it's called AlleleSeq package.

ADD COMMENT
0
Entering edit mode
7.8 years ago
Malcolm.Cook ★ 1.5k

I'm late to the QA party, but I hope a useful update for the contemporary searcher is the following list of options:

  • vcf-consensus (part of VCFtools)- "Apply VCF variants to a fasta file to create consensus sequence."

  • FastaAlternateReferenceMaker (part of Genome Analysis Toolkit - GATK) "Given a variant callset, this tool replaces the reference bases at variation sites with the bases supplied in the corresponding callset records. Additionally, it allows for one or more "snpmask" VCFs to set overlapping bases to 'N'."

  • vcf2chain - (part of g2gtools) "create a Chain file from a VCF file"

  • vcf2diploid (part of AlleleSeq) as suggested above by @rubic

ADD COMMENT

Login before adding your answer.

Traffic: 2316 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6