Understanding the VCF files
1
2
Entering edit mode
2.1 years ago
Bane ▴ 30

Dear Biostars, please help us to understand something. When and why to create a VCF file? Is it mandatory to create a VCF file for every single project or only when someone uses a different reference genome? Me and my colleague are making SNP analysis from Mutation Accumulation Lines (MAL) experiment. We are following a workflow and in the early MAL experiments, people used a VCF file that has been created for reference genome version 3.0 but we are using reference genome 4.0 and therefore we are creating a new VCF file. What we are wondering is that, can we also use that VCF file that has been created for reference genome 4.0 for another MAL experimental data by using the same reference genome? Is that file strongly dependent on reference genome or the data that we are using?

MAL VCF • 3.8k views
ADD COMMENT
0
Entering edit mode

This is if you have exactly the same sample input for the vcf files and the same reference genome, if different, make new vcf files:

a VCF file is a variant calling format, it contains all of the variants called between a sample and a reference. If you are using the same reference they used to call their variants, then yes you can use the same VCF but if you are using a newer reference then no do not use the same VCF file of the older version, there may be differences between the two reference genomes and therefore the VCF file.

ADD REPLY
0
Entering edit mode

Thank you very much. I was hoping to hear that. You made my day!

ADD REPLY
1
Entering edit mode
2.1 years ago
Jeremy ▴ 930

A VCF file contains variants that have been detected after mapping reads to a reference genome. This file is totally dependent on the data that you are using, so for each project, you can use the same reference genome, but you'll need to make a new VCF file. See the links below for an explanation of the VCF file format and a sample GATK workflow for finding germline variants. Note that the VCF file is produced after aligning FASTQ reads to a reference genome, which creates a BAM (or SAM) file, which can then be used to call variants, producing a VCF file.

VCF Format

GATK Workflow

ADD COMMENT
0
Entering edit mode

That is a bit different answer from what Amy said up there. So here is another question, is it possible to create a vcf file by using both pool sequencing and individual sequencing data? Because that is what we are struggling for 2 months.

ADD REPLY
0
Entering edit mode

I believe you can use both. For example, see the following paper:

Pool-seq Benchmarking

ADD REPLY
0
Entering edit mode

I guess the question is also, do you have different samples to them?

ADD REPLY
0
Entering edit mode

Yes, pool seq contains the ancestral population, while the individual seqs contains the fifth generation

ADD REPLY

Login before adding your answer.

Traffic: 1926 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6