Individual VCF files from main VCF file
4
1
Entering edit mode
9.8 years ago
win ▴ 990

Hi all,

In the 1000 genomes project there is one large VCF file which has all the samples represented in columns.

I want to generate one VCF file for each sample, how can this be done.

Also with the script that can do this, is it possible to stream the main VCF so that I don't have to store it locally.

Thanks in advance

VCF • 8.3k views
ADD COMMENT
5
Entering edit mode
7.7 years ago

already stated here and here:

for file in *.vcf*; do
  for sample in `bcftools query -l $file`; do
    bcftools view -c1 -Oz -s $sample -o ${file/.vcf*/.$sample.vcf.gz} $file
  done
done
ADD COMMENT
2
Entering edit mode
9.8 years ago

I wrote Biostar130456 https://github.com/lindenb/jvarkit/wiki/Biostar130456

$   curl -sL "https://raw.githubusercontent.com/arq5x/bedtools2/bc2f97d565c36a82c1a0b12f570fed4398001e5f/test/map/test.vcf" |\
    java -jar dist/biostar130456.jar -x -z -p "sample.__SAMPLE__.vcf.gz" 
sample.NA00003.vcf.gz
sample.NA00001.vcf.gz
sample.NA00002.vcf.gz

$ gunzip -c sample.NA00003.vcf.gz
(...)
##source=myImputationProgramV3.1
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NA00003
chr1    10  rs6054257   G   A   29  PASS    AF=0.5;DB;DP=14;H2;NS=3 GT:DP:GQ:HQ 1/1:5:43
chr1    20  rs6040355   A   G,T 67  PASS    AA=T;AF=0.333,0.667;DB;DP=10;NS=2   GT:DP:GQ 2/2:4:35
chr1    130 microsat1   GTC G,GTCT  50  PASS    AA=G;DP=9;NS=3  GT:DP:GQ    1/1:3:40
chr2    130 microsat1   GTC G,GTCT  50  PASS    AA=G;DP=9;NS=3  GT:DP:GQ    1/1:3:40
ADD COMMENT
0
Entering edit mode

Does this code create only one VCF file with a specific sample ID or create one VCF file per sample in original file?

ADD REPLY
0
Entering edit mode

it creates one VCF file per sample in original file

ADD REPLY
0
Entering edit mode
9.8 years ago
Lee Katz ★ 3.2k

You could do something with the new bcftools v1.1 like this:

bcftools query -H pooled.vcf.gz -f '%CHROM\t%POS\t%REF\t%ALT[\t%SAMPLE=%GT]\n' --samples 'favoriteSample'

I don't think it keeps all the headers but it will give you the information you might want. If you don't want any headers at all, you can remove the -H.

ADD COMMENT
0
Entering edit mode

I wanted each sample to have its own VCF

ADD REPLY
0
Entering edit mode

The --samples argument lets you choose one sample at a time. So you'd have to run this command once per sample.

ADD REPLY
0
Entering edit mode
9.8 years ago

Post not found

ADD COMMENT

Login before adding your answer.

Traffic: 2517 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6