I have a VCF file from a GDM patient it contains snps and indels from 1 sample only and i want to split it so that it size reduce to the size required by these tools online without getting the VCF format disruption. Any suggestions?
I have a VCF file from a GDM patient it contains snps and indels from 1 sample only and i want to split it so that it size reduce to the size required by these tools online without getting the VCF format disruption. Any suggestions?
Shorter:
bgzip variants.vcf
tabix variants.vcf.gz
tabix -l variants.vcf.gz | parellel -j 5 'tabix -h variants.vcf.gz {} > {}.vcf'
# annotate, creating annot_chr*.vcf
bcftools concat annot_chr*.vcf > annot_variants.vcf
From tabix manuals:
-l, --list-chroms List the sequence names stored in the index file.
I split by chromosome for things like that, using bgzip, tabix, unix commands, bcftools and gnu parallel (optional)
bgzip variants.vcf
tabix -p vcf variants.vcf.gz
zgrep -v '^#' variants.vcf.gz | cut -f1 | sort -u > chromosomes.txt
cat chromosomes.txt | parallel -j 5 --bar 'tabix variants.vcf.gz {} > {}.prevcf'
zgrep '^#' variants.vcf.gz > header
ls *.prevcf | parallel -j 5 'cat header {} > {.}.vcf'
rm *.prevcf
# annotate, creating annot_chr*.vcf
bcftools concat annot_chr*.vcf > annot_variants.vcf
WouterDeCoster some one posted a cool trick in getting chromosomes. After indexing, executing tabix -l variants.vcf.gz
would list the chromosomes in vcf.
Edit: It is Fin :).
Yes I'd definitely recommend the answer of finswimmer: C: Splitting VCF file to decrease file size to run it on VEP and wANNOVAR
Try vcftools:
for i in chr{1..22};do echo vcftools --chr $i --vcf input.vcf --recode -INFO-all --out $i.vcf;done
Remove echo when you are ready to execute.
If you are okay with gnu-parallel and vcftools, you can try this:
$ parallel --dry-run vcftools --chr {} --vcf input.vcf --recode -INFO-all --out {}.vcf ::: chr{1..22}
remove dry-run when you are ready to execute.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Related post: How to split vcf file by chromosome?
Hello S AR,
Don't forget to follow up on your threads.
If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.