Splitting VCF/BCF file into seperate gene files
1
0
Entering edit mode
14 months ago
Lynne-95 ▴ 20

I have a multi-sample bcf file which I would like to split into smaller files per gene so I can use this for some downstream eQTL analysis.

I've started a bash script which pipes bcftools query -f '%SAMPLE\t%POS\t%REF\t%ALT\t%GT\n' into an awk script where I subsequently re-code the genotype, and then in theory make a multi-sample text file per gene (+/- 1MB), by running this in a while loop with a GTF file.

However this is proving difficult! And I wasn't sure if I've missed a function or package somewhere that can do this for me.

Any help is appreciated - I also can easily convert this file to vcf or split by sample depending on the approach.

genomics bcftools eqtl gwas • 651 views
ADD COMMENT
0
Entering edit mode

What exactly proving difficult? Whats the results you're getting and what you like to get?

Also you can use code sample to format your code.

ADD REPLY
1
Entering edit mode
14 months ago

I have a multi-sample bcf file which I would like to split into smaller files per gene so I can use this for some downstream eQTL analysis.

awk -F '\t' '($3=="gene") {printf("%s\t%d\t%s\n",$1,int($4)-1,$5);}' in.gtf | while read R
do
      bcftools view -O z -o "${R//[:-]/_}.vcf.gz" input.vcf.gz "${R}"
done
ADD COMMENT

Login before adding your answer.

Traffic: 2003 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6