Entering edit mode
8.8 years ago
SGMS
▴
130
Hi all,
I am trying to extract SNPs from the vcf file of the 1000G using bcftools. I managed to extract the SNPs in a separate file, but I want to get the SNPs only in the regions which I'm interested in. I found that -R option provides me with this but I tried it in several ways and I didn't make it work. So the following works:
bcftools query -f'%CHROM\t%POS\t%REF,%ALT\n' ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz > extraction.txt
But then the following does not:
bcftools query -f'%CHROM\t%POS\t%ID\t%REF,%ALT\n' -R regions.bed ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz > extraction.txt
I'm suspecting that the problem is the bed file which initially was a .txt file. The original format was:
1:150748683-150800917
1:151758546-151824348
1:154520256-154572353
and now:
1:150718683-150830917 150718682 150830917
1:151728546-151854348 151728545 151854348
1:154490256-154602353 154490255 154602353
If someone could give me a hint, that would be great.
Thank you!
Try tab-delimited chrom, start, end and possibly identifier.
Did you look at tabix?
Loop over a set of regions in your bed file.
Sorry, I'm not sure whether I got this.. My initial file has the chr:start-end format and I don't know how to turn the bed file into a format accepted by bcftools..
@StatGen Its not for bcftools solution. Its is for your entire task to create a new VCF file with a set of regions in bed file.
Hi venu,
Thanks.. The task though is not to create a new vcf with a set of regions in a bed file.. I want to extract the SNPs from the 1000G vcf, whose positions overlap with the regions I will specify.. My regions though are in a text format which I turned into a bed format which is most probably not acceptable by the bcftools..
Can you just show how you are expecting the output file?
After I use bcftools I'm expecting:
i.e
CHR POS ID REF,ALT
Thanks
Select required fields with
cut
.