Extract a list of SNPs from a BCF file
0
0
Entering edit mode
6.2 years ago
Famf ▴ 30

Hi there, I have a very large BCF file from which I want to extract a list of SNPs (also very large). The SNPs of interest are in a tab delimited file with two columns chromosome and position. I have tried the bcftools by using view function and the options -T and -R like bellow but I haven't had success.

bcftools view -T mylist.txt file.bcf -Ou -o filteredfile.bcf

Thanks in advance

bcf snp • 4.9k views
ADD COMMENT
0
Entering edit mode

but I haven't had success

What does that mean? What is the exact problem you're facing? Can you also paste the first few lines of your mylist.txt file?

ADD REPLY
0
Entering edit mode

HI Ram,

can you please share with me what should be the format of this mylist.txt file?

ADD REPLY
0
Entering edit mode

is it ok to do something like this:

vcftools --bcf gokind.bcf --snps mySNPs.txt --recode --recode-INFO-all --out SNPs_only

where mySNPs.txt looks like this:

rs12121
rs242343
rs2348724
ADD REPLY
0
Entering edit mode

What do your questions have to do with my comment?

Is it ok to do something like this

Did you try it? Did it work? Did the results match your expectations?

ADD REPLY
0
Entering edit mode

Hi,

I tried it and this is what I got:

Outputting VCF file...
Error: Expected type 7 for string. Found type 9.
Error: Expected type 7 for string. Found type 0.
Error: Expected type 7 for string. Found type 0.
Error: Expected type 7 for string. Found type 0.
Error: Expected type 7 for string. Found type 0.
Error: Expected type 7 for string. Found type 0.
After filtering, kept 0 out of a possible 3 Sites
No data left for analysis!
ADD REPLY
0
Entering edit mode

I also tried this:

bcftools view -T mylist.txt gokind2.bcf -Ou -o filteredfile.bcf

where mylist.txt was a tab separated file:

20  33371323
12  73950313
1   216957281

but I got this error:

[E::bcf_sr_regions_init] Could not parse the file mylist.txt, using the columns 1,2[,-1]
Failed to read the targets: mylist.txt

Can you please advise how mylist.txt should be formatted?

Thanks

ADD REPLY
0
Entering edit mode

Does using -R mylist.txt (instead of -T mylist.txt) work? What is the output to cat -te mylist.txt?

ADD REPLY
0
Entering edit mode

I tried what you suggested:

$ bcftools view -R mylist.txt gokind2.bcf -Ou -o filteredfile.bcf
[E::bcf_sr_regions_init] Could not parse the file mylist.txt, using the columns 1,2[,-1]
Failed to read the regions: mylist.txt

and:

 cat -te mylist.txt

gives me:

20^I33371323$
12^I73950313$
1^I216957281$
5^I174820027$
...
ADD REPLY
0
Entering edit mode

Your mylist.txt looks fine. What is the bcftools version you're using? bcftools --version should give you the version info.

ADD REPLY
0
Entering edit mode

it's this one:

bcftools --version
bcftools 1.10.2-32-ge677391
Using htslib 1.10.2-46-g9a10355
Copyright (C) 2019 Genome Research Ltd.
ADD REPLY
0
Entering edit mode

That's a new version. I have no idea what's going on here. Is there any chance you could go back to bcftools 1.9 and try this? It should not make a difference but just in case.

ADD REPLY
0
Entering edit mode

I am doing this on some cluster so ...but anyway thank you so much for debugging tips!

ADD REPLY

Login before adding your answer.

Traffic: 2350 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6