Hello all,
What is the most efficient way to retrieve many subsets of regions from a vcf.gz file? I have about 1000 10kb regions that I need to extract from a whole genome vcf file. I think tabix is the best way to do this but I haven't been able to understand to use it on a large scale to retrieve hundreds of regions. For instance can one input a file with a list of regions (eg 1:11345-112345 for a position on chrom1) and automate the process?
Thanks very much for any help and comments in advance!
Thanks, very helpful
@Rm: can you tell us where you have seen this? The online manual is outdated and from version 0.1.1. I have version 0.2.5 (r964) with the option. It wouldn't make much sense to me to remove that option...
Dropped from latest version ( to read the locations from a file)
@Rm: You are right. Now that's a strange move... The latest 1.1 version I have just downloaded (tabix is now part of HTSlib) doesn't allow for the -B option although the manual (again out-of-date) still indicates it does: http://www.htslib.org/doc/tabix-1.1.html
Possible solution:
I don't understand this solution..Me too I have the same problem with Tabix v1.1 and I was not able to install previous versions such as v0.2.5, make function doesn't work. Did you find any substitutive (in Tabix v1.1) for
-B
option?Since version 1.2, tabix has
-R
option instead. See my answer for more info.