Tabix ouputs a non-specified region VCF
1
0
Entering edit mode
6.0 years ago

I ran the following command to extract a subset of regions in my VCF file using tabix:

 tabix -h myfile.vcf.gz "11:5247360-5247664" > myfilenew.vcf.gz

For some reason, no matter how I vary my regions, one particular site: 5247358 always comes up in my output. Why is this the case ?

P.S. noob in variant calling and analysing VCF files.

SNP next-gen • 1.5k views
ADD COMMENT
0
Entering edit mode

Can you show the full VCF line that comes up? Probably has to do with the length of the variant.

ADD REPLY
0
Entering edit mode

Hello and welcome Mehulsharma.253 ,

the quotation marks shouldn't be neccessary. Could you please show the variant(s) that you don't expect? I guess it will be an insdel that overlapt the region you specify.

fin swimmer

BTW: > myfilenew.vcf.gz will not create a compressed file. You have to pipe the output of tabix through bgzip:

$ tabix -h myfile.vcf.gz 11:5247360-5247664 | bgzip -c  > myfilenew.vcf.gz
ADD REPLY
1
Entering edit mode
6.0 years ago

one particular site: 5247358 always comes up in my output

isn't it a variant which is NOT a Single Nucleotide variant ? Is there a END attribute in the INFO column ?...

ADD COMMENT
0
Entering edit mode

Just checked it. Yes, there is an END attribute. The END coordinate is overlapping my #POS range.

ADD REPLY
1
Entering edit mode

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Upvote|Bookmark|Accept

ADD REPLY

Login before adding your answer.

Traffic: 2792 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6