Question

Remove a list of positions form a VCF file

1

Entering edit mode

8.1 years ago

shinken123 ▴ 150

Hi

I have a list of chromosomes and positions that looks like this:

And now I would like to use it to remove them from my vcf file. Do you know how to do this?

SNP vcf filter • 12k views

ADD COMMENT • link updated 6.0 years ago by zx8754 12k • written 8.1 years ago by shinken123 ▴ 150

3

Entering edit mode

8.1 years ago

Jorge Amigo 14k

a simple grep would do:

grep -vf list.txt file.vcf

ADD COMMENT • link 8.1 years ago by Jorge Amigo 14k

2

Entering edit mode

Though this was posted a while ago, I just have to say that if you grep with just the -vf flags, it will remove positions that are in list.txt from file.vcf but it will also remove additional positions that might be comprised of more digits and still contain the sequence of digits of the positions from the list. For example, you may want to remove position 10045, but if the vcf contains the positions 100450, 1004511, 100453489 etc, these will be removed as well.

In this case the -w flag should also be added to the above which greps words, that is it greps the patterns that are given if they are preceded and followed by whitespace.

ADD REPLY • link 7.5 years ago by Earendil ▴ 50

0

Entering edit mode

Thank you very much. The only problem with grep for me is that was very slow and memory consuming so I use this link

So I transform my file to a bed file like this:

1   6405767 6405767
1   8108895 8108895
1   8623336 8623336
.
.
.

May be is not the most elegant way to do it but works for me.

ADD REPLY • link 8.1 years ago by shinken123 ▴ 150

1

Entering edit mode

6.0 years ago

YocelynGG ▴ 70

Hi!!

If someone has the same question, this loop has solved the problem

grep -Fwvf list_snp_exclude file.vcf > new_filter.vcf

list_snp_exclude: It's a list with the format Chromosome_name"\t"Position

Chrom_177   4393715
Chrom_177   4394618
Chrom_177   4395751
Chrom_215   4395751
Chrom_215   4396373
. . .

ADD COMMENT • link updated 6.0 years ago by finswimmer 16k • written 6.0 years ago by YocelynGG ▴ 70

0

Entering edit mode

How is this different from Jorge's answer above?

A: remove a list of positions form a vcf file

ADD REPLY • link 6.0 years ago by zx8754 12k

1

Entering edit mode

my answer was very simple. this one adds more grep functionality: -F option looks for fixed strings rather than regular expressions, and -w option looks for whole words rather than just matching patterns. I don't know how -F works in conjunction with -w, but it looks like an overall faster option. if performance is to be considered, maybe a better aimed regex (-P option needed) could also be even faster:

sed 's/^/^/; s/$/\\t/' list.txt | grep -vPf - file.vcf

ADD REPLY • link 6.0 years ago by Jorge Amigo 14k

score 7 · Accepted Answer · 2018-11-25

7

Entering edit mode

6.0 years ago

finswimmer 16k

bcftools can do this:

$ bcftools view -T ^list_snp_exclude.txt input.vcf > output.vf

With the ^ before the file with the coordinates one tell bcftools to exclude these regions.

fin swimmer

ADD COMMENT • link 6.0 years ago by finswimmer 16k

0

Entering edit mode

just wondering if I wish to add 2 more columns "Alternate" and "Reference" what should I change in the above command? because for me this didn't work.

ADD REPLY • link 5.2 years ago by ijlal.hyder2012 ▴ 20