Hi
I have a list of chromosomes and positions that looks like this:
1 10045
1 93056
1 109272
1 127711
1 127822
.
.
.
And now I would like to use it to remove them from my vcf file. Do you know how to do this?
Hi
I have a list of chromosomes and positions that looks like this:
1 10045
1 93056
1 109272
1 127711
1 127822
.
.
.
And now I would like to use it to remove them from my vcf file. Do you know how to do this?
bcftools can do this:
$ bcftools view -T ^list_snp_exclude.txt input.vcf > output.vf
With the ^
before the file with the coordinates one tell bcftools
to exclude these regions.
fin swimmer
a simple grep would do:
grep -vf list.txt file.vcf
Though this was posted a while ago, I just have to say that if you grep with just the -vf flags, it will remove positions that are in list.txt from file.vcf but it will also remove additional positions that might be comprised of more digits and still contain the sequence of digits of the positions from the list. For example, you may want to remove position 10045, but if the vcf contains the positions 100450, 1004511, 100453489 etc, these will be removed as well.
In this case the -w flag should also be added to the above which greps words, that is it greps the patterns that are given if they are preceded and followed by whitespace.
Thank you very much. The only problem with grep for me is that was very slow and memory consuming so I use this link
So I transform my file to a bed file like this:
1 6405767 6405767
1 8108895 8108895
1 8623336 8623336
.
.
.
May be is not the most elegant way to do it but works for me.
Hi!!
If someone has the same question, this loop has solved the problem
grep -Fwvf list_snp_exclude file.vcf > new_filter.vcf
list_snp_exclude: It's a list with the format Chromosome_name"\t"Position
Chrom_177 4393715
Chrom_177 4394618
Chrom_177 4395751
Chrom_215 4395751
Chrom_215 4396373
. . .
my answer was very simple. this one adds more grep functionality: -F
option looks for fixed strings rather than regular expressions, and -w
option looks for whole words rather than just matching patterns. I don't know how -F
works in conjunction with -w
, but it looks like an overall faster option. if performance is to be considered, maybe a better aimed regex (-P
option needed) could also be even faster:
sed 's/^/^/; s/$/\\t/' list.txt | grep -vPf - file.vcf
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
just wondering if I wish to add 2 more columns "Alternate" and "Reference" what should I change in the above command? because for me this didn't work.