Remove duplicates
3
0
Entering edit mode
3.0 years ago
am29 ▴ 40

Is there some easy way to remove duplicates from the vcf file? I just want to get rid of the list of duplicated SNPs, not to leave one of the duplicates, but to delete them all. I already tried with bcftools but it didn't work. Now, I am trying to delete them using --exclude intervals from GATK, but I would like to find some other solution if possible. Is there some quick way to just delete lines/SNPs from vcf file?

remove duplicates vcf • 1.3k views
ADD COMMENT
2
Entering edit mode
3.0 years ago

a java code:

compile:

wget -O picard.jar "https://github.com/broadinstitute/picard/releases/download/2.26.8/picard.jar"
javac -cp picard.jar Biostar9502101.java

execute:

cat in.vcf | java -cp picard.jar:. Biostar9502101
ADD COMMENT
0
Entering edit mode

Thank you, this worked!

ADD REPLY
0
0
Entering edit mode

Thanks for the quick answer, but I have already seen those and that's not what I need. I want to delete not only the duplicate but the "nonduplicate" too. So, if there is SNP1 4356789 and SNP1 4356789 I want to get rid of both. Not to leave one of them, but to delete both.

ADD REPLY

Login before adding your answer.

Traffic: 1647 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6