how to filter orphan breakends from a vcf?
1
0
Entering edit mode
6.0 years ago
joannew • 0

After several rounds of filtering, I have vcf files in which some of the break-ends are missing their mates (which presumably didn't pass the other filter parameters). Does anyone know of a way to filter out these orphan break-ends? I don't see that there is any flag or annotation within the remaining BND record, just that I have ID# 28061_1, for example, included in the file, but not the mate 28061_2. Thanks for any suggestions!

vcf bcftools • 1.3k views
ADD COMMENT
0
Entering edit mode
6.0 years ago

using bioalcidaejdk. http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html

loop over all the BDN, create an associative map ID->MATEID and print the keys where the mate is not found.

java -jar ~/src/jvarkit-git/dist/bioalcidaejdk.jar -e 'final Map<String,String> map=stream().filter(V->V.hasID() && V.getAttributeAsString("SVTYPE","").equals("BND")).collect(Collectors.toMap(V->V.getID(),V->V.getAttributeAsString("MATEID",""))); for(String k:map.keySet()) if(!map.containsKey(map.get(k))) println(k); ' input.vcf > id.list

it gonna print a list of orphan ID that you can use to filter-out the VCF using for example GATK selectvariants and the option --excludeIDs using https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_variantutils_SelectVariants.php#--excludeIDs

ADD COMMENT

Login before adding your answer.

Traffic: 2194 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6