Entering edit mode
4.3 years ago
lopezpower86
▴
10
Hello,
I'm dealing with a quite large vcf file of 10 individuals called with freebayes. I'm willing to find unique variants for 1 of the individuals and then compare it with 3 other individuals. I tried searching for quite some time and i can't find the correct answer myself. I would appreciate your help.
see GATK selectvariants with a JEXL expression.
e.g:
Have you looked at
bcftools view -x
? https://samtools.github.io/bcftools/bcftools.html#viewTo be honest selecting individuals wasn't that big of a problem, I managed to do it with vcftools --indv option. The issue is I dont know how to select unique variants of 1 individuals vs 3 others (with --diff-site from vcftools i managed to compare one individual to another).
Further to Pierres' comment, another option I've used extensively is snpSift filter (from snpEff). It's available in Galaxy so is quite simple to use in the cloud, we use it locally and pass on full lists of filters and results.
For two samples, I use
(isHom( GEN[1] ) & isVariant( GEN[0] ) & isRef( GEN[1] ))
https://toolshed.g2.bx.psu.edu/repository/display_tool?repository_id=65063aa2c697f935&render_repository_actions_for=tool_shed&tool_config=%2Fsrv%2Ftoolshed%2Fmain%2Fvar%2Fdata%2Frepos%2F001%2Frepo_1363%2FsnpSift_filter.xml&changeset_revision=2b3e65a4252f
As I understand this filter: 1st individual [1] is a reference homozygote and 2nd [0] any variant ? So these types of variants will be moved to a new file or erased? Plus will long expressions work like (isHom( GEN[1] ) & isVariant( GEN[0] ) & isRef( GEN[1] ) | (isHom( GEN[3] ) & isVariant( GEN[2] ))
Yep, have a play with it, be careful, have a positive and negative control etc. Make sure you generate summaries of SNVs common to all, and then iteratively improve your queries to where you're happy with it. Very easy to make mistakes with wide-reaching consequences.