Shared variants
2
0
Entering edit mode
3.0 years ago
priya.bmg ▴ 60

Hello

I have exome data sets from 6 individuals, in which 4 are affected and 2 are not affected. I have to identify the variants which are shared between the four affected individuals. I did the joint call genotyping for the 4 affected individuals and filtered the SNPs and Indels by hard filtering. How do I identify the shared variants between 4 affected individuals

Thanks

Priya

filtering variants GATK • 1.2k views
ADD COMMENT
1
Entering edit mode
3.0 years ago

using VcfFilterJdk http://lindenb.github.io/jvarkit/VcfFilterJdk.html

 java -jar dist/vcffilterjdk.jar -e 'final Set<String> sns = new HashSet<>(Arrays.asList("SAMPLE1","SAMPLE2","SAMPLE3","SAMPLE4")); return sns.stream().map(S->variant.getGenotype(S)).allMatch(G->G.isHet() || G.isHomVar()) && variant.getGenotypes().stream().filter(G->!sns.contains(G.getSampleName())).noneMatch(G->G.isHet() || G.isHomVar());' input.vcf
ADD COMMENT
0
Entering edit mode
3.0 years ago
sbstevenlee ▴ 480

If you are a Python user, you may want to checkout the pyvcf submodule from the fuc package I wrote:

>>> from fuc import pyvcf
>>> data = {
...     'CHROM': ['chr1', 'chr1', 'chr1', 'chr1'],
...     'POS': [100, 101, 102, 103],
...     'ID': ['.', '.', '.', '.'],
...     'REF': ['G', 'T', 'T', 'T'],
...     'ALT': ['A', 'C', 'A', 'C'],
...     'QUAL': ['.', '.', '.', '.'],
...     'FILTER': ['.', '.', '.', '.'],
...     'INFO': ['.', '.', '.', '.'],
...     'FORMAT': ['GT', 'GT', 'GT', 'GT'],
...     'Affected1': ['0/0', '0/0', '1/1', '0/1'],
...     'Affected2': ['0/1', '0/0', '0/1', '0/0'],
...     'Affected3': ['0/0', '0/0', '1/1', '0/0'],
...     'Affected4': ['0/0', '0/0', '0/1', '0/1'],
...     'Unaffected1': ['0/0', '0/0', '0/0', '0/0'],
...     'Unaffected2': ['0/1', '0/1', '0/0', '0/0'],
... }
>>> vf = pyvcf.VcfFrame.from_dict([], data)
>>> vf.df
  CHROM  POS ID REF ALT QUAL FILTER INFO FORMAT Affected1 Affected2 Affected3 Affected4 Unaffected1 Unaffected2
0  chr1  100  .   G   A    .      .    .     GT       0/0       0/1       0/0       0/0         0/0         0/1
1  chr1  101  .   T   C    .      .    .     GT       0/0       0/0       0/0       0/0         0/0         0/1
2  chr1  102  .   T   A    .      .    .     GT       1/1       0/1       1/1       0/1         0/0         0/0
3  chr1  103  .   T   C    .      .    .     GT       0/1       0/0       0/0       0/1         0/0         0/0
>>> affected_samples = ['Affected1', 'Affected2', 'Affected3', 'Affected4']
>>> filtered_vf = vf.filter_sampall(samples=affected_samples)
>>> filtered_vf.df
  CHROM  POS ID REF ALT QUAL FILTER INFO FORMAT Affected1 Affected2 Affected3 Affected4 Unaffected1 Unaffected2
0  chr1  102  .   T   A    .      .    .     GT       1/1       0/1       1/1       0/1         0/0         0/0
ADD COMMENT

Login before adding your answer.

Traffic: 2522 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6