It sounds like you are already familiar with Python, so here's one solution using the pyvcf
submodule from the fuc
package I wrote.
Let's imagine you have 3 controls (C1-C3) and 3 affected samples (A1-A3).
>>> from fuc import pyvcf
>>> data = {
... 'CHROM': ['chr1', 'chr1', 'chr1'],
... 'POS': [100, 101, 102],
... 'ID': ['.', '.', '.'],
... 'REF': ['G', 'T', 'T'],
... 'ALT': ['A', 'C', 'A'],
... 'QUAL': ['.', '.', '.'],
... 'FILTER': ['.', '.', '.'],
... 'INFO': ['.', '.', '.'],
... 'FORMAT': ['GT', 'GT', 'GT'],
... 'C1': ['0/1', '0/1', '0/0'],
... 'C2': ['0/0', '1/1', '0/0'],
... 'C3': ['0/1', '0/1', '0/0'],
... 'A1': ['0/0', '0/1', '1/1'],
... 'A2': ['0/0', '1/1', '0/1'],
... 'A3': ['0/0', '0/0', '0/1'],
... }
>>> vf = pyvcf.VcfFrame.from_dict([], data)
>>> # vf = pyvcf.VcfFrame.from_file('in.vcf')
>>> vf.df
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT C1 C2 C3 A1 A2 A3
0 chr1 100 . G A . . . GT 0/1 0/0 0/1 0/0 0/0 0/0
1 chr1 101 . T C . . . GT 0/1 1/1 0/1 0/1 1/1 0/0
2 chr1 102 . T A . . . GT 0/0 0/0 0/0 1/1 0/1 0/1
You can remove variants that are absent in the affected.
>>> filtered_vf = vf.filter_sampany(['A1', 'A2', 'A3'])
>>> filtered_vf.df
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT C1 C2 C3 A1 A2 A3
0 chr1 101 . T C . . . GT 0/1 1/1 0/1 0/1 1/1 0/0
1 chr1 102 . T A . . . GT 0/0 0/0 0/0 1/1 0/1 0/1
Optionally write the VCF data to an output file.
# filtered_vf.to_file('out.vcf')
Let me know if you have any questions.
Thank you for your reply. I'm not sure I understand the command, I have to replace the ("S1","S2","S3","S4") by the list of my unaffected samples?
yes
Unfortunately when i try to install it with:
i get the error:
do you have the right to create directories under /shared/home/quentin67100/.gradle ?
You're right, it's an issue with space disk quota on my home.