Entering edit mode
9 months ago
pablo
▴
310
Hello,
I have 12 Illumina samples : 11 mutants and 1 wild. I did the SNP calling :
bcftools mpileup -Ou -f ref *.bam | bcftools call -mv -Ob -o calls.bcf
bcftools view -i '%QUAL>=20' calls.bcf
bcftools convert -O v -o calls.vcf calls.bcf
I need to extract the SNP which are present in my mutants (at least present in 5 samples / 11), but not in my wild sample. For example (Sample1 = wild, I show 8 others = 8 mutants / 11) :
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Sample7 Sample8 Sample9
0 chr1 100 . G A . . . GT 0/1 1/1 0/1 0/0 0/0 0/0 0/0 0/0 0/0
1 chr1 200 . T C . . . GT 0/0 0/0 0/0 0/0 0/1 0/0 0/0 0/0 0/0
2 chr1 300 . T A . . . GT 0/0 0/1 0/0 0/0 0/1 0/0 1/1 0/1 0/1
3 chr1 400 . C A . . . GT 0/0 1/1 1/1 1/1 1/1 1/1 1/1 1/1 1/1
4 chr1 500 . C A . . . GT 0/0 1/1 1/1 1/1 0/0 1/1 1/1 1/1 1/1
I should get the last three lines : because the SNP is 0/0 with the reference, and at least present in 5 mutants (no matter if it is hom 1/1 or het 0/1).
Any tool to do that? Before trying to deal with a perl script..
Best
Thanks Pierre. I guess I need to change
G.isHomVar()).count()>4L
by the number I want to select the number of filtered samples?yes !
And does the sample1 in my case is considered as the wild?
anything that is not in ("sample1","sample2","sample3","sample4","sample5","sample6","sample7","sample8","sample9","sample10") is wild.
Thanks Pierre.