Entering edit mode
7.8 years ago
slimane.khayi
▴
80
Dear colleagues, I have tab-limited file in this format
#CHROM POS REF a1 a2 b1 b2 c1 c2
NW_008246507.1 16 T C/C C/C T/T C/C C/C T/C
NW_008246507.1 1624 A C/C C/C C/C C/C C/C C/C
NW_008246507.1 1656 C T/T T/T T/T T/T T/T T/T
NW_008246507.1 1666 C T/T T/T T/T T/T T/T T/T
NW_008246507.1 1679 C T/T T/T T/T T/T T/T T/T
NW_008246507.1 1681 G A/A A/A A/A A/A A/A A/A
NW_008246507.1 1682 T A/A A/A A/A A/A A/A A/A
NW_008246507.1 1695 T C/C C/C C/C C/C C/C C/C
I want to identify the unique SNPs for each species a, b, c (not strain a1, a2, b1..),have you any python script or any idea to do this job, I am not familiar within scripting languages. Thank you in advance for your help. Sincerely.
Could you please clarify what is a unique SNP in your example data, I find it difficult to see (and I think you are missing a couple of line breaks, as there are currently two positions per line...) -- Would T/C for species c at NW_008246507.1:16 be what you are looking for?