Dear Biostars,
I'm trying to get all the high quality SNPs of the strains 129S1 and 129S5 from the Sanger Mouse sequencing project. For this I have used tabix previously, and specifying chromosomal ranges of my interest and then filtering based on strain and call quality (i.e. high confidence SNP). Usually I use the following command:
tabix -h ftp://ftp-mouse.sanger.ac.uk/REL-1111-SNPs/mouse-snps-all.annots.vcf.gz 7:123000000-124000000 > 7_123Mb_124Mb_129S1_5_SNPs_Sanger.txt
However doing this always gives me gigantic (>1Gb) files which I then have to process to find my 129S1 and 129S5 SNPs compared to C57Bl6 reference.
This time I want to do the whole genome and get all high confidence SNPs using tabix, but I'm confused with the syntax and how to specify the program to only retrieve S1 and S5 high confidence SNPs compared to the reference.
Anyone done something similar???
Thanks!!
Sakti
for whole genome dont limit to a specific region as you did it above; its better to download the vcf.gz n vcf.tbi files and run them locally