Hi all
I usually use Plink1.9 to deal with the genotype data for variant and individuals manipulation such as genotype extraction based on rsID locally in my laptop.
I am wondering that are there any methods that are able to extract genotype data of certain population based on multiple set of rsID remotely from 1000 genome project phase 1 dataset at once like plink1.9 can do it by using --extract
flag?
As far as i know, the only method to that to use for loop in tabix to extract SNPs one by one based on genomic position and chromosome.
However, using for loop may take a long time if I have a large number of SNPs. So any suggestion to achieve this with more efiiciency ?.
Thanks
While you wait for an answer to your question, have you tried using a loop? The extraction using
tabix
is reasonably fast with a region (instead of one SNP).To be honest, I did not try it yet.
I am just wondering if there are any methods to extract genotype based on rsID instead of a region remotely.
Or using tabix is the standard method (fastest !!!) when extracting genotype based on position remotely ?
If so, maybe it's worth that I will try to do it first.