Extracting genotype data from 1000 genome project phase I by using rsID Remotely ?
1
0
Entering edit mode
5.0 years ago
Yean ▴ 140

Hi all

I usually use Plink1.9 to deal with the genotype data for variant and individuals manipulation such as genotype extraction based on rsID locally in my laptop.

I am wondering that are there any methods that are able to extract genotype data of certain population based on multiple set of rsID remotely from 1000 genome project phase 1 dataset at once like plink1.9 can do it by using --extract flag?

As far as i know, the only method to that to use for loop in tabix to extract SNPs one by one based on genomic position and chromosome.

However, using for loop may take a long time if I have a large number of SNPs. So any suggestion to achieve this with more efiiciency ?.

Thanks

SNP 1000 genome project API • 1.1k views
ADD COMMENT
1
Entering edit mode

While you wait for an answer to your question, have you tried using a loop? The extraction using tabix is reasonably fast with a region (instead of one SNP).

ADD REPLY
0
Entering edit mode

To be honest, I did not try it yet.

I am just wondering if there are any methods to extract genotype based on rsID instead of a region remotely.

Or using tabix is the standard method (fastest !!!) when extracting genotype based on position remotely ?

If so, maybe it's worth that I will try to do it first.

ADD REPLY
1
Entering edit mode
5.0 years ago
Shicheng Guo ★ 9.6k

First to download 1000G data with tabix

tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/ALL.2of4intersection.20100804.genotypes.vcf.gz 2:39967768-39967768 > G1000.vcf

Then use bcftools to extract the information for the SNPs you want.

bcftools view -T SNP.bed G1000.vcf -Oz -o SNP.vcf.gz
ADD COMMENT

Login before adding your answer.

Traffic: 2061 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6