Subsetting .vcf.gz based on .txt file
1
0
Entering edit mode
4.6 years ago
chvbs2000 • 0

I am working on preprocessing data from a list of .vcf.gz to subset all these .gz files according to a list of SNPs. I stored SNP IDs of interest into a text file. And I want to all rows from these .vcf.gz files that have the same SNP IDs from the SNP_ID file:

SNP_ID file:

rs61733845
rs1320571
rs9729550
rs1815606
rs7515488
rs11260562
rs6697886
rs6603785
rs11804831

In python I would imagine to process each line on conditional statement or inner join, yet python may not be an optimal choice since the size all these .vcf.gz files are huge. Is there any way I can subsetting vcf.gz based on a text file with bash command such as awk, sed, or cat? Thanks!

SNP gene vcf genome sequencing • 1.2k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode
4.6 years ago
Yean ▴ 140

What's about plink ?

   plink1.9 --vcf input.vcf.gz --extract snp.snplist --make-bed --out extract_snp
ADD COMMENT

Login before adding your answer.

Traffic: 2176 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6