I have a file that I have filtered my SNPs for LD (in the example below;my.filtered.snp.id). I want to keep only these SNPs in my genotype matrix (geno_snp), I figured how to do it in R but the file is so big (~7GB) can not load it in R. I want to keep those lines (the whole line including snp.id and genotype information) in the genotype matrix where snp.id matches with snp.id in my my.filtered.snp.id and delete those that are not match. Can anyone help me to know how can I do it in Perl or python?
head my.filtered.snp.id)
Chr10_31458
Chr10_31524
Chr10_45901
Chr10_102754
Chr10_102828
Chr10_103480
head (geno_snp)
XRQChr10_103805 NA NA NA 0 NA 0 NA NA NA NA NA 0 0
XRQChr10_103937 NA NA NA 0 NA 1 NA NA NA NA NA 0 2
XRQChr10_103990 NA NA NA 0 NA 0 NA NA NA NA NA 0 NA
Yes, grep can easily do the job for me. I did not try Python scripts though .... thanks guys