I want to recall SNPs from Illumina HumanOmni2.5-4v1. I have the raw data (Grn.idat
and Red.idat
) files, and also a matching FinalReport.csv
which include the next columns:
SNP Name Sample Name GC Score Allele1- Forward Allele2- Forward Allele -Top Allele2-Top Allele1-Design Allele2-Design Allele1-AB Allele2-AB Theta R X Y X Raw Y Raw B Allele Freq, Log R Ratio
And I have thousands of such files. How can I get the number of calls for a specific SNP?
Should I use the raw idat files, or CSVs? And if the answer is the CSVs, then which column, and how to interpret it?
Thanks
I have the PLINK, but I want to process the raw data, as the bim-bed-fam files have many missing calls, and I'm trying to figure out why, or can I recall it.
I tried using CRLMM, and it reads the idat files, and gives two number for each SNP: calls and confidence. But do I do with this two? How do I know what is the genotype when a SNP has a call of 10210 and and a confidence of 0.4898242?
Chances are there's a lot of missing calls because they've not been imputed. Check out IMPUTE2
That's probably true, and yet, I want to see what was the genotyping in the raw calls, before imputation was done.
In addition, it is not obvious how to use IMPUTE2 with these files.