Hello,
I was running QC step on my .CEL files via:
apt-geno-qc \
--cdf-file BI_SNP.cdf \
--qcc-file BI_SNP_1.qcc \
--qca-file BI_SNP.qca \
--cel-files cell.txt \
--dm-out outDM1 \
--out-file qc1.txt
And my qc1.txt file looks like this:
#%guid=7c31419b-b5cf-41f5-9dc5-9d3bad34d263
cel_files qc-call-rate-all qc-call-rate-nsp qc-call-rate-sty qc-call-rate-nsp-sty-overlap em-cluster-chrX-het- contrast_gender em-cluster-chrX-het-contrast_gender_chrX_het_rate
ALIKE_g_1LTX827_BI_SNP_F01_33250.CEL 0.95731 0.94602 0.93559 0.97104 female 0.25601
BURRY_g_3KYJ479_BI_SNP_A12_40182.CEL 0.96459 0.97172 0.93076 0.97412 male 0.03026
ABAFT_g_4RWG569_BI_SNP_E12_35136.CEL 0.86433 0.98586 0.59742 0.90819 female 0.39013
MILLE_g_5AVC089_BI_SNP_F02_35746.CEL 0.96294 0.96401 0.92915 0.97535 male 0.02721
…
and I got my output .txt files in outDM1 which look like:
#%cel-file=./UNTIL_g_3ECO791_BI_SNP_H10_36454.CEL
#%number-SNPs=3022
#%dm-thresh=0.33
#%dm-het-mult=1.25
probeset_id call confidence
FQC-10090295 1 0.187500
FQC-10119363 2 0.023438
...
I did perform these kind of analysis on the results in outDM1 files:
1) Merge all the files together to have a matrix where individuals are rows and columns are SNPs 2) Calculate the mean and variance for SNP and each sample 3) Look at the distribution of mean confidence for each sample. 4) Look at the distribution of the variance of confidence for each sample.
this is the plot of these results:
I am not sure how to implement the results of this QC analysis in order to proceed with: apt-probeset-genotype step
I found this workflow: http://www.bioinf.wits.ac.za/h3a/scripts/phase1-apt/celfiles_prep_apt_geno_qc_h3data.txt however the samples in this example were genotyped on an Affy 6.0 chip which has different QC options than the 5.0 chip which is in my case (e.g. contrast QC). So I am not able to use any of the QC metrics specific to the 6.0 chip.
Any advice on this would be appreciated.
Thanks Ana