Entering edit mode
3.6 years ago
SHN
▴
40
Hello All,
I am using shapeit to phase my genotype data using 1000G phase3 as a reference genome. In order to phase my genotypes, I am using SHAPEIT. in the first step, I am using the command shapeIT to remove the problematic SNPs as has been mentioned in the link below: Phasing with SHAPEIT
However, when in ran the second round of SHAPEIT -check to exclude the problematic variants using --exclude-snp Prephased/MyData_chr"${chr}"_alignments.snp.strand.exclude (the link above), I still get error and files with alignment.snp.strand.exclude.
So here are my questions:
- Is it correct not to get all the problematic SNPs/variants in the first round of shapeit --check?
- Should I continue this step (below) again to remove the newly found problematic SNPs/variants until I get no error?
Thank you for your responses,
for chr in X {1..22}; do
plink --bfile MyData --chr "${chr}" --make-bed --out temp
if [ "${chr}" != "X" ]
then
srun --mem=8 --cpus-per-task=4 --partition=serial \
shapeit \
-check \
-B temp \
-M library/1000GP_Phase3/genetic_map_chr"${chr}"_combined_b37.txt \
--input-ref library/1000GP_Phase3/1000GP_Phase3_chr"${chr}".hap.gz library/1000GP_Phase3/1000GP_Phase3_chr"${chr}".legend.gz library/1000GP_Phase3/1000GP_Phase3.sample \
--exclude-snp Prephased/MyData_chr"${chr}"_alignments.snp.strand.exclude \
-T 8 ;
fi
done ;
rm temp.* ;
Hi, I wrote the code in the other threads. Can you show the first command for step 1? What is the error message(s) that you are receiving?
Hi Kevin,
Thanks for your message. I actually am doing the same steps. here is the code for the first step:
After this step, I get files data_qced_chr2_alignments.snp.strand.exclude" and "data_qced_chr2_alignments.snp.strand".
I use this file "_alignments.snp.strand.exclude" for the second round of shapeit -checks and I still get missing sites with variants saved in ".snp.strand.exclude".
and the output files is as below:
Parameters :
Reading SNPs to exclude from input file in [path/data_qced_chr5_alignments.snp.strand.exclude]
Reading site list in [PATH/qc_plink/unphased_chr/temp.bim]
Reading sample list in [PATH/qc_plink/unphased_chr/temp.fam]
Reading genotypes in [/PATH/qc_plink/unphased_chr/temp.bed]
Reading sample list [/PATH/1000GP_Phase3/1000GP_Phase3_chr5.legend.gz]
Reading SNPs in [PATH/1000GP_Phase3/1000GP_Phase3_chr5.hap.gz]
ERROR: Reference and Main panels are not well aligned:
Should I continue excluding variants with another round of shapeit -check and then prephasing? Or are these the new variants that should be used for the prephasing step (step3 in your codes) mentioned here: Phasing with SHAPEIT
Thanks,
Hello again. hmmm, the
--exclude-snp
does not actually remove variants; so, if you run it again, it will just produce the same result.I wonder have you done QC on your input data? Which array is it? Prior to doing the imputation, the code for which I shared here on Biostars in the other threads, I first performed QC, and then successfully pre-phased and imputed ImmunoChip and Illumina GSA again 1000 Genomes Phase III.
By QC, I mean filtering out variants with high missingness, multi-allelic records, duplicates, etc
You can also just try to proceed with the pre-phasing - not sure.
Thanks for your input. Yes I did the QC on the genotypes, ran shapeit -check, running step2! and I stopped here!
I thought that when we exclude variants in the second step, then the program should not produce additional variants.
My array is Infinium omni2.5, Illumina and I want to prophase and impute it against 1000G phase III.
un-alignments can affect imputation accuracy. I should probably redo everything just to make sure everything is running correctly.