Hi all,
I have a set of HLA typed case/control variant files annotated and with a single tab delimited file for each individual.
#Chromo Position Reference Change Change_type
5 32548555 C A SNP Hom
4 32548561 C G SNP Hom
I wonder whether I can create a PED file with these files. I thought of creating a BED file and then trying to convert it to PED. But since not all the variants are present in each of the individuals, I'm not sure of a way to automate the creation of PED since the variant genotypes should be in exactly same order for all the individuals. One approach would be to take all the annotated tab files and create a single VCF with sample names and convert it to a PED. Yet I'm not sure how to do that. Even a smallest clues is highly appreciated. Thanks in advance :)
Thank you very much. I will try that and let you know the progress.
Correction: you probably want to use --merge-list instead of --bmerge for the merge step.
Dear chrchang,
Sorry for bothering you again. Do you know what criteria does the' --merge-list ' uses for merging the BED files? Since my individual BED files does not contain the SNPs in same order, (Some SNPs are only present in some per individual BED files, I would want plink to merge the bam files using chromosome and position fields) is it possible? I couldn't find any clue :(
PLINK's merge will automatically sort by chromosome and position. (However, you first need to convert your files to PLINK-readable formats.)
I was able to convert the files to TPED using a bash script. Only problem is that, I dont have any values for 2nd and 3rd fields (rs IDs and genetic distances) . Since the merge depends on the chromosome and position, (1st and 4th) as you said, I hope I won't get into any trouble when merging the files :)
It's safe to use '0' for all the centimorgan coordinates.
Lack of rsIDs is a bigger problem. You may want to use PLINK 1.9's
--set-missing-var-ids
flag to address this.