multisample BED files to PED conversion
1
0
Entering edit mode
10.4 years ago

Hi all,

I have a set of HLA typed case/control variant files annotated and with a single tab delimited file for each individual.

#Chromo    Position    Reference    Change   Change_type   
5    32548555    C    A    SNP    Hom
4    32548561    C    G    SNP    Hom

I wonder whether I can create a PED file with these files. I thought of creating a BED file and then trying to convert it to PED. But since not all the variants are present in each of the individuals, I'm not sure of a way to automate the creation of PED since the variant genotypes should be in exactly same order for all the individuals. One approach would be to take all the annotated tab files and create a single VCF with sample names and convert it to a PED. Yet I'm not sure how to do that. Even a smallest clues is highly appreciated. Thanks in advance :)

plink ped gwas Bed VCF • 4.0k views
ADD COMMENT
2
Entering edit mode
10.4 years ago

One way you could do this:

  1. Write a short script to convert one file at a time to TPED (see http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#tr) or VCF format.
  2. Convert these files to PLINK binary format one at a time (--make-bed).
  3. Merge the binary filesets (--bmerge) and output the final result as PED (--recode).
ADD COMMENT
0
Entering edit mode

Thank you very much. I will try that and let you know the progress.

ADD REPLY
0
Entering edit mode

Correction: you probably want to use --merge-list instead of --bmerge for the merge step.

ADD REPLY
0
Entering edit mode

Dear chrchang,

Sorry for bothering you again. Do you know what criteria does the' --merge-list ' uses for merging the BED files? Since my individual BED files does not contain the SNPs in same order, (Some SNPs are only present in some per individual BED files, I would want plink to merge the bam files using chromosome and position fields) is it possible? I couldn't find any clue :(

ADD REPLY
0
Entering edit mode

PLINK's merge will automatically sort by chromosome and position. (However, you first need to convert your files to PLINK-readable formats.)

ADD REPLY
0
Entering edit mode

I was able to convert the files to TPED using a bash script. Only problem is that, I dont have any values for 2nd and 3rd fields (rs IDs and genetic distances) . Since the merge depends on the chromosome and position, (1st and 4th) as you said, I hope I won't get into any trouble when merging the files :)

ADD REPLY
0
Entering edit mode

It's safe to use '0' for all the centimorgan coordinates.

Lack of rsIDs is a bigger problem. You may want to use PLINK 1.9's --set-missing-var-ids flag to address this.

ADD REPLY

Login before adding your answer.

Traffic: 2752 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6