How to convert multiple .vcf files into single .ped (PLINK compatible files)?
0
2
Entering edit mode
3.2 years ago
isml2688 ▴ 20

Hi everyone,

I am a newbie to the whole bioinformatics world and I need to analyse WGS data from several case samples. I have now several individual .vcf files and would like to use PLINK for Quality Control analysis, Population Stratification and ultimately GWAS.

As PLINK requires .bim, .fam and .bed files to do such analysis, I need to create a single .ped file from the multiple .vcf files I have.

So, I tried using a for loop:

for file in /path-to-folder-with-all-vcf-files/*.vcf
do
  plink --vcf ${file} --allow-extra-chr --recode --out /path-where-I-want-to-save-the-new-ped-file/${file}.ped
done

but my job keeps giving me back the same error:

Exited with exit code 5.

and at the end it simply says:

Error: Missing --vcf parameter.
For more information, try "plink --help <flag name>" or "plink --help | more"

I already read the PLINK documentation but I cannot find the mistake. From what I understand, I am telling --vcf to use ${file} to execute the rest of the commands (--allow-extra-chr, --recode and --out), so why is it telling me there is no parameter?

Or is there another way to convert the multiple .vcf files into a single .ped file that you could recommend?

Thank you in advance for your help!

vcf ped PLINK WGS • 1.6k views
ADD COMMENT
1
Entering edit mode

What's the result of ls /path-to-folder-with-all-vcf-files/*.vcf*? If your .vcf files are actually .vcf.gz files then your script won't capture those files. Also you might want to consider using plink2, as it's faster and better in most respects.

ADD REPLY
0
Entering edit mode

Thank you for the fast response! This was indeed a problem. I changed the command to:

for file in /path-to-folder-with-all-vcf-files/*.vcf.gz; do filebase=$(basename $file); plink --vcf ${file} --allow-extra-chr --recode --out /path-where-I-want-to-save-the-new-ped-file/${filebase%.vcf.gz}.ped;done

However, this gave me an output of 50 individual .ped, .log, .map and .nosex files. I tried putting together all the 50x .ped files into one, but I am having issues with the name of my samples. They have an underscore (X21-001_a or X21-002_b) to indicate the type of sample used, problem is this underscore shifts the content of the columns to the right (it recognizes the first part as FamilyID=X21-001 and the second part as SampleID=a).

So I wanted to ask you about plink2, I tried asking google but I am lost. Could you please guide me in the right direction to find the type of command I need to use to convert my 50 .vcf.gz files, into a single .ped? or give me an example? Thank you in advance.

ADD REPLY

Login before adding your answer.

Traffic: 1638 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6