Hello everyone,
I predicted SNPs by GATK which are in VCF format. I want to create binary ped file which is intermediate input for another tool.
java -jar GenomeAnalysisTK.jar -T VariantsToBinaryPed -R reference.fasta -V variants.vcf -m metadata.fam -bed output.bed -bim output.bim -fam output.fam
I have reference genome -R , -V vcf input file. But after doing exploration, I found .fam is need to be given as input to GATK which is metadata information . I do not have information of family_id, paternal id and maternal id. What can I provide here in order to run in GATK ??
My metadata format is
family_id individual_id paternal_id maternal_id sex phenotyp
F545 1 1
I found link https://easygwas.ethz.ch/faq/view/15/
The PED file has 6 fixed columns at the beginning followed by the SNP information. The columns should be separated by a whitespace or a tab. The first six columns hold the following information:
Family ID (if unknown use the same id as for the sample id in column two)
Sample ID
Paternal ID (if unknown use 0)
Maternal ID (if unknown use 0)
Sex (if unknown use 0)
Not used, set to 0
Rest of the columns: SNPs
Is it right If I provide 0 in my metadata file in Paternal, meteral ID. Also I am using the same Family ID and Sample ID based on above description. Please correct me If I am doing anything wrong. I am new in this area of data analysis.
Thank you in advance
Archana