How can I convert from plink format (bed,bim and fam) to some input genotype formats which are defined as followed:
Format 1:
The first value in each line is the individual’s id. The remaining values are the genotypes of the individual at each locus, either 0, 1, or 2 (or 9 if missing).
Example:
id1 1 1 2 0 1 1 1 1 1 0
id2 0 2 1 1 0 1 1 1 2 2
id3 1 2 0 1 2 1 0 1 2 0
id4 2 1 1 1 1 1 1 1 2 1
Format 2:
For each individual there are two lines. The first line gives the individual’s id and the read counts for the reference allele. The second line gives the individual’s id and the read counts for the alternative allele.
Example:
id1 0 0 0 2 1 1 2 0 2 1
id1 3 1 2 1 2 2 1 4 4 3
id2 1 0 1 1 4 2 1 2 3 1
id2 0 1 2 1 1 1 3 2 3 2
id3 0 2 1 3 2 1 3 1 2 2
id3 2 3 3 1 2 2 0 2 1 2
id4 1 1 4 0 0 1 1 2 1 1
id4 1 3 2 3 2 1 2 2 1 3
the thing is, plink .bed files dont really include an entry for read count. so this makes me suspect you have a second file (perhaps a .fasta, a .bam, or a .vcf file?) what other inputs would you be drawing from here?
Sure! I should have been more precise. I have a full set of plink format files, bed,bim and fam, which I can use. I will edit the question. Thanks pointing to this.
So I guess we are looking at some weird version of the EIGENSTRAT format (https://reich.hms.harvard.edu/software/InputFileFormats). So converting from .ped to EIGENSTRAT and adjust the file might be the way to go?. Maybe one can confirm this?