I'm trying to use admixture but now I have a problem. It tells me that I need to use integers for chromosomes, but does not tell me how I might do that. The same with the admixture guide; it only tells me I need to use numbers, but doesn't tell me how I can change the files to do that.
I have the plink.bed, plink.bim, and plink.fam files produced by the plink program, which were made off the zipped VCF file I received from PGP. They do seem to use a "chr#" naming system, where # is replaced with the autosomal number, or X, Y, or M (for mitochondrial DNA), but I'm not sure how to change that.
What would be the best way of resolving this issue?
My first guess would be to change
chr1
to1
,chrX
to23
,chrY
to24
,chrM
to25
I agree, but where? I tried opening the VCF file, unzipped in Wordpad, but it was a binary file, so no go there. Even with a hex editor, I kinda don't wanna mess with a binary file unless I'm really sure of its format.
Other than that, I don't know where I can change the the chromosomes' designations.
Do you know?
You can change that in the plink bim file
And chrM becomes 25, right?
This thread seems relevant: VCF files: Change Chromosome Notation
I think the first awk solution wouldn't fix the chrX issue.
Sam seems to have the right idea:
It seems chrX is 23 and chrY is 24. The only thing I'm not sure about is chrM (MT-DNA). Would that be 25?
Right, it wouldn't. Need some more unix magic, but changing the chromosome identifiers while converting to plink formats would be the most convenient/flexible/error proof.
I see lots of commands in PLINK, but their explanation is quite a thing to wade through. I ended up opening the BIM file in an IDE and using the Replace function.
But now I have a new problem. (Won't I always?) It now ends in a message that says:
"Error: detected that all genotypes are missing for a SNP locus. "Please apply quality-control filters to remove such loci."
No idea what to do about this.
If I can solve it by re-running PLINK, what commands should I use?
Try to do the following code
Should help you to modify the bim file. As for your error, it is likely that your replace has changed the number of line of bim, leading to the problem
Thanks. I had made a backup of plink.bim before modifying it, so I was able to restore it. I used your sed function to move the backup to plink.bim while doing the modification listed above. It succeeded.
Unfortunately, I am still having this error: "Error: detected that all genotypes are missing for a SNP locus. "Please apply quality-control filters to remove such loci."
Does PLINK produce empty lines by default, and is there an option to turn it off?
Strange. PLINK does not produce empty lines filtered. Maybe you should try and add --geno 0.1 in your plink command to remove SNPs with high missingness?