Hi there,
this is my first post at BioStars. I am a new starter bioinformatician. I have a problem that should be easy to solve but I cannot sort out, so I would be glad if you could help me with this issue.
The question is easy. I have a VCF file with genotype data of many samples. It contains SNPs in the rows and the columns are of a typical VCF (#CHROM,..., INFO) followed by the Ids of the samples. I would like to filter out the non-European samples according to these genotype data, using HapMap. I was told I had to do a PCA. I have tried several tools for this, Shellfish, Beagle, SNPRelate, but I could not solve the problem. With SNPRelate, I could do the PCA, but this just clusters samples that are unlabeled and I need to associate them to HapMap populations (CEU, YRI, JPT, CHB). On the other hand, Shellfish returns me a non-informative error when it is running:
Exception: command gtool -P --ped shellfish-temp-15479/146134504516.ped --map shellfish-temp-15479/146134504516.map --og shellfish-temp-15479/146134504516.gen --os shellfish-temp-15479/146134504516.sample --discrete_phenotype 0 >> shellfish.log exited with code 256 (1)
And in file shellfish.log:
...
Note: No phenotypes present.
--recode to plink.ped + plink.map ... done.
Unknown parameter: 0
What steps and tools would you recommend to follow? I can use any tool you think is suitable for this.
Sorry for this, it may be an easy problem, but I have spent 2 days trying several tools.
Many thanks.