Entering edit mode
3.5 years ago
Mollie
▴
10
Hi, I am wondering if someone can provide me a data set to test across plink and EpiGen and possibly some others. I need a x.gen file and an x.csv file. Im looking at comparing the methods of analysis and don't actually care about the data so the smaller the better. Thanks
Hello and good evening Mollie. Are you trying to follow a tutorial on the World Wide Web? There are test datasets here: https://www.cog-genomics.org/plink/1.9/resources
My problem with that data is I want to use it also in SNP_TEST and GenEpi. SNP_TEST requires a .gen file and a .sample of phenotypes and GenEPi requires a .gen file and .csv of phenotypes. I have figured out how to convert a .bed file to a .gen but not sure about the phenotype part using those plink1.9 provided sets. Thank you
Can you please link us to the documentation page(s) for these programs where the phenotype file's format is described.
I have a hapmap1.gen file and I think that the .bim version can be used as the phenotype because of the column arrangement? heres the GenEpi docs: https://genepi.readthedocs.io/en/latest/format.html#input-phenotype-data Trying to run the sample data for SNP_TEST gives me this warning: !! Error: mismatch in column names or types between the sample files "./example/cohort1.sample", "./example/cohort2.sample". which doesn't make sense since they should provide this data ready to go?
And SNP_TEST: https://www.well.ox.ac.uk/~gav/snptest/#input_file_formats
Hey again, thanks! GenEpi states:
So, I guess that this means a file like this:
Here, the last column is the outcome, encoded
0
and1
, while the other columns are other phenotypes that you may have. I guess that each row, then, is a sample, and that order should correspond to the order in your GEN file.I am not sure what is happening with the other program, SNP_TEST...
I am acutely aware that using these programs is very frustrating. Apart from the fact that they all want different input formats, I have also come across situations where the documentation is incorrect and test data does not load as advertised.