Entering edit mode
9.5 years ago
Paula Sanchez
•
0
Dear all,
I am pretty new to genomics and I just received a genotype file. I was using other commands in R and it was too slow. I have decided to use SnpStats package, but I am not being able to read my file.
My file is a dataframe file with 10,000 rows (animals) and 600,000 columns (SNPs) coded as 0,1 and 2. I found several functions to transform it to SNPstats object, but all of them do not apply to my case e.g. read.snps.long is for one call per row, etc.
Any help for me to get started?
Thanks in advance.
What is the objective of your analysis?
I want to create the genomic relationship matrix, PCA and genomic predictions. Thanks.
If you want to use SNPstats you should format the data as pedigree file, or as a PLINK file, that is a kind of standard for genomic analysis. To transform the file you should master a bit of scripting (in any language: R, bash, python...). However, to the best of my knowledge SNPstats only deal with diallelic data.
There are other software that allow you to generate a GRM (e.g., GCTA, PLINK, LDAK) and some that allow you to evaluate the PCA (e.g., PLINK). However, I think that all of them require diallelic data (but it is worth checking).