Entering edit mode
8.1 years ago
forever
▴
80
Hi everyone, I have SNPS data with the below format and I need to perform SNP association. I do not know how to use R package or Plink with this data format.
SNP Name Sample ID Allele1 - Top Allele2 - Top GC Score SNP
chr1:109457160 2 C C 0.8609 [T/G]
chr1:109457233 2 C C 0.7725 [T/G]
I have little skills converting data file so, I appreciate your reply.
I assume this is array data (although it would be great if you could be more informative in your question). First of all you should know which genome assembly (build) this is from. You should for every position find the major and minor allele and encode the genotypes as 0,1 or 2: 0 for homozygous major allele, 1 for heterozygous, 2 for homozygous minor allele. Then you will need to combine the different samples in a bigger file for plink association analysis.
Thank you for your reply. Actually, it is SNPs association study. data is extracted from Ilimuna. The format of data can be considered by plink as long format file but I need to create the map and fam file. So I have to have lgen, fam and map files to use Plink? The map file shall contain all the snps positions and chromosomes.
Please use
ADD REPLY
to answer to earlier comments. Data is extracted from llimuna is meaningless, given that Illumina (what you most likely mean) is a company with different technology platforms, including sequencing and array.