I'm trying to do a cis eqtl analysis, so I have a linear regression where gene expression is the dependent variable and the snps are the independent variable , I have 20,000 genes with 1000 patients and 700,000 snps with 1000 patients , I need to reduce the dimension of the snps so I will choose the snps that is 1000 bp above TSS and 1000 bp below TSS of the gene (cis-eqtl) then I will combine the snps that are above TSS (not sure how yet ???)in one variable, and combine the snps that are below TSS in one variable in aim of reducing the snps in the model ,,and those two combined variables will be added to the model.
Does this make sense ?
I think it would make more sense to use a published method such as FastQTL.
Although it might be very interesting to reinvent the wheel, often that's not necessary.
Thanks for your answer, but my advisor wants me to write my own code, so Im trying to think how to do that ??
Then it makes sense to look up how published methods do their job, and try to replicate that.