For example, RNA-seq expression for gene1 in 10 people are, GENE1=[0, 0, 1, 2, 3, 4, 3, 4, 2, 7]。 SNP1 with alleles A and G, and SNP1 in 10 people is SNP1=[0, 1, 2, 1, 1, 2, 2, 2, 0, 0], 0 means GG, 1 means AG, 2 means AA。
What I want to do it eQTL analysis. Simple put, I want to fit a linear model to find out if the expression GENE1 was regulated by SNP1。 Should I remove the zeros values in GENE1 expression values before fit the regression model? It should be noted, for many genes, if I removed the zeros, most of the samples will also be removed.