Hello everyone, I am trying to compute correlation between depth of coverage for individuals with homozygote genotypes vs. individuals with heterozygote genotypes.
here is the fist few lines of my datasets (note, I have only one position across all individuals):
head(homozygotes)
chrom pos dp ind_id genotype_id
1 1 115258827 12 HG00099 0|0
3 1 115258827 8 HG00101 0|0
4 1 115258827 6 HG00103 0|0
8 1 115258827 2 HG00114 0|0
9 1 115258827 8 HG00115 0|0
12 1 115258827 8 HG00128 0|0
head(heterozygotes)
chrom pos dp ind_id genotype_id
2 1 115258827 5 HG00100 0|1
14 1 115258827 5 HG00133 0|1
16 1 115258827 5 HG00138 0|1
19 1 115258827 2 HG00160 1|0
27 1 115258827 4 HG00232 1|0
33 1 115258827 9 HG00251 1|0
these 2 datasets differ in length. Therefore,
when I simply try cor.test(homozygotes$dp,heterozygotes$dp),
I get an error message:
"Error in cor : incompatible dimensions.
I have searched to find a solution, but have not been able to find a solid solution. I am now quite stuck, does anyone have any idea how can I proceed and figure this out? I would sincerely appreciate it.
Thanks a lot
you need to match the columns. Down sample larger dataset.