Hi everyone
I'm working on a chip-seq experiment in Wheat, which has a very large and repeptitive genome.
I'm a bit baffled by the "effective genome size" parameter in macs2. I understand it is related to the repetitiveness of the genome but I'm not sure how to calculate it. I've tried GEM but it gave me an error, so in parallel to trying to solve the GEM problem, maybe someone has an alternative?
Secondly, if I'm looking for peaks in repeptitive as well as non-repetitve regions of the genome, I thought maybe I should use the full length rather than the mappable length. Am I correct?
Finally - if I have a control sample (no antibody), can that be used to estimate the mappability of the genome?
Thanks!
Thanks for the input and for the links.
Do you think the "effective genome size" should be calculated the same way I'm doing the mapping? For example, if I'm retaining only uniquely mapped reads, then I should calculate the mappability as uniquely mappable regions but if I'm retaining also reads that mapped twice or three times then maybe I should determine the mappability as the regions that are mappable two- or three times?
Is there a site you know of that explains the statistics behind the effective size?
Thanks
There is no complicated statistics behind "effective genome size". This Why Does Macs Use A Genome Size Of 2.7 Billion Instead Of 3 Billion For Human? might be useful and may be this paper. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0030377
Thanks a lot for the links!