Entering edit mode
10.5 years ago
marina-orlova
▴
90
Hello,
Could you please advise how to choose effective genome size as MACS (v.1.4.2) parameter for peak calling on Drosophila Melanogaster reads? dm2 (1.20e+08) or dm3 (1.52e+08)? Sorry if my question seems strange, I'm a newbie in bioinformatics.
Thank you a lot! I haven't done alignment by myself, just got BAM files with input and treatment files (alignment was done with Bowtie2). Is it possible to restore this information? Or maybe bowtie2 uses only dm3?
Just compare the chromosome sizes in the header of the BAM files. Info for dm3 is here, and dm2 is here. It looks like the chromosome names may differ slightly as well.
Alternatively, just ask whomever performed the alignments.
I compared chromosome sizes, it is dm3 as you said. Thank you for your help!
As a remark if anyone stumbles over this: The effective genome size here refers to the part of the genome that is actually uniquely mappable with standard sequencing read length. There are regions in the genome that are repetitive and therefore not mappable with short reads. They would only accumulate multimappers which are typically excluded from analysis due to low reliability (low or 0 MAPQ score). Therefore they are excluded from the genome size calculation which is used for the Poisson statistics
macs2
uses. Please check the original paper for details on that.