I was wondering about the methods that deal with multi mapping reads when investigating transposable elements (TE) with ChiP-seq and ATAC-seq data. Many publications remove reads that map to multiple genomic regions however TE’s from younger families are more likely to have multi mapping reads given their more recent integration into the human genome and less time to have mutations. Removing multi mapping reads may then bias the results towards older TE families.
Would you remove or keep multi mapped reads for ChiP-seq and ATAC-seq datasets? And if they are kept, what method would be the best way to assign them to a region (from methods based on random assignment, fraction, expectation-maximization (EM)). Recently developed tools for RNA-seq data seem to commonly use the EM algorithm for multi-mapped reads when studying TEs.
This message is too late to assist @ahnje770, but in case anyone with similar questions/comments stumbles upon this page, the following resources (PubMed or bioRxiv entries) may be useful: