Entering edit mode
4.5 years ago
elb
▴
260
Hi guys, suppose you are analysing acetylation Chip-Seq data. You have a list of peaks (with associated reads) in some genomic regions. Then you annotate the peaks to the nearest gene. At this point you have many peaks to the same gene name. How do you collapse this information to a single one in terms of reads? Do you take the sum of the reads for all the peaks of that gene? Do you take the mean? The median? It is a crude analysis just to have one single number of reads for that gene in order to perform differential "expression" like for RNA-Seq. Thank you in advance
If I may ask, what's the next step in your analysis? Because, I don't see any problem if more than one peaks is annotated to single gene, especially for acetylation peaks. You might want to check literature on super-enhancers to collapse close by peaks into one single region.
Having said that, if you have many samples, you can do a correlation analysis between peaks and gene (possibly with expression data) across all samples and narrow down peaks to single highly correlated peak (might have some false positives though).
First, thank you for help. I would like to perform a sort of differential expression analysis with the counts of the peaks and this is the reason why I have to have one single number of counts per gene.
Differential expression? Or differential enrichment of acetylation? If it's later, just do differential enrichment analysis with all peaks and annotate the differential peaks, that'd be a better option.
To be honest is the first that my boss asked me...I think the second makes more sense.. I mean I have to apply edgeR pipeline to acChip-seq