Hi all,
I'm a real beginner in the field of ChIP-seq analysis and bioinformatics in general (so please be patient) and wondered if you could help. I am trying to analyse my TFBS ChIP-seq data with Galaxy using the guidelines in the following paper:-
Bailey et al. "Practical guidelines for the comprehensive analysis of ChIP-seq Data", PLOS Computational Biology 2013
I have so far managed to do FastQC, trimming and grooming of my FASTQ data before aligning with Bowtie2. I now want to look at the quality metrics of my sequence reads specifically my library complexity. I can see from my BAM files that I have around 68% of uniquely mapped reads so maybe a little below ideal but I would like to proceed and look at the library complexity i.e., the number of genomic locations that my uniquely aligned reads map to.
Where/how can I find this information and generate scores/ratios for this? Is there a tool in Galaxy that will do this for me? I tried the "Estimate Library Complexity" function but I didn't seem to get anything useful back from it. There seems to be a "Collect RNA-seq Metrics" function that also didn't give me what I'm looking for.
Is there something I'm missing in my BAM file or a tool that is not available in Galaxy to do this for me? I have absolutely no experience with R and would take me a long time to get up and running with it so any non-R related solutions would be greatly appreciated!
For what it's worth, our group (we do a LOT of ChIPseq) doesn't bother calculating library complexity, though we do use CollectAlignmentSummaryMetrics from Picard (that's in Galaxy). You might find the deepTools suite (available in galaxy and we also have a dedicated public Galaxy server) quite useful. This is primarily intended for QC and normalization of ChIPseq data and can give you a nice graphical depiction of things like, "Did my IP work and, if so, what sort of signal/peaks can I expect?".
Sadly, if you were doing this a month or two from now I'd just point you to the Galaxy workflow we're putting together for ChIPseq, but sadly that isn't done yet.
HI Devon,
Thanks for the help that's great. For what it's worth, I'll still probably be trying to do this ChIP-seq analysis 2 months from now so it would be great if you could point me in the direction of your workflow when it's complete!! We have more samples being sequenced as we speak so I'll have more analyses to do.
I'll check out Picard and the deepTools suite as you recommend. Thank you!!!