I recently performed a p300 ChIP-seq on two different genotypes.
I aligned my reads using STAR aligner, and made UCSC .bedGraph files that I loaded on IGV viewer. I noticed that the reads for one of my genotypes - genotypeX (red - combined replicates; purple and pink - individual replicates) look very blocky. The reads for the other genotype - genotype Y (black - combined replicates; green, blue - individual replicates) look better!?
Hnrnpa0 is a known target of p300 which is why I chose it as an example.
What could be the reason why the reads look 'blocky'?
How can I troubleshoot this?
Could it be due to duplications or just very bad ChIP?
I attempted to run samtools markdup. I had the following line of commands set up for one my genotype replicates:
Convert my sam files to bam format:
samtools view -b genotypeX_R1.sam > genotypeX_R1.bam
Then I followed the samtools markdup steps as follows:
samtools sort genotypeX_R1.bam -o sorted_genotypeX_R1.bam
samtools index sorted_genotypeX_R1.bam
samtools markdup -r sorted_genotypeX_R1.bam marked_duplicates_genotypeX_R1.bam
samtools flagstat marked_duplicates_genotypeX_R1.bam
Are these the right set of commands to use to mark up duplicates in my files?
I was introduced to bioinformatics recently and am still struggling to figure out/trouble shoot errors that come up in my analysis workflow. 80% of the time I have no idea how to approach or what I am doing to solve any issues with my data. :(
Thank you so much for your time and help! Appreciate it!
Yes the ChIP is from my data! Would you say the data for genotype Y is okay?
No, from this screenshot it all looks like pure noise.