Hi,
I have a mouse H3K27ac ChIP-Seq data with totally 23,316,540 reads after trimming. Its 85.75% (19,994,088) reads can be aligned on mouse mm9 genome using Bowtie2. After that, I align unaligned reads (3,322,452) onto human hg19 reference genome. Among them, 34.14% (1,134,255) reads can be aligned on human hg19. Are these reads contaminated from human tissues? Many thanks.
Try aligning the 85% to human, you'll see a lot of homology between species. If it's nearly 34% I don't know.
It is not 34.14% of the total reads, it is 34.14% from the 14.25% unmapped reads, or 4.86%.
edit: ok, I see what you mean, if a similar percentage of the mapped reads will also map to the human genome.
Yes, then we would know if that's normal. I believe a lot of mammal genes are shared. Exonically you could see 34%, but that sounds too high for a chip-seq. I really have no idea. But you can test how much is shared by aligning your known-mouse reads to human. That gives kind of a background rate of sequence similarity.
You may want to run a few left over reads (that don't align to mouse) through blast.
If you feel they are truly contaminants then you could try BBSplit from BBMap package to separate them.
H3K27Ac marks active regions, It is quite possible that it could share some fractions with human.
There are many more ultra conserved regions between human and mouse, H3K27Ac is also marked over gene bodies, so theoritically it should be over orthologous active genes.
you can run fastqc on raw files to check quality and over represented sequences.
but I think its fine