ChIP-Seq contaminated by human tissue?
2
1
Entering edit mode
9.5 years ago
Gary ▴ 480

Hi,

I have a mouse H3K27ac ChIP-Seq data with totally 23,316,540 reads after trimming. Its 85.75% (19,994,088) reads can be aligned on mouse mm9 genome using Bowtie2. After that, I align unaligned reads (3,322,452) onto human hg19 reference genome. Among them, 34.14% (1,134,255) reads can be aligned on human hg19. Are these reads contaminated from human tissues? Many thanks.

ChIP-Seq • 3.7k views
ADD COMMENT
1
Entering edit mode

Try aligning the 85% to human, you'll see a lot of homology between species. If it's nearly 34% I don't know.

ADD REPLY
1
Entering edit mode

It is not 34.14% of the total reads, it is 34.14% from the 14.25% unmapped reads, or 4.86%.

edit: ok, I see what you mean, if a similar percentage of the mapped reads will also map to the human genome.

ADD REPLY
2
Entering edit mode

Yes, then we would know if that's normal. I believe a lot of mammal genes are shared. Exonically you could see 34%, but that sounds too high for a chip-seq. I really have no idea. But you can test how much is shared by aligning your known-mouse reads to human. That gives kind of a background rate of sequence similarity.

ADD REPLY
1
Entering edit mode

You may want to run a few left over reads (that don't align to mouse) through blast.

If you feel they are truly contaminants then you could try BBSplit from BBMap package to separate them.

ADD REPLY
0
Entering edit mode

H3K27Ac marks active regions, It is quite possible that it could share some fractions with human.

There are many more ultra conserved regions between human and mouse, H3K27Ac is also marked over gene bodies, so theoritically it should be over orthologous active genes.

you can run fastqc on raw files to check quality and over represented sequences.

but I think its fine

ADD REPLY
1
Entering edit mode
9.5 years ago

since the samples were probably handled and processed by humans, I wouldn't be too surprised about human DNA contamination. It's a recurring issue: see the NY Times (http://www.nytimes.com/2011/02/17/science/17genome.html) or this more recent publication: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0110808

I'd be more worried about those reads that can be mapped to mouse and human genome alike in that case, as those have the potential to bias your results. I don't have much experience with it, but there are a couple of tools that you can use to determine human contamination, just google it.

ADD COMMENT
0
Entering edit mode
9.5 years ago
Gary ▴ 480

Hi,

Thanks for all your valuable suggestion. Although 1,134,255 reads only occupy 4.86% of total reads (1,134,255 / 23,316,540) for this mouse H3K27ac ChIP-Seq sample, I still worry about the contamination issue very much. It is because that (1) we have known that a mouse H3K27me3 ChIP-Seq sample performed by the same labmate was contaminated by yeast. For totally 22,889,979 reads, only 29.46% (6,741,623) reads can be aligned on the mouse mm9 reference genome, and 31.66% (7,246,204) reads can be aligned on the yeast sacCer3 genome; (2) For another mouse H3K27me3 ChIP-Seq sample also performed by the same labmate, only 0.23% (40,699 / 17,733,214) unaligned reads can be aligned onto human hg19 genome. It means that this sample could be not contaminated by human tissue, and the first H3K27ac ChIP-Seq sample I reported could be, at least partially, contaminated by human tissue. Any additional suggestion is very welcome and thanks again.

ADD COMMENT
1
Entering edit mode

First, you should not add your comment as an answer.

Regarding your problems, instead of mapping only the unmapped reads to potential contaminants, map (a random subsample of) all your reads, as one of the initial quality-checking steps. You probably run FastQC on all your sequencing runs, add to this workflow MGA or FastQ_Screen - include common contaminants on their databases. If contamination is common, use some method to filter it, e.g. BBSplit as genomax2 suggested.

ADD REPLY

Login before adding your answer.

Traffic: 1950 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6