Hi, I was trying to map the sequence generated after immunoprecipitation of 5hmC using bowtie. for input DNA (genomic DNA without IP) , I had relatively high alignment rate:
6835542 reads; of these:
6835542 (100.00%) were unpaired; of these:
294415 (4.31%) aligned 0 times
5329518 (77.97%) aligned exactly 1 time
1211609 (17.73%) aligned >1 times
while for the samples after IP, the mapping seems to be low:
6622988 reads; of these:
6622988 (100.00%) were unpaired; of these:
1734630 (26.19%) aligned 0 times
2223155 (33.57%) aligned exactly 1 time
2665203 (40.24%) aligned >1 times
73.81% overall alignment rate
another sample has almost more or less the same mapping rate. Is this result is acceptable or kind of alarming? Comments or suggestions appreciated.
@igor. Hi, igor. Thank you for your kind comments and suggestions. Yes, I can still call peaks, see the attached image. I was wondering whether you have any idea about a related issue. Since there is about a quarter of the reads could not be aligned (align 0 times), suggesting that they are not of mouse origin,but contaminant (?). I wondered where they came from. I tried to map the sample after IP against human, E.coli, phage indexes (the most common species we are dealing with), but there were no significant proportion of reads mapped to those species. alignment result against human index:
alignment result against E. coli:
Alignment against phage:
Do you have any idea about that? Tsk!
Hard to say what they are. My first guess would be adapter dimers. Try BLASTing a few of the sequences. If there is an obvious contaminant and it's a quarter of the reads, you should find it fairly quickly.
@igor. I collected the unmapped reads using bowtie2 option " --u",
and blasted some of them:
AlignmentsDownloadGenBankGraphicsDistance tree of resultsShow/hide columns of the table presenting sequences producing significant alignments Sequences producing significant alignments: Select for downloading or viewing reports Description Max score Total score Query cover E value Ident Accession Select seq gb|AC157543.8| Mus musculus chromosome 1, clone RP23-271O17, complete sequence 283 283 76% 9e-73 97% AC157543.8 Select seq gb|AC115853.8| Mus musculus chromosome 5, clone RP24-273B9, complete sequence 110 110 31% 2e-20 96% AC115853.8
Select seq gb|AC184151.2| Mus musculus BAC clone RP24-91H6 from chromosome y, complete sequence 169 169 65% 2e-38 99% AC184151.2 ...
Basically, it seems that the all the "unmapped" reads produced significant alignments against mouse, even though with modest "Query coverage".