Recently, I obtained several ChIP-seq data from Saccharomyces Cerevisiae.
After the Illumina sequencing, each fastq contains around ~20 million 50 bp reads. I aligned the reads either with BWA MEM or Bowtie2 to the sacCer3 genome with a very low mapping rate (20% mapped, 80 % unmapped).
I can't figure it out what can cause the unmappability of the reads. Even the input DNA does not align to the genome (50%). I tried to switch genomes but I got always the same overall mapping rate.
Hi Laszlo, did you try to take the unmapped reads and blast them? Look whether it's a high level of contamination or if they map to cerevisiae then you might have to tweak the parameters.
to me this looks like a classic mappability problem caused by mapping the reads to repetitive regions. For example if you are trying to map 30-mers to human genome then approx. 25% of the genome will be unmappable if only unique positions are mapped (check the bowtie parameters). What I usually do as one of the first steps is to create a mappability tract (GEM-mappability tool) for the reference species. Then map reads, followed by creating a track of mapped reads and uploading it to the one of the browsers (UCSC or ensembl). The two will give me the information about which regions are mappable and which ones are not and where the mapped reads align to.
Unfortunately UCSC does not contain the mappability info-track for S. cer. so you will need to make one yourself.
Cheers
mxs
ADD COMMENT
• link
updated 2.6 years ago by
Ram
44k
•
written 9.8 years ago by
mxs
▴
530
1
Entering edit mode
Cerevisiae doesn't have that many repetitive regions. Even if you do mapping ignoring them it would be 80% mapped 20% unmapped, not the other way around.
Hi Laszlo, did you try to take the unmapped reads and blast them? Look whether it's a high level of contamination or if they map to cerevisiae then you might have to tweak the parameters.
Thank you for the answers.
Tha data is clean from TrueSeq adaptors. Firstly, I run fastqc to check the quality and everything was ok.
I used the default parameters of the aligners.
I tried to align reads to human, mouse or e.coli genome, but the alignment rate was under 1%.
I will try to blast the unmapped reads to find the source.
Thanks again for the answers. Ill update this thread with the blast results.
May be you need to clean your data ?
Try blasting a few of the unmapped reads. Perhaps you got the wrong samples back or your samples had a high level of contamination by another species.