Question

Low mapping rate

2

Entering edit mode

10.2 years ago

hlsz.laszlo ▴ 50

Dear all,

Recently, I obtained several ChIP-seq data from Saccharomyces Cerevisiae.

After the Illumina sequencing, each fastq contains around ~20 million 50 bp reads. I aligned the reads either with BWA MEM or Bowtie2 to the sacCer3 genome with a very low mapping rate (20% mapped, 80 % unmapped).

I can't figure it out what can cause the unmappability of the reads. Even the input DNA does not align to the genome (50%). I tried to switch genomes but I got always the same overall mapping rate.

What can possibly happened?

Kind Regards,
Laszlo

mapping ChIP-seq • 9.5k views

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 10.2 years ago by hlsz.laszlo ▴ 50

2

Entering edit mode

Hi Laszlo, did you try to take the unmapped reads and blast them? Look whether it's a high level of contamination or if they map to cerevisiae then you might have to tweak the parameters.

ADD REPLY • link updated 3.0 years ago by Ram 45k • written 10.2 years ago by marina.v.yurieva ▴ 580

1

Entering edit mode

Thank you for the answers.

Tha data is clean from TrueSeq adaptors. Firstly, I run fastqc to check the quality and everything was ok.

I used the default parameters of the aligners.

I tried to align reads to human, mouse or e.coli genome, but the alignment rate was under 1%.

I will try to blast the unmapped reads to find the source.

Thanks again for the answers. Ill update this thread with the blast results.

ADD REPLY • link updated 3.0 years ago by Ram 45k • written 10.2 years ago by hlsz.laszlo ▴ 50

0

Entering edit mode

May be you need to clean your data ?

ADD REPLY • link 10.2 years ago by GouthamAtla 12k

0

Entering edit mode

Try blasting a few of the unmapped reads. Perhaps you got the wrong samples back or your samples had a high level of contamination by another species.

ADD REPLY • link 10.2 years ago by Devon Ryan 105k

Ram · Answer 1 · 2015-02-26

Hi,

to me this looks like a classic mappability problem caused by mapping the reads to repetitive regions. For example if you are trying to map 30-mers to human genome then approx. 25% of the genome will be unmappable if only unique positions are mapped (check the bowtie parameters). What I usually do as one of the first steps is to create a mappability tract (GEM-mappability tool) for the reference species. Then map reads, followed by creating a track of mapped reads and uploading it to the one of the browsers (UCSC or ensembl). The two will give me the information about which regions are mappable and which ones are not and where the mapped reads align to.

Unfortunately UCSC does not contain the mappability info-track for S. cer. so you will need to make one yourself.

Cheers
mxs