Question

Adaptor trimming issue

2

Entering edit mode

7.8 years ago

1769mkc ★ 1.3k

I am doing adaptor trimming ,its illumina universal adaptor , I using cutadapt to trim the adaptor sequence.

Alignment without adaptor trimming

Left reads:
          Input     :  46627933
           Mapped   :  29928631 (64.2% of input)
            of these:  11992814 (40.1%) have multiple alignments (8801 have >20)
Right reads:
          Input     :  46627933
           Mapped   :  29469536 (63.2% of input)
            of these:  11724006 (39.8%) have multiple alignments (8688 have >20)
63.7% overall read mapping rate.

Aligned pairs:  28562130
     of these:  11404825 (39.9%) have multiple alignments
                  155607 ( 0.5%) are discordant alignments
60.9% concordant pair alignment rate.

Alignment after adaptor trimming

Left reads:
          Input     :  46624601
           Mapped   :  44668679 (95.8% of input)
            of these:  29803908 (66.7%) have multiple alignments (56945 have >20)
Right reads:
          Input     :  46624601
           Mapped   :  43936907 (94.2% of input)
            of these:  29470503 (67.1%) have multiple alignments (56649 have >20)
Unpaired reads:
          Input     :       226
           Mapped   :       181 (80.1% of input)
            of these:        95 (52.5%) have multiple alignments (0 have >20)
95.0% overall read mapping rate.

Aligned pairs:  42110389
     of these:  28673148 (68.1%) have multiple alignments
                34996114 (83.1%) are discordant alignments
15.3% concordant pair alignment rate.

Cutadapt command that used , reference

cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -o HL60_trimmed1.fastq -p HL60_trimmed2.fastq FRED_6_150224_BC6BK7ANXX_P1881_1001_1_123bp.fastq FRED_6_150224_BC6BK7ANXX_P1881_1001_2_123bp.fastq

I m not able to get it how come after trimming of adaptor the concordant rate goes down ?

Any suggestion or help would be highly appreciated .

alignment • 2.8k views

ADD COMMENT • link updated 7.8 years ago by Brian Bushnell 20k • written 7.8 years ago by 1769mkc ★ 1.3k

0

Entering edit mode

You have a large amount of multi-mappers. What kind of dataset is this?

ADD REPLY • link 7.8 years ago by GenoMax 152k

0

Entering edit mode

its a HL60 data set .

ADD REPLY • link 7.8 years ago by 1769mkc ★ 1.3k

0

Entering edit mode

and RNAseq? if so, did you enrich your RNA samples?

ADD REPLY • link 7.8 years ago by cpad0112 21k

0

Entering edit mode

im not sure about enriching RNA sample , could you explain it ?

ADD REPLY • link 7.8 years ago by 1769mkc ★ 1.3k

0

Entering edit mode

One of the reasons of having multi-mappers in your dataset is presence of rRNA in reads. I think @cpad0112 is asking if you know if these were removed by ribo-depletion or some mechanism enriching transcripts that are of actual interest.

ADD REPLY • link 7.8 years ago by GenoMax 152k

0

Entering edit mode

okay my fastqc results only shows...illumina adaptors ,rest all i dont see anything .

But for this "presence of rRNA in reads" i am not sure if that is the case, but i would like to know how to check that is there a way ?

ADD REPLY • link 7.8 years ago by 1769mkc ★ 1.3k

1

Entering edit mode

See rRNA detection (for contamination) in RNA-seq and threads linked from it.

ADD REPLY • link 7.8 years ago by GenoMax 152k

0

Entering edit mode

okay i will look into it , but do you think that is the only issue which is lead to low discordant pair ?

ADD REPLY • link 7.8 years ago by 1769mkc ★ 1.3k

0

Entering edit mode

It is one of the possibilities. I am not sure what aligner you are using but if it needs you to provide insert size as one of the parameters are you providing a number that reflects actual distribution in your data?

ADD REPLY • link 7.8 years ago by GenoMax 152k

0

Entering edit mode

i used tophat2 as my aligner

ADD REPLY • link 7.8 years ago by 1769mkc ★ 1.3k

0

Entering edit mode

You are processing PE data, you can use AfterQC (https://github.com/OpenGene/AfterQC) to cut adapters without the need of giving the adapter sequences.

Just run:

python AfterQC/after.py -1 read1.fq -2 read2.fq

ADD REPLY • link 7.8 years ago by chen ★ 2.5k

1

Entering edit mode

Moving to a comment since this is not addressing OP's question of why % concordant alignment is decreasing after trimming of adapters.

ADD REPLY • link 7.8 years ago by GenoMax 152k

score 7 · Accepted Answer · 2017-10-09

Normally when the concordant rate decreases dramatically after adapter-trimming it indicates that pairing was broken, which can happen if the files are trimmed independently and some reads were discarded. You could try running BBMap's reformat.sh like this:

reformat.sh in1=trimmed1.fq in2=trimmed2.fq vpair

...to verify that, according to the read names, the reads are still properly paired after trimming. I don't see anything wrong with your trimming command, though.

In this case, I think the problem might be Tophat2/Bowtie2 calculating concordant pairs incorrectly. Before trimming, reads with adapters (which fully overlap) probably just did not map at all. After trimming, the new reads that map would map 100% overlapping; and indeed a lot of the reads that mapped previously with a few mismatches at the end (and not fully overlapping because of the adapter overhang) would also now map fully overlapping. Perhaps your version of Tophat2/Bowtie2 does not consider that concordant. I suggest you try a different aligner such as BBMap or Star and see what it reports for the concordance rate.