I am doing adaptor trimming ,its illumina universal adaptor , I using cutadapt to trim the adaptor sequence.
Alignment without adaptor trimming
Left reads:
Input : 46627933
Mapped : 29928631 (64.2% of input)
of these: 11992814 (40.1%) have multiple alignments (8801 have >20)
Right reads:
Input : 46627933
Mapped : 29469536 (63.2% of input)
of these: 11724006 (39.8%) have multiple alignments (8688 have >20)
63.7% overall read mapping rate.
Aligned pairs: 28562130
of these: 11404825 (39.9%) have multiple alignments
155607 ( 0.5%) are discordant alignments
60.9% concordant pair alignment rate.
Alignment after adaptor trimming
Left reads:
Input : 46624601
Mapped : 44668679 (95.8% of input)
of these: 29803908 (66.7%) have multiple alignments (56945 have >20)
Right reads:
Input : 46624601
Mapped : 43936907 (94.2% of input)
of these: 29470503 (67.1%) have multiple alignments (56649 have >20)
Unpaired reads:
Input : 226
Mapped : 181 (80.1% of input)
of these: 95 (52.5%) have multiple alignments (0 have >20)
95.0% overall read mapping rate.
Aligned pairs: 42110389
of these: 28673148 (68.1%) have multiple alignments
34996114 (83.1%) are discordant alignments
15.3% concordant pair alignment rate.
Cutadapt command that used , reference
cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -o HL60_trimmed1.fastq -p HL60_trimmed2.fastq FRED_6_150224_BC6BK7ANXX_P1881_1001_1_123bp.fastq FRED_6_150224_BC6BK7ANXX_P1881_1001_2_123bp.fastq
I m not able to get it how come after trimming of adaptor the concordant rate goes down ?
Any suggestion or help would be highly appreciated .
You have a large amount of multi-mappers. What kind of dataset is this?
its a HL60 data set .
and RNAseq? if so, did you enrich your RNA samples?
im not sure about enriching RNA sample , could you explain it ?
One of the reasons of having multi-mappers in your dataset is presence of rRNA in reads. I think @cpad0112 is asking if you know if these were removed by ribo-depletion or some mechanism enriching transcripts that are of actual interest.
okay my fastqc results only shows...illumina adaptors ,rest all i dont see anything .
But for this
"presence of rRNA in reads"
i am not sure if that is the case, but i would like to know how to check that is there a way ?See rRNA detection (for contamination) in RNA-seq and threads linked from it.
okay i will look into it , but do you think that is the only issue which is lead to low discordant pair ?
It is one of the possibilities. I am not sure what aligner you are using but if it needs you to provide insert size as one of the parameters are you providing a number that reflects actual distribution in your data?
i used tophat2 as my aligner
You are processing PE data, you can use AfterQC (https://github.com/OpenGene/AfterQC) to cut adapters without the need of giving the adapter sequences.
Just run:
Moving to a comment since this is not addressing OP's question of why % concordant alignment is decreasing after trimming of adapters.