Adapter removal using cutadapt and multiqc report:
0
0
Entering edit mode
1 day ago

Hi, I found a little bit of discrepancy over the fastqc reports after trimming them with cutadapt,

(a) FASTQC report before cutadapt: I have downloaded some of the sequences from the paper (178 fruit fly samples) Illumina 2500, 100bp single end read sequencing whose samples have high adapter content within them.

enter image description here

(b) Since, I have no information about the adapter sequences: I used the Illumina TruSeq single index adapter AGATCGGAAGAGCACACGTCTGAACTCCAGTCA from the above link (this is because I found them to be one of the overrepresented sequences) and when I used cutadapt for it and after using it, the adapters were trimmed but there were still few less overrepresented sequence warning of some universal adapter. enter image description here

(c) I wanted to see whether if I used the first 16 bp of the above cutadapt -a AGATCGGAAGAGCACA, the result I got polyA and there was overrepresented sequence warning of some universal adapters enter image description here

(d) I did use the cutadapt -a GATCGGAAGAGCACA without the A in front from (c) (I got warning in for incomplete adapter adapter for 100 samples) but the adapter content was this very very less and there was no overrepresented sequence warning enter image description here

(e) Now I used the multiple adapter combination of (c) and (d) together, cutadapt -a AGATCGGAAGAGCAC -a GATCGGAAGAGCAC and got less adapter and there was no overrepresented sequence warning enter image description here

Question: Based in the inference, it is ideal to go for multiple adapter removal case (e) because the adapters used were cutadapt -a AGATCGGAAGAGCAC -a GATCGGAAGAGCAC without A and what about polyA adapters present? Should I remove them

Cutadapt adapter MultiQC illumina fastes • 163 views
ADD COMMENT
0
Entering edit mode

Hi ayeraselvan,

Yes. Based on your results and in general, removing multiple adapters and using the complete adapter is an ideal approach. Also, the polyA stretches detected in the scenario C are likely present. You should run the program with the --poly-a flag as well as the adapters. In Bash, the program could look like this and be executed in your current working directory:

for f in *.fastq.gz; do
    cutadapt -a AGATCGGAAGAGCAC -a GATCGGAAGAGCAC --poly-a -o trimmed-${f} ${f}
done

Hope this is helpful.

Maze

ADD REPLY
0
Entering edit mode

Thank you so much, Maze.

Is it ideal to trim the polyA, poly A are sequences of adenine across the ends of the reads. Is it neccessary to do so?

ADD REPLY
0
Entering edit mode

Hi ayeraselvan,

Generally yes for RNA seq analysis of mRNA transcripts. In polyadenylation, polyA tails are added posttranscriptionally to most eukaryotic mRNA molecules at the 3' end during RNA processing. They are relied upon by some single-cell GEX library kits (like 10x Genomic's 3' and 5' assays) for capture of mRNA transcripts in GEMs for future sequencing. However, they don't correspond to genomic loci on the genome and are homopolymeric as you mentioned. If there are many, polyA tails can complicate alignment of transcripts of genes to a reference and subsequent transcript counting.

Maze

ADD REPLY

Login before adding your answer.

Traffic: 1283 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6