Question

fastqc adapter content

0

Entering edit mode

6.8 years ago

prasundutta87 ▴ 670

Hi,

I made a multiqc plot collating many fastqc files from paired end WGS reads. I got a warning in adapter content step, but there was nothing present in overrepresented sequences bit. What could be the issue? How do I interpret this? The read length is 250 and no trimming has been done.

I am unable to upload the image somehow. Here is the link: https://ibb.co/hOvRac

sequencing fastqc quality control • 13k views

ADD COMMENT • link updated 6.8 years ago by GenoMax 147k • written 6.8 years ago by prasundutta87 ▴ 670

2

Entering edit mode

The overrepresentation in FastQC is computed on the reads' first 50 bp. Your adapter sequences are popping up further downstream.

ADD REPLY • link 6.8 years ago by michael.ante ★ 3.9k

score 1 · Answer 1 · 2018-02-06

1

Entering edit mode

6.8 years ago

GenoMax 147k

You have adapter contamination in your data. You should scan/trim the reads. Many popular options. (bbduk.sh from BBMap suite, trimmomatic, cutadapt). Take your pick.

ADD COMMENT • link 6.8 years ago by GenoMax 147k

0

Entering edit mode

Thanks genomax. I scanned the reads and got adapters. My sequencing facility did not remove adapters.Since my main goal is calling variants from the reads, the adapter bases will anyway have lower base qualities and while mapping to the genome also may be soft clipped (using BWA mem here). Furthermore, GATK (variant caller of my choice) in itself handles bad base and mapping qualities by itself (through its default paramters, not getting into details here).

It has also been mentioned in their website (various places) not to trim the reads at all. I personally haven't done any benchmark at my end, but theres a trend of not trimming reads at all. How far is this advisable in my case here?

ADD REPLY • link 6.8 years ago by prasundutta87 ▴ 670

1

Entering edit mode

I cannot agree on your statements. Adapters do not necessarily have worse base qualities than other nucleotides. The sequencer does not know what is the intended DNA and what is adapter. Especially in variant calling, proper trimming essential so that no false-positive variants are introduced. Adapter content may also interfere with proper alignment. I do not know the documentation of GATK in detail, but no trimming at all is not a option imho. If you do not want to trim adapters, at least remove poor quality bases at the 3' prior to alignment. I still recommend trimming adapters as you introduce nucleotides to your sequence that are simply not in the genome.

ADD REPLY • link 6.8 years ago by ATpoint 85k

1

Entering edit mode

I concur with @ATPoint. Perhaps I am old fashioned but I like to make sure that there is no extraneous/unwanted sequence in data I am analyzing.

ADD REPLY • link 6.8 years ago by GenoMax 147k

0

Entering edit mode

Thanks..I can understand the points..On just grepping my data with Truseq adapter data found in https://support.illumina.com/bulletins/2016/12/what-sequences-do-i-use-for-adapter-trimming.html , I also found adapters in the middle of the reads..can it they be just artifacts? Because, from my multiqc output as well, the line starts moving up from the middle of the read (total length is 250)..

ADD REPLY • link 6.8 years ago by prasundutta87 ▴ 670

1

Entering edit mode

Thanks..I can understand the points..On just grepping my data with Truseq adapter data found in https://support.illumina.com/bulletins/2016/12/what-sequences-do-i-use-for-adapter-trimming.html , I also found adapters in the middle of the reads

Can easily happen. Just because it's Illumina does not imply that everything that they develop is perfect. In fact, their products, generally, are of lesser quality than other vendors because they became too confident over time in their dominance of the market. It's quite surprising, in fact, that the MiSeq is approved for clinical use when it produces as much error as it does quality data.

ADD REPLY • link 6.8 years ago by Kevin Blighe 88k

1

Entering edit mode

Sequencing can never be perfect, at least looking into the current scenario of the advent of long reads...it's also difficult to draw lines between what is right and what is an artifact or error..these are evolving technologies..and marketing plays a key role in making things even more difficult..

ADD REPLY • link 6.8 years ago by prasundutta87 ▴ 670

1

Entering edit mode

prasundutta87 : Where you have short inserts, you will start reading into adapters once you run out of the insert sequence. You may also have primer dimers which will have no insert. Don't trust grep for looking at this. Use a proper scan/trim program. There is a core sequence that is common for all TruSeq adapters. Once you find that in your sequence everything to the right is generally trimmed. If you have paired-end data be sure to trim the two files together to prevent reads from getting out of sync.