Question

FastQC on nanopore data: high proportion of polyA and polyG. Why ?

0

Entering edit mode

12 months ago

Matt • 0

Dear all,

I have received my first ever nanopore sequences (miniON). They come from the genomic DNA of a vertebrate species. I have some experience analyzing Illumina sequences but none for ONT. I quickly did a quality control with nanoplot (N50 close to 15kb) and fastQC. In the fastQC report there appears to be no overrepresented sequences. However in the 'Adapter content' section I was surprised to see a high level of polyA in the form of a quickly reached plateau around 26% and polyG (quickly reached plateau around 3%): fastQC_ONT_polyA

I don't know how to interpret that. Here are the possibilities I am thinking of:

this is just an artifact from fastQC which is not tailored to work with ONT data.
Could it mean a contamination of the DNA with RNA ?
Could it mean that there are indeed adapters that need to be removed ?

Did some of you already observe that and/or have an explanation ?

Many thanks !

Matt

fastQC polyG ONT nanopore polyA • 1.5k views

ADD COMMENT • link updated 1 day ago by lieven.sterck 15k • written 12 months ago by Matt • 0

1

Entering edit mode

FastQC isn't suitable for Nanopore data. IF and which adapter trimming is necessary at all depends on the basecalling software. With dorado, I think they have been likely removed. You could check with porechop for the presence of adapters. If the polyA concerns you, you could try to extract some of those sequences and blast them against NT (not the A's that is). Possibly you have picked up something completely unrelated.

ADD REPLY • link 12 months ago by Michael 55k

0

Entering edit mode

I prefer fastp for nanopore data, as it produces much better and more appropriately scaled graphics. It's also quick and filters and reports appropriately, esp the % Q20 and % Q30 is useful.

https://github.com/OpenGene/fastp

ADD REPLY • link 12 months ago by colindaven 7.3k

0

Entering edit mode

they recently released a version specific for long reads: fastplong

ADD REPLY • link 1 day ago by lieven.sterck 15k

0

Entering edit mode

If you have access to the sequencing summary file from the run then using PycoQC is useful: https://github.com/a-slide/pycoQC It can account for barcodes samples etc.

ADD REPLY • link 12 months ago by GenoMax 150k

0

Entering edit mode

@Matt -- Hi, did you figure out about the cause of poly-A spike in your reads? I have similar problem in genomic HiFi reads.

ADD REPLY • link 2 days ago by vkaz • 0

0

Entering edit mode

Original question was about nanopore data. Are you referring to PacBio HiFi data?

ADD REPLY • link 1 day ago by GenoMax 150k

0

Entering edit mode

Yes, I am referring to PacBio HiFi data (even though original question was about nanopore data)

ADD REPLY • link 1 day ago by vkaz • 0

0

Entering edit mode

Have you considered contacting PacBio support? If they have any specific explanations please come back to this thread to add them here.

ADD REPLY • link 1 day ago by GenoMax 150k