FastQC on nanopore data: high proportion of polyA and polyG. Why ?
0
0
Entering edit mode
12 months ago
Matt • 0

Dear all,

I have received my first ever nanopore sequences (miniON). They come from the genomic DNA of a vertebrate species. I have some experience analyzing Illumina sequences but none for ONT. I quickly did a quality control with nanoplot (N50 close to 15kb) and fastQC. In the fastQC report there appears to be no overrepresented sequences. However in the 'Adapter content' section I was surprised to see a high level of polyA in the form of a quickly reached plateau around 26% and polyG (quickly reached plateau around 3%): fastQC_ONT_polyA

I don't know how to interpret that. Here are the possibilities I am thinking of:

  1. this is just an artifact from fastQC which is not tailored to work with ONT data.
  2. Could it mean a contamination of the DNA with RNA ?
  3. Could it mean that there are indeed adapters that need to be removed ?

Did some of you already observe that and/or have an explanation ?

Many thanks !

Matt

fastQC polyG ONT nanopore polyA • 1.5k views
ADD COMMENT
1
Entering edit mode

FastQC isn't suitable for Nanopore data. IF and which adapter trimming is necessary at all depends on the basecalling software. With dorado, I think they have been likely removed. You could check with porechop for the presence of adapters. If the polyA concerns you, you could try to extract some of those sequences and blast them against NT (not the A's that is). Possibly you have picked up something completely unrelated.

ADD REPLY
0
Entering edit mode

I prefer fastp for nanopore data, as it produces much better and more appropriately scaled graphics. It's also quick and filters and reports appropriately, esp the % Q20 and % Q30 is useful.

https://github.com/OpenGene/fastp

ADD REPLY
0
Entering edit mode

they recently released a version specific for long reads: fastplong

ADD REPLY
0
Entering edit mode

If you have access to the sequencing summary file from the run then using PycoQC is useful: https://github.com/a-slide/pycoQC It can account for barcodes samples etc.

ADD REPLY
0
Entering edit mode

@Matt -- Hi, did you figure out about the cause of poly-A spike in your reads? I have similar problem in genomic HiFi reads.

ADD REPLY
0
Entering edit mode

Original question was about nanopore data. Are you referring to PacBio HiFi data?

ADD REPLY
0
Entering edit mode

Yes, I am referring to PacBio HiFi data (even though original question was about nanopore data)

ADD REPLY
0
Entering edit mode

Have you considered contacting PacBio support? If they have any specific explanations please come back to this thread to add them here.

ADD REPLY

Login before adding your answer.

Traffic: 1762 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6