Nanopore Data Quality Check
0
1
Entering edit mode
3 months ago
Umer ▴ 130

Hello.

I have received my first ever nanopore sequences. They come from the genomic DNA of a Fungal species. I have experience analyzing Illumina sequences but none for ONT.

Background Information:

  • Organism: Fusarium
  • Target: Genome Assembly
  • I have Illumina data too for the same samples.

We recieved data in following files

  • fast5.pass.tar
  • fast5.fail.tar
  • fastq.pass.tar
  • fastq.fail.tar

What I did:

  • Used only fastq.pass data for downstram analysis by merging all fastq files in one FastQ file for each sample.
  • Ran NANOSTAT on raw fastq
  • Ran PORECHOP on raw fastq
  • Ran NANOSTAT on porechop output

and I got the following results

Nanostat results on raw and porechop processed fastq

When I run fastqc on my RAW nanopore data, It shows that I have adapter content which is polyA and polyG (image attached).

fastQC adapters

Even after running PORECHOP these polyA and polyG were still showing up in fastqc report. I see some over represented sequences in in Raw data fastqc report but after running porechop there are non.

MultiQC report is as below.

MultiQC report Adapter Content

Assembly via Flye 2.9.4-b1799 results for same sample using 3 iteration before and after running QC gives following results. Flye Assembly

QUESTIONS: keeping the target "Genome Assembly and annotation" in mind

  1. Is it necessary to run poreChop on the raw data again? The report from sequencing company says that they removed the adapters and did basic QC on the data.
  2. is it important to remove these polyA and polyQ adapters ? will it effect the assembly ?
  3. If YES for question.2, then which tool can do both ? + Should I run this tool on data already processed by porechop ?
  4. Based on assembly stats, N50 increased with negligible increase in number of contigs, so what is your openion on this, should I use porechop processed data for downstream analysis of just raw fastq files.
QC Fungi genome-assembly Nanopore • 510 views
ADD COMMENT
0
Entering edit mode

Use a proper program for nanopore data: https://github.com/a-slide/pycoQC

Do you actually see those poly=A/G in your data? Perhaps that is some sort of artifact because of FastQC?

ADD REPLY
0
Entering edit mode

I checked and there are log stretches of A and G in fastq files.

I did run pucoQC on the summary.txt file for the same data. html file Google Drive LinK

this doesnot show anything related to adapters.

ADD REPLY
1
Entering edit mode

It has been a while since I worked with fungal sequences but long stretches of poly-A/-G's nonetheless sound suspicious. Perhaps someone else will have an input.

If you want to remove poly-A/-G then bbduk.sh (or for that matter fastp) should be able to do this. Question is are they real though and should be left alone.

The report from sequencing company says that they removed the adapters and did basic QC on the data.

If that is the case then running any additional chopping is likely not warranted.

ADD REPLY
0
Entering edit mode

Having a few poly A and G in the reads is not such a problem. The real issue is if they are included in the assemblies ? Why not do a few kmer analyses or even blast or grep to find out if long stretches of your assemblies are problematic - I doubt it.

I don't do these checks on our nanopore assemblies and have never been confronted by errors on fungal or plant genomes.

That said - I think the nanopore tool QC ecosystem could definitely be improved more, especially considering adapters. It's worth noting there is a fork of porechop which was still being maintained when I last looked - https://github.com/bonsai-team/Porechop_ABI

ADD REPLY
0
Entering edit mode

I just did a quick CTRL+F search in one of the assembly.fasta file and found that

A are present upto 27 consecutive bases string as AAAAAAAAAAAAAAAA giving 3 results

G are present upto 23 consecutive bases string as GGGGGGGGGGGGGG giving 1 result and at 20 bases giving 3 results

will these be problematic? if yes, how should i remove them ?

ADD REPLY

Login before adding your answer.

Traffic: 1846 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6