Question

Problem with N sequences in fastqc file

0

Entering edit mode

5.6 years ago

carina2817 ▴ 20

Hello,

I am trying to filter a fastq file, I ran fastqc to get a quality report and I get an overrepresented sequence:

sequence: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN count: 39317 percentage: 0.13862182817994162

The fastq file has 28362777 sequences and the read length is 125.

I used cutadapt (fastx toolkit) to remove it:

gunzip -c SRR9667734_S_sp_2.fastq.gz |  cutadapt -m 20 -e 0.1 -z -a NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN - -o SRR9667734_S_sp_cutadapt_2.fastq.gz

but the resulting file still has those overrepresented sequences and the number of sequences in the fastq file was reduced to 68122 after running cutadapt.

Overrepresented sequences:

sequence: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN count: 39317 percentage: 57.71556912597986

sequence: ANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN count: 1172 percentage: 1.7204427350929214

sequence: GNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN count: 1014 percentage: 1.488505915856845

sequence: CNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN count: 895 percentage: 1.3138193241537244

sequence: TNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN count: 864 percentage: 1.268312733037785

Any idea of what's happening?

fastqc fastq • 2.0k views

ADD COMMENT • link updated 5.6 years ago by Jianyu ▴ 580 • written 5.6 years ago by carina2817 ▴ 20

0

Entering edit mode

Not answering your question but you can try bbduk.sh from BBMap suite with maxns=-1 If non-negative, reads with more Ns than this (after trimming) will be discarded option to remove reads with N's.

ADD REPLY • link 5.6 years ago by GenoMax 151k

0

Entering edit mode

For starters, maybe put the -o option before the input. And I'm pretty sure cutadapt can handle gzipped files, so no need to decompress.

ADD REPLY • link 5.6 years ago by swbarnes2 14k

score 0 · Answer 1 · 2019-10-16

0

Entering edit mode

5.6 years ago

Jianyu ▴ 580

See the documentation about wildcard interpretation in cutadapt: https://cutadapt.readthedocs.io/en/stable/guide.html#wildcards

The right way to remove N in fastq: https://cutadapt.readthedocs.io/en/stable/guide.html#dealing-with-n-bases

ADD COMMENT • link 5.6 years ago by Jianyu ▴ 580