Question

Fastx_toolkit with nanopore data

1

Entering edit mode

7.9 years ago

glgowers ▴ 10

Hi, I am trying to convert a fastq file (containing multiple sequences) to a fasta file using fastx_toolkit (Hannon lab). However I get this error message:

fastq_to_fasta: Error: invalid quality score data on line 68 (quality_tok = "ATATGCGTGCCATTG...etc)

I notice on another thread that you can put -Q33 to tell it you are using Illumina quality scores. Does anyone know if there is an equivalent flag to tell it I am using nanopore data?

If not can anyone recommend another way to convert these files?

Thank you!

nanopore fastx fastq fasta • 4.6k views

ADD COMMENT • link updated 7.9 years ago by Botond Sipos ★ 1.7k • written 7.9 years ago by glgowers ▴ 10

2

Entering edit mode

I think that this package could to what you want:

https://poretools.readthedocs.io/en/latest/index.html

It has a function called poretools fasta

ADD REPLY • link 7.9 years ago by IP ▴ 770

0

Entering edit mode

You could try Q33 with Nanopore data which I am sure uses sanger fastq format.

You could also use reformat.sh in=your.fastq out=your.fasta from BBMap suite to achieve the same result.

Edit: If you are using FAST5 format files as input then use poretools as suggested by Iñigo Prada .

ADD REPLY • link 7.9 years ago by GenoMax 148k

0

Entering edit mode

You might have to add qin=33 to the reformat.sh making it reformat.sh qin=33 in=your.fastq out=your.fasta otherwise you might get Warning! Changed from ASCII-33 to ASCII-64 on input ;: 59 -> 28.

ADD REPLY • link 5.5 years ago by opplatek ▴ 300

score 1 · Accepted Answer · 2017-01-27

1

Entering edit mode

7.9 years ago

Botond Sipos ★ 1.7k

You can easily do that conversion using biopython:

from Bio import SeqIO 
count = SeqIO.convert("input.fastq", "fastq", "output.fasta", "fasta")
print("Converted %i records" % count)

ADD COMMENT • link 7.9 years ago by Botond Sipos ★ 1.7k