Question

Modify query names longer than 252

0

Entering edit mode

4.0 years ago

leonardo.caserta ▴ 10

Hello

I'm trying to align a fastq file obtained from a concatenation of reads basecalled with guppy high accuracy in MinIT. I'm receiveing the following error message:

[WARNING] wrong FASTA/FASTQ record. Continue anyway.[M::worker_pipeline::2.165*0.99] mapped 67504 sequences

[E::sam_parse1] query name too long

[W::sam_read1] Parse error at line 67541

[main_samview] truncated file.

Alignment pipeline failed.

Here's the command i used:

mini_align -i Deerpox_2.fq.gz -r /home/lcc88/Desktop/References/Deerpox.fasta -p Deerpox_2

Apparently modifying the query names longer than 252 is the solution but i have no idea how to do it. Does anyone know how to do it? Or possibly another solution for this problem.

Thank you,

Leonardo

samtools fastq awk minimap2 • 2.3k views

ADD COMMENT • link updated 4.0 years ago by 5heikki 11k • written 4.0 years ago by leonardo.caserta ▴ 10

0

Entering edit mode

You have a solution to rename fastq headers below but keep in mind that you will lose original headers in case you need to correlate the sequences back to original headers.

Is mini_align an ONT supplied program? If so I am surprised that it is having this issue with standard ONT read names.

ADD REPLY • link 4.0 years ago by GenoMax 148k

score 1 · Answer 1 · 2020-12-22

1

Entering edit mode

4.0 years ago

Mensur Dlakic ★ 28k

This solution uses seqtk, but you will probably get other suggestions as well.

seqtk rename Deerpox_2.fq.gz my_prefix_ | gzip -c - > Deerpox_2_renamed.fq.gz

In this case my_prefix_ can be whatever you want, but should be reasonably short. When all is done, use Deerpox_2_renamed.fq.gz as your query.

ADD COMMENT • link 4.0 years ago by Mensur Dlakic ★ 28k

score 1 · Answer 2 · 2020-12-22

1

Entering edit mode

4.0 years ago

5heikki 11k

How about

zcat yourFile.fq.gz | paste - - - - | awk 'BEGIN{FS="\t";OFS="\n"}{print "@"NR,$2,$3,$4>new.fq;print NR"\t"$1>map}'

Assuming there are no line breaks in the fastq seqs.. (I have never seen a fastq file with line breaks in seqs). Also, not tested and written on my phone from memory..

ADD COMMENT • link 4.0 years ago by 5heikki 11k