Modify query names longer than 252
2
0
Entering edit mode
4.0 years ago

Hello

I'm trying to align a fastq file obtained from a concatenation of reads basecalled with guppy high accuracy in MinIT. I'm receiveing the following error message:

[WARNING] wrong FASTA/FASTQ record. Continue anyway.[M::worker_pipeline::2.165*0.99] mapped 67504 sequences

[E::sam_parse1] query name too long

[W::sam_read1] Parse error at line 67541

[main_samview] truncated file.

Alignment pipeline failed.

Here's the command i used:

mini_align -i Deerpox_2.fq.gz -r /home/lcc88/Desktop/References/Deerpox.fasta -p Deerpox_2

Apparently modifying the query names longer than 252 is the solution but i have no idea how to do it. Does anyone know how to do it? Or possibly another solution for this problem.

Thank you,

Leonardo

samtools fastq awk minimap2 • 2.3k views
ADD COMMENT
0
Entering edit mode

You have a solution to rename fastq headers below but keep in mind that you will lose original headers in case you need to correlate the sequences back to original headers.

Is mini_align an ONT supplied program? If so I am surprised that it is having this issue with standard ONT read names.

ADD REPLY
1
Entering edit mode
4.0 years ago
Mensur Dlakic ★ 28k

This solution uses seqtk, but you will probably get other suggestions as well.

seqtk rename Deerpox_2.fq.gz my_prefix_ | gzip -c - > Deerpox_2_renamed.fq.gz

In this case my_prefix_ can be whatever you want, but should be reasonably short. When all is done, use Deerpox_2_renamed.fq.gz as your query.

ADD COMMENT
1
Entering edit mode
4.0 years ago
5heikki 11k

How about

zcat yourFile.fq.gz | paste - - - - | awk 'BEGIN{FS="\t";OFS="\n"}{print "@"NR,$2,$3,$4>new.fq;print NR"\t"$1>map}'

Assuming there are no line breaks in the fastq seqs.. (I have never seen a fastq file with line breaks in seqs). Also, not tested and written on my phone from memory..

ADD COMMENT

Login before adding your answer.

Traffic: 2488 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6