How to check if all read names are paired properly in paired end fastq.gz files?
1
0
Entering edit mode
5 months ago
DNAngel ▴ 250

Hi all,

I was wondering if there is a command that can be used to check if all the sequence names in the R1 and R2 fastq.gz files are paired and identical?

I've got some files causing issues where the error during bwa-mem says there are mismatched names. Which is strange because I've used this files months ago without any problems using the same program (only change is I've downloaded them onto a new cluster).

I would like to check if all my files have this issue but I don't see any commands in samtools that can check for this problem. If there is another program that can do this, please advice.

samtools illumina paired-end • 709 views
ADD COMMENT
1
Entering edit mode

If they worked before and now, after moving them, they are not working, I would be concerned there's a deeper issue. Is the file corrupted/incomplete?

ADD REPLY
0
Entering edit mode

If they're not paired, seqkit pair could help to match up paired-end reads from two fastq files.

ADD REPLY
0
Entering edit mode

repair.sh from BBMap suite will also re-pair the files.

ADD REPLY
2
Entering edit mode
5 months ago
LChart 4.7k

Well `zcat $fq | awk 'NR %4 == 1' gives you the read names. So you could do something like:

mkfifo r1_names r2_names

zcat read1.fq.gz | awk 'NR % 4 == 1' > r1_names &
zcat read2.fq.gz | awk 'NR % 4 == 1' > r2_names &

paste r1_names r2_names | awk -F "\t" '$1 != $2'

which will dump all of the mismatching read names to the terminal.

ADD COMMENT
0
Entering edit mode

which will dump all of the mismatching read names to the terminal.

Clever solution but if the data is badly mismatched tons of stuff would be sent to terminal :-)

ADD REPLY
1
Entering edit mode

That's what | head is for :p

ADD REPLY

Login before adding your answer.

Traffic: 1662 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6