Question

BWA paired reads have different names error

0

Entering edit mode

10.6 years ago

crysis405 ▴ 30

[bwa_sai2sam_pe_core] convert to sequence coordinate...
[infer_isize] (25, 50, 75) percentile: (144, 213, 328)
[infer_isize] low and high boundaries: 100 and 696 for estimating avg and std
[infer_isize] inferred external isize from 111344 pairs: 247.404 +/- 131.705
[infer_isize] skewness: 1.110; kurtosis: 0.633; ap_prior: 2.92e-05
[infer_isize] inferred maximum insert size: 1103 (6.50 sigma)
[bwa_sai2sam_pe_core] time elapses: 10.79 sec
[bwa_sai2sam_pe_core] changing coordinates of 6435 alignments.
[bwa_sai2sam_pe_core] align unmapped mate...
[bwa_paired_sw] 29497 out of 36462 Q17 singletons are mated.
[bwa_paired_sw] 1384 out of 8191 Q17 discordant pairs are fixed.
[bwa_sai2sam_pe_core] time elapses: 4.24 sec
[bwa_sai2sam_pe_core] refine gapped alignments... 1.88 sec
[bwa_sai2sam_pe_core] print alignments... [bwa_sai2sam_pe_core] paired reads have different names: "I@ILLUMINA:381:D1HHHACXX:1:2312:14474:27286", "ILLUMINA:381:D1HHHACXX:1:2312:14474:27286"

Using Version: 0.7.9a-r786

Can anyone shed some light on what might be causing this error? Stampy had not problem with the exact same files.

EDIT:

grep -n "ILLUMINA:381:D1HHHACXX:1:2312:14474:27286"  forward.fastq
9:@I@ILLUMINA:381:D1HHHACXX:1:2312:14474:27286/1

grep -n "ILLUMINA:381:D1HHHACXX:1:2312:14474:27286"  reverse.fastq
9:@ILLUMINA:381:D1HHHACXX:1:2312:14474:27286/2

grep -B 2 "ILLUMINA:381:D1HHHACXX:1:2312:14474:27286"  forward.fastq
+
CCCFFFFFHHHHHJJJJJJJJJJJJJIIJJJJJJIIJJJJJJJJJIJJJJJJJJJBDHHIJJJJJJHHHHHHFFFFFFEEEEEEDDDDDDDDDDCDEECC
@I@ILLUMINA:381:D1HHHACXX:1:2312:14474:27286/1

grep -B 2 "ILLUMINA:381:D1HHHACXX:1:2312:14474:27286"  reverse.fastq
+
BBCFFFF;FHHHHJHIJGHIJJJJJJGGJJIJ?F?BFGGGHGJJJJJIIIHGHDFF@DDD9@ABBDDBC@ACDDD>AB9@D?BCCDADEEEDDDCCDCC@
@ILLUMINA:381:D1HHHACXX:1:2312:14474:27286/2

alignment • 8.9k views

ADD COMMENT • link updated 3.3 years ago by Ram 45k • written 10.6 years ago by crysis405 ▴ 30

0

Entering edit mode

Run these commands on your pair of fastq files and paste the output.

grep -n "ILLUMINA:381:D1HHHACXX:1:2312:14474:27286" forward.fastq
grep -n "ILLUMINA:381:D1HHHACXX:1:2312:14474:27286" reverse.fastq
grep -B 2 "ILLUMINA:381:D1HHHACXX:1:2312:14474:27286" forward.fastq
grep -B 2 "ILLUMINA:381:D1HHHACXX:1:2312:14474:27286" reverse.fastq

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.6 years ago by Ashutosh Pandey 12k

0

Entering edit mode

Added the output

ADD REPLY • link 10.6 years ago by crysis405 ▴ 30

0

Entering edit mode

There is no issue with the ordering of the read pairs in two files. The issue is related to the name of the read id. I am sure you have figured it out by now. Correct the read id or read name and run the aligner again. I would just make sure that that extra I@ doesn't belong to the quality score string of the previous read.

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.6 years ago by Ashutosh Pandey 12k

Ram · Answer 1 · 2014-08-20

0

Entering edit mode

10.6 years ago

Istvan Albert 102k

You have an I@ symbol in the read name for read1

ILLUMINA:381:D1HHHACXX:1:2312:14474:27286

That's pretty strange. Investigate why that happened.

ADD COMMENT • link updated 3.3 years ago by Ram 45k • written 10.6 years ago by Istvan Albert 102k

0

Entering edit mode

Yeah, don't know why only 1 out of 103 files would suddenly have I@ incorporated. I was thinking just deleting it and seeing if that worked.

ADD REPLY • link 10.6 years ago by crysis405 ▴ 30

0

Entering edit mode

Looks like the I@ was produced by Picardtools SamToFastq when using INTERLEAVE=TRUE

ADD REPLY • link 10.6 years ago by crysis405 ▴ 30