Paired-end reads merge tool: Multiple @ lines in merged output of FLASH tool
1
0
Entering edit mode
5.6 years ago
Ankit ▴ 500

Hi everyone,

I have a query regarding merging paired-end read files. I am using FLASH for merging data. I ran flash as follows:

./flash sample_rep1_R1.fastq sample_rep1_R2.fastq -m 5 -t 5 -o sample_merge 2>&1 | tee flash.log

In sample_merge.extendedFrags.fastq I noticed some lines with multiple @ and quality score. For example,

> > <B/B-:@@D@D@:-:@-D@-:@D@@D-DDD@D@:@---D--::D:DDD BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFDDDDDDDDDDDDD#::DDDDDDDD@@DDDDDDDDDDDDDDDDDDDDDDDDF#FFFFFFFFFFFFFFFFF<<
> BBBBBF/BFFFBFFFFFDDDD:DDDDDDDDDDDDDDDDD@D:D@DDDDDDD:@D::@@DD@D-D@DDDDDFDB@FFFDF@:::
> BBBBBFFFFFDDD-D@DDDD@-D@-:D:@DDDDD-@DDDDDDD@:D@D-@D-D-@-D-5D@D@FFFFFFFFFF<<<-:7:
> BB@@@DF<FFF<DDDDDDDDDDFDDDDFFFFFDFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<DDDDDDFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBB
> BBBBBFFFFBFFFFB/FFFFFFFF<FFFFFFFFFFFFFFFFFFFFFB<FBFFFBFFBFBBFFFFBD@DDDDDDDD@D#::
> BB@@@DDDDDDDDDD@DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDFFFFF#FFFFFFFF#F#FFFFFFDDF@DD@D@DDDDDDDDDDDDDDDDDDDDDDDDDDDD@DDDDDDDDDDDDFBB

While the unmerged files: sample.notCombined_1.fastq and sample.notCombined_2.fastq does not have these lines.

I am wondering if these multi @ lines in extendedFrags.fastq are normal or are related to the parameter I have chosen.

My reads are 125X2

It would very help if someone can guide me.

Thanks

Ankit

flash paired-end read merge fastq • 2.8k views
ADD COMMENT
1
Entering edit mode
5.6 years ago
gb ★ 2.2k

Not fully sure if I understand you. But that "@" character stands for a certain quality score, is this case 64 (https://www.drive5.com/usearch/manual/quality_score.html). If you look up the merged read in sample_rep1_R1.fastq and sample_rep1_R2.fastq you will see those "@" characters in the line after the line starting with a "+".

What could be the cause why you don't see them often in the non-merged files is that FLASH does not merge if the --max-mismatch-density exceeds. It does not merge if there are to many mismatches. It can be that there are mismatches because the sequencer read the basepair wrong. And mostly those wrong basepairs have a lower quality score. The "@" character stand for a relatively higher score.

You should check how to interpret fastq files. FLASH will choose the basepair with highest quality score.

ADD COMMENT
0
Entering edit mode

Hi, Thanks for the reply. Yes you are right. I checked the distribution of fastq reads both in *extended.frags.fastq and *.notCombined_1 _2.fastq. It matches the sum of the original fastq. I was doing the mistake by checking read count using "@" and it was not matching sum properly so I thought this might be an issue. But now I checked "@header". It seems ok.

Can you also suggest me the appropriate value for -m and -M for 125 bp read. I am using -m 5 and -M default (65)?

Thanks for the quick help.

ADD REPLY

Login before adding your answer.

Traffic: 2327 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6