Mismatch during Merge Paired-End Reads in FLASH Algorithm
1
1
Entering edit mode
10.0 years ago
clear.choi ▴ 30

I am planning to merge two FastQ file from NGS.

And I have question with FLASH algorithm.

I have below File Forward and reverse

1. Forward

@CP000143_994500_994663_0:0:0_0:0:0_0/1
GCTTCTGCGACCGCGCCCTCGTCGTCTACCGCGGCACGCTGAACGGCGAGTTCGCGGGCGAGACGCTCGACAGCGACCTGCTCCTGGCCGCCGCCTCGGG
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

2. Reverse

@CP000143_994500_994663_0:0:0_0:0:0_0/2
GCTCTCGCTGGAGGACGGGGACAGGGCCATGGTCATTCAGGCCTCCTTTCGTTGGGCCCGCGCGCCCGAGGCGGCGGCTAGGAGCAGGTCGCTGTCGCGA
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

If Reverse Complement and Merge together results will be.

GCTTCTGCGACCGCGCCCTCGTCGTCTACCGCGGCACGCTGAACGGCGAGTTCGCGGGCGAGACTCGCGACAGCGACCTGCTCCTAGCCGCCGCCTCGGGCGCGCGGGCCCAACGAAAGGAGGCCTGAATGACCATGGCCCTGTCCCCGTCCTCCAGCGAGAGC

Compare both Sequence

GCTTCTGCGACCGCGCCCTCGTCGTCTACCGCGGCACGCTGAACGGCGAGTTCGCGGGCGAGACGCTCGACAGCGACCTGCTCCTGGCCGCCGCCTCGGG
                                                                ^----------------------------------^
TCGCGACAGCGACCTGCTCCTAGCCGCCGCCTCGGGCGCGCGGGCCCAACGAAAGGAGGCCTGAATGACCATGGCCCTGTCCCCGTCCTCCAGCGAGAGC 
^----------------------------------^

And There is 3 mismatch during merge two sequences

But that mismatch part Results will follow reverse complement. Anyone has idea why it happen like this?

Thank you!

paired FLASH rna merge fastq • 4.3k views
ADD COMMENT
0
Entering edit mode

It's a little difficult to tell exactly what your question is. Are you asking why the reads disagree in the overlapped region (this does happen on occasion)? Are you asking why FLASH merges them the way it does, in which case is the 3rd sequence the output from FLASH or something else?

ADD REPLY
0
Entering edit mode

Sorry for confused, I mean I understand that two sequences can combine together by Mistach Ratio. However, results showed me reverse complement sequences was used for combined sequences in case of different postion. so I want to know why reverse complement sequence used!

ADD REPLY
0
Entering edit mode

BTW, it looks like the quality scores are fake, that's generally a bad idea since having actual base quality scores can help resolves ambiguous base calls like you're running into here.

ADD REPLY
0
Entering edit mode

So I see actually that is just for sample, do you know what is algorithm to check quality and how can i determine ambiguous by quality score?

ADD REPLY
2
Entering edit mode
10.0 years ago

In the case of mismatches between the two sequences, FLASH will use the nucleotide with the highest base quality. Of course with fake qualities you're going to get weird results on occasion. I should note that mismatches like this only happen is something like 0.01% of alignments in bisulfite sequencing datasets...so in RNAseq it should be even less common (so it shouldn't be common enough to care about).

BTW, in the source code for FLASH (combine_reads.c, around line 400), this is how mismatches with equal phred scores are handled:

if (*qual_1 > *qual_2) {
    *combined_seq = *seq_1;
} else if (*qual_1 < *qual_2) {
    *combined_seq = *seq_2;
} else {
    /* Same quality value; take the base from the
     * first read if the base from the second read
     * is an 'N'; otherwise take the base from the
     * second read. */
    if (*seq_2 == 'N')
        *combined_seq = *seq_1;
    else
        *combined_seq = *seq_2;
}

One could argue about their choice there at the end, but that's why you get the results that you do.

ADD COMMENT
0
Entering edit mode

Awesome! Thanks!

ADD REPLY
0
Entering edit mode

Oh And could you tell me what is second read is an 'N'?

ADD REPLY
0
Entering edit mode

That's asking if the base seen is an N rather than one of A, T, C, or G. Obviously if one read has an N at a position, then using the other read at that position is the right idea.

ADD REPLY

Login before adding your answer.

Traffic: 1941 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6