I am planning to merge two FastQ file from NGS.
And I have question with FLASH algorithm.
I have below File Forward and reverse
1. Forward
@CP000143_994500_994663_0:0:0_0:0:0_0/1
GCTTCTGCGACCGCGCCCTCGTCGTCTACCGCGGCACGCTGAACGGCGAGTTCGCGGGCGAGACGCTCGACAGCGACCTGCTCCTGGCCGCCGCCTCGGG
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
2. Reverse
@CP000143_994500_994663_0:0:0_0:0:0_0/2
GCTCTCGCTGGAGGACGGGGACAGGGCCATGGTCATTCAGGCCTCCTTTCGTTGGGCCCGCGCGCCCGAGGCGGCGGCTAGGAGCAGGTCGCTGTCGCGA
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
If Reverse Complement and Merge together results will be.
GCTTCTGCGACCGCGCCCTCGTCGTCTACCGCGGCACGCTGAACGGCGAGTTCGCGGGCGAGACTCGCGACAGCGACCTGCTCCTAGCCGCCGCCTCGGGCGCGCGGGCCCAACGAAAGGAGGCCTGAATGACCATGGCCCTGTCCCCGTCCTCCAGCGAGAGC
Compare both Sequence
GCTTCTGCGACCGCGCCCTCGTCGTCTACCGCGGCACGCTGAACGGCGAGTTCGCGGGCGAGACGCTCGACAGCGACCTGCTCCTGGCCGCCGCCTCGGG
^----------------------------------^
TCGCGACAGCGACCTGCTCCTAGCCGCCGCCTCGGGCGCGCGGGCCCAACGAAAGGAGGCCTGAATGACCATGGCCCTGTCCCCGTCCTCCAGCGAGAGC
^----------------------------------^
And There is 3 mismatch during merge two sequences
But that mismatch part Results will follow reverse complement. Anyone has idea why it happen like this?
Thank you!
It's a little difficult to tell exactly what your question is. Are you asking why the reads disagree in the overlapped region (this does happen on occasion)? Are you asking why FLASH merges them the way it does, in which case is the 3rd sequence the output from FLASH or something else?
Sorry for confused, I mean I understand that two sequences can combine together by Mistach Ratio. However, results showed me reverse complement sequences was used for combined sequences in case of different postion. so I want to know why reverse complement sequence used!
BTW, it looks like the quality scores are fake, that's generally a bad idea since having actual base quality scores can help resolves ambiguous base calls like you're running into here.
So I see actually that is just for sample, do you know what is algorithm to check quality and how can i determine ambiguous by quality score?