Actually I am writing some code to perform sequence alignment. I need to know the valid identifier of a sequence where it starts. I read that it is @
but I found some sequence with >
also. And I also found that some quality values also have @
identifier such as the below read sequence
@r129
GTTGACTGAATTTTTTATCTATTAATGAATAAGTGCTTACTTCTTCTTTTTGACCTACAAAACCAATTTTAACATTTCCGATATCGCATTTTTCACCATGCTCATCAAAGACAGTAAGATAAAACATTGTAACAAAGGAATAGTCATTCCAACCATCTGCTCGTAGGAATG
+
@1=@!85H(!-H>#4@@$-4+D>6)DD*C-&=+?F3:0?.,?8;=?1&<-6!!4&7.C(:)H.442#;%G4(F$,C+;?*96C:&D0H@+;AE@$B&+3#HB)>@*?D0,!;&=0B=1E3421':<(*)4F6"-*3+@*$./H8;#'0&),=+<=B=*@"E#.@C@'#&'@
How can I differentiate the sequence identifier with the other elements of read that will not fail.
Simply put a valid fastq record will always need to have 4 lines. The first line should start with an
@
. If the 4th line contains an@
at the beginning treat that as a valid Q-score. Here is the FASTQ format entry on WikiP for reference.