Entering edit mode
10.1 years ago
jack
▴
980
Hi,
I have read file in Fastq file format. The file format is not well understandable for me.
I looked at first few lines of that, and looks like :
@HISEQ:10:C3B24ACXX:6:2303:18849:17686 2:N:0:ACAGTG
ATCTTCACAAATAAAACAAGCAATTCAATCGATTGATGAAGTATTTGCAAAGGAGAGGAAACATAGGAGTGGAAAAAAAGATGCAGAGTTCAGAT
+
?@?D7DFDHHHFHIIIIIIIIIGGIIIIHDHCB@F9<DD94BG9?DC<<DDFHG@CHE?F;C@AGADHCEIGEE>=;BDFCDCDBB?3(5>:>ACC:>CCD
@HISEQ:10:C3B24ACXX:6:2303:18827:17745 2:N:0:ACAGTG
adinfo@wks-12-49:/MMCI/MS/DeNovoAssembly/work/mirRegression/DE/Data_Martin_lab/140718/Sample_A1$ head -6 A1_ACAGTG_L006_R2_004.fastq
@HISEQ:10:C3B24ACXX:6:2303:18849:17686 2:N:0:ACAGTG
ATCTTCACAACACATAACAAGCAATTCAATCGATTGATGAAGTAAAAGGAGAGGAAACATAGGAGATGTGGAAAAAGATGCAGAGTTCAGAT
+
?@?D7DFDHHHFHIIIIIIIIIGGIIIIHDHCB@F9<DD94BG9?DC<<DDFHG@CHE?F;C@AGADHCEIGEE>=;BDFCDCDBB?3(5>:>ACC:>CCD
@HISEQ:10:C3B24ACXX:6:2303:18827:17745 2:N:0:ACAGTG
Basically what I just understand is that, there is read sequence. what are the others means?
Basically every four lines is a new read record.
Line 1: the sequence identifier.
Line 2: The actually DNA sequence.
Line 3: Usually just a + but can contain extra information.
Line 4: The quality represented using ascii character values.
Source: http://en.wikipedia.org/wiki/FASTQ_format