Hi biostars!
I want to assemble some reads from real dataset like http://sra.dnanexus.com/studies/ERP000108/runs. However, I am confused with the description of Accession ERR011087. It says, library layout: paired. I do not know what this "paired" means.
The first 24 lines of ERR011087.fastq is
@ERR011087.1 I330_1_FC30JM6AAXX:4:1:0:199 length=88 TTCANATATGGAAAAACAGGGAGCGGAAATCACGTTACTTGCGTATCATCGGAAAAGGCAGGCTGTCCATGCTCCAACCGGTTAATGA +ERR011087.1 I330_1_FC30JM6AAXX:4:1:0:199 length=88 IIII"9I;III<+<-45CI13;-=93+046/0<1:-06>4.2+4:I86III0.863;GA@7I:5./2$62110='0(2(0$+++&+( @ERR011087.2 I330_1_FC30JM6AAXX:4:1:0:242 length=88 ACAANCTTCTCAATCTCGGTCTTTTTCTTGGGGAACTCCTTGGTAATAGAACTTGGAACACAGTCCTTGGATGAATACCGTTCTTTTG +ERR011087.2 I330_1_FC30JM6AAXX:4:1:0:242 length=88 @?;+"IIIIIIF+FII@9<16I<<bd+b6+4>1&&4%-08)/$$+III4.I@III3CIE:,@+04>8799H015./21/@/51791 @ERR011087.3 I330_1_FC30JM6AAXX:4:1:0:394 length=88 ATCANTTTCACTCAAACCATTAATAACATCTACCTGGTTCTTCAGGCTTCGATTCGTTTAAGGGTGATCAAGAGGCAATCATCAGAAA +ERR011087.3 I330_1_FC30JM6AAXX:4:1:0:394 length=88 2BI;"IIIIIIIG:8CCB<e?i7i c1ei4i)4<7;212+f5="" ;6iiif<7gi8c?i8'70="7@=$<7+2.-+,4&/*.24,&4*&*" @err011087.4="" i330_1_fc30jm6aaxx:4:1:0:438="" length="88" accancaatatcggtaacagtacccgtcttggaacccttaacctgaagattgatggctttggcagctttggcaactggcgttgctttg="" +err011087.4="" i330_1_fc30jm6aaxx:4:1:0:438="" length="88" <2="">."I7IIII8;=8)(CI;/II81):2>548,+7(&:6?&+-06+DIGCBII6-GIB9<i7i= 911?+4+21;-)43:.20---+-="" @err011087.5="" i330_1_fc30jm6aaxx:4:1:0:740="" length="88" actgntctttggcatggctcatgagcattcccatcttgtttgtcagccagataggtgccaacaaccaccgtcttgaagtttctaccat="" +err011087.5="" i330_1_fc30jm6aaxx:4:1:0:740="" length="88" 3iii"iiiiiiiii="">;IIIF5I>3;45=IB3=):2<d6;ah0:*5h6ibiiic9iii:ii1d=282>3;-11ID:.0,H<,6-'5/7 @ERR011087.6 I330_1_FC30JM6AAXX:4:1:0:753 length=88 ATGANCGCTATGCATGATGATACGACTGTTTTTGTCGCGCGCCTCAGCGTGTGCACCTTTACGCCCAGATATGACGCGACAGCGTTGG +ERR011087.6 I330_1_FC30JM6AAXX:4:1:0:753 length=88 IIII"IIII3I=I6I=5I18I+;:+4959A&0>&,++(&(-(,90IIIAB;IA;IDIIIF;@G56:+9=?034,0+210'+204+&
I did not see anything suggests there exists some kind of pair in the file. Does this file only contain single-end reads or pair-end reads? If it contains pair-end reads, how can I figure out which two reads are in one pair?
Thanks.
Yes, I use fastq-dump to convert sra file to fastq file.
Do you mean that if I use fastq-dump directly, I will have a mistake?
Depends on how you used fastq-dump. For paired end illumina files you would want to use
--split-files
option to get the two PE files.