How to use a fastq file from paired library layout
2
0
Entering edit mode
8.4 years ago
thustar ▴ 130

Hi biostars!

I want to assemble some reads from real dataset like http://sra.dnanexus.com/studies/ERP000108/runs. However, I am confused with the description of Accession ERR011087. It says, library layout: paired. I do not know what this "paired" means.

The first 24 lines of ERR011087.fastq is

@ERR011087.1 I330_1_FC30JM6AAXX:4:1:0:199 length=88 TTCANATATGGAAAAACAGGGAGCGGAAATCACGTTACTTGCGTATCATCGGAAAAGGCAGGCTGTCCATGCTCCAACCGGTTAATGA +ERR011087.1 I330_1_FC30JM6AAXX:4:1:0:199 length=88 IIII"9I;III<+<-45CI13;-=93+046/0<1:-06>4.2+4:I86III0.863;GA@7I:5./2$62110='0(2(0$+++&+( @ERR011087.2 I330_1_FC30JM6AAXX:4:1:0:242 length=88 ACAANCTTCTCAATCTCGGTCTTTTTCTTGGGGAACTCCTTGGTAATAGAACTTGGAACACAGTCCTTGGATGAATACCGTTCTTTTG +ERR011087.2 I330_1_FC30JM6AAXX:4:1:0:242 length=88 @?;+"IIIIIIF+FII@9<16I<<bd+b6+4>1&&4%-08)/$$+III4.I@III3CIE:,@+04>8799H015./21/@/51791 @ERR011087.3 I330_1_FC30JM6AAXX:4:1:0:394 length=88 ATCANTTTCACTCAAACCATTAATAACATCTACCTGGTTCTTCAGGCTTCGATTCGTTTAAGGGTGATCAAGAGGCAATCATCAGAAA +ERR011087.3 I330_1_FC30JM6AAXX:4:1:0:394 length=88 2BI;"IIIIIIIG:8CCB<e?i7i c1ei4i)4&lt;7;212+f5="" ;6iiif&lt;7gi8c?i8'70="7@=$&lt;7+2.-+,4&amp;/*.24,&amp;4*&amp;*" @err011087.4="" i330_1_fc30jm6aaxx:4:1:0:438="" length="88" accancaatatcggtaacagtacccgtcttggaacccttaacctgaagattgatggctttggcagctttggcaactggcgttgctttg="" +err011087.4="" i330_1_fc30jm6aaxx:4:1:0:438="" length="88" &lt;2="">."I7IIII8;=8)(CI;/II81):2>548,+7(&:6?&+-06+DIGCBII6-GIB9<i7i= 911?+4+21;-)43:.20---+-="" @err011087.5="" i330_1_fc30jm6aaxx:4:1:0:740="" length="88" actgntctttggcatggctcatgagcattcccatcttgtttgtcagccagataggtgccaacaaccaccgtcttgaagtttctaccat="" +err011087.5="" i330_1_fc30jm6aaxx:4:1:0:740="" length="88" 3iii"iiiiiiiii="">;IIIF5I>3;45=IB3=):2<d6;ah0:*5h6ibiiic9iii:ii1d=282>3;-11ID:.0,H<,6-'5/7 @ERR011087.6 I330_1_FC30JM6AAXX:4:1:0:753 length=88 ATGANCGCTATGCATGATGATACGACTGTTTTTGTCGCGCGCCTCAGCGTGTGCACCTTTACGCCCAGATATGACGCGACAGCGTTGG +ERR011087.6 I330_1_FC30JM6AAXX:4:1:0:753 length=88 IIII"IIII3I=I6I=5I18I+;:+4959A&0>&,++(&(-(,90IIIAB;IA;IDIIIF;@G56:+9=?034,0+210'+204+&

I did not see anything suggests there exists some kind of pair in the file. Does this file only contain single-end reads or pair-end reads? If it contains pair-end reads, how can I figure out which two reads are in one pair?

Thanks.

next-gen • 6.1k views
ADD COMMENT
4
Entering edit mode
8.4 years ago
Jenez ▴ 540

To answer you first question:

'Library layout: Paired' simply means that it's a paired end sequence run. If you visit ENA (which in my opinion is simpler to use for raw read downloads) for ERR011087, this is what you find http://www.ebi.ac.uk/ena/data/view/ERR011087&display=html. You can see that there are two fastq files listed here.

I'm guessing that you retrieved yours through ncbi sra with fastq dump? You have to be careful here and actually specify that you want both pairs of sequences extracted, as I believe the default is for it to not to...

ADD COMMENT
0
Entering edit mode

Yes, I use fastq-dump to convert sra file to fastq file.

Do you mean that if I use fastq-dump directly, I will have a mistake?

ADD REPLY
2
Entering edit mode

Depends on how you used fastq-dump. For paired end illumina files you would want to use --split-files option to get the two PE files.

ADD REPLY
0
Entering edit mode
8.4 years ago
qingxiangg ▴ 40

Simple answer to your question,

ERR011088 is the pair file of ERR011087

ERR011090 is the pair file of ERR011089

Check the design description, pair-end files will share the same sample, i.e., Solexa sequencing of MetaHit individual MH0001 random pair end library for ERR011088, ERR011087

ADD COMMENT
0
Entering edit mode

oh, really? I do not agree with you.

Could you please show me the link to the description? I did not notice that.

ADD REPLY

Login before adding your answer.

Traffic: 1960 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6