Question

how to deal with sra files which can generate three fastq files?

1

Entering edit mode

6.1 years ago

fanglujing ▴ 60

Hi, I have downloaded sra file from NCBI, SRR4242282.sra and I got three fastq files after use fastq-dump to extract fastq files from sra files. command :fastq-dump --split-3 --gzip SRR4242282.sra I have no idea with this result, I haven't met this before. Any suggestion would be appreciated.

sra fastq-dump fastq • 3.8k views

ADD COMMENT • link updated 6.1 years ago by thomaskuilman ▴ 850 • written 6.1 years ago by fanglujing ▴ 60

1

Entering edit mode

It does look like the submitter's may have submitted index sequences in a separate file since the corresponding ENA entry also shows three fastq files. Examine the files to see which one is the index sequence containing file. It should be easily apparent because of short reads.

Edit: I will leave this here in case other submitter's have done this.

OP: Please confirm if t.kuilman's explanation is applicable in your case.

ADD REPLY • link 6.1 years ago by GenoMax 147k

0

Entering edit mode

I have checked fastq content and I think t.kuilman's suggestion works in this situation. Thanks for the reply.

ADD REPLY • link 6.1 years ago by fanglujing ▴ 60

0

Entering edit mode

6.1 years ago

Santosh Anand 5.8k

Either use only the _1 and _2 files or use option --split-files instead of --split-3

See the manual/help page:

  --split-files                    Dump each read into separate file.Files 
                                   will receive suffix corresponding to read 
                                   number 
  --split-3                        Legacy 3-file splitting for mate-pairs: 
                                   First biological reads satisfying dumping 
                                   conditions are placed in files *_1.fastq and 
                                   *_2.fastq If only one biological read is 
                                   present it is placed in *.fastq Biological 
                                   reads and above are ignored.

ADD COMMENT • link 6.1 years ago by Santosh Anand 5.8k

score 4 · Accepted Answer · 2018-10-10

4

Entering edit mode

6.1 years ago

thomaskuilman ▴ 850

Please see my previous post: this is due to the fact that BOTH paired and unpaired reads are included in these sra files. Using the --split-files option does not work since this would lead to fastq-files that are incomplete. What you did is correct; simply use the files ending with _1 and _2 will do. The remaining files contains the unpaired reads, and can be trashed.