Sra File Fastq-Dump Command
2
1
Entering edit mode
13.1 years ago
Varun Gupta ★ 1.3k

Hi Everyone I am working on Sra files. I need to convert them into Fastq files. When i used fastq-dump on my SRR786.sra file , it gave me three files namely SRR7861.fastq, SRR7862.fastq and SRR786.fastq. I read the documentation for SRA and in that it is written that if we use fastq-dump --split-3 SRR786.sra, then we get 3 files as i got above. Additionally SRR786.fastq file is much smaller in size than other 2 files namely SRR7861.fastq and SRR7862.fastq.

I would be really kind if you guys explain me why 3 files were generated when i did not gave the split option and also what does this small fastq file generatations means

Regards Varun

sra • 18k views
ADD COMMENT
0
Entering edit mode

perhaps _1 & _2 are paired end and the other is a single end library.

ADD REPLY
0
Entering edit mode

But the size of the single end library as u say is way too small as compared to _1 and _2. Any suggestions on this .

Regards Varun

ADD REPLY
2
Entering edit mode
13.1 years ago
Nico ▴ 190

It's documented here

section 5.5 Basic Execution of ‘fastq-dump’ Utility

"The file, ‘SRR000001.fastq’, contains fragment read sequences where only a single biological/application read exists (or remained after filtering) for a spot. The first spot in the file will look like this"...

It's not very clear to me what the "small" file is, but I guess that what you want to use are the 2 "large" ones.

ADD COMMENT
0
Entering edit mode

Hi Nico Thanks for your reply. I read that but could not understand why 3rd file(.fastq) is even generated when i simply used the command fastq-dump and not fastq-dump --split -3.With small file i meant that the size of SRR76.fastq was too small as compared to other 2 files.

ADD REPLY
0
Entering edit mode

The documentation as linked indicates that the smaller file 3 the better. IT are dumped reads: "--split-3

Legacy 3-file splitting for mate-pairs used for the 1000 Genomes fastq files. First 2 biological reads satisfying dumping conditions are placed in files _1.fastq and _2.fastq. If only 1 biological read is dumpable - it is placed in *.fastq Biological reads 3 and above are ignored."

ADD REPLY
2
Entering edit mode
13.1 years ago

Maybe you could try -v -v -v -v

‘-v’ or ‘--verbose’ Increases the verbosity level of the program. Use multiple times for more verbosity.

Ha ha.

Really, I don't know why you are seeing what you do. Could you try getting the sra.lite version of the file instead, just to see if it's doing the same thing?

I don't know if this helps at all, but maybe check your version of the sra toolkit and contact the help desk over there if you keep having problems.

How To Convert Sra-Lite Paired-End Submission To Fastq?

ADD COMMENT

Login before adding your answer.

Traffic: 1830 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6