Hello biostars,
I am having a go at using sratoolkit for the first time, and wanted to know if the code I am using is appropriate for my data. I am using the same code for multiple different types of sequencing experiments, and am not sure if this is optimal. I'll lay out the sequence types and code below. I think the -3 is irrelevant for the single end data, do not know of any negative consequences of using -3 for Single paired end. I also think there may be a parameter I am missing when applying fastq-dump to miRNA-seq data.
example of code:
./fastq-dump --outdir /fastq --split-3 -I -F -B --skip-technical SRR7663647
This code is being used on the following studies:
Study | Assay Type | Library Layout | Instument
SRP052803 | RNA-seq | PAIRED | Illumina HiSeq 2000
SRP156883 | miRNA-seq | SINGLE | NextSeq 500
SRP156882 | RNA-seq | SINGLE | NextSeq 500
SRP047031 | miRNA-seq | SINGLE | Illumina HiSeq 2000
Any help or advice will be very appreciated.
I am going to recommend Phil Ewel's
sra-explorer
tool (https://ewels.github.io/sra-explorer/# ). Search using your study numbers. Shopping cart model. Add sequences to cart. Get direct URL's for download of fastq files from EBI-ENA in one click. You can also get even a nice bash script to download all files.Use those links with this guide: Fast download of FASTQ files from the European Nucleotide Archive (ENA)
Hi @genomax, thanks for pointing out this tool to me. I am running the files I intend to download through a loop in R and would like to understand the code for fastq-dump in more depth so will not be using sra-explorer for now. But may do in the future, as it looks like it is very quick and easy to use.
Link posted by @Santosh Anand gives a nice overview of options for
fastq-dump
command.There is no problem if you use
--split-3
if the SRA entry doesn't contain paired-end reads the parameter will be ignored. I will recommend to first cache the SRA files usingprefetch
and then runfastq-dump
.Hi @arup, I had a quick google on what prefetch does. Am I right to think it is a method to increase speed of download and stop the likelihood of downloading the same SRR file multiple times? I am running my code in a loop through R so I don't think I will encounter this issue.
fastq-dump
is used to download the data - can you provide more details about what you mean bySince miRNAs are so much smaller than mRNAs, I was thinking there may be an option to target smaller reads.
https://edwards.sdsu.edu/research/fastq-dump/