Hi all, I am trying to optimize an RNA-seq pipeline and I want to be able to estimate the RAM requirements for fastq files of different sizes. So far I have tested on files from typical rna-seq expeirments of ~30 to 40 million reads. I want to now test on much larger data where the file is close to 50gb in size .
I was wondering where I can obtain such files for testing. Can anyone point me to some publicly available datasets that have more number of sequences than what I have already done? Anything like >= 150 million reads would also be okay.
Thanks , RK
You can try to merge multiple samples to big fastq file. Use
cat
orzcat
command.Hi , Thanks for your reply. I used some samples from ENCODE which were ~30GB or ~200 million reads to run some jobs on aws batch. And I got Out OfMemory Errors. And I am aligning to the human genome. I used the same pipeline before for running a basic rna-seq experiment with 25 to 30 million reads and I didn't have any problems then.
Do you have any idea about what could be the problem?
WIthout details on the pipeline this is impossible to answer. Please add comments via
ADD COMMENT/REPLY
to keep things organized.