I have fastq files for 10 samples. For each sample, I have 2 fastq files (paired end) and average size of compressed fastq file is 4gb and uncompressed is 16gb. It means, I have 20 uncompressed fastq files of size 320gb. Now I want to do mapping using BWA. I have 10 folders containing 2 files each.
I want to know if it is possible to input compressed fastq files in BWA?
What method would you use to map all these files? (fast and easy)
Should I just split each file and then map it?
I have seen some posts like this and tutorial, but did not find any efficient solution and I think there are lot of people here who do this often. I would really appreciate your help.
What compute resources do you have available? A cluster or a single machine?
I have a single machine with 32GB RAM. I was thinking to do mapping using "screen" (10 screens) at same time for all samples. Or should I do it one by one?
You are probably better off running one-at-a-time and using multiple threads (approximately as many threads as you have cores), but you may need to experiment. The point, of course, is to have all the cores busy all the time.
Thanks for your reply. Could you please give me some reason that why running one by one is better? I thought may be if I will run 10 screens, then I could do it for all samples at same time?
You could run 10 samples at once, each using 1 core, or run the samples one-at-a-time using 10 threads (or more) for each sample. The advantage of the second over the first is that the memory usage will be about 1/10 of the use of the first. The time to complete all 10 samples should be similar.
Thanks a lot. I will try it.