Mapping Large Fastq Files With Bwa
2
2
Entering edit mode
12.5 years ago
Vikas Bansal ★ 2.4k

I have fastq files for 10 samples. For each sample, I have 2 fastq files (paired end) and average size of compressed fastq file is 4gb and uncompressed is 16gb. It means, I have 20 uncompressed fastq files of size 320gb. Now I want to do mapping using BWA. I have 10 folders containing 2 files each.

I want to know if it is possible to input compressed fastq files in BWA?

What method would you use to map all these files? (fast and easy)

Should I just split each file and then map it?

I have seen some posts like this and tutorial, but did not find any efficient solution and I think there are lot of people here who do this often. I would really appreciate your help.

bwa mapping • 16k views
ADD COMMENT
0
Entering edit mode

What compute resources do you have available? A cluster or a single machine?

ADD REPLY
0
Entering edit mode

I have a single machine with 32GB RAM. I was thinking to do mapping using "screen" (10 screens) at same time for all samples. Or should I do it one by one?

ADD REPLY
1
Entering edit mode

You are probably better off running one-at-a-time and using multiple threads (approximately as many threads as you have cores), but you may need to experiment. The point, of course, is to have all the cores busy all the time.

ADD REPLY
0
Entering edit mode

Thanks for your reply. Could you please give me some reason that why running one by one is better? I thought may be if I will run 10 screens, then I could do it for all samples at same time?

ADD REPLY
2
Entering edit mode

You could run 10 samples at once, each using 1 core, or run the samples one-at-a-time using 10 threads (or more) for each sample. The advantage of the second over the first is that the memory usage will be about 1/10 of the use of the first. The time to complete all 10 samples should be similar.

ADD REPLY
0
Entering edit mode

Thanks a lot. I will try it.

ADD REPLY
4
Entering edit mode
12.5 years ago

[Edited]

Concerning handling compression in bwa, you should find your answer here : http://www.biostars.org/post/show/5474/bwa-index-on-all-human-grch37-sequences

Apart from that, 2Gb files is not that big, so you could process them separately (i.e. parallelization by data) which shouldn't take too long on a multi-thread machine.

ADD COMMENT
0
Entering edit mode

Thanks. For compressed fastq files, its clear now. Now I am looking for efficient technique as mentioned in my original post.

ADD REPLY
0
Entering edit mode

Thanks a lot Sean and Leonor.

ADD REPLY
0
Entering edit mode
12.5 years ago
pinkiii1984v ▴ 20

I too work with compressed files and it is possible to use them with BWA.

ADD COMMENT

Login before adding your answer.

Traffic: 1903 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6