Dear all,
we got a few hundred large compressed BAM files (70GB <= size <= 300GB). They are not sorted and we would like to convert them back to fastq, in order to align them with a different algorithm.
We have paired-end reads and were planning to first sort the BAM by read name using sambamba (http://lomereiter.github.io/sambamba/docs/sambamba-sort.html) and then to convert the sorted BAM into fastq using bedtools (http://bedtools.readthedocs.io/en/latest/content/tools/bamtofastq.html). However, while the sorting is relatively fast (about 4h for each file), the conversion is very slow.
Is anyone aware of any other procedure that will make the conversion faster? I've seen that there are alternatives to bedtools such as picard (https://broadinstitute.github.io/picard/command-line-overview.html#FastqToSam) and biobambam2 (https://github.com/gt1/biobambam2). Does anyone know the performances of these tools, if a benchmarking has already been performed and/or if there are better tools?
Thank you very much in advance :)
Have you tried bam2fastq: https://gsl.hudsonalpha.org/information/software/bam2fastq
It says its no longer supported but still works
Hi Tonor. Do you know if bam2fastq is faster than Picard's SamToFastq and why it has been discontinued for Picard's SamToFastq (as in the top pf their web page)?
I've never done benchmarking so not sure which one if faster, I think as Picard tools accomplishes the same task they stopped actively developing it.
I don't think the bam file has to be sorted though for bam2fastq to work
BAM needs to be name sorted for PE data for
bedtools bamtofastq
.Yes, it does need to be sorted if the reads are paired-end (as specified by both picard and bedtools).
I am interested in the fastest way to complete the task, but I will add this tool to the list of programs to benchmark --if no one has done this before.
I recently used bam2fastq for single end reads. The problem I found was that reads that had more than one alignment in the bam file ended up being present more than once in the fastq file.
Yes, that's why for the reformat command I added the "primaryonly" flag; without that it has similar behavior.