I have some RNASeq data that my PI gave me from an old collaborator. Unfortunately, it looks like they only gave me the output of tophat2, not the raw fastq files, so I have sample1_accepted_hits.bam and sample1_unmapped.bam files.
I think it is possible to merge the two bam files, sort them by name, and then use samtools to extract the reads, but I have a couple of questions: 1) Will this actually give me all of the original reads from the fastq?
2) Is something like this the best way to handle this?
samtools merge -o sample1_merged.bam sample1_accepted_hits.bam sample1_unmapped.bam -@ 7
samtools collate -n 7 -u -O -@ 7 sample1_merged.bam | samtools fastq -F 0x900 -@ 7 -1 1.fastq.gz -2 2.fastq.gz -s sample.fastq.gz
3) This seems very slow. Is there a faster way to get the fastq files? I am doing this on a workstation with 32 GB RAM and an 8 core AMD 5700X so there are compute limitations.