Question

Using tophat2 output for kallisto

0

Entering edit mode

9 months ago

Wintermute • 0

I have some RNASeq data that my PI gave me from an old collaborator. Unfortunately, it looks like they only gave me the output of tophat2, not the raw fastq files, so I have sample1_accepted_hits.bam and sample1_unmapped.bam files.

I think it is possible to merge the two bam files, sort them by name, and then use samtools to extract the reads, but I have a couple of questions: 1) Will this actually give me all of the original reads from the fastq?

2) Is something like this the best way to handle this?

samtools merge -o sample1_merged.bam sample1_accepted_hits.bam sample1_unmapped.bam -@ 7

samtools collate -n 7 -u -O -@ 7 sample1_merged.bam | samtools fastq -F 0x900 -@ 7 -1 1.fastq.gz -2 2.fastq.gz -s sample.fastq.gz

3) This seems very slow. Is there a faster way to get the fastq files? I am doing this on a workstation with 32 GB RAM and an 8 core AMD 5700X so there are compute limitations.

bam tophat2 kallisto • 386 views

ADD COMMENT • link updated 9 months ago by dsull ★ 6.9k • written 9 months ago by Wintermute • 0

score 0 · Answer 1 · 2024-02-22

You should indeed use samtools to extract the original FASTQ files. But of course it will be slow -- it won't take up much memory (more like it takes up disk space), so just wait it out. I don't see why "compute limitations" would have anything to do with you simply waiting a bit longer.

There may be faster tools (can't think of any off the top of my head), but I'd just say wait it out.