Hi all,
I was wondering if anyone could offer me some advice on using paired end reads with Tophat, specifically with the output. I'm planning on using Tophat as part of a pipeline for processing my sequence data. The reads that map are obviously going to be easy to deal with, but the unmapped.bam file is proving a bit problematic. I would like to get that bam file back to two fastq files containing the paired reads which didn't map to the reference genome (hg19 in this case). What I was thinking was to convert to sam, and then use Picard's SamToFastq function, but that is returning the following error
MAPQ must be zero if RNAME is not specified;
Which I haven't been able to find anything about online. I'm also not sure how time consuming this will be. I'm currently just playing around with a random sample of my data just trying to get everything working, but my actual data files are probably going to be 20gb + at least in fastq format anyway.
I was also thinking of converting the accepted_hits.bam
file to sam and then writing a unix script which would take the files which were input into tophat and write any read which isn't present in the accepted_hits
file into 2 new files.
What do you think?
Thank you very much sir.
Or LENIENT. LENIENT it will still work but you'll get the warnings.