Entering edit mode
3.1 years ago
noodle
▴
590
Hi all,
I am mapping to a transcriptome for bisulfite sequencing and would like to revert the output bam to a 'genome-mapped' bam - is anyone aware of a simple command line tool that can take a genome/gff and transcriptome-mapped bam file and return a genome-mapped bam?
Thanks!
The only tool I've seen do that is rsem-tbam2gbam. I'm not sure if that'll be useful in your case.
(edited) This basically exactly what I would need, except that I need it to be adapted for a 'bisulfite genome' so it's a bit off...maybe there is some trick to make it work
May I ask why you're mapping to a transcriptome to begin with? I'm scratching my head what the use-case for that could be
It's bisulfite treated RNA, so instead of 4 nucleotides you have 3 (all C become T except in cases where there is a methyl-modification, most commonly m5C, which blocks conversion)...so there is more potential for incorrect mappings. To map to a bisulfite genome, I would anyway need to take the '+' and '-' strand and do the C -> T conversion. Would you take a different approach?
I would like to convert back to a genome alignment so I don't have to worry about downstream analysis, or simple things like loading in IGV to the 'non-bisulfite' genome, which I do often to manually inspect how reads mapped to ensure there are no artifacts causing a signal (like a SNP).
ahhhh, I had misunderstood the question :-D I thought you wanted to somehow change the alignment coordinates (e.g. ignore splicing). What you're describing makes a whole lotta more sense. In the past I've used IGV to load the data as-is (http://software.broadinstitute.org/software/igv/interpreting_bisulfite_mode), what details are bothering you about it? And which downstream analyses of RNA-seq would actually care about the sequence and would benefit from the full alphabet?
I have many other treatments to the RNA, so being able to quickly validate sites on the same genome would be very advantageous (now I'm mainly referencing visualizing in IGV). It seems the
rsem-tbam2gbam
has many caveats when it converts the bam...for example, it doesn't handle soft-clipped reads. It also seems to mangle some reads that align 'perfect' to the transcriptome but end up not mapping well to the genome.You could email the RSEM developer and check with them on what you could do to make the script work better for you.
I still don't fully understand the issue with IGV. I seem to recall that I simply loaded my WGBS samples with other samples without giving it a second thought.
Because it's RNAseq ...I have a non-treated control which I can align to a genome to check for DNA contamination. IMO it's better practice to align to a transcriptome in this case.
But the IGV Browser does not care how a BAM file was generated. Have you tried loading your files in IGV? If yes, what bothered you about it?
It's about having the ability to view alongside non-bisulfite treated samples...even in a case where you align bisulfite-RNAseq to a genome, you need to have a 'double genome' where both the + and - strands are separate have C>T converted sequences.