Question

Convert transcriptomic bam to genomic bam

0

Entering edit mode

3.1 years ago

noodle ▴ 590

Hi all,

I am mapping to a transcriptome for bisulfite sequencing and would like to revert the output bam to a 'genome-mapped' bam - is anyone aware of a simple command line tool that can take a genome/gff and transcriptome-mapped bam file and return a genome-mapped bam?

Thanks!

genome bisulfite bam transcriptome • 2.8k views

ADD COMMENT • link 3.1 years ago by noodle ▴ 590

2

Entering edit mode

The only tool I've seen do that is rsem-tbam2gbam. I'm not sure if that'll be useful in your case.

ADD REPLY • link 3.1 years ago by Ram 44k

0

Entering edit mode

(edited) This basically exactly what I would need, except that I need it to be adapted for a 'bisulfite genome' so it's a bit off...maybe there is some trick to make it work

ADD REPLY • link 3.1 years ago by noodle ▴ 590

0

Entering edit mode

May I ask why you're mapping to a transcriptome to begin with? I'm scratching my head what the use-case for that could be

ADD REPLY • link 3.1 years ago by Friederike 9.0k

0

Entering edit mode

It's bisulfite treated RNA, so instead of 4 nucleotides you have 3 (all C become T except in cases where there is a methyl-modification, most commonly m5C, which blocks conversion)...so there is more potential for incorrect mappings. To map to a bisulfite genome, I would anyway need to take the '+' and '-' strand and do the C -> T conversion. Would you take a different approach?

I would like to convert back to a genome alignment so I don't have to worry about downstream analysis, or simple things like loading in IGV to the 'non-bisulfite' genome, which I do often to manually inspect how reads mapped to ensure there are no artifacts causing a signal (like a SNP).

ADD REPLY • link 3.1 years ago by noodle ▴ 590

0

Entering edit mode

ahhhh, I had misunderstood the question :-D I thought you wanted to somehow change the alignment coordinates (e.g. ignore splicing). What you're describing makes a whole lotta more sense. In the past I've used IGV to load the data as-is (http://software.broadinstitute.org/software/igv/interpreting_bisulfite_mode), what details are bothering you about it? And which downstream analyses of RNA-seq would actually care about the sequence and would benefit from the full alphabet?

ADD REPLY • link 3.1 years ago by Friederike 9.0k

0

Entering edit mode

And which downstream analyses of RNA-seq would actually care about the sequence and would benefit from the full alphabet?

I have many other treatments to the RNA, so being able to quickly validate sites on the same genome would be very advantageous (now I'm mainly referencing visualizing in IGV). It seems the rsem-tbam2gbam has many caveats when it converts the bam...for example, it doesn't handle soft-clipped reads. It also seems to mangle some reads that align 'perfect' to the transcriptome but end up not mapping well to the genome.

ADD REPLY • link 3.1 years ago by noodle ▴ 590

0

Entering edit mode

You could email the RSEM developer and check with them on what you could do to make the script work better for you.

ADD REPLY • link 3.1 years ago by Ram 44k

0

Entering edit mode

I still don't fully understand the issue with IGV. I seem to recall that I simply loaded my WGBS samples with other samples without giving it a second thought.

ADD REPLY • link 3.1 years ago by Friederike 9.0k

0

Entering edit mode

Because it's RNAseq ...I have a non-treated control which I can align to a genome to check for DNA contamination. IMO it's better practice to align to a transcriptome in this case.

ADD REPLY • link 3.1 years ago by noodle ▴ 590

0

Entering edit mode

But the IGV Browser does not care how a BAM file was generated. Have you tried loading your files in IGV? If yes, what bothered you about it?

ADD REPLY • link 3.1 years ago by Friederike 9.0k

0

Entering edit mode

It's about having the ability to view alongside non-bisulfite treated samples...even in a case where you align bisulfite-RNAseq to a genome, you need to have a 'double genome' where both the + and - strands are separate have C>T converted sequences.

ADD REPLY • link 3.1 years ago by noodle ▴ 590