Question

converting RSEM bam output into sam files for DEseq?

0

Entering edit mode

6.8 years ago

Sepd • 0

Hi everyone,

I have used RSEM to calculate expression values and then used ebseq to measure differential expression. Eb seq was not helpful in throwing out transcripts that were not consistent through out replicates so now I am trying to use DEseq. I am planning on converting my transcript.bam files to sam using samtools and then inputting this into DEseq. Does this make sense and is it likely to work?

Thank you

RNA-Seq deseq rsem sam bam • 3.5k views

ADD COMMENT • link 6.8 years ago by Sepd • 0

3

Entering edit mode

DESeq2 doesn't use sam format but uses read counts.

ADD REPLY • link 6.8 years ago by WouterDeCoster 47k

3

Entering edit mode

The input to DEseq is a (gene) counts table and not a sam file as far as I know.

In one of the output files of RSEM you should be able to find something read count like which could in theory serve as input for DEseq, however it is likely not the most appropriate one to be used as input

ADD REPLY • link 6.8 years ago by lieven.sterck 15k

0

Entering edit mode

I have a gene.results but they should be normalized read counts. Is there anyway I can get this from converting the BAM file into something? I just don't have time to start over.

ADD REPLY • link 6.8 years ago by Sepd • 0

3

Entering edit mode

I think the fifth column in the gene.results file is the 'expected_count' . This should be a NON normalized count (so diff from TPM or FKPM) . They are however not raw counts! (= what DEseq expects as input).

So if you really do not want to start over, that's probably your best option. However I must stress that starting over is likely the best approach!! (one you should consider doing ;-) )

You should have a bam file of your aligned reads, no. So it's just a matter of using a different tool (eg HTSeq) to get to the counts table, it's not that you will have to remap all your reads. The starting over should only take several minutes to perhaps an hour or so. Thus, do consider in stead of working with non-optimal input

ADD REPLY • link 6.8 years ago by lieven.sterck 15k

0

Entering edit mode

Thank you for your help. Can you explain how to do this? Do I use HTSeq and my bam files from RSEM? Also is DEseq only ran on Rstudio on my local computer? Is there anyway to do it on HPC? Sorry I am a beginner.

You should have a bam file of your aligned reads, no. So it's just a matter of using a different tool (eg HTSeq) to get to the counts table, it's not that you will have to remap all your reads. The starting over should only take several minutes to perhaps an hour or so. Thus, do consider in stead of working with non-optimal input

ADD REPLY • link 6.8 years ago by Sepd • 0

1

Entering edit mode

Yes indeed, you should be able to use the same input BAM file as you used for RSEM (perhaps with diff sorting, that you might need to check). What software did you use to create the BAM file?

No, you can get htseq also a linux distribution, so then you are able to run it in on HPC as well

EDIT: ok, you meant DEseq, my bad :-) , but anyway yes, you can run R and thus also DESeq in a linux environment

ADD REPLY • link 6.8 years ago by lieven.sterck 15k

1

Entering edit mode

Do I use HTSeq and my bam files from RSEM?

In fact, no, the same bam would not work, as RSEM uses as input bam with reads aligned to the transcriptome, and HTSeq uses as input bam with reads aligned to the genome.

One could fool HTSeq and use the transcriptome-aligned bam as input, but reads mapping to several isoforms of the same gene would not counted, because HTSeq discards multi-mappers.

Also is DEseq only ran on Rstudio on my local computer? Is there anyway to do it on HPC?

No need to run DESeq2 on HPC, unless you have hundreds or thousands of samples. A regular laptop / desktop will handle the analysis easily.

ADD REPLY • link 6.8 years ago by h.mon 35k

0

Entering edit mode

RSEM gave me a transcript.bam output file. I was just reviewing my work and I remembered I am trying to convert these bam files into sam for input into htseq because of the sentence below. Will this bam output from RSEM be OK to convert into sam and input into htseq? "Here I used BWA to align the reads to my reference and generate SAM files that are then processed by HTseq count."

ADD REPLY • link 6.8 years ago by Sepd • 0

0

Entering edit mode

I wanted to thank everyone who responded. I decided to use tximport and then DEseq.

ADD REPLY • link 6.8 years ago by Sepd • 0

0

Entering edit mode

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. Upvote|Bookmark|Accept

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

ADD REPLY • link 6.8 years ago by WouterDeCoster 47k

score 2 · Answer 1 · 2018-03-15

2

Entering edit mode

6.8 years ago

h.mon 35k

I am planning on converting my transcript.bam files to sam using samtools and then inputting this into DEseq. Does this make sense and is it likely to work?

As others explained, no, it will not work.

RSEM uses a BAM of reads aligned to a reference transcriptome, not to a reference genome. HTSeq needs a bam aligned to the genome, and the genome annotation in gtf format. If you have a transcript to gene mapping, you can use the tximport R package to summarize RSEM transcript counts into gene counts, then use these counts with DESeq2.

edit:

I have a gene.results

If this gene.results is part of RSEM output, I think it is also appropriate (maybe rounding it to integers), and should be really close to the tximport counts.

ADD COMMENT • link 6.8 years ago by h.mon 35k

0

Entering edit mode

by transcript to gene mapping do you mean the transcript.bam output of RSEM?

ADD REPLY • link 6.8 years ago by Sepd • 0

1

Entering edit mode

No, it is just a table of which transcripts correspond to which genes, something like:

transcript_01    gene_01
transcript_02    gene_01
transcript_03    gene_02
transcript_04    gene_03
transcript_05    gene_03

Check the correct format at the tximport manual.

ADD REPLY • link 6.8 years ago by h.mon 35k

score 1 · Answer 2 · 2018-03-15

1

Entering edit mode

6.8 years ago

WouterDeCoster 47k

See the tximport vignette on how to use RSEM with DESeq2.

ADD COMMENT • link 6.8 years ago by WouterDeCoster 47k