Hi everyone,
I have used RSEM to calculate expression values and then used ebseq to measure differential expression. Eb seq was not helpful in throwing out transcripts that were not consistent through out replicates so now I am trying to use DEseq. I am planning on converting my transcript.bam files to sam using samtools and then inputting this into DEseq. Does this make sense and is it likely to work?
Thank you
DESeq2 doesn't use sam format but uses read counts.
The input to DEseq is a (gene) counts table and not a sam file as far as I know.
In one of the output files of RSEM you should be able to find something read count like which could in theory serve as input for DEseq, however it is likely not the most appropriate one to be used as input
I have a gene.results but they should be normalized read counts. Is there anyway I can get this from converting the BAM file into something? I just don't have time to start over.
I think the fifth column in the gene.results file is the 'expected_count' . This should be a NON normalized count (so diff from TPM or FKPM) . They are however not raw counts! (= what DEseq expects as input).
So if you really do not want to start over, that's probably your best option. However I must stress that starting over is likely the best approach!! (one you should consider doing ;-) )
You should have a bam file of your aligned reads, no. So it's just a matter of using a different tool (eg HTSeq) to get to the counts table, it's not that you will have to remap all your reads. The starting over should only take several minutes to perhaps an hour or so. Thus, do consider in stead of working with non-optimal input
Thank you for your help. Can you explain how to do this? Do I use HTSeq and my bam files from RSEM? Also is DEseq only ran on Rstudio on my local computer? Is there anyway to do it on HPC? Sorry I am a beginner.
You should have a bam file of your aligned reads, no. So it's just a matter of using a different tool (eg HTSeq) to get to the counts table, it's not that you will have to remap all your reads. The starting over should only take several minutes to perhaps an hour or so. Thus, do consider in stead of working with non-optimal input
Yes indeed, you should be able to use the same input BAM file as you used for RSEM (perhaps with diff sorting, that you might need to check). What software did you use to create the BAM file?
No, you can get htseq also a linux distribution, so then you are able to run it in on HPC as well
EDIT: ok, you meant DEseq, my bad :-) , but anyway yes, you can run R and thus also DESeq in a linux environment
In fact, no, the same bam would not work, as RSEM uses as input bam with reads aligned to the transcriptome, and HTSeq uses as input bam with reads aligned to the genome.
One could fool HTSeq and use the transcriptome-aligned bam as input, but reads mapping to several isoforms of the same gene would not counted, because HTSeq discards multi-mappers.
No need to run DESeq2 on HPC, unless you have hundreds or thousands of samples. A regular laptop / desktop will handle the analysis easily.
RSEM gave me a transcript.bam output file. I was just reviewing my work and I remembered I am trying to convert these bam files into sam for input into htseq because of the sentence below. Will this bam output from RSEM be OK to convert into sam and input into htseq? "Here I used BWA to align the reads to my reference and generate SAM files that are then processed by HTseq count."
I wanted to thank everyone who responded. I decided to use tximport and then DEseq.
If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Please use
ADD COMMENT
orADD REPLY
to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.