Hello again. I am new to bioninformatics and i need some guidance. I am trying to do an RNA seq analysis but the output seems a bit wrong. I got a type of cells for example h at 0,3 and 6 hour, and i got 2 replicates. And i want to compare 0h to 3h and 0h to 6h. So my sampels are: h0b,h0c,h3b,h3c,h6b,h6c(each .fastq, for example h0b.fastq is about 5.8g). I got the gtf for hg19 from table browser in ucsc(Refseq genes). And my pipeline is :
- tophat2 --library-type fr-unstranded -g 1 -G ...hg19.gtf ....bowtie2index/hg19genome h0b.fastq
- cufflinks -g ...hg19.gtf -o ./cufflinks ...acceptedhits.bam
- cuffmerge -g ...hg19.gtf h0-3.txt
- cuffdiff -L h0,h3 ...h0-3merged.gtf h0bacceptedhits.bam,h0cacceptedhits.bam h3bacceptedhits.bam,h3cacceptedhits.bam
I dont specify -p because i have seen many bugs in case i used it. Small in size outputs. And i say a bit wrong output because of the difference of results with a colleague who ran this process in usegalaxy ( but i am trying to learn how to use terminal ) and the gene_exp.diff ( open with excel ) seems a bit off.
Do you see any msitakes i made ? Could i clarify something better for you? Because i am pretty sure i dont explain everything as good as i could.
i uploaded an output so you can see my problem: https://files.fm/u/vss3dd6v and if you select only the significant, they are very few..
thanks in advance..
I strongly encourage you to not use tophat2 or any of the cuff* tools. These are out-dated and no longer best-practice. You are encouraged to use more modern aligners, such as STAR or hisat2. You'd also be better served with
featureCounts
and DESeq2/edgeR/limma rather than cufflinks. These will also prove to be much faster. Finally, for new projects, you'd be best served by using hg38 rather than hg19.Thank you for your reply. So what pipeline should i use? I have no idea with these programs. Also is hg38 complete? Someone told me that it isnt finished(?) yet?
I have seen many post suggesting the usage of Hisat over tophat. It is peculiar to observ odd behaviour. One of human rnaseq, hisat gave me an alignment % of 59 but tophat an overall of 91 for same against grch38.p10..
Hopefully you used hisat2 rather than hisat.
yes, hisat2 was used.
Please use a current program for RNAseq analysis. There are plenty out there (STAR, HISAT2, BBMap and more) and you would have an overall better experience with any one of them.
You beat me by a minute!
We need to have a "sticky" text for TopHat questions that can be reused. TopHat still seems attractive to new users since it is a "complete" pipeline.
Perhaps we should have a bot which posts default reactions to questions mentioning 'Tophat'.
What about a 'latest technology' section for programs? As an example, even I, before I joined, was not aware that HISAT2 had replaced TopHat2. I gave up TopHat2 for Kallisto long ago though.
thank you very much. What is the pipeline with those? I could really use your help.
SALMON - Getting started
Kallisto - Getting started
STAR - manual
HISAT/StringTie/Ballgown - protocol Link for reference. User HISAT2 the latest version, if you do use this option.
hg38 was released in December 2013. It will never be
finished
since human genome is not likely to, for a while. At some point it will be superseded by a new major genome release.deseq2 and sleuth are both in R, right? is there any other program out there that doesnt need r?
That will pass through peer review? Probably not. R is the standard environment for statistical analysis. That's also used in the standard Galaxy workflows (STAR -> featureCounts -> DESeq2 or hisat2 -> featureCounts -> DESeq2).
dimitrischat : If you have access to commercial software packages (e.g. CLC Genomics Workbench, Partek Genomics Suite, GeneSpring etc) then yes.
...or do all of the calculations manually?