RNA seq pipeline
0
1
Entering edit mode
7.1 years ago
dimitrischat ▴ 210

Hello again. I am new to bioninformatics and i need some guidance. I am trying to do an RNA seq analysis but the output seems a bit wrong. I got a type of cells for example h at 0,3 and 6 hour, and i got 2 replicates. And i want to compare 0h to 3h and 0h to 6h. So my sampels are: h0b,h0c,h3b,h3c,h6b,h6c(each .fastq, for example h0b.fastq is about 5.8g). I got the gtf for hg19 from table browser in ucsc(Refseq genes). And my pipeline is :

  • tophat2 --library-type fr-unstranded -g 1 -G ...hg19.gtf ....bowtie2index/hg19genome h0b.fastq
  • cufflinks -g ...hg19.gtf -o ./cufflinks ...acceptedhits.bam
  • cuffmerge -g ...hg19.gtf h0-3.txt
  • cuffdiff -L h0,h3 ...h0-3merged.gtf h0bacceptedhits.bam,h0cacceptedhits.bam h3bacceptedhits.bam,h3cacceptedhits.bam

I dont specify -p because i have seen many bugs in case i used it. Small in size outputs. And i say a bit wrong output because of the difference of results with a colleague who ran this process in usegalaxy ( but i am trying to learn how to use terminal ) and the gene_exp.diff ( open with excel ) seems a bit off.

Do you see any msitakes i made ? Could i clarify something better for you? Because i am pretty sure i dont explain everything as good as i could.

i uploaded an output so you can see my problem: https://files.fm/u/vss3dd6v and if you select only the significant, they are very few..

thanks in advance..

RNA-Seq • 3.5k views
ADD COMMENT
3
Entering edit mode

I strongly encourage you to not use tophat2 or any of the cuff* tools. These are out-dated and no longer best-practice. You are encouraged to use more modern aligners, such as STAR or hisat2. You'd also be better served with featureCounts and DESeq2/edgeR/limma rather than cufflinks. These will also prove to be much faster. Finally, for new projects, you'd be best served by using hg38 rather than hg19.

ADD REPLY
0
Entering edit mode

Thank you for your reply. So what pipeline should i use? I have no idea with these programs. Also is hg38 complete? Someone told me that it isnt finished(?) yet?

ADD REPLY
0
Entering edit mode

I have seen many post suggesting the usage of Hisat over tophat. It is peculiar to observ odd behaviour. One of human rnaseq, hisat gave me an alignment % of 59 but tophat an overall of 91 for same against grch38.p10..

ADD REPLY
0
Entering edit mode

Hopefully you used hisat2 rather than hisat.

ADD REPLY
0
Entering edit mode

yes, hisat2 was used.

ADD REPLY
1
Entering edit mode

Please use a current program for RNAseq analysis. There are plenty out there (STAR, HISAT2, BBMap and more) and you would have an overall better experience with any one of them.

ADD REPLY
2
Entering edit mode

You beat me by a minute!

ADD REPLY
1
Entering edit mode

We need to have a "sticky" text for TopHat questions that can be reused. TopHat still seems attractive to new users since it is a "complete" pipeline.

ADD REPLY
2
Entering edit mode

Perhaps we should have a bot which posts default reactions to questions mentioning 'Tophat'.

ADD REPLY
1
Entering edit mode

What about a 'latest technology' section for programs? As an example, even I, before I joined, was not aware that HISAT2 had replaced TopHat2. I gave up TopHat2 for Kallisto long ago though.

ADD REPLY
0
Entering edit mode

thank you very much. What is the pipeline with those? I could really use your help.

ADD REPLY
2
Entering edit mode

SALMON - Getting started
Kallisto - Getting started
STAR - manual
HISAT/StringTie/Ballgown - protocol Link for reference. User HISAT2 the latest version, if you do use this option.

ADD REPLY
1
Entering edit mode
fastq file-->{STAR/BBmap/HISAT2/any splice aware aligner}-->bam file--> featureCounts -> DESeq2/edgeR
fastq file-->KALLISTO/SALMON-->SLEUTH/DESeq2

hg38 was released in December 2013. It will never be finished since human genome is not likely to, for a while. At some point it will be superseded by a new major genome release.

ADD REPLY
0
Entering edit mode

deseq2 and sleuth are both in R, right? is there any other program out there that doesnt need r?

ADD REPLY
1
Entering edit mode

That will pass through peer review? Probably not. R is the standard environment for statistical analysis. That's also used in the standard Galaxy workflows (STAR -> featureCounts -> DESeq2 or hisat2 -> featureCounts -> DESeq2).

ADD REPLY
1
Entering edit mode

dimitrischat : If you have access to commercial software packages (e.g. CLC Genomics Workbench, Partek Genomics Suite, GeneSpring etc) then yes.

ADD REPLY
0
Entering edit mode

...or do all of the calculations manually?

ADD REPLY

Login before adding your answer.

Traffic: 1830 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6