Question

RNA seq pipeline

1

Entering edit mode

7.2 years ago

dimitrischat ▴ 210

Hello again. I am new to bioninformatics and i need some guidance. I am trying to do an RNA seq analysis but the output seems a bit wrong. I got a type of cells for example h at 0,3 and 6 hour, and i got 2 replicates. And i want to compare 0h to 3h and 0h to 6h. So my sampels are: h0b,h0c,h3b,h3c,h6b,h6c(each .fastq, for example h0b.fastq is about 5.8g). I got the gtf for hg19 from table browser in ucsc(Refseq genes). And my pipeline is :

tophat2 --library-type fr-unstranded -g 1 -G ...hg19.gtf ....bowtie2index/hg19genome h0b.fastq
cufflinks -g ...hg19.gtf -o ./cufflinks ...acceptedhits.bam
cuffmerge -g ...hg19.gtf h0-3.txt
cuffdiff -L h0,h3 ...h0-3merged.gtf h0bacceptedhits.bam,h0cacceptedhits.bam h3bacceptedhits.bam,h3cacceptedhits.bam

I dont specify -p because i have seen many bugs in case i used it. Small in size outputs. And i say a bit wrong output because of the difference of results with a colleague who ran this process in usegalaxy ( but i am trying to learn how to use terminal ) and the gene_exp.diff ( open with excel ) seems a bit off.

Do you see any msitakes i made ? Could i clarify something better for you? Because i am pretty sure i dont explain everything as good as i could.

i uploaded an output so you can see my problem: https://files.fm/u/vss3dd6v and if you select only the significant, they are very few..

thanks in advance..

RNA-Seq • 3.5k views

ADD COMMENT • link 7.2 years ago by dimitrischat ▴ 210

3

Entering edit mode

I strongly encourage you to not use tophat2 or any of the cuff* tools. These are out-dated and no longer best-practice. You are encouraged to use more modern aligners, such as STAR or hisat2. You'd also be better served with featureCounts and DESeq2/edgeR/limma rather than cufflinks. These will also prove to be much faster. Finally, for new projects, you'd be best served by using hg38 rather than hg19.

ADD REPLY • link 7.2 years ago by Devon Ryan 105k

0

Entering edit mode

Thank you for your reply. So what pipeline should i use? I have no idea with these programs. Also is hg38 complete? Someone told me that it isnt finished(?) yet?

ADD REPLY • link 7.2 years ago by dimitrischat ▴ 210

0

Entering edit mode

I have seen many post suggesting the usage of Hisat over tophat. It is peculiar to observ odd behaviour. One of human rnaseq, hisat gave me an alignment % of 59 but tophat an overall of 91 for same against grch38.p10..

ADD REPLY • link 7.2 years ago by popayekid55 ▴ 110

0

Entering edit mode

Hopefully you used hisat2 rather than hisat.

ADD REPLY • link 7.2 years ago by Devon Ryan 105k

0

Entering edit mode

yes, hisat2 was used.

ADD REPLY • link 7.2 years ago by popayekid55 ▴ 110

1

Entering edit mode

Please use a current program for RNAseq analysis. There are plenty out there (STAR, HISAT2, BBMap and more) and you would have an overall better experience with any one of them.

ADD REPLY • link 7.2 years ago by GenoMax 148k

2

Entering edit mode

You beat me by a minute!

ADD REPLY • link 7.2 years ago by Devon Ryan 105k

1

Entering edit mode

We need to have a "sticky" text for TopHat questions that can be reused. TopHat still seems attractive to new users since it is a "complete" pipeline.

ADD REPLY • link 7.2 years ago by GenoMax 148k

2

Entering edit mode

Perhaps we should have a bot which posts default reactions to questions mentioning 'Tophat'.

ADD REPLY • link 7.2 years ago by WouterDeCoster 47k

1

Entering edit mode

What about a 'latest technology' section for programs? As an example, even I, before I joined, was not aware that HISAT2 had replaced TopHat2. I gave up TopHat2 for Kallisto long ago though.

ADD REPLY • link 7.2 years ago by Kevin Blighe 88k

0

Entering edit mode

thank you very much. What is the pipeline with those? I could really use your help.

ADD REPLY • link 7.2 years ago by dimitrischat ▴ 210

2

Entering edit mode

SALMON - Getting started
Kallisto - Getting started
STAR - manual
HISAT/StringTie/Ballgown - protocol Link for reference. User HISAT2 the latest version, if you do use this option.

ADD REPLY • link 7.2 years ago by GenoMax 148k

1

Entering edit mode

fastq file-->{STAR/BBmap/HISAT2/any splice aware aligner}-->bam file--> featureCounts -> DESeq2/edgeR
fastq file-->KALLISTO/SALMON-->SLEUTH/DESeq2

hg38 was released in December 2013. It will never be finished since human genome is not likely to, for a while. At some point it will be superseded by a new major genome release.

ADD REPLY • link 7.2 years ago by GenoMax 148k

0

Entering edit mode

deseq2 and sleuth are both in R, right? is there any other program out there that doesnt need r?

ADD REPLY • link 7.2 years ago by dimitrischat ▴ 210

1

Entering edit mode

That will pass through peer review? Probably not. R is the standard environment for statistical analysis. That's also used in the standard Galaxy workflows (STAR -> featureCounts -> DESeq2 or hisat2 -> featureCounts -> DESeq2).

ADD REPLY • link 7.2 years ago by Devon Ryan 105k

1

Entering edit mode

dimitrischat : If you have access to commercial software packages (e.g. CLC Genomics Workbench, Partek Genomics Suite, GeneSpring etc) then yes.