Hi all,
I want to run cufflinks pipeline explained in it's site. I have used the paired-end sample from the Illumina body map project with B20G06AAX1_s7 assay name. Size of data after unzipping are 13G for each pairs.
For the reference file I've used the genome and annotation files from the geneCode site.
First step of pipeline is transcript assembly for generating .gtf file. For this purpose, I've used the following commad:
cufflinks -b genome_38.fa -g gencode.v21.chr_patch_hapl_scaff.annotation.gtf ERR030885.sam
ERR030885.sam is the output of tophat alignment.
This step takes a lots of time. After 2 weeks only half of processing is done on a computer with Corei7 processor and 32GB of RAM. I don't think it was supposed to take his long, please tell me where I am wrong?
Thanks.
Mansoor.
You should know that the old 'Tuxedo' pipeline of Tophat and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using kallisto or salmon.
Thanks a lot, yes you right, In most recent paper reported that cufflinks output not reliable, but I just want to run it's pipeline completely for a comparison