Hi all,
I read on Cufflinks man page that input sam file, coming from others mappers than TopHat, must be sorted this way:
sort -k 3,3 -k 4,4n hits.sam > hits.sam.sorted
However, it is taking ages, and eventually causes the server on which we are doing calculations to crash. Is there any possibility to achieve the same sorting as Cufflinks wants, overcoming this sorting step? I tried samtools sort, but apparently it is not what Cufflinks needs.
Just in case, I also have alignments with STAR and MapSplice, both of them are also apparently "too big" to be handled by sort as cufflinks wants it. If you are wondering why I am not using TopHat for alignment, well you probably don't even imagine how it is slow for alignment. :P
If you have a valid alternative to Cufflinks, I am also open to new software.
Thanks in advance!
And how many reads did you sequence? :-D
You ask for an alternative, what is it that you're aiming to achieve?
Hi! I have 251929648 reads (x2 because of paired-end). I am using Cufflinks for reference-based transcriptome assembly starting from RNA-seq reads, which are aligned back to reference genome, and currently in sam/bam format. The genome has been indexed using reference gtf file. In parallel I am also running de-novo transcriptome assembly, using Trinity. Final aim is to quantify the isoforms in this dataset.
Well that's fairly deep sequenced, might cause you some memory issues indeed. Since Hisat is from the same group as tophat, my guess is that it might also be applicable for Cufflinks. You do not explicitly state that, but I assume you tested without sorting? And you have enough memory and disk space on your machine? Sorting creates temporary files...
I tried with and without sorting (if by that you mean, samtools sort). My command was:
cufflinks --GTF-guide Drosophila_melanogaster.BDGP6.84.gtf --library-type fr-firststrand hisat2_alignment.sam
If I do not sort the input file, then I got the error:
I have 3TB free disk space on the cluster, no RAM issues as well (~250 GB available).
I actually meant just as produced by hisat. With that amount of RAM and disk space you wouldn't expect a server to go down quickly I guess, have you monitored why it crashed?