Cuffmerge running time
0
0
Entering edit mode
3.3 years ago
bart ▴ 50

Hi,

I'm new to RNA seq and I'm trying to find LncRNA using this protocol: https://link.springer.com/protocol/10.1007%2F978-1-4939-9045-0_13.

As per the protocol, I used this command, but used 2 threads instead of 4 on my MacBook.

cuffmerge -p 2 -g gencode.v19.annotation.gtf_withproteinids -s GRCh37.p13.genome.fa assembly_GTF_list.txt

The command runs fine, but I'm left with this message:

[10:27:12] Assembling transcripts and estimating abundances

for almost 24 hours now.

Assembly_GTF_list.txt only includes one GTF file because I'm doing a proof-of-concept to know if it is possible to find lncRNA in my RNA seq files and I want to get to know the protocol better.

Does anyone know if this running time is normal for this command?

Thanks!

Cuffmerge • 1.2k views
ADD COMMENT
1
Entering edit mode

I don't have much idea about the time taken by cuffmerge. But please understand cuffmerge is for merging different individual assemblies making it easier to produce a single assembly GTF file suitable for downstream analysis. However, if you want to try the concept of finding LncRNAs using just 1 sample, so what is the need of using cuffmerge in that case. I would rather suggest you to directly use cuffcompare or gffcompare for comparing your single assembled gft file with that of reference genome gtf to assign class codes to each transcript.

In future when you want to work with more than 1 sample you can first merge individual assemblies using stringtie merge as it is latest compare to cuffmerge. Then Merged GTF generated can be compared to reference GTF using gffcompare

stringtie --merge -p 4 -G Reference_genome.gtf -o merged_output.gtf Assembly_GTF_list.txt

gffcompare -r Reference_genome.gtf -o comparative_out.gft merged_output.gtf 
ADD REPLY
0
Entering edit mode

Hey thanks for the response! I hope you don't mind if I ask another question. So I should first do cufflinks on one sample and then cuffcompare or gffcompare to get the class codes, but as a GTF file, should I use the comprehensive gene annotation file or the Lnc annotation file on the gencode website: https://www.gencodegenes.org/human/release_19.html? Or does this not matter because I will have the same results? Because I am only interested in the lncRNA and will discard all other RNA anyway.
And which cuffcompare output file has the correct class codes? Is it refmap file or tmap file?

Also, I saw that in the cufflink output named 'gene expression', there is also a column for class codes, but this is only filled with the '-' sign. Is this ok?

ADD REPLY
1
Entering edit mode

First thing I would suggest is please substitute your cufflink pipeline with stringtie.

cuffcompare or gffcompare to get the class codes, but as a GTF file, should I use the comprehensive gene annotation file or the Lnc annotation file

For gffcompare, you can try both the ways i.e considering comprehensive gene annotation file and the Lnc annotation file, then from the results, you should decide. Means you should check which way you are able to get maximum class-code of your need.

And which cuffcompare output file has the correct class codes? Is it refmap file or tmap file?

You should refer tmap file and combined.gft file for getting class codes. I prefer using gft file.

Also, I saw that in the cufflink output named 'gene expression', there is also a column for class codes, but this is only filled with the '-' sign. Is this ok?

I don't have idea about that output. But when you are interested in getting expression of transcripts, you have to rerun stringtie again after stringtie merge step with "-e" option.

Hope this help.

ADD REPLY

Login before adding your answer.

Traffic: 1433 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6