Entering edit mode
6.6 years ago
Sharon
▴
610
Hi Every one
Why I am having some annotations missing in cufflinks -g option
chr1 Cufflinks transcript 679399 679585 1000 . . gene_id "CUFF.2"; transcript_id "CUFF.2.1"; FPKM "12.3670741574"; frac "1.000000"; conf_lo "5.199859"; conf_hi "19.534289"; cov "90.723687"; full_read_support "yes";
But I have others is annotated with the gene:
chr1 Cufflinks exon 621096 622034 1000 - . gene_id "OR4F16"; transcript_id "NM_001005277_1"; exon_number "1"; FPKM "0.0586063181"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.201545"; cov "0.396735";
I need to convert this gtf to fasta format and use it for lncRNA pipeline, but this CUFF annotations is causing some problems. I am using the same genome and gtf in calling cufflinks format. I also use Salmon, but don't know if it can give me my assembled transcripts in gtf of fasta ?
Thanks
Hi Sharon, can you please define better "some problems"? Have you tried using the ggfread utility? What has been the result of using it?
Hi Fabio
Thanks. I am using lncscore, it is a tool for lncRNA. it takes assembled transcripts in fasta format, so I convert my transcricpts.gtf to fasta using cufflinks gffread tool to input the transcripts.fa to lncscore.
But when the original transcript.gtf has "Cuff" id instead of a gene id or transcript id, it shows up in the transcript.fasta format with the same cuff code, and this cuff code is not recognized by lncscore and the later fails.
My goal is to detect novel lncRNA, so I am open to different pipelines suggestions too. I am new to this lncRNA stuff :)