Question

Finding lncRNA using cufflinks

0

Entering edit mode

3.3 years ago

bart ▴ 50

Hi all,

I'm trying to find lncRNAs using cufflinks and this protocol: https://pubmed.ncbi.nlm.nih.gov/30945188/

I have assembled transcripts using cufflinks and found class codes from the transcripts using cuffcompare (instead of cuffdiff) which has outputted a .tabular tmap file on the Galaxy website. I have used this tmap file in R to select transcripts with class codes u,i,o and x, which are most likely to be from lncRNA.

However, according to the suggested steps, a GTF file is needed instead of a tabular file as input for gffread to create the 'selected.fa' file. This Fasta file can then be used by coding potential calculator to actually find the lncRNAs (see steps below). Does anyone have an idea how to create this Fasta file with cuffread without using an input gtf file?

Thanks!

#Obtain the selected transcript sequences in the fasta format 
gffread -w selected.fa -g Fvesca_226.fa selected.gtf  --- fvesca226fa is the reference Fasta, selected.gtf has just been created by cuffdiff
#Find the noncoding transcripts longer than 200 nt from the CPC output
cat cpc.txt | awk ‘$4 < -1 && $2 > 200 {print $0}’ | cut -f1 > non-coding-transcript.txt

lncRNA cuffcompare cufflinks • 938 views

ADD COMMENT • link updated 3.3 years ago by Ram 44k • written 3.3 years ago by bart ▴ 50

0

Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. If your code has long lines with a single command, break those lines into multiple lines with proper escape sequences so they're easier to read and still run when copy-pasted. I've done it for you this time.
code_formatting

ADD REPLY • link 3.3 years ago by Ram 44k