Hi all,
I'm trying to find lncRNAs using cufflinks and this protocol: https://pubmed.ncbi.nlm.nih.gov/30945188/
I have assembled transcripts using cufflinks and found class codes from the transcripts using cuffcompare (instead of cuffdiff) which has outputted a .tabular tmap file on the Galaxy website. I have used this tmap file in R to select transcripts with class codes u,i,o and x, which are most likely to be from lncRNA.
However, according to the suggested steps, a GTF file is needed instead of a tabular file as input for gffread to create the 'selected.fa' file. This Fasta file can then be used by coding potential calculator to actually find the lncRNAs (see steps below). Does anyone have an idea how to create this Fasta file with cuffread without using an input gtf file?
Thanks!
#Obtain the selected transcript sequences in the fasta format
gffread -w selected.fa -g Fvesca_226.fa selected.gtf --- fvesca226fa is the reference Fasta, selected.gtf has just been created by cuffdiff
#Find the noncoding transcripts longer than 200 nt from the CPC output
cat cpc.txt | awk ‘$4 < -1 && $2 > 200 {print $0}’ | cut -f1 > non-coding-transcript.txt
Please use the formatting bar (especially the
code
option) to present your post better. You can use backticks for inline code (`text` becomestext
), or select a chunk of text and use the highlighted button to format it as a code block. If your code has long lines with a single command, break those lines into multiple lines with proper escape sequences so they're easier to read and still run when copy-pasted. I've done it for you this time.