Good evening,
I have .fasta
file transcripts_longestiso.fasta
(output of rnaSPAdes
) that looks like this
>NODE_1_length_24937_cov_13085.239743_g0_i0
CCTTTTTATTTCTCATCAAATGAATGGCATCTTCTTCTGGAA...
>NODE_7_length_14915_cov_350.128521_g1_i0
AAATAGAAGGGTCT...
etc
I need to discard transcripts with ORF length less than 100 amino acids (300 nucleotides). To do this, I launched
TransDecoder.LongOrfs -t transcripts_longestiso.fasta
I got longest_orfs.pep
containing the predicted amino acid sequences, if I understand correctly. Like this:
>NODE_1_length_24937_cov_13085.239743_g0_i0.p1 type:complete len:247 gc:universal NODE_1_length_24937_cov_13085.239743_g0_i0:9145-9885(+)
MRPLEAP...
>NODE_1_length_24937_cov_13085.239743_g0_i0.p2 type:complete len:247 gc:universal NODE_1_length_24937_cov_13085.239743_g0_i0:15793-15053(-)
MRPLE...
etc
And longest_orfs.gff3
NODE_1_length_24937_cov_13085.239743_g0_i0 transdecoder gene 1 24937 . + .
ID=GENE.NODE_1_length_24937_cov_13085.239743_g0_i0~~NODE_1_length_24937_cov_13085.239743_g0_i0.p1;Name=ORF%20type%3Acomplete%20len%3A247%20%28%2B%29
NODE_1_length_24937_cov_13085.239743_g0_i0 transdecoder mRNA 1 24937 . + .
ID=NODE_1_length_24937_cov_13085.239743_g0_i0.p1;Parent=GENE.NODE_1_length_24937_cov_13085.239743_g0_i0~~NODE_1_length_24937_cov_13085.239743_g0_i0.p1;Name=ORF%20type%3Acomplete%20len%3A247%20%28%2B%29
And other files that are transdecoder
output.
But I would like to get the same fasta file at the output, only filtered. With whole transcripts and in a nucleotide record. But those transcripts that do not meet my condition should not be included in this file.
For example, if in NODE_1_length_24937_cov_13085.239743_g0_i0
transcript transdecoder
found an ORF with appropriate length, and in NODE_7_length_14915_cov_350.128521_g1_i0
- did not find it, then the output fasta file should be
>NODE_1_length_24937_cov_13085.239743_g0_i0
CCTTTTTATTTCTCATCAAATGAATGGCATCTTCTTCTGGAA...
Could you please tell me how to do this?
Thank you in advance!
Best regards, Poecile
It's tl;dr and confusing.