extract the longest isoform from multi fasta file
1
0
Entering edit mode
8.0 years ago
Alex ▴ 50

Hi; I have a fasta file of transcript sequences and some of the transcripts are in multiple isoforms. I want to make a uniq list of the transcripts and choose the longest sequence where a transcript has several isoforms. Original:

PB.1.1|1:94-3818(+)|c14521/f1p1/240(240 is the sequence length)
AATGGGAGAAGGCTGGCATTGAACAAGACTATGTTAGTAGGATGTTGTTGAAGTATCCAT GGATTCTTTCAACGAGTGTGATAGAGAACTACAGTCAAATGCTGTTGTTTTTCAACCAAA AAAGGATTTCCAGTACAGTCCTCGCTATTGCTGTGAAAAGTTGGCCTCATATTCTTGGCT CCTCTTCAAAAAGAATGAATTCAGTTTTGGAGCTGTTTCATGTTCTGGGCATCAGTAAAA

PB.1.2|1:1699-3803(+)|c70289/f1p2/124(124 is the sequence length)
TTACAGTATTGAATTTGTTATGAAACCAAAGCTTGAGTTTCTGCTAAGAACCATGAAGAA GCCACTTAAAGCAGTTGTAGAATACCCAAGGTACTTCAGTTATTCACTCGAGGGGAAGATTTAC

ideal Unique isoform:

PB.1.1|1:94-3818(+)|c14521/f1p1/214 
AATGGGAGAAGGCTGGCATTGAACAAGACTATGTTAGTAGGATGTTGTTGAAGTATCCAT GGATTCTTTCAACGAGTGTGATAGAGAACTACAGTCAAATGCTGTTGTTTTTCAACCAAA AAAGGATTTCCAGTACAGTCCTCGCTATTGCTGTGAAAAGTTGGCCTCATATTCTTGGCT CCTCTTCAAAAAGAATGAATTCAGTTTTGGAGCTGTTTCATGTTCTGGGCATCAGTAAAA

Have any awk scripts or tools to obtain the sequence,Thank you very much

Thanks

Alex

RNA-Seq • 5.5k views
ADD COMMENT
0
Entering edit mode

Would PB.1.1 be the first isoform, and PB.1.2 be the second for the gene cluster, given your headers? You mentioned awk, but I like to do this with python, and a default dict where the "gene id" is the key, and the values are the sequence lengths. Then you can grab the longest sequence in each value list. Your approach would depend on how big your fasta file is, whether you would store this dict to memory or loop through is sequence and write out.

ADD REPLY
1
Entering edit mode
8.0 years ago
Farbod ★ 3.4k

Dear Alex, Hi.

You can have a look at perl Script to get the longest isoform for each 'gene' of Trinity team.

And please be aware that : The longest transcript isn't always the best transcript!

Instead you can collect longest ORF for each gene using tools like Transdecoder.

ADD COMMENT
0
Entering edit mode

Thanks ,I have solved it

ADD REPLY

Login before adding your answer.

Traffic: 1932 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6