How can i find longest sequence in fasta file on terminal
1
0
Entering edit mode
2.2 years ago
logbio ▴ 30

I have protein file in fasta format and I want to find the longest sequence. What should I do? The codes I found did not solve my problem, as they included the nomenclature in the length.

fasta sequence • 2.6k views
ADD COMMENT
0
Entering edit mode

Thank you for answer but your code is not solve my problem :/

ADD REPLY
0
Entering edit mode

so, show us what you tried, and what failed and why.

ADD REPLY
0
Entering edit mode

I thought I included the code but I'm unpublished, sorry.

The fasta file I have is as follows

>NP_000005.3 alpha-2-macroglobulin isoform a precursor [Homo sapiens] MGKNKLLHPSLVLLLLVLLPTDAS....
>NP_000010.1 acetyl-CoA acetyltransferase, mitochondrial isoform b precursor [Homo sapiens] MAVLAALLRSGARSRSPLLRLVQEI... . . .

I want to find the longest sequence in this file excluding naming. However, when I run the code you have given, it does not give an error and does not give any output.

ADD REPLY
0
Entering edit mode

the code(s) was/were an inspiration. You should understand it and adapt it to your needs.

ADD REPLY
0
Entering edit mode
2.2 years ago
liorglic ★ 1.5k

Here is one option using GNU tools only:

grep -v '>' seq.fasta | awk '{print $0"\t"length($0)}' | sort -n -k2 -r
ADD COMMENT
0
Entering edit mode

Your code is work but not gives sequence properties. (ID etc)

ADD REPLY
0
Entering edit mode

Well, you need to specify what you want your output to look like. In general, if you want better control then I suggest you look into bioawk or biopython SeqIO.

ADD REPLY

Login before adding your answer.

Traffic: 1643 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6