Entering edit mode
5.0 years ago
Hi, I want to extract the first N aminoacids from sequences in a fasta file. I have this sequences,
>a47619p2-
MVKIALFGRNITLPILIFIGFVFLHDASAQTATVIDWDQIREASQTQRRQAAAIANAPVK
QGVVHEPIDAGVMAGNVPAEQRNAASIVQSIDGSKLSQISDRLPKFIKQGSDEVVYGKHV
VVSKLGPEVIGLILDLIKAQPANRALLLAKLQAISNDGNPEASNFMGFVFEYGLFGAVKN
for example, I want this sequence with only 30 aa, like:
>a47619p2-
MVKIALFGRNITLPILIFIGFVFLHDASAQ
Is there a program that can do this to all sequences in linux terminal? I hope you can help me. Thank you.
You could convert to tabular format with
seqkit
and use the substring function fromawk
:seqkit subseq -r 1:20
is enough.Be careful! This approach makes a lot of assumptions about the structure of the FASTA file.
Yes, it does. Sorry, I thought the input file was tabular format. I updated the comment.