Text editor with sequence (i.e. FASTA) search features?
2
0
Entering edit mode
2.2 years ago
raywong.chn ▴ 10

Sequence FASTA files comes with 60 characters per line.

If a sequence motif is placed in separate lines, searching it (i.e. Ctrl+F) would cause a miss.

For example

>sp|P30872|SSR1_HUMAN Somatostatin receptor type 1 OS=Homo sapiens OX=9606 GN=SSTR1 PE=1 SV=1
MFPNGTASSPSSSPSPSPGSCGEGGGSRGPGAGAADGMEEPGRNASQNGTLSEGQGSAIL
ISFIYSVVCLVGLCGNSMVIYVILRYAKMKTATNIYILNLAIADELLMLSVPFLVTSTLL
RHWPFGALLCRLVLSVDAVNMFTSIYCLTVLSVDRYVAVVHPIKAARYRRPTVAKV**VNLG**
**VWVL**SLLVILPIVVFSRTAANSDGTVACNMLMPEPAQRWLVGFVLYTFLMGFLLPVGAIC
LCYVLIIAKMRMVALKAGWQQRKRSERKITLMVMMVVMVFVICWMPFYVVQLVNVFAEQD
DATVSQLSVILGYANSCANPILYGFLSDNFKRSFQRILCLSWMDNAAEEPVDYYATALKS
RAYSVEDFQPENLESGGVFRNGTCTSRITTL

If I Ctrl+F and search the bold faced "VNLGVWVL" I'd get nothing.

This seems like a quite basic but frequently used demand.
I'm wondering if there is a handy editor/plugin to overcome this issue? I'm using Vscode.

Any suggestions would be appreciated!

Ray

FASTA search • 1.1k views
ADD COMMENT
1
Entering edit mode

Sequence FASTA files comes with 60 characters per line.

Not always. That is simply done for formatting reasons by many programs. Internally programs will remove the line breaks and use the sequence as a single line.

You should use a proper sequence focused tool like seqkit grep (LINK) for this purpose, unless you are actually looking to find these in an editor/visual interface.

ADD REPLY
1
Entering edit mode
2.2 years ago
iraun 6.2k

What about converting multi-line fasta file to single-line fasta?

awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);}  END {printf("\n");}' multiline.fa > singleline.fa
ADD COMMENT
0
Entering edit mode
2.2 years ago
5heikki 11k

Emacs dna-mode dna-isearch-forward doesn't care about linebreaks..

ADD COMMENT

Login before adding your answer.

Traffic: 2210 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6