Entering edit mode
2.2 years ago
raywong.chn
▴
10
Sequence FASTA files comes with 60 characters per line.
If a sequence motif is placed in separate lines, searching it (i.e. Ctrl+F) would cause a miss.
For example
>sp|P30872|SSR1_HUMAN Somatostatin receptor type 1 OS=Homo sapiens OX=9606 GN=SSTR1 PE=1 SV=1
MFPNGTASSPSSSPSPSPGSCGEGGGSRGPGAGAADGMEEPGRNASQNGTLSEGQGSAIL
ISFIYSVVCLVGLCGNSMVIYVILRYAKMKTATNIYILNLAIADELLMLSVPFLVTSTLL
RHWPFGALLCRLVLSVDAVNMFTSIYCLTVLSVDRYVAVVHPIKAARYRRPTVAKV**VNLG**
**VWVL**SLLVILPIVVFSRTAANSDGTVACNMLMPEPAQRWLVGFVLYTFLMGFLLPVGAIC
LCYVLIIAKMRMVALKAGWQQRKRSERKITLMVMMVVMVFVICWMPFYVVQLVNVFAEQD
DATVSQLSVILGYANSCANPILYGFLSDNFKRSFQRILCLSWMDNAAEEPVDYYATALKS
RAYSVEDFQPENLESGGVFRNGTCTSRITTL
If I Ctrl+F and search the bold faced "VNLGVWVL" I'd get nothing.
This seems like a quite basic but frequently used demand.
I'm wondering if there is a handy editor/plugin to overcome this issue? I'm using Vscode.
Any suggestions would be appreciated!
Ray
Not always. That is simply done for formatting reasons by many programs. Internally programs will remove the line breaks and use the sequence as a single line.
You should use a proper sequence focused tool like
seqkit grep
(LINK) for this purpose, unless you are actually looking to find these in an editor/visual interface.