Hi,
I have a vector containing small strings of interest:
seq_vector <-c ("NET | NST | NVT | NIT | NCT | NYT | NHT | NRT | NNT | NDT | NTT ")
And I would like to find these small strings in larger strings, which are in my .txt file:
A0A0D9S786..........STDQNHSTETPNLAAAVPSSVSVPR
A0A0D9R8B0........ STEVQGMKVNGTKTDNNEGPK
A0A0D9RJY3........ STHNLQVAALDANGTVVEGPVPITIEVK
This file has ~ 3600 entries
I am able to perform this procedure in a .fasta sequence, using the seqinr, tydiverse and biostrings packages. But I am having trouble making these data above.
Does anyone have any ideas and could help me? I'm only interested in sequences that match
I would like to get something like:
A0A0D9S786.....STDQNHSTETPNLAAAVPSSVSVPR...... NHS
A0A0D9R8B0.....STEVQGMKVNGTKTDNNEGPK ............ NGT
Thank you in advance!
Read in your data using
Have you looked at grep() or the stringr package?