Entering edit mode
5.4 years ago
Benn
8.3k
Maybe a simple question, hopefully I get a simple answer.
I have imported a fasta file of proteins in R with biostrings.
> mySeqs <- readAAStringSet(fastaFile) # from package Biostrings
> mySeqs
A AAStringSet instance of length 318
width seq names
[1] 98 QVQLVQSGAEVKKPGASVKVSC...STAYMELRSLRSDDTAVYYCAR IGHV1-18*01
[2] 92 QVQLVQSGAEVKKPGASVKVSC...TTDTSTSTAYMELRSLRSDDTA IGHV1-18*02
[3] 98 QVQLVQSGAEVKKPGASVKVSC...STAYMELRSLRSDDMAVYYCAR IGHV1-18*03
[4] 98 QVQLVQSGAEVKKPGASVKVSC...STAYMELRSLRSDDTAVYYCAR IGHV1-18*04
[5] 98 QVQLVQSGAEVKKPGASVKVSC...STAYMELSRLRSDDTVVYYCAR IGHV1-2*01
... ... ...
[314] 91 QVQLVQSGSELKKPGASVKVSC...FSLDTSVSTAYLQISTLKAEDT IGHV7-4-1*03
[315] 98 QVQLVQSGSELKKPGASVKVSC...SMAYLQISSLKAEDTAVYYCAR IGHV7-4-1*04
[316] 98 QVQLVQSGSELKKPGASVKVSC...SMAYLQISSLKAEDTAVCYCAR IGHV7-4-1*05
[317] 98 QVQLVQSGHEVKQPGASVKVSC...STAYLQISSLKAEDMAMYYCAR IGHV7-81*01
[318] 98 EAQLTESGGDLVHLEGPLRLSC...YMLYMQMISLRTQNMAAFNCAG IGHV8-51-1*02
So how can I select only sequences with a certain name, lets say all names that contain "IGHV1". I have tried some grep, but that does not work on these objects.
Thanks in advance.
Simpler than mine, +1. I always forget than Biostrings follows the same logic as IRanges/GenomicRanges when it comes to subsetting. You almost never need any
apply
based method :)Great solution, thanks!