Dear BIostar community
I have around 50,000 sequence identifier in a text file and around 1Million sequences in a large fasta file.
I want to get the sequence of these 50,000 from the fasta file.
I checked the questions and answers already in the community pages. But most of them are using perl,linux,python.
How can it be done in R(fasomerecord/grep/something else)?
Any suggestion?
Thanks
You want this done with
grep
, but not in linux?I just want to know it this is possible in R?
Yes, it is (please see this post: How To Extract Multiple Fasta Sequences At A Time From A File Containing Sequences Ids Using R)
samtools faidx is pretty handy.