Entering edit mode
14 months ago
praasu
▴
40
Hi,
I would like to extract first and last n bp from multiple fasta file in R. Let me know if you guys have any suggestion.
Many thank for your time.
Its not an answer but suggestion. Seqkit's subseq command does this. e.g. https://bioinf.shenwei.me/seqkit/tutorial/#play-with-mirna-hairpins
If you really want to do it in R you would want to loop over as you stream the file or else it would hog your memory really bad. Or you can call seqkit with system :).
https://stackoverflow.com/questions/42492351/stream-processing-large-csv-file-in-r
https://stackoverflow.com/questions/12626637/read-a-text-file-in-r-line-by-line
OP asked the exact same question 2 years ago and got the seqkit subseq answer from shenwei himself. Really odd behavior.