I am wanting to extract the sequences in a fasta file as specific positions. I have downloaded my fasta file (f) in r by using the seqinr package.
f <- read.fasta("sequence.fasta")
I have a data frame (df1) that contains the positions I want to extract sequences at. My dataframe looks like this:
V1 strand
6002 -
5679 -
10123 -
7685 -
13563 -
15588 -
16064 -
21042 -
..
What I am wanting to get is the sequence 50 back from this position and 5 ahead and output it to my data frame. Df1 contains the negative strand and I also have a Df2 which is the positive strand. I am wanting the extract these sequences from the fasta file for both strands. Does anyone know how I can do this?
Desired output would look like:
V1 strand sequence
6002 - agggacc...
5679 - ctttgac...
10123 -
7685 -
13563 -
15588 -
16064 -
21042 -
..
Look into the stringr package https://evoldyn.gitlab.io/evomics-2018/ref-sheets/R_strings.pdf
I think you could use str_sub with your coordinates (n-5 to n+50 or n-50 to n+5, I can't tell from your post).
Is "n" my V1 column?