Question

Replace list of sequence names with sequences from a fasta file using R

0

Entering edit mode

5.0 years ago

shelley.w.peterson ▴ 10

I have a list of sequence names (A1, A2, A3, A1, A1, A2 etc) and a fasta file with the names and sequences, and I am trying to find a way to replace each item on the list with the corresponding sequence from the fasta file.

I've used:

test <- sequences[names(sequences) %in% list]

which just extracts A1, A2, A3 and doesn't give me the remaining ones. What am I missing?

list of sequence names:

A1
A2
A3
A1
A1
A2

fasta file:

>A1
ATCATC
>A2
CCCGGG
>A3
GTGTGT
>A4
TCTATC
>A5
ATCTAC

output:

>A1
ATCATC
>A2
CCCGGG
>A3
GTGTGT

Desired output:

ATCATC
CCCGGG
GTGTGT
ATCATC
ATCATC
CCCGGG

R sequence • 1.3k views

ADD COMMENT • link updated 5.0 years ago by Ram 44k • written 5.0 years ago by shelley.w.peterson ▴ 10

0

Entering edit mode

Please give representative in/output.

ADD REPLY • link 5.0 years ago by ATpoint 86k

0

Entering edit mode

list of sequence names:

A1
A2
A3
A1
A1
A2

fasta file:

>A1
ATCATC
>A2
CCCGGG
>A3
GTGTGT
>A4
TCTATC
>A5
ATCTAC

output:

>A1
ATCATC
>A2
CCCGGG
>A3
GTGTGT

Desired output:

ATCATC
CCCGGG
GTGTGT
ATCATC
ATCATC
CCCGGG

ADD REPLY • link updated 5.0 years ago by Ram 44k • written 5.0 years ago by shelley.w.peterson ▴ 10

0

Entering edit mode

Please format your post better. I've done it for you this time.
This shows what you have and what you need, but not what you've tried. Your single line of R code does not show how you read or write files, so we don't know the packages or functions you're using.

ADD REPLY • link 5.0 years ago by Ram 44k

0

Entering edit mode

Is there an instruction segment for how to properly format a post? I was proud enough of myself for thinking to put in "< br >" when pressing the enter button didn't work. I'm a biologist not a programmer, so I don't know these things.

ADD REPLY • link 5.0 years ago by shelley.w.peterson ▴ 10

0

Entering edit mode

Apologies, we do not have a manual for the formatting bar yet. You did a great job with the <br> tags, but the formatting bar is your toolbelt for most tasks.

ADD REPLY • link 5.0 years ago by Ram 44k

0

Entering edit mode

Try dedicated fasta/fastq manipulation tools such as: seqtk, seqkit etc. @ shelley.w.peterson. R code as follows:

> library(Biostrings)
> test=readDNAStringSet("test.fa", format = "fasta")
> names=read.csv("file.txt", header = F, stringsAsFactors = F, strip.white = T)
 > names
  V1
1 A1
2 A2
3 A3
4 A1
5 A1
6 A2
> data.frame("sequences"=test[names$V1])
  sequences
1    ATCATC
2    CCCGGG
3    GTGTGT
4    ATCATC
5    ATCATC
6    CCCGGG

ADD REPLY • link 5.0 years ago by cpad0112 21k

0

Entering edit mode

Thanks so much!!!! As someone who is new to coding, sometimes it's hard to figure out if I'm using the wrong tool/command or if I'm using the correct one the wrong way -_-'

ADD REPLY • link 5.0 years ago by shelley.w.peterson ▴ 10

1

Entering edit mode

I started that way and learnt on the way. Keep visiting biostars @ shelley.w.peterson

ADD REPLY • link 5.0 years ago by cpad0112 21k