Question

how can I retrieve FASTA sequence using ID in another text file?

0

Entering edit mode

9.4 years ago

tcf.hcdg ▴ 70

Dear BIostar community

I have around 50,000 sequence identifier in a text file and around 1Million sequences in a large fasta file.

I want to get the sequence of these 50,000 from the fasta file.

I checked the questions and answers already in the community pages. But most of them are using perl,linux,python.

How can it be done in R(fasomerecord/grep/something else)?

Any suggestion?

Thanks

fasta • 2.2k views

ADD COMMENT • link updated 24 months ago by Ram 44k • written 9.4 years ago by tcf.hcdg ▴ 70

1

Entering edit mode

You want this done with grep, but not in linux?

ADD REPLY • link 9.4 years ago by PoGibas 5.1k

0

Entering edit mode

I just want to know it this is possible in R?

ADD REPLY • link updated 24 months ago by Ram 44k • written 9.4 years ago by tcf.hcdg ▴ 70

2

Entering edit mode

Yes, it is (please see this post: How To Extract Multiple Fasta Sequences At A Time From A File Containing Sequences Ids Using R)

ADD REPLY • link 9.4 years ago by PoGibas 5.1k

1

Entering edit mode

samtools faidx is pretty handy.

ADD REPLY • link 9.4 years ago by Ashutosh Pandey 12k

Ram · Answer 1 · 2015-07-08

6

Entering edit mode

9.4 years ago

Devon Ryan 104k

If you really want to do this in R, you could use the seqinr package. You end up loading the fasta file into memory, subsetting the results according to the names() accessor, and then writing the results of that to a file. Presumably you could alternatively pass only the names you want to write.fasta().

ADD COMMENT • link updated 24 months ago by Ram 44k • written 9.4 years ago by Devon Ryan 104k

score 2 · Answer 2 · 2015-07-08

2

Entering edit mode

9.4 years ago

royefendymarbun ▴ 30

From google , may be is : a-little-book-of-r-for-bioinformatics.readthedocs.org/en/latest/src/chapter1.html

ADD COMMENT • link 9.4 years ago by royefendymarbun ▴ 30