how can I retrieve FASTA sequence using ID in another text file?
2
0
Entering edit mode
9.4 years ago
tcf.hcdg ▴ 70

Dear BIostar community

I have around 50,000 sequence identifier in a text file and around 1Million sequences in a large fasta file.

I want to get the sequence of these 50,000 from the fasta file.

I checked the questions and answers already in the community pages. But most of them are using perl,linux,python.

How can it be done in R(fasomerecord/grep/something else)?

Any suggestion?

Thanks

fasta • 2.2k views
ADD COMMENT
1
Entering edit mode

You want this done with grep, but not in linux?

ADD REPLY
0
Entering edit mode

I just want to know it this is possible in R?

ADD REPLY
2
Entering edit mode
ADD REPLY
1
Entering edit mode

samtools faidx is pretty handy.

ADD REPLY
6
Entering edit mode
9.4 years ago

If you really want to do this in R, you could use the seqinr package. You end up loading the fasta file into memory, subsetting the results according to the names() accessor, and then writing the results of that to a file. Presumably you could alternatively pass only the names you want to write.fasta().

ADD COMMENT
2
Entering edit mode
9.4 years ago
From google , may be is : a-little-book-of-r-for-bioinformatics.readthedocs.org/en/latest/src/chapter1.html
ADD COMMENT

Login before adding your answer.

Traffic: 1835 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6