how to create a fasta file with subset of sequences using seqinr
1
0
Entering edit mode
5.9 years ago
Camilla ▴ 10

Hi everyone, I am very new in environment R. I am trying to obtain a final fasta file selecting a subset of sequences from a downloaded fasta file, using seqinr. For this purpose I type the following script:

 myfasta<- read.fasta(file = "MRP2.fas", seqtype = "AA",as.string = TRUE, set.attributes = FALSE)
  subsetlist<-read.table("list.txt", header=TRUE)
  myfasta[names(myfasta) %in% subsetlist$id]
  write.fasta(sequences = myfasta, names = names(myfasta), nbchar = 80, file.out = "parsedMRP2.fasta")
  parsedMRP2<- read.fasta("parsedMRP2.fasta", set.attributes = FALSE)"

The MRP2.fas contains 5039 sequences, while the list.txt is a list of 3000 sequences. When I run my script, I obtain final file (parsedMRP2.fasta) always with 5039 sequences instead of the 3000 sequences that I indicated in the list.txt.

Could you please help me? thank you very much in advance

R seqin fasta sequences • 6.4k views
ADD COMMENT
1
Entering edit mode

If you need an alternative then using faSomeRecords from Jim Kent's utilities is a great/fast option. Linux version is linked but macOS available as well. After downloading do not forget to add execute permissions (chmod a+x faSomeRecords).

faSomeRecords - Extract multiple fa records
usage:
   faSomeRecords in.fa listFile out.fa
options:
   -exclude - output sequences not in the list file.
ADD REPLY
2
Entering edit mode
5.9 years ago
Chirag Parsania ★ 2.0k

Because you are not saving data in new object. Before you write in the file, you need to save in another object. See the change below.

myfasta<- read.fasta(file = "MRP2.fas", seqtype = "AA",as.string = TRUE, set.attributes = FALSE)
subsetlist<-read.table("list.txt", header=TRUE)

my_fasta_sub <- myfasta[names(myfasta) %in% subsetlist$id]
write.fasta(sequences = my_fasta_sub, names = names(my_fasta_sub), nbchar = 80, file.out = "parsedMRP2.fasta")
ADD COMMENT
0
Entering edit mode

Thank you very much Chirag for the solution ! Now it works!

ADD REPLY
0
Entering edit mode

Hi. I would like to ask a similar question: I have a Fasta alignment, that I have created a object with: myotis <- read.dna("epfu.fas",format= "fasta") I am interested in using bootstrap to estimate confidence intervals, if it is actually possible. I would like to resample with replacement sequences from that alignment, and then using the function tajimaD from the package strataG or the function tajima.test$D from the package pegas.

ADD REPLY

Login before adding your answer.

Traffic: 1982 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6