How to write a list of DNAStringsets as a FASTA file in R
3
0
Entering edit mode
10.1 years ago
newscient ▴ 20

Starter in using R and Bioconductor for bioinformatics projects, my question is:

I have a list of DNAstringsSets (seen below) and want to use the writeXstringset() function which takes a DNAstringset object as an argument in order to save as a FASTA file.Anyone knows how is it possible to collapse the list of DNAstringsets into a single DNAStringset object?

$NM_008866
  A DNAStringSet instance of length 13
     width seq                                                        names               
 [1]   693 ATGTGCGGCAACAACATGTCCGCTCCGA...GATAAGCTCCTACCTCCAATTGATTGA NM_008866
 [2]    72 ATGGATGGGCAGAAGCCTTTGCAGGTAT...AATACATCTGTCCACATGCCCCTGTGA NM_008866
 [3]   114 ATGGGCAGAAGCCTTTGCAGGTATCAAA...GAATATGGCTATGCCTTCTTGGTTTGA NM_008866
 [4]   213 ATGGCATTCCTTCTAACAGGATTATTTT...AGTGCCATGGAGATTGTGACCCTTTAG NM_008866
 [5]    63 ATGTCAAGCACTTCATTGATAAGCTCCT...TTGATTGACATCACTAAGAGGCCTTGA NM_008866
 ...   ... ...
 [9]   219 ATGGCCCTTCTATTGGGAGACCAGGCTT...CAGAGGCAGGCGGATCTCTGTCAATAG NM_008866
[10]   144 ATGTTATGCTTAAAACCAAATACTGTTC...CAGTCTCCTGTACAAATATTAAAATAA NM_008866
[11]    78 ATGTTGCAAAAATTATGGTTATTTCTGA...CCAACCAACCAAGAAGCACCTTTATAA NM_008866
[12]    75 ATGGTTATTTCTGAACGGTTGCTTTTCT...AGAAGCACCTTTATAAACAGGTGCTAA NM_008866
[13]    90 ATGTCTGGATTTAAAACAATTTCAAACA...AATTTACTTCAGTTATTCTATCTGTAA

$NM_001159750
  A DNAStringSet instance of length 9
    width seq                                                         names               
[1]   903 ATGGAGGACGAGGTGGTTCGCATTGCCA...ATGTGGAAATCGGTGGAAGTTCTGTTGA NM_001159750
[2]   105 ATGGACCATCAACTGATAAAGACCCTGA...AGAGAAGAAAGTTCCAGCAGCAATGTAA NM_001159750
[3]    75 ATGAGACAAATGCTCGAGATACATATGT...CCAAGCACTTCTGATTCTGTGCGATTAA NM_001159750
[4]    75 ATGATTATGTTGCAATTGGAGCTGATGA...ATTGAGGAAGCTATATATCAAGAAATAA NM_001159750
[5]   129 ATGAATGTGGAAATCGGTGGAAGTTCTG...GCCAGGCAACTCGTTTCCTTGCAAGTGA NM_001159750
[6]    63 ATGTGGAAATCGGTGGAAGTTCTGTTGA...AGAATTGGCAAAGTATCTGGACCATTAA NM_001159750
[7]   102 ATGTGTCCCACTTGTTTTGCTAGTAATA...TATAGTAAAGGCCACTTTTATAAATTAA NM_001159750
[8]   102 ATGGAAAACAATATGTCCATGTTAAAAG...CGGGAGGCAGAGGCAGGCGGATTTCTGA NM_001159750
[9]    75 ATGGATAATTTCTGTCACTTTAAAAATA...TAGTTTAAAAGTAATAAGGTTAAAATAG NM_001159750

$NM_011541
  A DNAStringSet instance of length 9
    width seq                                                         names               
[1]   906 ATGGAGGACGAGGTGGTTCGCATTGCCA...ATGTGGAAATCGGTGGAAGTTCTGTTGA NM_011541
[2]   108 ATGGACCATCAACTGATAAAGACCCTGA...GAAGAAAGTAGTTCCAGCAGCAATGTAA NM_011541
[3]    75 ATGAGACAAATGCTCGAGATACATATGT...CCAAGCACTTCTGATTCTGTGCGATTAA NM_011541
[4]    75 ATGATTATGTTGCAATTGGAGCTGATGA...ATTGAGGAAGCTATATATCAAGAAATAA NM_011541
[5]   129 ATGAATGTGGAAATCGGTGGAAGTTCTG...GCCAGGCAACTCGTTTCCTTGCAAGTGA NM_011541
[6]    63 ATGTGGAAATCGGTGGAAGTTCTGTTGA...AGAATTGGCAAAGTATCTGGACCATTAA NM_011541
[7]   102 ATGTGTCCCACTTGTTTTGCTAGTAATA...TATAGTAAAGGCCACTTTTATAAATTAA NM_011541
[8]   102 ATGGAAAACAATATGTCCATGTTAAAAG...CGGGAGGCAGAGGCAGGCGGATTTCTGA NM_011541
[9]    75 ATGGATAATTTCTGTCACTTTAAAAATA...TAGTTTAAAAGTAATAAGGTTAAAATAG NM_011541

$NM_001159751
  A DNAStringSet instance of length 9
    width seq                                                         names               
[1]   939 ATGTGTCCCTCGGTGTGTACCACTTTCC...ATGTGGAAATCGGTGGAAGTTCTGTTGA NM_001159751
[2]   108 ATGGACCATCAACTGATAAAGACCCTGA...GAAGAAAGTAGTTCCAGCAGCAATGTAA NM_001159751
[3]    75 ATGAGACAAATGCTCGAGATACATATGT...CCAAGCACTTCTGATTCTGTGCGATTAA NM_001159751
[4]    75 ATGATTATGTTGCAATTGGAGCTGATGA...ATTGAGGAAGCTATATATCAAGAAATAA NM_001159751
[5]   129 ATGAATGTGGAAATCGGTGGAAGTTCTG...GCCAGGCAACTCGTTTCCTTGCAAGTGA NM_001159751
[6]    63 ATGTGGAAATCGGTGGAAGTTCTGTTGA...AGAATTGGCAAAGTATCTGGACCATTAA NM_001159751
[7]   102 ATGTGTCCCACTTGTTTTGCTAGTAATA...TATAGTAAAGGCCACTTTTATAAATTAA NM_001159751
[8]   102 ATGGAAAACAATATGTCCATGTTAAAAG...CGGGAGGCAGAGGCAGGCGGATTTCTGA NM_001159751
[9]    75 ATGGATAATTTCTGTCACTTTAAAAATA...TAGTTTAAAAGTAATAAGGTTAAAATAG NM_001159751
FASTA DNAstringSet R • 19k views
ADD COMMENT
3
Entering edit mode

Answered at stackexchange using:do.call(c, dna_list)

ADD REPLY
0
Entering edit mode

Thanks, i used the do.call, but first i had to get rid of the names in my DNAStringSet by using

names(dna_list)<- NULL
ADD REPLY
2
Entering edit mode
7.0 years ago
dillonchan97 ▴ 20

Hi, I am aware that this post is very old! I ran into a similar problem, putting many DNAStrings into a list. This is how I solved it:

list_of_DNAStrings # your list of DNAStrings
LargeDNAStringSet <- DNAStringSet(sapply(list_of_DNAStrings, `[[`, 1)) 
# the sapply gets a DNAString from each member of the list. 
# This works if typeof(list_of_DNAStrings[[1]]) is S4, or DNAStrings.
writeXStringSet(LargeDNAStringSet, 'fname.fa')
ADD COMMENT
1
Entering edit mode
10.1 years ago
Ram 44k

Maybe try union() method to join all the DNAStringSets? The manual says it works on XStringSets and DNAStringSet is derived from XStringSet.

ADD COMMENT
0
Entering edit mode

Yes but how can I use union() on a list of DNAStringSet? I see that it needs single DNAStringsets as arguments

ADD REPLY
0
Entering edit mode

I'd either write a loop, like so:

i<-1
growing<-set[i-1]
while (i<=len(set))
to_add<-set[i]
growing<-union(growing,to_add)
i<-i+1
END_LOOP

(Sorry, not too familiar with R)

Or use a Reduce function, from here

ADD REPLY
1
Entering edit mode
5.3 years ago
alslonik ▴ 320

Just in case someone needed to save it as a list of fastas, like me, you can it with a for loop:

for (i in names(myAAStringSetList)) {
  writeXStringSet(myAAStringSetList[[i]], filepath = paste0("path/to/your/directory/",i, ".fasta"), format = "fasta")
}
ADD COMMENT

Login before adding your answer.

Traffic: 1616 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6