How to write pairwiseAlignment in R in Fasta Format with Headers?
1
0
Entering edit mode
9.3 years ago
Yosi • 0

Hi there,

I try to write an R-script with which I can align a bench of sequences in one file with a single sequence in another file.

So far, I'm really happy with the results, but I have one big problem. How to write the pairwiseAlignment output in fasta format WITH header names?

I did this (among others):

seq1 <- readDNAStringSet("file1.fasta", use.names=T)
seq2 <- readDNAStringSet("file2.fasta", use.names=T)

mat <- nucleotideSubstitutionMatrix(match = 1, mismatch = -3, baseOnly= F)

# here I will introduce a for-loop to align every sequence against the sequence in file2
globalAlign <- pairwiseAlignment(seq1[1], seq2[1], type='global-local', substitutionMatrix=mat, gapOpening=10, gapExtension=-5)

r = BStringSet( c(toString(subject(globalAlign)), toString(pattern(globalAlign))) )
writeXStringSet(r,"out1.txt")

But my "out1.txt"-outfile looks like this:

>
ATGCGATGCTAGCTGCATAGCTCGATCG
>
ATGCGAT---AGCTGCATAGCT---TCG

Has anyone of you an idea how to include the sequence names so that it will look like this:

>seq_name_1
ATGCGATGCTAGCTGCATAGCTCGATCG
>seq_name_2
ATGCGAT---AGCTGCATAGCT---TCG

Many thanks in advance!

R sequence alignment • 3.1k views
ADD COMMENT
0
Entering edit mode
9.3 years ago
Brice Sarver ★ 3.8k

I tried this with two DNAString instances, a and b, with sequences "ACTG" and "GTCA". I then performed the global alignment the way you did, and I converted them to a BStringSet the same way you did. This is 'r'.

Objects of class XStringSet can have names assigned, but pairwiseAlignments cannot as far as I know. This makes sense; it's just a comparison, and you ought to know what you're comparing beforehand. This doesn't help when you want to write out the files. You can add the names this way:

> r
  A BStringSet instance of length 2
    width seq
[1]     4 GTCA
[2]     4 ACTG
> names(r)
NULL
> names(r) <- c("a", "b")
> r
  A BStringSet instance of length 2
    width seq                                               names
[1]     4 GTCA                                              a
[2]     4 ACTG                                              b​
ADD COMMENT

Login before adding your answer.

Traffic: 2516 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6