How can extract nucleotide frequency for position on aligned sequences in clustal omega?

How can extract nucleotide frequency for position on aligned sequences in clustal omega?

0

Entering edit mode

7.4 years ago

algas2006 • 0

hi

I am begginer in Bioinformatics and I have a laptop with OS Ubuntu.

The question is:

How can extract nucleotide frequency for position on aligned sequences in clustal omega?

Thanks

Cèsar

sequencing frecuency of nucleotides clustal omega • 4.4k views

ADD COMMENT • link updated 7.0 years ago by nhaituan ▴ 10 • written 7.4 years ago by algas2006 • 0

1

Entering edit mode

7.4 years ago

Jean-Karim Heriche 27k

You could use the seqinr package for R. First read in your alignment with the read.alignment() function then use the consensus() function with method="profile" to get a matrix of counts of each nucleotide at each position.

ADD COMMENT • link 7.4 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Hi Jean-Karim Heriche

Thanks for your answer. I had installed R and seqinr, but I got a error message.

The file with the alignement is in: home/Documents/A/B/file1.clustal

c@c-HP-245-G4-Notebook-PC:~$

c@c-HP-245-G4-Notebook-PC:~$cd Documents/

c@c-HP-245-G4-Notebook-PC:~/Documents$

c@c-HP-245-G4-Notebook-PC:~/Documents$ cd A/

c@c-HP-245-G4-Notebook-PC:~/Documents/A$

c@c-HP-245-G4-Notebook-PC:~/Documents/A$ cd B/

c@c-HP-245-G4-Notebook-PC:~/Documents/A/B$

c@c-HP-245-G4-Notebook-PC:~/Documents/A/B$ R

R version 3.0.2 (2013-09-25) -- "Frisbee Sailing" Copyright (C) 2013 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit)

library("seqinr")

The manual say we have to that write:

clustal.res <- read.alignment(file = system.file("home/Documents/A/B/file1.clustal", package = "seqinr"), format="clustal") Error en read.alignment File is not readable

I tried save the file in fasta format in the site of clustal omega and change the command > fasta.res <- read.alignment but the result is the same.

can you help me to localizate the error?

Thanks

Cesar

ADD REPLY • link 7.4 years ago by algas2006 • 0

0

Entering edit mode

Don't use system.file() here. This is used for files that come with packages. Also note that you may be missing the first / in the file path. In any case, make sure the path is correct. You just need

read.alignment( file = "/correct/path/to/file.clustal", format="clustal")

If you still get an error, check also the file permissions.

ADD REPLY • link 7.4 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Jean-Karim Heriche Thanks for your answer!

I ran the correct command and I can see the alignment in the ubuntu terminal.

read.alignment( file = "/home/Documents/A/B/clustalomega1.clustal", format="clustal")

In the final lines I can read this.

$com

[1] NA

attr(,"class")

[1] "alignment"

For use the function “consensus”, I do not know the use of “matali”

I Ran

consensus(alignment, method = "profile")

Error en inherits(matali, "alignment") : objeto 'alignment' no encontrado

Could you help me to localizate the error?

Thanks Cesar

ADD REPLY • link 7.4 years ago by algas2006 • 0

0

Entering edit mode

In general, when programming, one assigns the result of a function to a variable so that it can be reused, e.g. in your case:

msa <- read.alignment( file = "/home/Documents/A/B/clustalomega1.clustal", format="clustal")
prof <- consensus(msa, method = "profile")

Although we do occasionally answer such questions here, this forum is not for programming questions. I would encourage you to first learn how to program and then to direct your programming questions to the Stack Overflow forum.

ADD REPLY • link 7.4 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Thanks for your attention!

ADD REPLY • link 7.4 years ago by algas2006 • 0

1

Entering edit mode

7.4 years ago

Joe 21k

Another similar option would be with BioPython, to extract a PSSM (Position Specific Score Matrix):

from Bio import SeqIO
from Bio.Align import AlignInfo

align = AlignIO.read('~/path/to/alignment.aln', 'format') # Whatever format your mutliple sequence alignment is in
summary_align = AlignInfo.SummaryInfo(align)
consensus = summary_align.dumb_consensus()
my_pssm = summary_align.pos_specific_score_matrix(consensus,chars_to_ignore = ['N'])
print(my_pssm)

Should end up looking something like:

Instructions from: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc301

ADD COMMENT • link 7.4 years ago by Joe 21k

0

Entering edit mode

Thanks for your answer!

Cesar

ADD REPLY • link 7.4 years ago by algas2006 • 0

0

Entering edit mode

7.0 years ago

nhaituan ▴ 10

Following method suggested by Healey,

I really like to output the PSSM ouput with frequency in each cell as you mentioned. However, I follow your method above, the method gives me something like: 0 7.0 0 7.0 0 0 0 7.0 in the cells ect ... within the output. It seems to be threshold 70%. But I really want actual frequencies.

Could anyone share how to output the actual frequencies of alleles within each cell, assuming that the left side is position of an alignment?

Highly appreciate you suggestions. Thank you.

ADD COMMENT • link 7.0 years ago by nhaituan ▴ 10

0

Entering edit mode

This would probably be better as a new question so you can demonstrate your code and input data.

The dumb consensus has a threshold of 0.7 by default (I think), but to my knowledge that doesn’t affect the PSSM; in my example code above the consensus sequence is purely used as a ‘marker sequence’ down the left hand side of the PSSM. There is nothing in the documentation, at a glance, about a default threshold but I may well be wrong (and can’t currently check).

ADD REPLY • link 7.0 years ago by Joe 21k

Login before adding your answer.

Similar Posts

Loading Similar Posts

Traffic: 1880 users visited in the last hour

Content Search
Users
Tags
Badges

Help About
FAQ

Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the

version 2.3.6