Warning: MSG: Replacing one seq (BioPerl)
1
0
Entering edit mode
10.4 years ago
ddusan1 ▴ 50

Hey

I am running a script that gathers Ka/Ks or dN/dS ratios from an alignment fasta file. It keeps returning this warning:

I got the script from here and edited it to suit my needs. Thanks very much for the script man! https://github.com/MadsAlbertsen/miscperlscripts/blob/master/calc.dnds.pl

I understand that this generally means the headers are not unique but they are in this instance. I believe it might not be taking the entire header like I want it to.

--------------------- WARNING ---------------------
MSG: Replacing one sequence [comp17867_c0_seq1|m.61203/1-641

Here's the code: http://freetexthost.com/lnhlri3kbk

If that's a shady site I apologize, still learning.

ka.ks perl bioperl • 2.3k views
ADD COMMENT
0
Entering edit mode

To host a code example, you might better use a gist https://gist.github.com/ or pastebin http://pastebin.com/

They have syntax highlighting and other features handy for code.

ADD REPLY
1
Entering edit mode
10.4 years ago
Neilfws 49k

Tip: when you see a warning or error, simply Google search for the exact message. Someone else has almost always seen and discussed the issue. Examples: here and here.

In this case, the problem is that the sequence identifier is not unique.

ADD COMMENT
0
Entering edit mode

I've read those and done that.

The sequence identifiers are all unique, that's the problem. They are all long strands and every single one is unique in some aspect. I'm theorizing that it's only taking the beginning of each identifier, which is why I asked people to specifically look at that particular script.

ADD REPLY
2
Entering edit mode

Problem is the data, not the script. If you read those links then you should understand that the warning means the identifiers are not unique, since one is replacing another. You may think they're unique, but that's not the same thing :)

"Identifier" has a very specific meaning in a FASTA file. It is the string immediately following the ">" in the header line. If there is a space in the header then the part between ">" and the space is the identifier, the rest is the "description". Examples:

>myseq1 desc1
>myseq1 desc2

Same identifier (myseq1), different descriptions, unique header lines.

So: identifiers are not "long strands" and while the entire header line (identifier + description) may be "unique in some aspect", the identifier alone may not.

BioPerl sequence objects usually use the method display_id to get the identifier. I see that in the script; I don't see anything that would take only the beginning.

ADD REPLY
0
Entering edit mode

Brilliant! Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 1759 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6