Question

How do I understand -Z option of infernal cmscan?

2

Entering edit mode

2.8 years ago

Naemul ▴ 20

The userguide says:

But I find another explanation saying that，the -Z should be set to the size of genome times 2.

I am in a mess now. How should I set the -Z , and I also want to know what's the E-value?

RNA ncRNA cmscan infernal • 1.2k views

ADD COMMENT • link updated 2.8 years ago by Dunois ★ 2.9k • written 2.8 years ago by Naemul ▴ 20

2

Entering edit mode

I believe the question about -Z has been answered already.

As for the E-value, it is basically the number of matches you would expect to get purely by chance that are of a similar quality as the current match (or better). So if you have an E-value of 1, you could expect to get up to one extra match (purely by chance) whose alignment to the query is as good as the match you're looking at currently.

ADD REPLY • link 2.8 years ago by Dunois ★ 2.9k

score 3 · Accepted Answer · 2022-10-10

3

Entering edit mode

2.8 years ago

LDT ▴ 340

To calculate Z please visit this link and use the function

esl-seqstat my_reference.fasta*

reference fasta should be your genome.

E-score: The E-value is the statistical significance of the hit: the number of hits we’d expect to score this highly in a database of this size (measured by the total number of nucleotides) if the database contained only nonhomologous random sequences. The lower the E-value, the more significant the hit.

Z: This option ensures that the reported E-values are accurate.

I hope that his helps:

ADD COMMENT • link 2.8 years ago by LDT ▴ 340

0

Entering edit mode

Thank you! very useful instruction.

ADD REPLY • link 2.8 years ago by Naemul ▴ 20

1

Entering edit mode

You are welcome Naemul, feel free to upvote the question or accept the answer if it was useful. In that way we might be able to help more members of the community to solve a similar problem

ADD REPLY • link 2.8 years ago by LDT ▴ 340