I understand the meaning of N50 contig length, but lately I've been coming across assembly statistics that report the N50 contig number. Can someone explain what is meant by N50 contig number and how it relates to assembly quality?
I understand the meaning of N50 contig length, but lately I've been coming across assembly statistics that report the N50 contig number. Can someone explain what is meant by N50 contig number and how it relates to assembly quality?
N50-length is the length of that last contig that puts you over the top. N50-contig number (i.e. N50) is just the rank of that contig.
91 77 70 69 62 56 45 29 16 4[?]N50 vs N50 length[?] Technically N50, as opposed to N50 length, refers to the ordinal of that last contig that pushes it over the brink - in this example 4 (since 69bp is the 4th largest contig). Unfortunately, a higher N50 implies the opposite of a longer N50 length. Some papers refer to N50 length as L50, while most have simply followed the lazy convention of dropping "length" off of "N50 length". I think it is important to include units with your N50 to minimize confusion.
http://jermdemo.blogspot.com/2008/11/calculating-n50-from-velvet-output.html
Recently there seems to be a nice change in nomenclatur:
You can speak of L50 and N50:
At first I was very confused about the change of the meaning of N50 but now I am used to the names. It is very helpful to distinguish the different meanings.
I'm guessing that N50 contig number is the number of contigs with length >= N50 length. Presumably as assembly improves, one would like total contig number to decrease and proportion of total contig number with length > N50 length to increase.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Your very good blog post is the only source I can find for the definition of N50 number. Do you recall any other sources of the definition?
i don't remember how I derived the L50/N50 distinction other than deduction. Google scholar is not helping me find many L50 references. I see these terms used in the sorghum genome paper, but my blog post predates that. I see the Assemblathon http://assemblathon.org/assemblathon-1-results has actually switched the terms such that L50 is now a rank and N50 is a length - this is cute but incorrect.
It seems like this is a slightly different, and perhaps more precise, way of saying what other people have answered: that it is the number of contigs >= N50 length.
right, the only boundary condition really being in the rare situation where if you have a tie at the brink, let's say these are your contigs:
91 77 70 69 69 56 45 29 16 4
I would say N50 is 4, others some might say 5
OK, I see. Thanks for your answer!