interpretation Abyss stats
2
0
Entering edit mode
7.3 years ago
milanlove • 0

i used Abyss to assemble some reads and after assembly, it give me stats like this:

n:114915

min:500

n50:739

sum:2471869

it say me that minimum contig size is 500 while in output fasta file i see contigs less than 500. it confused me. also when i use grep command (grep -v ">" input.fasta | wc | awk '{print $3-$1}') to get nucleotide count it give me 17458192 that is so bigger than 2471869. what is problem?

Abyss Assembly next-gen • 4.5k views
ADD COMMENT
0
Entering edit mode

Could you provide more details of the parameters used or any log information

ADD REPLY
3
Entering edit mode
7.3 years ago
benv ▴ 730

Hi Sej,

Sorry for the late reply.

When calculating stats such as N50 and sum (a.k.a. reconstruction), ABySS discards all sequences below a minimum length threshold (default value: 500 bp). This is usual practice in the assembly domain because de novo assemblies typically contain a subset of "junk" sequences that are very short (e.g. k bp), and are caused by a mixture of: sequencing errors, uncollapsed heterozygosity, and unresolved repeat sequences.

The min field reported by ABySS is the length of the shortest sequence that is above or equal to the length cutoff (default value: 500 bp).

Stats in ABySS are calculated by the abyss-fac program. If you want to calculate stats with a different minimum length cutoff, you can run abyss-fac on your FASTA file and specify a different cutoff with the -t option.

For a more detailed description of the stats reported by ABySS, please see: https://github.com/bcgsc/abyss/wiki/ABySS-File-Formats#stats

ADD COMMENT
0
Entering edit mode

your answer is complete, thanks a lot

ADD REPLY
1
Entering edit mode
ADD COMMENT

Login before adding your answer.

Traffic: 2655 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6