How do I access genome coverage using SPADES?
3
2
Entering edit mode
9.3 years ago
fhsantanna ▴ 620

I have assembled bacterial genomes using SPADES. Now I am going to submit them to Genbank, but I need to know the coverage of each assembly. Should I provide the raw read coverage or the filtered final coverage? If the second possibility is true, how do I access these values from the SPADES log file?

SPADES genome coverage • 13k views
ADD COMMENT
0
Entering edit mode

Hello All,

This post was useful - thanks! I have got this result by using bbmap.sh on my data. Could you please tell me how to interpret the coverage here? Average coverage is 209.654 - what does this mean? I would really appreciate your input.

Genome:                 1
Key Length:             13
Max Indel:              16000
Minimum Score Ratio:    0.56
Mapping Mode:           normal
Reads Used:             1821574 (553015125 bases)

Mapping:                1648.545 seconds.
Reads/sec:              1104.96
kBases/sec:             335.46


Read 1 data:            pct reads       num reads       pct bases          num bases

mapped:                  99.2931%         1808698        99.2358%          548788917
unambiguous:             80.4977%         1466326        83.8068%          463464066
ambiguous:               18.7954%          342372        15.4290%           85324851
low-Q discards:           0.0000%               0         0.0000%                  0

perfect best site:        4.4512%           81082         2.4810%           13720350
semiperfect site:        17.7655%          323611        14.4467%           79892367

Match Rate:                   NA               NA        10.2835%          499947774
Error Rate:              90.2312%         1639609        88.7859%         4316473287
Sub Rate:                28.4581%          517117         0.0276%            1340991
Del Rate:                68.6627%         1247684        88.7119%         4312877825
Ins Rate:                53.1733%          966223         0.0464%            2254471
N Rate:                  57.9214%         1052501         0.9307%           45245681

Reads:                                  1821574
Mapped reads:                           1529439
Mapped bases:                           408501909
Ref scaffolds:                          9236
Ref bases:                              1948456

Percent mapped:                         83.963
Percent proper pairs:                   0.000
Average coverage:                       209.654
Standard deviation:                     645.173
Percent scaffolds with any coverage:    76.09
Percent of reference bases covered:     77.89

Thanks!

ADD REPLY
0
Entering edit mode

I have tried to obtain this same output file by: bbmap.sh in=reads.fq ref=contigs.fa covstats=covstats.txt But my bbmap.sh does not recognise covstats as parameter? Do you mind posting here the bbmap version you are using and the command you used?

Thanks, Chiara

ADD REPLY
7
Entering edit mode
9.3 years ago

The best way to calculate coverage is by mapping, not by looking at the assembler's logs. For example, with BBMap:

bbmap.sh in=reads.fq ref=contigs.fa covstats=covstats.txt

That will print a message like this:

Average coverage:                       278.50
Percent scaffolds with any coverage:    100.00
Percent of reference bases covered:     99.98

...in addition to creating covstats.txt which will list the coverage statistics for each individual scaffold. The reads you use for mapping should be the ones you fed into Spades.

ADD COMMENT
0
Entering edit mode

Thx! BBMAP was very helpful!

ADD REPLY
2
Entering edit mode
7.6 years ago

The contigs also have length and coverage information by which you can compute the average coverage.

$ grep '^>'  contigs.fasta | awk -F _  'BEGIN {OFS="\t"} {print $0,$4,$6}' | more
>NODE_1_length_766747_cov_499.885       766747  499.885
>NODE_2_length_581296_cov_457.579       581296  457.579
>NODE_3_length_399441_cov_525.578       399441  525.578
ADD COMMENT
2
Entering edit mode

I am not 100% sure, but I think this is the k-mer coverage and not the read coverage. S.a. Confusion about the kmer coverage and http://seqanswers.com/forums/showthread.php?t=6887

ADD REPLY
1
Entering edit mode

Kmer-based assemblers like Spades generally annotate contigs with kmer coverage. For read coverage, you need to map the reads against the assembly.

ADD REPLY
0
Entering edit mode

Thanks for your explanation.

ADD REPLY
1
Entering edit mode
7.6 years ago

You could search for "Average coverage" throughout the spades.log file.

ADD COMMENT

Login before adding your answer.

Traffic: 1940 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6