Entering edit mode
3.8 years ago
mewgia
•
0
Hello!
I have 4 samples of whole metagenome shotgun sequencing results (soil communities). Illumina, single end, read length ~76 bp. The raw reads were preprocessed using trimmomatic, adapters, short and low-quality reads were removed. The remaining reads were assembled using Spades (not --meta, because --meta doesnot work when reads are single end). Now I'm trying to map raw trimmed reads to my assemblies (bowtie2, default settings) and the mapping percent is 8-11%. Why so, any ideas? What I did wrong?
Maybe not much sequence was assembled and the rest was discarded by Spades? You should try some other assemblers like e.g. IDBA-UD or MEGAHIT (both can work with single end reads)
the assembly graph processing in SPAdes was not designed to deal with metagenomic sequencing data. Hence, most of your reads were not assembled into contigs because there could be a very high intraspecific diversity at sequence level. I would not expect to much improvement but I would follow @5heikki suggestion. Some metagenomes could be very challenging to resolve, even using paired-end reads
Thanks, I tried MEGAHIT and its results are nearly the same (mapping % (metaspades, megahit)):
sample 1 - 11.64, 6.18
sample 2 - 15.35, 23.96
sample 3 - 36.37, 51.01
sample 4 - 13.48, 6.68
Then I took unmapped reads and tried to reassemble them, but had no success.
51.01 and 23.96 for soil metagenome is not that bad. As I said, shor-reads from metagenomic samples can be very hard to assemble in longer contigs. May I ask you how many reads do you have for each sample.
Yes, of course. Sample 1 - 79 329 568, 2 - 81 473 469, 3 - 78 824 132, 4 - 92 191 436.
And what about normalizing or subsampling? I run the khist script from bbnorm, and here's the result of peaks file.
k 31
unique_kmers 3079249855
error_kmers 3079247658
genomic_kmers 2197
main_peak 127
genome_size_in_peaks 6906
genome_size 9775
haploid_genome_size 9775
fold_coverage 127
haploid_fold_coverage 127
ploidy 1
percent_repeat_in_peaks 71.431
percent_repeat 75.427
start center stop max volume
25 127 140 39 416
140 144 164 25 384
164 174 188 38 319
188 200 214 19 210
214 222 236 13 109
236 240 251 5 38
251 282 309 6 174
309 341 377 9 116
377 391 402 4 36
402 413 434 3 42
434 453 464 7 37
600 615 621 4 25
621 626 636 5 20
875 882 890 7 16
14924 14934 14944 31 31