problem with 110 kb plasmid assembly, illumina
0
0
Entering edit mode
3.9 years ago
mewgia • 0

dear colleagues, i need your advice. i have a set of illumina reads (2x250) for the plasmid with the size of 110 kb.

first, the reads were trimmed:

LEADING:10 TRAILING:30 SLIDINGWINDOW:4:15 MINLEN:50, adapters were cut

then i tried to assemble them with or without kmer size specification:

spades.py \
--pe1-1 aaa/trimmed/lane1_forward_paired.fastq \
--pe1-2 aaa/trimmed/lane1_reverse_paired.fastq \
--pe1-s aaa/trimmed/lane1_forward_unpaired.fastq \
--pe1-s aaa/trimmed/lane1_reverse_unpaired.fastq \
--careful --plasmid -o aaa/wk

and had no success. i got a lot of very short contigs.

then i did the normalization (for all four trimmomatic files, target was 20 or 50):

/home/bobii/bbmap/bbnorm.sh \
in=aaa/trimmed/lane1_forward_paired.fastq \
out=aaa/trimmed/20_norm_lane1_forward_paired.fastq \
target=20 min=5

and repair, because i got spades error Message 'Pair of read files aaa/trimmed/20_norm_lane1_forward_paired.fastq and aaa/20_norm_lane1_reverse_paired.fastq contain unequal amount of reads'.

so eventually i used for spades assembly trimmed normalized repaired data, but again had no success, spades dont make contigs and writes Skipping processing of contigs (empty file)

any ideas? what i do wrong? thank you.

illumina plasmid • 1.1k views
ADD COMMENT
0
Entering edit mode

You may need to discard reads which are known to not be plasmid relevant (e.g. by aligning/mapping to the chromosome and discarding). This may help.

Also consider using plasmidSPAdes instead of regular spades. I doubt the trimming is really the issue (errors relating to borked files notwithstanding).

ADD REPLY
0
Entering edit mode

(e.g. by aligning/mapping to the chromosome and discarding)

i thought so, but i dont have chromosome sequence. the plasmid was isolated and sequenced separately. or there's any other way to discard plasmid irrelevant reads?

using plasmidSPAdes instead of regular spades

when i use spades, i flag --plasmid, it is the same

ADD REPLY
0
Entering edit mode

So there is no reference sequence data for this organism at all? (Plasmid or chromosome?)

ADD REPLY
0
Entering edit mode

The coverage is about 100, not so large to cause such assembly problems, i think.

According to your advice i took from genbank the sequence of Pseudomonas KT2440 chromosome. this strain is the carrier of our plasmid. i mapped the reads to the chromosome and saved the files with unmapped reads. then i did trimming of these reads and:

  1. only paired reads, spades without normalization. 1000 contigs

  2. only paired reads, spades after normalization (20). it sorts through different k's and fails when k = 77. in the file of k = 55 there are 8 contigs with the total length of ~2000 bp.

  3. unicycler after normalization (20), only paired reads. 1 contig of 1000 bp.

  4. unicycler without normalization, only paired reads. 59 contigs, 1000-6500 bp, total length 89 kb.

ADD REPLY
0
Entering edit mode

You could also try out tadpole.sh which is part of BBMap suite and good for assembling small genomes. A guide is here.

Also consider the possibility that some parts of the plasmid may not have been sequenced or have repeats etc that can't be resolved using just short reads.

ADD REPLY
0
Entering edit mode

If we assume the raw data is good and representative of the real sequence, do you know anything else about this plasmid at all?

Is it repetitive? Does it carry e.g. phage components?

If it is a sequence of low complexity and high repetition etc, it may be that you simply will not get a single contig from this and you may need to do some long read sequencing instead (but I don't think we need to conclude that just yet, though plasmids are notoriously difficult).

ADD REPLY
0
Entering edit mode

I also suggest that you try the assembly with just the properly paired reads first (do not use the singletons). This is a small plasmid and you may already have over sampled data.

ADD REPLY
0
Entering edit mode

thanks, but i'm still confused. i did so (for the reads more strictly trimmed than in start topic text):

  1. unicycler with the reads after trimmomatic (not normalized, not repaired). thousands of short contigs

  2. unicycler with the reads normalized and repaired (without singletons). the resulting file contains 5 contigs with the total length of 11 kb.

  3. spades with the reads normalized and repaired (without singletons), as you suggest. at the moment it is the best result - 47 contigs, but still not what i want.

ADD REPLY
0
Entering edit mode

Do you have some idea of the depth of coverage? If you have several hundred or even thousand fold coverage, this can cause de bruijn graph assemblers like SPAdes to choke. You may need to randomly downsample your data.

ADD REPLY

Login before adding your answer.

Traffic: 2284 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6