Question

De novo assembly of chloroplast genome

0

Entering edit mode

3.6 years ago

Eisuan ▴ 20

Hello everyone! I am performing a de novo genome assembly of a Prunus spp. chloroplast, starting from SRA datasets.

My sequencing data is from Illumina HiSeq 2500 paired-end sequencing and ONT performed on the same species.

The final goal is to evaluate the performance of different assembly strategies and to get the best assembly. Our instructor told us to extrapolate the chloroplast reads of the sequencing data by mapping them to the chloroplast genome of a single spp belonging to the same genus. However, I realized that in some cases there is structural variation even in species belonging to the same genus. Therefore, in order to avoid biasing the extrapolation of the reads, I decided to map the fastq data against more than one reference. Thus I mapped my data with Bowtie2 using an index based on 10 chloroplast genomes of Prunus spp (I choose the most related ones based on phylogenetic studies and data availability). After this procedure, I got a good number of mapped reads, approx 3'200'000, which means an estimated coverage of x4800.

Here's the big question: I would like to use these mapped PE Illumina reads to perform scaffolding (using ABySS) and error correction of long reads (ONT). I am not sure how to deal with paired-end reads that have been mapped to discordant chromosomes. They account for 10% of the total PE-reads mapped. Do you think they might interfere with the downstream processes?

Thank you for your attention,

Eisuan

Scaffolding Illumina assembly • 1.5k views

ADD COMMENT • link updated 3.6 years ago by shelkmike ★ 1.4k • written 3.6 years ago by Eisuan ▴ 20

score 2 · Accepted Answer · 2021-04-28

2

Entering edit mode

3.6 years ago

shelkmike ★ 1.4k

The structural variation won't probably be a problem if you map reads in --local mode of Bowtie2. In this mode, Bowtie2 doesn't require a read to map entirely, but, instead, considers a read mapped if a part of the read maps. So, an inversion or some other structural variant probably won't affect the mappability of a read.

Also, did you try NOVOPlasty and GetOrganelle? With one of these two tools, you will probably be able to assemble the plastid genome even without Nanopore reads.

I assembled many plastid genomes and I prefer to use de novo assembly instead of reference-based assemblies, exactly because if the genomic polymorphism is too large, it can occasionally affect the reference-based assembly process. To perform a hybrid (Illumina+Nanopore) de novo assembly, you can downsample Illumina reads to the coverage of 200 (to reduce computation time and RAM consumption) and then try Unicycler and some other assemblers and then compare their results.

ADD COMMENT • link 3.6 years ago by shelkmike ★ 1.4k

0

Entering edit mode

Thanks a lot for your suggestions! I'll definitely give NOVOPlasty a try.

For what regards the mapping with bowtie: do you think I should perform read mapping again using a single species and the local mode? Or apply it just using the local mode and my multiple references?

ADD REPLY • link 3.6 years ago by Eisuan ▴ 20

1

Entering edit mode

I recommend to combine all references in one FASTA file and map reads to it. Also, I recommend to use BWA-MEM instead of Bowtie2, because BWA-MEM is slightly more accurate.

ADD REPLY • link 3.6 years ago by shelkmike ★ 1.4k

0

Entering edit mode

Perfect! My references are concatenated in the same FASTA file yet. Now I am gonna re-run Bowtie2 using the local mode. Then, with my output, should I keep all the mapped pairs or just the ones mapped concordantly to the same reference (thus chr)?

Which SAM flag terms would you use for filtering the reads from the output? I previously used -f1 -F12. Should I change f1 to f3?

(I am sorry for all these trivial questions. I am a Master's student with no prior experience in the matter and I would like to better understand the topic)

ADD REPLY • link 3.6 years ago by Eisuan ▴ 20

1

Entering edit mode

I think, it's better to take reads that align discordantly too, because if your species has a structural variant that no reference species has, reads around this variant may always align discordantly.

Instead of using SAM flags, I think it's simpler to use the following procedure: map only first reads from read pairs and get all mapped reads using the option --al of Bowtie2. Then get second reads that correspond to these first reads using a custom script.

Anyway, for a reference-based assembly it would be even simpler to use NOVOPlasty or GetOrganelle.

ADD REPLY • link 3.6 years ago by shelkmike ★ 1.4k