Question

Mapping illumina seq reads to bacterial reference genome.

0

Entering edit mode

4.0 years ago

satwa • 0

I'm new at bioinformatics and I have fastq files for whole genome sequencing of a bacterial genome. When I make de novo assembly I get hundreds of contigs. How can I get the whole genome assembled? If I want to map such contigs to a reference genome, how can I choose the closest genome? and which tools can I use for mapping. How can I identify plasmids sequences?

assembly alignment genome • 2.2k views

ADD COMMENT • link written 4.0 years ago by satwa • 0

0

Entering edit mode

You want to map the reads to the reference genome or generate a de novo assembly from the raw sequencing data?

ADD REPLY • link 4.0 years ago by Arup Ghosh 3.2k

0

Entering edit mode

I de novo assembled the reads using spades but I got thousands of contigs that is why I decided to map to a reference genome.

ADD REPLY • link 4.0 years ago by satwa • 0

0

Entering edit mode

If you are not sure about the organism try classifying the reads with Kraken2.

ADD REPLY • link 4.0 years ago by Arup Ghosh 3.2k

score 2 · Answer 1 · 2021-01-02

Hello, you can do a reference guided scaffolding using RagTag (https://github.com/malonge/RagTag ). RagTag uses minimap2 or Nucmer under the hood to map your reads to a reference genome and finally generates a consensus sequence which is your scaffold.

If you want to identify plasmid, then what you need to do is to obtain the plasmid sequence you want to use as your reference. Then when using RagTag you specify that as your reference sequence.

After you get your consensus sequence you can compare to other sequences using BRIG software. This will generate an image for the comparison result. I made a video of how to use BRIG here : https://youtu.be/pobQgE4z-5Q

score 0 · Answer 2 · 2021-01-02

If your contigs are large enough (larger than 3000 bp), get a subsequence of a contig (maybe 10000 bp) and make a BLAST search on the NCBI portal. You will get there a list of results with links to the complete sequences.

If your contigs aren't large enough, there are three options:

Your reads are messed up. Sometimes whoever programs the sequencing selects reads that do not overlap for whatever reason (maybe he just wants to map the reads, not assemble them). In that case, you are mostly helpless but you can try software like kraken2 or just blast some reads to find a reference.
Your reads contain adaptor sequences. In that case, you should try tools like Trimmomatic or TrimGalore. Then try again assembling.
Your assembler just sucks for that data. Sometimes there is nothing especially wrong with your data but an assembler just doesn't like that data and fails miserably. It happens more often than it seems. In these cases, you should find an assembler that does not have that weakness with your data. (cough... I don't want to make too obvious advertising but look at my profile to find a software that will never leave you helpless, cough...)

score 0 · Answer 3 · 2021-01-03

0

Entering edit mode

3.9 years ago

MSRS ▴ 590

Hi, you can find some answer from here . Thank you

ADD COMMENT • link 3.9 years ago by MSRS ▴ 590