Entering edit mode
7.5 years ago
skbrimer
▴
740
Greetings Hive Brain,
I am having issues with mapping reads of Streptococcus suis to the RefSeq genome. When I use it was a reference I only get about 30% of the reads mapping to it. When I do a de novo build I get the expect number of base pairs and when I blast the contigs they come back as Strep suis.
I know the streptococcus pneumoniae has a lot of internal rearrangement and I suspect that s.suis does as well, has anyone had any experiecne with either organisum and would be willing to give me any advice for assembly.
I'm using Ion torrent single end data, average fragment size is 280bp.
Thanks,
Sean
what happens if you map your reads against you de novo assembly?
Pretty much the same thing that I get when I just map the reads. If I use bwa mem I get a lot of hard clipping. I need to play with the stringency I think. I was also going to try either the pacbio or nanopore settings.
By "pretty much the same thing" you mean only 30% of the reads map to the de novo assembly?
Oh. Excellent question I didn't look at that. What I meant by basically the same is when I look at the mapping in igv I get the same areas of coverage. I will look at the total mapping and get back to you. We currently don't have power due to a storm but when it comes back I will look.
While you are at it map the assembly you have to the reference using mauve. That should give you an idea of what your assembly looks like at the genome scale compared to the reference.
I will download and give it a try
The Mauve alignment shows lots of gene shuffling, this was a great idea! Is it possible to use this alignment to order the contigs?
Yes it is possible to do that.
Thank you for the link!
Also, and I suspect unsurprisingly (yey circular genomes), it looks like the start of the genome and the end of the genome is in one contig of the de novo build.
HAHA! The power is back on!
According to samtools flagstats 82% of the contigs map back to the assembly, however they get clipped heavily and the output in IGV matched closely to the short read mapping.
Also because of the library I know that the de novo mapping should leave gaps due to the repeat areas but I am getting large drop outs. It's more like I have a really bad reference. However I have been reading other strep suis papers and they all seem to map to this ref.
Might it be that you are using a mapper that is tuned for illumina instead of ion-torrent data, since you did not mention the mapper you used. If you have enough coverage (~60X) then the de-novo assembly should be better if you have performed pre-processing or used an assembler that handles ion-torrent like in this case.
Sorry, I am using BWA as the mapper and SPAdes as the de novo assembler. TMAP is a variant of BWA and I do use both. I have found that they give only slightly different builds since the read quality of Ion Torrent has improved. For most builds I do use BWA over TMAP mostly due to the speed. There are some cases where the default BWA settings outperforms the default TMAP settings, mostly in viral assemblies, however if you adjust the min-seed-length flag in TMAP from 11 to 19 they perform exactly the same.