I have run MagicBLAST on a de-novo assembled metagenomic RNASeq dataset (using MEGAHIT). The contigs were queried against a specific database of genomes.
Now I am stuck as to how to visualise the hit results from MagicBLAST (.sam file). I could write a python notebook to decode the MagicBLAST results (get nucleuotide start, stop and sequence for query against reference) and then plot as a colour-coded horizontal ~'bar' type plot showing reference vs contig overlap. But I expect there is already open source software that can do this.
IGV is a good example of what I am trying to do- but as far as I can tell you can only show one reference sequence. I want to plot multiple reference sequences (eg 10 - to save time) and for each show the contigs which overlap with each refseq. Is there an open source tool or script which already does this?
What are examples of workflows (on desktop linux, prefer local to web-based tools) once you have generated blast against ref sequences? (I am a novice)- aim is to identify genomic coverage of specific genes in metagenomic datasets.
A related question is what is the advantage of de-novo assembly vs alignment against specific sequences? I am struggling to work out what to do with the de-novo data
One way I may be able to accomplish this is:
Blast the de-novo assembled to nt database, find the matching gene sequences. Then run bowtie2 to align the de-novo assembled contigs to the reference gene (and or bowtie2/bwa-mem on raw reads to align).
Is aligning de-novo built contigs back to a reference (once identified) common practice? Wondering if there are use cases when this would be preferable to just aligning the raw reads back to reference gene.
Then plot the alignments for each reference gene in igv.
Thanks GenoMax
Refseqs ae 10-30Kbp so yes quite long.
The tools you mention look like a good starting point :-)