HI, Posting again as last post didn't get any comments.
I have some fungal genomes sequenced wil illumina short-read. a handful of them are also sequenced with nanopore. All samples belong to one Specie Complex.
I have generated
- nanopore assemblies with
flye
and after polishing steps and filtered out small contigs (removed contigs < 5000bp) - Illumina samples were assambled with
SPAdes v4.0
then QC and all assembly stats checked.
NOw i want to perform Repeat annotation before going for the genomic annotation.
As i remember, Repeat Modeler
is used to generate a de-novo database from the query genome and then Repeat Masker
is used to ascually mask the fasta file. Correct me if i am wrong.
My question is
Should i merge all my Nanopore based assemblies in
ONE-BIG.fasta
file and use that for de-novo repeat annotation database generation with Repead-Modeler ? Then Individually mask each of the nanopore.fasta assemblies ? And for SPAdes assemblies, do the same (merge all fasta -> Annotate -> Mask each individual fasta)Second way that comes to my mind is to MERGE all Nanopore.fasta and Spades.fasta genomes into a
ONE-REALLY-BIG.fasta
and then useRepeat-Modeler
to generate De-novo annotation database, then mask repeats in all fasta genomes individually using this database.
Will merhing of these different sappemblies create any biasness or issue with my genome assemblies? Technically or Biologically ?
AGain, all samples belong to onse Specie-Complex.
KIndly share your views about this. THanks.