Hello, I was asked to build a phyloseq object from some metagenome assembled genomes that I built. By far I have created a tax_table() slot from the classification of the MAGs using GTDB-Tk, a out_table() that I made using coverM and added a metadata table from the collected samples.
Now the goal is to build the refseq() slot to store all the MAGs fasta files to the phyloseq object. I know that all phyloseq utilities were built for ASVs/OTUs but want to know if it is possible to add MAGs fasta files to the phyloseq object.
So, is there a way in which I can add the MAGs files to the phyloseq refseq() slot ? on the other hand, given that the MAGs fasta consists in fragmented contigs , should I first concatenate these contigs using something like \n before adding them to the refseq() if it is possible ?
By far I have stored all the MAGs fasta files to a DNAStringSet
class object:
mags_seqs <- lapply(Sys.glob("*.fa"), Biostrings::readDNAStringSet)
bests,
ValentÃn.
This is just suggestion:
From the output of GTDB-tk analysis you should have a folder called
align
, with a file inside namedgtdbtk.bac120.user_msa.fasta.gz
.Why don't you store the concatenated marker genes of each MAG from the MSA analysis instead of the entire genome sequence?