Entering edit mode
3.0 years ago
v.berriosfarias
▴
140
Hello I was asked for creating a custom database from GTDB, I just need to incorporate some metagenome assembly genomes (MAGs) to the GTDB database the issue is that I dont know how to do that.
the GTDB file "gtdbtk_data.tar.gz" from release202 (https://data.gtdb.ecogenomic.org/releases/release202/) is the file that I want to add the MAGs that I built but they will not have a taxonomy identifier which I think is necesarry for the correct databse build
- note: I classified the MAGs that I built using the GTDB-Tk program, I know that the taxonomy ID that GTDB-Tk gave to the created MAGs are important here but not sure how to do that... a classmate told me that I needed to do a python script to add the taxonomy classification information to the FASTA headers of each MAG's contig.
Hello, have you found out a way to create a custom database from GTDB ? I will try to do it myself but I am interested if you have found a method to do it! Thank you.
Hello, I have not but people told me that you can build a custom database (in a reasonable easy way) using CLARK-S. on its readme file have the instructions. basically you build a file with two columns. the first column is the path to each genome, the second columns is where you have to add the taxonomy of each genome (on the taxonomic rank that you prefer)
https://www.reddit.com/r/bioinformatics/comments/rfqt7a/any_metagenomic_classifier_that_can_elaborate_a/
Personally I finally decided to not include my own MAGs to the GTDB database so I used kraken2 for read classification using a index built from GTDB database: https://github.com/rrwick/Metagenomics-Index-Correction which is better than using kraken2 refseq database for microbial taxonomy classification.