Hello I need to add taxonomic information for each of my metagneome assembled genomes (MAGs).
each fasta file is to a representative draft genome of a bacterial specie. I used GTDB-Tk to classify my MAGs. The issue is that I need to add to each fasta header of each contig the taxonomic assignation on genus taxonomic category.
for example the fasta headers of the following MAG looks like this:
cat bin.10.fna | grep '>' | head -3
k141_811263
k141_1446
k141_974775
now from the GTDB-Tk output file, each comma separated column is to a different taxonomic rank. Lets say I want to add genus text in a tab separated way to each fasta header of each MAG fasta file. I can retrieve all the genus information using the following:
awk -F "\t" '{print $7}' GTDB-tk_output.bac120.summary.csv | head -6
Genus
Clostridium_p
psychrobacter
Clostridium
Anaerobiospirillum_A
Sutterella
So I want to add the correspoding genus name to all the contigs fasta headers of a MAG , this is the desired output:
cat bin.10.fna | grep '>' | head -3
k141_811263 Clostridium_p
k141_1446 Clostridium_p
k141_974775 Clostridium_p
As input files I have the gtdb-tk sumamry csv output file whose columns are comma separated:
bin_id Domain Phylum Class Order Family Genus Species
The bin_id column is to the bin name , they are displayed in the folowing way :
bin.1
bin.2
bin.3
....
bin.75
And a directory in which are all the fasta of my MAGs. those files are named on the following way:
bin.#.fna where '#' is a number from 1 to 75.
both the bin_id column values and the ls
output order of the MAGs directory are the same but for the ".fna" suffix of each of the ls
output file, id est:
awk -F "\t" 'NR>1 {print $1}' GTDB-tk_output.bac120.summary.csv | head -5
bin.1
bin.2
bin.3
bin.4
bin.5
ls | head -5
bin.1.fna
bin.2.fna
bin.3.fna
bin.4.fna
bin.5.fna
So how can I do that?
Please format this post correctly using
101010
code button. Using>
turns text intoquote
. See example below.Using just
>
in front of a line turns it into quoted text.Using
1010
to properly format the text displays it properly as fasta header.It is difficult to figure out what you want above since most of it looks like plain quoted text.