Question

How to add taxonomic information to fasta headers

0

Entering edit mode

3.1 years ago

v.berriosfarias ▴ 140

Hello I have 700 metagenome assembled genomes that were taxonomically classified using the GTDB database with the GTDB-tk software

So I have taxonomic information assigned for each one of these MAGs but for downstream analysis I need the fasta headers to contain the taxonomic information that GTDB-tk assigned.

This is how the fasta headers of one of the MAGs looks like:

    cat cluster1_bin.101.fa | grep '>' | head

> k141_1192826  

>k141_94001 

>k141_1104537

>k141_375209  

 >k141_375646 

> k141_742386

>  k141_560036 

>  k141_12021 

>  k141_838926

>   k141_1209697

And I want to know if there is a way of extract the full taxonomy of the following table and give it to the respective fasta headers of a MAG:

sample_table

So this is the desired output for each mag fasta headers using the "cluster1_bin.101.fa" as example

> k141_1192826  Phylum Class Order Family Genus Species 

>k141_94001  Phylum Class Order Family Genus Species

>k141_1104537 Phylum Class Order Family Genus Species

>k141_375209  Phylum Class Order Family Genus Species

 >k141_375646 Phylum Class Order Family Genus Species

> k141_742386 Phylum Class Order Family Genus Species

>  k141_560036 Phylum Class Order Family Genus Species

>  k141_12021 Phylum Class Order Family Genus Species

>  k141_838926 Phylum Class Order Family Genus Species

>   k141_1209697 Phylum Class Order Family Genus Species

any way to do that using any programming language?

MAGs taxonomy fasta • 1.2k views

ADD COMMENT • link updated 3.1 years ago by Joe 22k • written 3.1 years ago by v.berriosfarias ▴ 140

0

Entering edit mode

any way to do that using any programming language?

I think this can be done literally in any programming language of your choice. It is a simple fasta header addition which can be done with existing libraries (BioPerl, BioPython), or by using awk/sed to find header lines to which extra information needs to be added. But you will most likely need to write that script on your own.

ADD REPLY • link 3.1 years ago by Mensur Dlakic ★ 29k

0

Entering edit mode

Please do not post the images of the data.

ADD REPLY • link 3.1 years ago by cpad0112 21k

0

Entering edit mode

You'll need to post the table in text form for us to be able to help easily.

ADD REPLY • link 3.1 years ago by Joe 22k