I want to change my FASTA headers (in file named VT.fasta) to reflect taxonomic information. Right now they have accession information only, like this:
gb|AJ854100_VTX00001
AGCAGCCGCGGTAATTCCAGCTCCAATAGCGTAGGCGAGCGACTGCCG
gb|AF202299_VTX00002
GTAACGGGGAATTAGGGTTCCAATCCCGACACGGGGAGGTAGTGACAAT
I want the headers to include taxonomic information (Genus species) as well. This information is stored in a separate comma delineated csv file (VTtaxonomy.csv) like this:
GenBank code,VT,DNA,Class,Order,Family,Genus,Species
AJ854100,VTX00001,AGCAGCCGCGGTAATTCCAGCTCCAATAGCGTAGGCGAGCGACTGCCG,Paraglomeromycetes,Paraglomerales,Paraglomeraceae,Paraglomus,sp.
AF202299,VTX00002,GTAACGGGGAATTAGGGTTCCAATCCCGACACGGGGAGGTAGTGACAAT,Paraglomeromycetes,Paraglomerales,Paraglomeraceae,Paraglomus,sp.
The fasta file and csv file are in the exact same order and there are 352 total header IDs that need to have the taxonomic information added on. I want the new fasta file (NewVT.fasta) to look like this:
gb|AJ854100_VTX00001| Paraglomus sp.
AGCAGCCGCGGTAATTCCAGCTCCAATAGCGTAGGCGAGCGACTGCCG
gb|AF202299_VTX00002| Paraglomus sp.
GTAACGGGGAATTAGGGTTCCAATCCCGACACGGGGAGGTAGTGACAAT
I’m using Linux. This is probably a simple bash command (I hope!) but I just can’t figure it out. Thank you for your help!
Any Perl solutions perhaps?