Dear all, I am in the following situation. I have two files: 1. it is a collection of sequences in fasta format for emu; 2. the list of sequences with more taxonomic information. Please, find a "fake" example of them:
- genome.txt
> 2591237:ncbi:1 [MK211378]
mammammmammammammammammammammammmammammammammammammammamammammmammammammammammammammammmammammammammammammammamammammmammammammammammammammammmammammammammammammammamammammmammammammammammammammammmammammammammammammammamammammmammammammammammammammammmammammammammammammammamammammmammammammammammammammammmammammammammammammammamammammmammammammammammammammammmammammammammammammammamammammmammammammammammammammammmammammammammammammamma
>11120:ncbi:1011 [MG021194]
banananavananabananabananabanananavananabananabananabanananavananabananabananabanananavananabananabananabanananavananabananabananabanananavananabananabananabanananavananabananabananabanananavananabananabananabanananavananabananabananabanananavananabananabananabanananavananabananabananabanananavananabananabananabanananavananabananabananabanananavananabananabananabanananavananabananabanana
- lista.txt
1120 ncbi 1011 [MG021194] 11120 Infectious bronchitis virus scientific name
1237 ncbi 1 [MK211378] 2591237 Coronavirus BtRs-BetaCoV/YN2018D scientific name
`
What I want to obtain is an "extended" version of the genome.txt file where the header of each sequence has been combined to the information from the lista.txt file. The "join" operation could be done by the sequence ID (already unique, e.g MK211378). I already tried to use the join (bash) command and awk, but without results.
Please, can someone help me?
Thank you very much.
Emilio
Thanks iraun. The merger line is perfect.
No worries, please consider marking the answer as accepted if it fixed your problem :).