Hello, everyone. I would like a help in a simple issue that I am not being abble to solve.
I want to get a nucleotide sequence from the second column of a text file and match with a fasta file to know the headers which correspond to these sequences. I would also like to modify the header and acording to the first column of the text file and generate a new fasta file, as demonstrated below.
Text file:
1 AACTGA
1 AACTGC
2 CCAGAT
3 GGATCA
3 GGATCC
Original fasta file:
>Sample 1
AACTGA
>Sample 2
CCAGAT
>Sample 3
AACTGA
>Sample 4
CCAGAT
>Sample 5
GGATCA
>Sample 6
GGATCC
>Sample 7
GGATCA
>Sample 8
GGATCC
>Sample 9
AACTGC
>Sample 10
AACTGC
Expected output:
>1|Sample 1
AACTGA
>1|Sample 3
AACTGA
>1|Sample 9
AACTGC
>1|Sample 10
AACTGC
>2|Sample 4
CCAGAT
>2|Sample 2
CCAGAT
>3|Sample 5
GGATCA
>3|Sample 7
GGATCA
>3|Sample 6
GGATCC
>3|Sample 8
GGATCC
I am still a beginner in bioinformatics and simple things are still a challenge for me. Thank you for the help!