Hi! I have two input files, fastas.txt
with multiple FASTA sequences, such as shown in an example below:
>Blap_contig79
MSTDVDAKTRSKERASIAAFYVGRNIFVTGGTGFLGKVLIEKLLRSCPDVGEIFILMRPKAGLSIDDRLKKMLELPLFDRLRKERPSNLKK
>Bluc_contig23663
MSTNVDAKARSKERASIAAFYVGRNIFVTGGTGFLGKVLIEKLLRSCPDVGEIFILMRPKAGLSIDDRLKKMLELPLFDRLRKERPSNLKK
and second file additional_header_information.txt
with strings, such as:
XYZ
aksjdkasdj
And I' like to merge the strings from the second file with the headers in the first file to generate:
>XYZ_Blap_contig79
MSTDVDAKTRSKERASIAAFYVGRNIFVTGGTGFLGKVLIEKLLRSCPDVGEIFILMRPKAGLSIDDRLKKMLELPLFDRLRKERPSNLKK
>aksjdkasdj_Bluc_contig23663
MSTNVDAKARSKERASIAAFYVGRNIFVTGGTGFLGKVLIEKLLRSCPDVGEIFILMRPKAGLSIDDRLKKMLELPLFDRLRKERPSNLKK
I previously used bioawk to put one specific prefix to all FASTA headers:
bioawk -c fastx '{ print ">PREFIX_" $name "\n" $seq }' input.txt >outupt.txt
So I thought I might be able to use some sort of loop to make bioawk go through the lines of additional_header_information.txt
and of fastas.txt
and combine them...but I did not get anything functional.
I also tried to modify python script from replace fasta headers with another name in a text file (see the original script):
fasta= open('fastas.txt')
newnames= open('additional_header_information.txt')
newfasta= open('additional_header_information_fastas.txt', 'w')
for line in fasta:
if line.startswith('>'):
newname= newnames.readline()
newfasta.write(newname)
else:
newfasta.write(line)
fasta.close()
newnames.close()
newfasta.close()
so it would not replace the FASTA headers but rather it is adding to them a string from the additional_header_information.txt
but this is also not working for me.
I'll be thankful for tour tips how to use bioawk or there but actually any other solution will be also most welcomed!
Almost there - only my output looks like:
...it seems that the problem is the new line in the
additional_header_information.txt
which separates the individual lines (if I test it on files with more than 2 fastas, always all but the last fasta header contain after executing the command the additional new line). Is there a way how not to include the new line into the new header?PS. thanks for your explanation of each step!
You mean this?
If yes, just filter the
additional_header_information.txt
file withawk
by omitting blank lines.Actually my input files really look like
and
and the output is
So I'm not sure what is going on there...some excessive line breaks?
EDIT: explained: CR+LF originating from windows text editor "notepad"