Hello everyone,
I am trying to replace the sequences from file_1.fasta with specific sequences from file_2.fasta. The files are set up this way:
head -2 file_1.fasta
>Scaffolds_1
TAAATCAAATTGGACAGTCAATGCTATTATGTCTCAATCTACAACACATAATACACAATT
CCAAACCACCTTCTTGGAATATCCACATTTTCTTTATTGGAAGAGAAATTAGTTTAACAA
TGACCACTCTTTTTCTCACTAATGTTACAACCACCTAGAAACTGAATTTCAAGCCTATAC
head -6 file_2.fasta
>Scaffolds_1:327519-327900
AAATCAAATTGGACAGTCAATGCTATTATGTCTCAATCTACAACACATAAT
>Scaffolds_1:344277-344478
ACCTTTGAAACTTTGACTCTAACTCAGCTTGAATATTGGAAGTTAGGGGT
>Scaffolds_1:345134-345287
CCTTCTCTGCTAGAACACGTAGGGCCACTTCAGTAGATTCGCCAATCTTT
I have tried to set up a script to replace the sequences at the right coordinates, but got a bit confused with the sed script.
Would anybody know if there is simple tool (Samtools, Biopython) designed to make this kind of replacement?
Hi,
Check the SeqKit toolkit. I'm not sure if does what you want, but it allows a lot of fasta/q manipulations. So, it might do it.
António
Not supported.
But it easy with python, for example,
Thanks a lot for your answers.
the SeqKit toolkit is very handy to convert the fasta files to tabulated files (with seqkit fx2tab).
Your script is nice Pierre, but as I have several sequences to edit in each Scaffold, I cannot use it as it is written.
Thanks for your script shenwei356, I will have to get used to Python to use it and format the files exactly as I want.
Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized.SUBMIT ANSWER
is for new answers to original questionMy apologizes, I will