Entering edit mode
4.3 years ago
jamie.pike
▴
80
I have a master fasta file (File_1.fasta), and another fasta file (File_2.fasta). For every instance where the header in File_2.fasta matches the header in File_1.fasta (apart from "/rc"), I would like the header and subsequent sequence in File_1.fasta to be replayed with the header and subsequent sequences from File_2.fasta.
E.g
File_1.fasta
>header1
ATGCCTTCCTCAAAGGGATACG
>header2
ATTGGAATTTGCATCCGAGGGC
File_2.fasta
>header2/rc
GCCCTCGGATGCAAATTCCAAT
Output file
>header1
ATGCCTTCCTCAAAGGGATACG
>header2/rc
GCCCTCGGATGCAAATTCCAAT
Are there any tools which will do this? I imagine it can be done with awk but I am not competent enough with awk to do it.
Thank you
Great thank you - could you please elaborate on the
join
andcut
sections? How do I usejoin
to select the sequences present in linearized1 but not in linerarized2,join
to select the sequences in linearized1 and in linerarized2, and thencut
to only select the 2nd sequence? I have had a look at the manual and I don't fully understand.