I have two files that look as follows:
file 1
>sp|P0|H1_HUMAN dhj OS=Homo sapiens OX=9606 GN=CDH1 PE=1 SV=3
MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNTVG
>sp|Q4|C1_RAT C-1 jkjk OS=Rattus norvegicus OX=10116 GN=Cdh1 PE=1 SV=1
QIKSNRDKETTVFYSITGPGADKPPVGVFIIERETGWLKVTQPLDREAIDKYLLYSHAVS
file 2
>sp|P641|A1_CHICK link OS=Gallus gallus OX=9031 GN=CDH1 PE=1 SV=2
MGRRWGSPALQRFPVLVLLLLLQVCGRRCDEAAPCQPGFAAETFSFSVPQDSVAAGRELG
>sp|QF2|A2_BOVIN hjh OS=Bos taurus OX=9913 GN=CDH1 PE=2 SV=1
MGPWSRSLSALCCCCRCNPWLCREPEPCIPGFGAESYTFTVPRRNLERGRVLGRVSFEGC
I am looking to combine all the fasta sequences from file1 with file2 and save it in new file output.fasta.
desired output file: output.fasta
>sp|P0|H1_HUMAN dhj OS=Homo sapiens OX=9606 GN=CDH1 PE=1 SV=3_sp|P641|A1_CHICK link OS=Gallus gallus OX=9031 GN=CDH1 PE=1 SV=2
MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNTVGMGRRWGSPALQRFPVLVLLLLLQVCGRRCDEAAPCQPGFAAETFSFSVPQDSVAAGRELG
>sp|P0|H1_HUMAN dhj OS=Homo sapiens OX=9606 GN=CDH1 PE=1 SV=3_>sp|QF2|A2_BOVIN hjh OS=Bos taurus OX=9913 GN=CDH1 PE=2 SV=1
MGPWSRSLSALLLLLQVSSWLCQEPEPCHPGFDAESYTFTVPRRHLERGRVLGRVNFEDCTGRQRTAYFSLDTRFKVGTDGVITVKRPLRFHNPQIHFLVYAWDSTYRKFSTKVTLNTVGMGPWSRSLSALCCCCRCNPWLCREPEPCIPGFGAESYTFTVPRRNLERGRVLGRVSFEGC
>sp|Q4|C1_RAT C-1 jkjk OS=Rattus norvegicus OX=10116 GN=Cdh1 PE=1 SV=1_>sp|P641|A1_CHICK link OS=Gallus gallus OX=9031 GN=CDH1 PE=1 SV=2
QIKSNRDKETTVFYSITGPGADKPPVGVFIIERETGWLKVTQPLDREAIDKYLLYSHAVSMGRRWGSPALQRFPVLVLLLLLQVCGRRCDEAAPCQPGFAAETFSFSVPQDSVAAGRELG
>sp|Q4|C1_RAT C-1 jkjk OS=Rattus norvegicus OX=10116 GN=Cdh1 PE=1 SV=1_>sp|QF2|A2_BOVIN hjh OS=Bos taurus OX=9913 GN=CDH1 PE=2 SV=1
QIKSNRDKETTVFYSITGPGADKPPVGVFIIERETGWLKVTQPLDREAIDKYLLYSHAVSMGPWSRSLSALCCCCRCNPWLCREPEPCIPGFGAESYTFTVPRRNLERGRVLGRVSFEGC
Have you tried
cat *.fasta > out.fasta
?how to combine multiple fasta file into a larger fasta file
Hi Arup, Actually, I want to combine entry1 from file 1 with all possible entries of file2(and do the same for all entries of file) and save in output.fasta. Cat *.fasta will merge all fasta sequences no matter what...
You'll need to use custom BioPerl/BioPython code. What you are doing is not a standard operation. In fact, it is odd enough to warrant the question: What are you doing and why are you doing that?
I need combined fasta sequences of entries from file 1 and file 2 to do a residue correlation analysis.Ok..I will check it out with biopython,but I thought it is possible with awk/unix..
Maybe with
bioawk
- but the operation is complicated enough to warrant a more robust, verifiable, reproducible approach, which one-liners are not.sayaneshome.rsg : Take a look at
seqkit
(https://github.com/shenwei356/seqkit ). It may have an option (concat
perhaps ) to do something like this.