Entering edit mode
4.7 years ago
waqaskhokhar999
▴
160
I have three fasta files reflecting protein sequences for each gene in xls format (space separated). The first column contains header, while the other column contains sequence. For example:
File1:
sample 1 2 3 4 5 6
BnaA03g18710D M A A A V S
BnaA03g18710D_S25 M A A A V S
BnaA03g18710D_S31 M A A A V S
File2:
sample 1 2 3 4 5 6
BnaA03g18710D_a M A A A V S
BnaA03g18710D_S25_a M A A A V S
BnaA03g18710D_S31_a M A A A V S
File3:
sample 1 2 3 4 5 6
BnaA03g18710D_b M A A A V S
BnaA03g18710D_S25_b M A A A V S
BnaA03g18710D_S31_b M A A A V S
I am intersted to merge them in the follwoing order:
sample 1 2 3 4 5 6
BnaA03g18710D M A A A V S
BnaA03g18710D_a M A A A V S
BnaA03g18710D_b M A A A V S
BnaA03g18710D_S25 M A A A V S
BnaA03g18710D_S25_a M A A A V S
BnaA03g18710D_S25_b M A A A V S
BnaA03g18710D_S31 M A A A V S
BnaA03g18710D_S31_a M A A A V S
BnaA03g18710D_S31_b M A A A V S
I have tried cat, sed and other commands but wasn't able to make the desired format. Any help will be highly appreciated.
Try to
cat
them together, and sort them by first column, then remove the sample columns bygrep -v 'sample
. To get the header line, simply cat the first line of the first file with the output you obtained from the strategy I just described. I am sure you manage to do that.Small nitpick but these are not fasta format files.