Hi. I am looking for a quick idea how to connect two sets of multifasta files with a number of of NNNs in between. Would appreciate if anyone pointed me to existing script or other solution.
sort of:
file1
>1_set1
AAA
>2_set1
CCC
...
file2
> 1_set2
GGG
>2_set2
TTT
...
to get something like
>1_set1_set_2
AAANNNGGG
>2_set1_set2
CCCNNNTTT
Sorry for formatting, I am kinda new here... Thanks in advance, Xi
Thanks for the replies! I can use
grep 'set' | sed 's/>\|_set*//g' | sort | comm -3 file1 file2
to make a sorted list of common sequences. Then usefastafetch -F
to pull only relevant sequence pairs. I wrote little python script some time ago which gets rid of end of line characters in sequence part and writes two lines per record. Then should be fast and easy to test your solutions. Will let you know how it went soon.Thanks for help!