Hi guys,
I am wondering if anyone could advice the package /strategy any hint to the following problem. I have a list of strings, which are of different length ['blabalaa..11', 'anotherstring......60', 'anotherlength....25' and et c] I have to take one string and look if it has any sequentially repeated blocks, for example String1 (see below) contains "kilimanjaro" repeated block. However, in another string it can be something else or it might NOT contain any repetitions at all.
String1=’njarokilimanjarokilimanjarokilimanjarokilimanjarokili’ String1=’njarokilimanjarokilimanjarokilimanjaro*kilimanjarokili’
I want to split the string into blocks of repetitions it contains of, desired output:
’njaro kilimanjaro kilimanjaro kilimanjaro kilimanjaro kili’
' no repetitions are found' and etc
As you can see the beginning and end of the string can contain part of the required information. So basically repeated blocks are hidden somewhere in a middle of a string,
- I dont know the pattern of this repetitions, for each string in my list it is different,
- I dont know the length of this repetitions,
- I dont know from which position the repetitions starts
Is it possible to solve such problem?
Thank you in advance
I assume you want the largest substring which is repeated? "njarokilimanjarokilimanjarokilimanjarokilimanjarokili" could also be: "njarokili man jarokili man jarokili man jarokili manjaro kili" if we found either man or njarokili.
I would suggest starting with a sliding window approach per string, brute force searching starting from the largest substring:
Some badly indented sample code, untested but as a pointer to what you could optimize:
Let me know if you can't get this working and I'll have another look at it later ;)