Mostly unrelated opening picture for flair purposes

Practicing Programming Through Problem Solving (Part 2)

It was hard to pick just one problem to go over this time. I have encountered numerous interesting problems in the last week of daily coding exercises.

The problem this week required me to look for some assistance, as I knew my initial plan would probably result in a TLE (Time Limit Exceeded). What was waiting for me in the discussion section blew my mind as it was an interesting way to approach using the strategy behind Binary Searching for a different kind of problem! It also made me more deeply consider that data structures and algorithms aren't set in stone. The thinking behind each of those items can be used to solve adjacent problems as well.

For background, the idea behind a Binary Search is that, given a sorted list of items, if you are looking for a specific item you can start by looking at the middle item in the list. If what you are looking for is less than the middle, you don't need the second half of the list any more. Rinse and repeat until you find the item. Binary Search is one of the quickest searching algorithms and has a O(log n) runtime. You can read more about Binary Search using this GeeksforGeeks link.

This week's problem is Longest Duplicate Substring on LeetCode. It asks you to find the longest substring of a string that appears more than once. For instance, using the example in the problem itself, the string "banana" has a substring of length 1 that occurs multiple times (two actually, "a" and "n"). But it also has a substring of length 2: "an" and "na". Realizing that substrings can overlap, you can also find one of length 3: "ana".

The brute force solution might consist of using a sliding window with a length one less than the length of the string and checking that against the whole string then reducing the window size by 1 and continuing until you find a repeated substring. The biggest issue with this, is if the biggest substring is length 2 or maybe doesn't even exist, the runtime is going to be around O(n^2). Because of this, it may not work on some of the trickier test cases. This is what was happening to me, and for a while I felt stuck.

After thinking about this problem for longer than I should have, I took a trip to the discussion section. This is where I saw Binary Search being used in a way I didn't really think of: determining the length of the substring. What this means is if we have a string of length 12, we can see (using a similar sliding window process to the brute force method) if a substring of length 6 exists (half of the original size). If so, try to find one that is half way between 6 and 12, which would be 9, if not, halfway between 0 (doesn't exist) and 6, which is 3. It was a neat application of 'halving' that is typically used in Binary Search.

On to the code!

First, let's just make a function that does the halving. In this case, that means we need to keep track of the 'left' and 'right' of the string, which would start as 0 and len(string), as well as a placeholder for the result. This function will find the midpoint of 'left' and 'right', see if there is a substring with this length (which we can off-load to another function) and based on this result, determine if we try a longer or shorter substring:

left = 0
right = size
result = ""

mid = (left + right) // 2
text = longest_substring_of_size(mid)
if text:
----result = text
----left = mid + 1
else:
----right = mid

If you can't tell, mid is finding the midpoint of 'left' and 'right' (The // operator divides and floors the result). The longest_substring_of_size is a function that checks to see if a substring of the given size occurs more that once in the string, we will get to that in a bit. If the substring exists, we want to check the 'upper half' of substring sizes, so we hold onto the substring we found, just in case other size values don't work, and change the 'left' to the midpoint plus one (this effectively says: We had a left of 0 and a right of 12, we checked 6 (mid) and found a substring, so let's to it again with a left of 7 and a right of 12). If we could not find a substring of the given length, we check the 'lower half' of sizes (i.e. left is still 0 and right becomes 6).

You might notice that we are missing something. We need a loop so it keeps checking! In problems like this we usually keep going until the pointers (left and right) cross, so while left < right. There might be some other minor details, but this is the overall thinking behind this function.

Now let's tackle the longest_substring_of_size function. It will be given a size and utilize a sliding window to see if a substring occurs more than once in the string. To keep track of what we found, let's use a set, because we don't need to hold on to multiple occurrences of a substring as when we find a second occurrence we are done. We will check every substring of the length we are given from the beginning to the end, which might look like this:

found = set()
for i in range(size - k + 1):
----check = s[i:i + k]

The math in the range is calculating what we consider the end, which is the length of the whole string, size, minus how large the sliding window is, plus one because range does not include the last number (0 to 2 would be 0 and 1). Similarly in the check variable hold onto the substring we are checking, which starts at i and is k large. Once we have this substring, we see if it is in the set. If so, we found a substring that occurs twice and we can return it, if not we add it to the set and keep looking:

if check in found:
----return check
else:
----found.add(check)

Putting it all together, we have this code. Note that I added the two functions above inside of the LeetCode's given function, but if you adjusted the parameters of each function you could pull them out as well, it's just a style choice:

def longestDupSubstring(self, s: str) -> str:
----size = len(s)

----def longest_substring_of_size(k):
--------found = set()
--------for i in range(size - k + 1):
------------check = s[i:i + k]
------------if check in found:
----------------return check
------------else:
----------------found.add(check)
--------return None

----def binary_search_size():
--------left = 0
--------right = size
--------result = ""
--------while left < right:
------------mid = (left + right) // 2
------------text = longest_substring_of_size(mid)
------------if text:
----------------result = text
----------------left = mid + 1
------------else:
----------------right = mid
--------return result

----return binary_search_size()

And that is how I went about thinking through this problem! My solutions might not always be the most perfect solutions, but thinking through them and finding interesting new ways to utilize previous knowledge is super fun.

If you have any suggestions for how I could have improved this solution, feel free to drop me a line using the links in the footer.

Published by

Dan Gaylord
Dan Gaylord on 11/01/2021

Filed under 🖥️Programming 🎓Learning