Problem 76

Question statement

Given two strings s and t of lengths m and n respectively, return the minimum window substring of s such that every character in t (including duplicates) is included in the window. If there is no such substring, return the empty string “”.

Solution

At first I spent some decent bit of time thinking about a 2d-dp approach, but couldn’t find a realistic subproblem. Then I looked at the hints provided, and it seems like the suggestion was to do the most straightforward thing imaginable, and then optimizing it for time

Main idea:

The overall main idea is to have two pointers, a left and a right, with right always ahead of left. For a given position of the pointers, we look at the window arr[left:right+1] and see if it contains the string t in it. If it isn’t, we move right one step further, and check agian. If at any point t falls into the window, then we check the length of the window against the optimal length we have recorded so far, updating the optimal length as needed. Do this for each possible value for left and right, and finally return the optimal length. If len(s) = n, then this process is $~O(n^4)$ (n iterations each for left and right, and about n more for analyzing the window arary, and then potentially some more to keep track of letter frequencies), so some optimization is needed.

Approach 1:

Here we simply do a brute force approach. We create a dictionary counter_t that captures the letter frequencies in t. Now, iterate through all possible values of left and right, and for each case, create a counter of the window arr[left:right+1]. Then we check if counter_t is a subset of the window counter. A nice thing about counters is that any two counters in python are partially ordered by <=. So checking for subset is as simple as checking if counter_t <= counter_window. If it is, then we update the substring inside the window, and the min length. Finally, we return the optimal substring corresponding to the optimal length.

This is too slow. We need $n^2$ to go through the array, ~$n$ more for the creation of each window counter, and then ~$n$ more to sift through the counter to check for subsets. Even so, this approach passes 231/267 test-cases, so we can’t be too far away from an optimal solution.

Approach 2:

As a first step towards optimization, we identify that we are creating a new counter unnecessarily for each possible pair (left, right). We can save a lot of this time, by creating a new empty counter for each iteration of the outer loop (that tracks the left value). Now, iterating through the inner loop, we add to this counter each arr[right] we encounter, and then we do our comparisons with counter_t. Once we are done with the inner loop, then we simply remove the element arr[left] from the counter, which signifies the left pointer moving one step ahead!

The time complexity here is ~O($n^3$). Even though this is still too slow, and TLEs; this is much better than the previous submission – passing 265/267 test cases without issue.

Approach 3: (accepted)

Finally the natural question to ask is, can I get away with creating only one global counter, adding and removing from it as necessary? Afterall, when we humans approach this algorithm, we aren’t creating frequency counters in our head. This is possible, and the code I wrote isn’t the prettiest, and can easily be cleaned up into manageable chunks – but the idea is as follows: We initialize left and right at 0, we scan the array using right until we reach the point where counter_t <= counter_window. Then if the current window length is better than the optimal length, we update the optimal length – and most crucially, we store the index at which the right pointer stopped. Now, move the left pointer, and check if counter_t<= counter_window. If it is, then we found a shorter window that works, update the window, and move the left pointer again. Keep doing this until counter_t is no longer a subset of counter_window. Then we shift to moving the right pointer again. Zig-zagging between moving the left and right pointer this way, we skip out on computing a lot of unnecessary windows. This cuts down on average $n$ computations on a good day, if $t$ is much smaller compared to $s$

Code

Approach 1:

1
2
3
4
5
6
7
8
9
10
11
12
13
def minWindow2(self, s: str, t: str) -> str:
counter_t = Counter(t)
min_len = float("inf")
ans = ""
for left in range(len(s)):
for right in range(left, len(s)):
word_len = right - left + 1
word = s[left : right + 1]
counter_cur_word = Counter(word)
if counter_t <= counter_cur_word and word_len < min_len:
ans = word
min_len = word_len
return ans

Approach 2:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def minWindow3(self, s: str, t: str) -> str:
counter_t = Counter(t)
min_len = float("inf")
ans = ""
for left in range(len(s)):
counter_cur_word = Counter()
for right in range(left, len(s)):
counter_cur_word[s[right]] += 1
word_len = right - left + 1
word = s[left : right + 1]
if counter_t <= counter_cur_word and word_len < min_len:
ans = word
min_len = word_len
break
counter_cur_word[s[left]] -= 1

return ans

Approach 3:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# Can and should be refactored at some point.

def minWindow(self, s: str, t: str) -> str:
counter_t = Counter(t)
min_len = float("inf")
ans = ""
counter_cur_word = Counter()
right_start = 0
for left in range(len(s)):
if counter_t <= counter_cur_word:
word_len = right - left + 1
word = s[left : right + 1]
if word_len < min_len:
ans = word
min_len = word_len
counter_cur_word[s[left]] -= 1
continue

else:
for right in range(right_start, len(s)):
if(right < left):
break
counter_cur_word[s[right]] += 1
word_len = right - left + 1
word = s[left : right + 1]
if counter_t <= counter_cur_word:
if word_len < min_len:
ans = word
min_len = word_len
right_start = right + 1
break
counter_cur_word[s[left]] -= 1

return ans

Comments:

Time complexity: ~$O(n^2)$ on a good day and $O(n^3)$ on a bad day, for the solution that got accepted.

At the time of submission, the above solution passes leetcode submission cases in 1349ms, beats 5.00% of users. It uses 17.3MB of memory, beating 64.26% of users.