Problem 1239
Question statement
You are given an array of strings arr. A string s is formed by the concatenation of a subsequence of arr that has unique characters. Return the maximum possible length of s.
Solution
This one took me a while to solve. For the longest time, I was trying to implement a DP solution and was consistently going into the realm of quadratic computation time, to the point where a dp approach seemed suboptimal. Ultimately, the hitch was with the fact that at any given element arr[i], there is no way to have a consistent winner stored at dp[i-1] (since arr[i] could merge with an earlier string combination to create a new winner), so I would have to go check every element of dp until i-1. But this is no better than going through the array with a double pointer.
A new approach:
After struggling for 30+ minutes and failing, I read the discussion section where people pointed to an algorithm called backtracking. I was vaguely aware of this before, but until yesterday I had never invested the time or resources to learn or try it out. This youtube video gave me a pretty good feel for the algorithm. Ultimately, backtracking needs to be in anyone’s potential plan of action if a given problem evolves based on prior choices, and more importantly, these choices (if suboptimal) should potentially be corrected to pave way to a better optimal choice. That is to say, think backtracking if the evolution of your problem is not linear/monotonic (if it is linear/monotonic, then it is dp) – that it to say, your current choice can is not dictated by only a handful of previous choices which encode the culmination of the evolution thus far.
For instance, in the house robber problem, the maximum profit from house i was only dependent on the maximum profit from houses i-2, i-3, which contained in them the culmination of the evolution of the system thus far. Similarly in the falling sums problem, the minimum falling sum ending in index [i][j] was only dictated by the culmination of the evolution of the system at indices [i-1][j-1], [i-1][j] and [i-1][j+1].
However, this problem is not like that. For example, take the third provided example
arr = [“cha”, “r”, “act”, “ers”]
In this case, a 1-d dp approach might try and store the best word at dp[i], but the best word at index [i] might not be an evolution of the best word at index [j] for some j< i. For instance, here dp[1] would store” char” but dp[3] would store “chaers” or “acters” neither of which is an evolution of “char”. So in some sense, our algorithm needs to forget “char” from dp[1] to get to the right solution. How can we make an argument forget an unnecessary calculation?
$$\begin{center} \texttt{recursion.} \end{center}$$
This is at the heart of backtracking.
Main idea:
Make a sequence of choices recursively everytime it is possible to make progress. When it is not possible to make progress (i.e, you’ve gotten to the longest possible word given where you started), then store the length of that word in an array. By the time you get to the previous call, your function would remember the possible lengths, but will not go into the same choices again, and would try to take a different route. Let’s illustrate it with an example:
For instance: We start with
- cur_logest_word = “cha”
- Can merge “r”, so cur_longest_word = “char”
- Can merge no more, so store 4 into a lengths array, and return to the previous call.
- In this call, cur_longest_word = “cha” and now check for validity against “act”, which is invalid, so move on.
- Finally, “cha” -> “ers” done. Store 6 into a lengths array.
This process tells us the longest possible word attainable starting with “cha” is 6. Now, repeat this starting with “r”, and then “act” and finally “ers”, tracking the longest possible word in each case. Finally, return the max of all these maximal lengths.
Note: it is possible to not waste space by using a list, but I can’t think of a short creative way to do it off the top if my head. I’ll maybe get around to writing that up after a few days of thinking about it.
Note 2: Oh also I do a pre-processing step where I remove any string that has repeated characters from arr.
Code:
1 | def pre_processing(self, arr): |
Comments:
Time complexity: O(n^2) on average.
At the time of submission, the above solution passes leetcode submission cases in 92ms, beats 57.31% of users. It uses 16.72MB of memory, beating 74.50% of users.
I also had a 2-d dp solution idea, which stores in dp[i][j] the best possible word that can be made using strings between indices i and j. Then to calculate dp[i][j+1], we check every non-zero upper triangular element in our array to find the best word. Maybe one day I wil code this and see if this ends up being more optimal.