Problem 1235

Arun Suresh

(Last updated: 2024-01-15)

Programming

Question

We have n jobs, where every job is scheduled to be done from startTime[i] to endTime[i], obtaining a profit of profit[i]. You’re given the startTime, endTime and profit arrays, return the maximum profit you can take such that there are no two jobs in the subset with overlapping time range.

Naive DP solution:

My carelessness yesterday came back to bite me in the butt. This problem has a naive 1-d dynamic programming solution, but it turns out even this DP solution has quadratic time complexity, and is therefore too slow. Since it is instructive, I’ll include this solution in this type-up anyway.

Main idea:

Zip together the given arrays into a single array $\texttt{jobs}$ and have it sorted by endTime.
Create a dp array, whose i-th entry stores the maximum profit earned by the i-th entry of (sorted) endTime.
Once the array dp is filled up, return $\texttt{max(dp)}$
Fill up the dp array as follows:
- Initialize dp to all 0’s and set dp[0] = profit[0].
- Iterate through jobs, and at job i, check each job j $\leq$ i to see if it is a valid previous job
  - i.e, check condition j.endTime $\leq$ i.startTime
- Among valid previous jobs, choose the job $j$ that earned the maximum previous profit
  - i.e, find prev_profit = max(dp[j]) as j ranges over all the valid jobs
- dp[i] = profit[i] + prev_profit.

Filling the dp requires $\texttt{O(n}^2)$ operations, and is ultimately very slow.

Naive DP Code:

def jobScheduling(self, startTime: List[int], endTime: List[int], profit: List[int]) -> int:
    jobs = sorted(zip(endTime, startTime, profit))
    dp = [0]*len(endTime)
    dp[0] = jobs[0][2]
    max_dp =0
    for i in range(1, len(jobs)):
        max_ = 0
        for j in range(i):
            if(jobs[j][0] <= jobs[i][1] and dp[j] >= max_):
                max_ = dp[j]
        dp[i] = jobs[i][2] + max_
        if(dp[i] >= max_dp):
            max_dp = dp[i]
    return max_dp

I was unable to come up with a way to optimize this naive approach directly, until I looked at some hints and discussions, where the ill-fated words “use binary search to achieve O(nlog n) time complexity” were uttered.

Optimized DP solution:

Shortcomings of the previous method:

The short coming of the naive approach is that for each job i, the entry dp[i] stores the maximum profit obtained by doing only jobs until job i but necessarily including job i! This also comes with another down-side: If we are deciding about job i, the job j with j.endTime that is closest to that of i.startTime might not always be the most optimal pick to maximize profits. This is precisely why we can’t directly bootstrap a binary search to the setup we have, and instead were forced to look at every job that came before job i and hunt for the one that gave the largest profit.

Tweaks and bootstrapping binary search:

In order to bootstrap a binary search to this setup, we need to make a tiny tweak to how we define dp[i]. Namely, now we will let dp[i] will store the maximum attainable profit from jobs[:i+1]. Under this phrasing, the optimal choice might not necessarily include job i. The way we attain this, is by giving ourselves a choice at each job[i], whether to pick it or not. If we don’t pick the job, then the maximum profit from jobs[:i+1] is exactly the same as that from jobs[:i] = dp[i-1]. If we do pick it, then we earn profit from this job in addition to the maximum possible profit earned from jobs[:j] where j is the closest possible valid previous job to i. Thus,
$$\texttt{dp[i] = max(dp[i-1], profits[i] + dp[j])}$$

To find the closest possible valid previous job to i, we can use a binary search. We will take jobs[:i+1], and keep bisecting it until we find a job $\texttt{mid}$ such that mid.endTime $\leq$ i.startTime and (mid+1).endTime $>$ i.startTime, which precisely characterizes mid as the closest possible valid previous job to i.

Code:

def binarySearch(self, jobs, current_job):
    low = 0 
    high = current_job - 1
    while low <= high:
        mid = (low + high) // 2
        if jobs[mid][0] <= jobs[current_job][1]:
            if jobs[mid + 1][0] <= jobs[current_job][1]:
                low = mid + 1
            else:
                return mid
        else:
            high = mid - 1
    return -1

def jobScheduling(self, startTime, endTime, profit):
    jobs = sorted(zip(endTime, startTime, profit))
    dp = [0]*len(endTime)
    dp[0] = jobs[0][2]
    for i in range(1, len(jobs)):
        prev_job = self.binarySearch(jobs, i)
        no_pick = dp[i-1]
        pick = dp[prev_job] + jobs[i][2]
        dp[i] = max(no_pick, pick)
    return dp[-1]

Comments:

Time complexity for the optimized solution: O(nlog n) where n is the length of jobs.

I technically cheated to complete today’s challenge. Since my naive solution kept getting TLE, I checked some other suggestions in the discussion, and looked at people’s approach before fully understanding how to optimize the problem. The above optimized code runs in 668ms beats 52.39% of users. It uses 29.7MB of memory, beating 76.02% of users.