19. Dynamic Programming I: Fibonacci, Shortest Paths

19. Dynamic Programming I: Fibonacci, Shortest Paths


The following
content is provided under a Creative
Commons license. Your support will help MIT
OpenCourseWare continue to offer high quality
educational resources for free. To make a donation or
view additional materials from hundreds of MIT courses,
visit MIT OpenCourseWare at ocw.mit.edu. PROFESSOR: We’re going to start
a brand new, exciting topic, dynamic programming. AUDIENCE: Yes! PROFESSOR: Yeah! So exciting. Actually, I am really excited
because dynamic programming is my favorite thing in
the world, in algorithms. And it’s going to be
the next four lectures, it’s so exciting. It has lots of different facets. It’s a very general,
powerful design technique. We don’t talk a lot about
algorithm design in this class, but dynamic programming is
one that’s so important. And also takes a little
while to settle in. We like to injected it
into you now, in 006. So in general, our motivation
is designing new algorithms and dynamic programming,
also called DP, is a great way– or a
very general, powerful way to do this. It’s especially good, and
intended for, optimization problems, things
like shortest paths. You want to find the
best way to do something. Shortest path is you
want to find the shortest path, the minimum-length path. You want to minimize,
maximize something, that’s an optimization
problem, and typically good algorithms to solve them
involve dynamic programming. It’s a bit of a broad statement. You can also think of
dynamic programming as a kind of exhaustive search. Which is usually
a bad thing to do because it leads to
exponential time. But if you do it in a clever
way, via dynamic programming, you typically get
polynomial time. So one perspective is
that dynamic programming is approximately
careful brute force. It’s kind of a
funny combination. A bit of an oxymoron. But we take the idea of
brute force, which is, try all possibilities
and you do it carefully and you get it to
polynomial time. There are a lot of
problems where essentially the only known polynomial
time algorithm is via dynamic programming. It doesn’t always work,
there’s some problems where we don’t think there are
polynomial time algorithms, but when it’s possible
DP is a nice, sort of, general approach to it. And we’re going to be talking a
lot about dynamic programming. There’s a lot of different
ways to think about it. We’ll look at a few today. We’re going to warm up today
with some fairly easy problems that we already
know how to solve, namely computing
Fibonacci numbers. It’s pretty easy. And computing shortest paths. And then in the
next three lectures we’re going to get to
more interesting examples where it’s pretty
surprising that you can even solve the problem
in polynomial time. Probably the first burning
question on your mind, though, is why is it
called dynamic programming? What does that even mean? And I used to have this
spiel about, well, you know, programming
refers to the– I think it’s the British
notion of the word, where it’s about optimization. Optimization in American
English is something like programming
in British English, where you want to
set up the program– the schedule for your
trains or something, where programming
comes from originally. But I looked up the
actual history of, why is it called
dynamic programming. Dynamic programming was invented
by a guy named Richard Bellman. You may have heard of Bellman
in the Bellman-Ford algorithm. So this is actually the
precursor to Bellman-Ford. And we’re going to see
Bellman-Ford come up naturally in this setting. So here’s a quote about him. It says, Bellman
explained that he invented the name dynamic
programming to hide the fact that he was doing
mathematical research. He was working at this
place called Rand, and under a secretary of defense
who had a pathological fear and hatred for
the term research. So he settled on the
term dynamic programming because it would be
difficult to give a pejorative meaning to it. And because it was
something not even a congressman could object to. Basically, it sounded cool. So that’s the origin of the
name dynamic programming. So why is the called that? Who knows. I mean, now you know. But it’s not–
it’s a weird term. Just take it for what it is. It may make some
kind of sense, but– All right. So we are going to start
with this example of how to compute Fibonacci numbers. And maybe before
we actually start I’m going to give you
a sneak peak of what you can think of
dynamic programming as. And this equation,
so to speak, is going to change throughout
today’s lecture. In the end we’ll
settle on a sort of more accurate perspective. The basic idea of
dynamic programming is to take a problem,
split it into subproblems, solve those subproblems,
and reuse the solutions to your subproblems. It’s like a lesson in recycling. So we’ll see that in
Fibonacci numbers. So you remember
Fibonacci numbers, right? The number of rabbits you have
on day n, if they reproduce. We’ve mentioned them before,
we’re talking about AVL trees, I think. So this is the
usual– you can think of it as a recursive definition
or recurrence on Fibonacci numbers. It’s the definition of what
the nth Fibonacci number is. So let’s suppose our goal–
an algorithmic problem is, compute the nth
Fibonacci number. And I’m going to assume here
that that fits in a word. And so basic
arithmetic, addition, whatever’s constant
time per operation. So how do we do it? You all know how to do it. Anyways– but I’m going to give
you the dynamic programming perspective on things. So this will seem
kind of obvious, but it is– we’re going to apply
exactly the same principles that we will apply over and
over in dynamic programming. But here it’s in a
very familiar setting. So we’re going to start with
the naive recursive algorithm. And that is, if you want to
compute the nth Fibonacci number, you check whether
you’re in the base case. I’m going to write it this way. So f is just our return value. You’ll see why I write
it this way in a moment. Then you return f. In the base case
it’s 1, otherwise you recursively call
Fibonacci of n minus 1. You recursively call
Fibonacci of n minus 2. Add them together, return that. This is a correct algorithm. Is it a good algorithm? No. It’s very bad. Exponential time. How do we know it’s
exponential time, other than from experience? Well, we can write the
running time as recurrence. T of n represents the time
to compute the nth Fibonacci number. How can I write the recurrence? You’re gonna throwback to
the early lectures, divide and conquer. I hear whispers. Yeah? AUDIENCE: [INAUDIBLE]. PROFESSOR: Yeah. T of n minus 1 plus t of
n minus 2 plus constant. I don’t know how
many you have by now. So to create the
nth Fibonacci number we have to compute the n
minus first Fibonacci number, and the n minus second
Fibonacci number. That’s these two recursions. And then we take
constant time otherwise. We do constant number of
additions, comparisons. Return all these operations–
take constant time. So that’s a recurrence. How do we solve this recurrence? Well one way is to see this
is the Fibonacci recurrence. So it’s the same thing. There’s this plus whatever. But in particular, this is at
least the nth Fibonacci number. And if you know
Fibonacci stuff, that’s about the golden ratio
to the nth power. Which is bad. We had a similar
recurrence in AVL trees. And so another way
to solve it– it’s just good review–
say, oh well, that’s at least 2 times t of n minus 2. Because it’s going
to be monotone. The bigger n is, the
more work you have to do. Because to do the
nth thing you have to do the n minus first thing. So we could just reduce t of
n minus 1 to t of n minus 2. That will give us a lower bound. And now these two terms– now
this is sort of an easy thing. You see that you’re
multiplying by 2 each time. You’re subtracting
2 from n each time. How many times can
I subtract 2 from n? N/2 times, before I
get down to a constant. And so this is equal
to 2 to the n over 2– I mean, times some
constant, which is what you get
in the base case. So I guess I should say theta. This thing is theta that. OK. So it’s at least that big. And the right constant is phi. And the base of the exponent. OK. So that’s a bad algorithm. We all know it’s
a bad algorithm. But I’m going to give you a
general approach for making bad algorithms like this good. And that general approach
is called memoization. We’ll go over here. And this is a technique
of dynamic programming. So I’m going to call this the
memoized dynamic programming algorithm. So did I settle on
using memo in the notes? Yeah. The idea is simple. Whenever we compute
a Fibonacci number we put it in a dictionary. And then when we need to
compute the nth Fibonacci number we check, is it
already in the dictionary? Did we already
solve this problem? If so, return that answer. Otherwise, computer it. You’ll see the transformation
is very simple. OK. These two lines are
identical to these two lines. So you can see how
the transformation works in general. You could do this with
any recursive algorithm. The memoization transformation
on that algorithm– which is, we initially make an
empty dictionary called memo. And before we actually do
the computation we say, well, check whether this version
of the Fibonacci problem, computing f of n, is
already in our dictionary. So if that key is already
in the dictionary, we return the corresponding
value in the dictionary. And then once we’ve computed
the nth Fibonacci number, if we bothered to do this,
if this didn’t apply, then we store it
in the memo table. So we say well, if you ever
need to compute f of n again, here it is. And then we return that value. So this is a general procedure. It can apply to any
recursive algorithm with no side effects
I guess, technically. And it turns out, this makes
the algorithm efficient. Now there’s a lot of ways
to see why it’s efficient. In general, maybe it’s helpful
to think about the recursion tree. So if you want to compute
fn in the old algorithm, we compute fn minus
1 and fn minus two completely separately. To compute fn minus 1 we compute
fn minus 2 and fn minus 3. To compute fn minus 2 we compute
fn minus 3 and fn minus 4. And so on. And you can see why
that’s exponential in n. Because we’re only decrementing
n by one or two each time. But then you observe, hey,
these fn minus 3’s are the same. I should really only have
to compute them once. And that’s what
we’re doing here. The first time you call
fn minus 3, you do work. But once it’s done and you go
over to this other recursive call, this will
just get cut off. There’s no tree here. Here we might have
some recursive calling. Here we won’t, because it’s
already in the memo table. In fact, this already
happens with fn minus 2. This whole tree disappears
because fn minus 2 has already been done. OK. So it’s clear why
it improves things. So in fact you can argue
that this call will be free because you already
did the work in here. But I want to give you a very
particular way of thinking about why this is efficient,
which is following. So you could write down a
recurrence for the running time here. But in some sense recurrences
aren’t quite the right way of thinking about
this because recursion is kind of a rare thing. If you’re calling
Fibonacci of some value, k, you’re only going to make
recursive calls the first time you call Fibonacci of k. Because henceforth,
you’ve put it in the memo table
you will not recurse. So you can think of
there being two versions of calling Fibonacci of k. There’s the first time, which
is the non-memoized version that does recursion– does some work. And then every time
henceforth you’re doing memoized calls
of Fibonacci of k, and those cost constant time. So the memoized calls
cost constant time. So we can think of
them as basically free. That’s when you call
Fibonacci of n minus 2, because that’s a
memoized call, you really don’t pay anything for it. I mean, you’re already
paying constant time to do addition and whatever. So you don’t have to
worry about the time. There’s no recursion here. And then what we care
about is that the number of non-memorized calls,
which is the first time you call Fibonacci of k, is n. No theta is even necessary. We are going to
call Fibonacci of 1. At some point we’re going
to call Fibonacci of 2 at some point, and the original
call is Fibonacci of n. All of those things will
be called at some point. That’s pretty easy to see. But in particular,
certainly at most this, we never call
Fibonacci of n plus 1 to compute Fibonacci of n. So it’s at most n calls. Indeed it will be exactly n
calls that are not memoized. Those ones we have to pay for. How much do we have to pay? Well, if you don’t count
the recursion– which is what this recurrence
does– if you ignore recursion then the total amount
of work done here is constant. So I will say the non-recursive
work per call is constant. And therefore I claim
that the running time is constant– I’m sorry, is linear. Constant would be
pretty amazing. This is actually not the
best algorithm– as an aside. The best algorithm for computing
the nth Fibonacci number uses log n arithmetic
operations. So you can do better,
but if you want to see that you
should take 6046. OK. We’re just going to get
to linear today, which is a lot better
than exponential. So why linear? Because there’s n non-memoize
calls, and each of them cost constant. So it’s the product
of those two numbers. This is an important idea. And it’s so important
I’m going to write it down again in a slightly
more general framework. In general, in
dynamic programming– I didn’t say why it’s
called memoization. The idea is you have
this memo pad where you write down all
your scratch work. That’s this memo dictionary. And to memoize is to write
down on your memo pad. I didn’t make it up. Another crazy term. It means remember. And then you remember all the
solutions that you’ve done. And then you reuse
those solutions. Now these solutions are
not really a solution to the problem
that I care about. The problem I care about is
computing the nth Fibonacci number. To get there I had to compute
other Fibonacci numbers. Why? Because I had a
recursive formulation. This is not always the
way to solve a problem. But usually when you’re
solving something you can split it into parts,
into subproblems, we call them. They’re not always
of the same flavor as your original goal
problem, but there’s some kind of related parts. And this is the big challenge
in designing a dynamic program, is to figure out what
are the subproblems. Let’s say, the
first thing I want to know about a dynamic program,
is what are the subproblems. Somehow they are designed to
help solve your actual problem. And the idea of memoization is,
once you solve a subproblem, write down the answer. If you ever need to solve
that same problem again you reuse the answer. So that is the core idea. And so in this sense
dynamic programming is essentially recursion
plus memoization. And so in this case these
are the subproblems. Fibonacci of 1 through
Fibonacci of n. The one we care about
is Fibonacci of n. But to get there we solve
these other subproblems. In all cases, if this
is the situation– so for any dynamic program,
the running time is going to be equal to the
number of different subproblems you might have to solve,
or that you do solve, times the amount of time
you spend per subproblem. OK. In this situation we
had n subproblems. And for each of them
we spent constant time. And when I measure the
time per subproblem which, in the Fibonacci
case I claim is constant, I ignore recursive calls. That’s the key. We don’t have to
solve recurrences with dynamic programming. Yay. No recurrences necessary. OK. Don’t count recursions. Obviously, don’t count
memoized recursions. The reason is, I only
need to count them once. After the first time
I do it, it’s free. So I count how many different
subproblems do I need to do? These are they going to be
the expensive recursions where I do work, I do
some amount of work, but I don’t count the
recursions because otherwise I’d be double counting. I only want to count
each subproblem once, and then this will solve it. So a simple idea. In general, dynamic programming
is a super simple idea. It’s nothing fancy. It’s basically just memoization. There is one extra trick
we’re going to pull out, but that’s the idea. All right. Let me tell you
another perspective. This is the one maybe
most commonly taught. Is to think of– but I’m
not a particular fan of it. I really like memoization. I think it’s a simple idea. And as long as you
remember this formula here, it’s really easy to work with. But some people like to
think of it this way. And so you can pick whichever
way you find most intuitive. Instead of thinking of a
recursive algorithm, which in some sense starts at the
top of what you want to solve and works its way down,
you could do the reverse. You could start at the
bottom and work your way up. And this is probably
how you normally think about computing
Fibonacci numbers or how you learned it before. I’m going to write it
in a slightly funny way. The point I want to make
is that the transformation I’m doing from the naive
recursive algorithm, to the memoized algorithm,
to the bottom-up algorithm is completely automated. I’m not thinking,
I’m just doing. OK. It’s easy. This code is exactly
the same as this code and as that code, except
I replaced n by k. Just because I needed a couple
of different n values here. Or I want to iterate
over n values. And then there’s this
stuff around that code which is just formulaic. A little bit of thought
goes into this for loop, but that’s it. OK. This does exactly the same
thing as the memoized algorithm. Maybe it takes a
little bit of thinking to realize, if you unroll all
the recursion that’s happening here and just write
it out sequentially, this is exactly
what’s happening. This code does exactly the
same additions, exactly the same computations as this. The only difference
is how you get there. Here we’re using a loop,
here we’re using recursion. But the same things
happen in the same order. It’s really no difference
between the code. This code’s probably going
to be more efficient practice because you don’t make
function calls so much. In fact I made a
little mistake here. This is not a
function call, it’s just a lookup into a table. Here I’m using a hash
table to be simple, but of course you
could use an array. But they’re both constant
time with good hashing. All right. So is it clear
what this is doing? I think so. I think I made a little typo. So we have to compute–
oh, another typo. We have to compute f1 up to
fn, which in python is that. And we compute it
exactly how we used to. Except now, instead
of recursing, I know that when I’m computing
the k Fibonacci number– man. So many typos. AUDIENCE: [LAUGHTER] PROFESSOR: You
guys are laughing. When I compute the
kth Fibonacci number I know that I’ve already
computed the previous two. Why? Because I’m doing them
in increasing order. Nothing fancy. Then I can just do
this and the solutions will just be waiting there. If they work, I’d
get a key error. So I’d know that there’s a bug. But in fact, I won’t
get a key error. I will have always computed
these things already. Then I store it in my table. Then I iterate. Eventually I’ve solved all the
subproblems, f1 through fn. And the one I cared
about was the nth one. OK. So straightforward. I do this because
I don’t really want to have to go through
this transformation for every single problem we do. I’m doing it in Fibonacci
because it’s super easy to write the code
out explicitly. But you can do it for all
of the dynamic programs that we cover in the
next four lectures. OK. I’m going to give you
now the general case. This was the special
Fibonacci version. In general, the bottom-up does
exactly the same computation as the memoized version. And what we’re doing is
actually a topological sort of the subproblem
dependency DAG. So in this case, the
dependency DAG is very simple. In order to compute–
I’ll do it backwards. In order to compute fn,
I need to know fn minus 1 and fn minus 2. If I know those
I can compute fn. Then there’s fn
minus 3, which is necessary to compute this
one, and that one, and so on. So you see what
this DAG looks like. Now, I’ve drawn
it conveniently so all the edges go left to right. So this is a topological
order from left to right. And so I just need to do
f1, f2, up to fn in order. Usually it’s totally
obvious what order to solve the subproblems in. But in general, what
you should have in mind is that we are doing
a topological sort. Here we just did it in our
heads because it’s so easy. And usually it’s so easy. It’s just a for loop. Nothing fancy. All right. I’m missing an arrow. All right. Let’s do something a little
more interesting, shall we? All right. One thing you can do from
this bottom-up perspective is you can save space. Storage space in the algorithm. We don’t usually worry
about space in this class, but it matters in reality. So here we’re
building a table size, n, but in fact we
really only need to remember the last two values. So you could just store
the last two values, and each time you make a
new one delete the oldest. so by thinking a
little bit here you realize you only
need constant space. Still linear time,
but constant space. And that’s often the case. From the bottom-up
perspective you see what you really
need to store, what you need to keep track of. All right. I guess another nice thing
about this perspective is, the running time
is totally obvious. This is clearly constant time. So this is clearly linear time. Whereas, in this
memoized algorithm you have to think
about, when’s it going to be memoized,
when is it not? I still like this perspective
because, with this rule, just multiply a
number of subproblems by time per subproblem,
you get the answer. But it’s a little less
obvious than code like this. So choose however you
like to think about it. All right. We move onto shortest paths. So I’m again, as usual, thinking
about single-source shortest paths. So we want to compute the
shortest pathway from s to v for all v. OK. I’d like to write this initially
as a naive recursive algorithm, which I can then memoize,
which I can then bottom-upify. I just made that up. So how could I write this as
a naive recursive algorithm? It’s not so obvious. But first I’m going to tell you
how, just as an oracle tells you, here’s what you should do. But then we’re going to think
about– go back, step back. Actually, it’s up to you. I could tell you
the answer and then we could figure out
how we got there, or we could just
figure out the answer. Preferences? Figure it out. All right. Good. No divine inspiration allowed. So let me give you a tool. The tool is guessing. This may sound silly, but
it’s a very powerful tool. The general idea is, suppose
you don’t know something but you’d like to know it. So what’s the answer
to this question? I don’t know. Man, I really want a cushion. How am I going to
answer the question? Guess. OK? AUDIENCE: [LAUGHTER] PROFESSOR: It’s a
tried and tested method for solving any problem. I’m kind of belaboring
the point here. The algorithmic concept is,
don’t just try any guess. Try them all. OK? AUDIENCE: [LAUGHTER] PROFESSOR: Also pretty simple. I said dynamic
programming was simple. OK. Try all guesses. This is central to the
dynamic programming. I know it sounds obvious, but if
I want to fix my equation here, dynamic programming is roughly
recursion plus memoization. This should really
be, plus guessing. Memoization, which is obvious,
guessing which is obvious, are the central concepts
to dynamic programming. I’m trying to make it sound
easy because usually people have trouble with
dynamic programming. It is easy. Try all the guesses. That’s something a
computer can do great. This is the brute force part. OK. But we’re going to
do it carefully. Not that carefully. I mean, we’re just
trying all the guesses. Take the best one. That’s kind of important
that we can choose one to be called best. That’s why dynamic
programming is good for optimization problems. You want to maximize
something, minimize something, you try them all and then you
can forget about all of them and just reduce it
down to one thing which is the best one, or a best one. OK. So now I want you
to try to apply this principle to
shortest paths. Now I’m going to draw a
picture which may help. We have the source, s,
we have some vertex, v. We’d like to
find the shortest– a shortest path from s to v. Suppose I want to know
what this shortest path is. Suppose this was it. You have an idea already? Yeah. AUDIENCE: What you could do is
you could look at everywhere you can go from s. [INAUDIBLE] shortest path
of each of those notes. PROFESSOR: Good. So I can look at all the
places I could go from s, and then look at the shortest
paths from there to v. So we could call this s prime. So here’s the idea. There’s some hypothetical
shortest path. I don’t know where
it goes first, so I will guess
where it goes first. I know the first
edge must be one of the outgoing edges from s. I don’t know which one. Try them all. Very simple idea. Then from each of
those, if somehow I can compute the shortest
path from there to v, just do that and
take the best choice for what that first edge was. So this would be the
guess first edge approach. It’s a very good idea. Not quite the one I wanted
because unfortunately that changes s. And so this would
work, it would just be slightly less
efficient if I’m solving single-source
shortest paths. So I’m going to tweak
that idea slightly by guessing the last edge
instead of the first edge. They’re really equivalent. If I was doing this
I’d essentially be solving a single-target
shortest paths, which we talked about before. So I’m going to draw
the same picture. I want to get to v. I’m
going to guess the last edge, call it uv. I know it’s one of the incoming
edges to v– unless s equals v, then there’s a special case. As long as this path has
length of at least 1, there’s some last edge. What is it? I don’t know. Guess. Guess all the possible
incoming edges to v, and then recursively compute the
shortest path from s to u. And then add on the edge v. OK. So what is this shortest path? It’s delta of s comma
u, which looks the same. It’s another subproblem
that I want to solve. There’s v subproblems
here I care about. . So that’s good. I take that. I add on the weight
of the edge uv. And that should hopefully
give me delta of s comma v. Well, if I was lucky and I
guessed the right choice of u. In reality, I’m not lucky. So I have to minimize
over all edges uv. So this is the–
we’re minimizing over the choice of u. V is already given here. So I take the minimum over
all edges of the shortest path from s to u, plus
the weight of the edge uv. That should give me the shortest
path because this gave me the shortest path from s to u. Then I added on the edge
I need to get there. And wherever the shortest path
is, it uses some last edge, uv. There’s got to be some choice
of u that is the right one. That’s the good guess
that we’re hoping for. We don’t know what
the good guess is so we just try them all. But whatever it is, this will
be the weight of that path. It’s going to take
the best path from s to u because sub
paths are shortest paths are shortest paths. Optimal substructure. So this part will
be delta of su. This part is obviously w of uv. So this will give
the right answer. Hopefully. OK. It’s certainly
going to– I mean, this is the analog of the
naive recursive algorithm for Fibonacci. So it’s not going to be
efficient if I– I mean, this is an algorithm, right? You could say– this
is a recursive call. We’re going to treat this
as recursive call instead of just a definition. Then this is a
recursive algorithm. How good or bad is this
recursive algorithm? AUDIENCE: Terrible. PROFESSOR: Terrible. Very good. Very bad, I should say. It’s definitely going
to be exponential without memoization. But we know. We know how to make
algorithms better. We memoize. OK. So I think you know how to write
this as a memoized algorithm. To define the function delta
of sv, you first check, is s comma v in the memo table? If so return that value. Otherwise, do this computation
where this is a recursive call and then stored it
in the memo table. OK. I don’t think I need
to write that down. It’s just like the
memoized code over there. Just there’s now two
arguments instead of one. In fact, s isn’t changing. So I only need to store
with v instead of s comma v. Is that a good algorithm? I claim memoization
makes everything faster. Is that a fast algorithm? Not so obvious, I guess. Yes? How many people think, yes,
that’s a good algorithm? AUDIENCE: Better. PROFESSOR: Better. Definitely better. Can’t be worse. How many people think it’s
a bad algorithm still? OK. So three for yes, zero for no. How many people aren’t sure? Including the yes votes? Good. All right. It’s not so tricky. Let me draw you a graph. Something like that. So we wanted to commit
delta of s comma v. Let me give these
guys names, a and b. So we compute delta of
s comma v. To compute that we need to know delta
of s comma a and delta of s comma v. All right? Those are the two ways– sorry,
actually we just need one. Only one incoming edge to v.
So its delta of s comma a. Sorry– I should have
put a base case here too. Delta of s comma s equals 0. OK. Delta of s comma
a plus the edge. OK. There is some
shortest path to a. To compute the
shortest path to a we look at all the
incoming edges to a. There’s only one. So delta of s comma b. Now I want to compute the
shortest paths from b. Well, there’s two
ways to get to b. One of them is delta of s
comma b– sorry, s comma s. Came from s. The other way is delta of s
comma v. Do you see a problem? Yeah. Delta of s comma v is what
we were trying to figure out. Now you might say, oh,
it’s OK because we’re going to memoize our
answer to delta s comma v and then we can reuse it here. Except, we haven’t finished
computing delta of s comma v. We can only put it in
the memo table once we’re done. So when this call happens the
memo table has not been set. And we’re going to
do the same thing over and over and over again. This is an infinite algorithm. Oops. Not so hot. So it’s going to be infinite
time on graphs with cycles. OK. For DAGs, for acyclic graphs, it
actually runs in v plus e time. This is the good case. In this situation we
can use this formula. The time is equal to the
number of subproblems times the time per subproblem. So I guess we have to think
about that a little bit. Where’s my code? Here’s my code. Number of subproblems
is v. There’s v different subproblems
that I’m using here. I’m always reusing
subproblems of the form delta s comma something. The something could be
any of the v vertices. How much time do I
spend per subproblem? That’s a little tricky. It’s the number
of incoming edges to v. So time for a
sub problem delta of sv is the indegree of v. The
number of incoming edges to v. So this depends on
v. So I can’t just take a straightforward
product here. What this is really
saying is, you should sum up over
all sub problems of the time per sub problem. So total time is the sum over
all v and v, the indegree of v. And we know this
is number of edges. It’s really– so indegree
plus 1, indegree plus 1. So this is v plus v. OK. Handshaking again. OK. Now we already knew an algorithm
for shortest paths and DAGs. And it ran a v plus e time. So it’s another way
to do the same thing. If you think about
it long enough, this algorithm
memoized, is essentially doing a depth first search
to do a topological sort to run one round
of Bellman-Ford. So we had topological sort
plus one round of Bellman-Ford. This is kind of it
all rolled into one. This should look kind of like
the Bellman Ford relaxation step, or shortest
paths relaxation step. It is. This min is really
doing the same thing. So it’s really the
same algorithm. But we come at it from
a different perspective. OK. But I claim I can use
this same approach to solve shortest paths in
general graphs, even when they have cycles. How am I going to do that? DAGs seem fine– oh, what
was the lesson learned here? Lesson learned is that
subproblem dependencies should be acyclic. Otherwise, we get an
infinite algorithm. For memoization to work
this is what you need. It’s all you need. OK. We’ve almost seen this already. Because I said that, to
do a bottom up algorithm you do a topological sort of
this subproblem dependency DAG. I already said it
should be acyclic. OK. We just forgot. I didn’t tell you yet. So for that to work
it better be acyclic. For DP to work, for memoization
to work, it better be acyclic. If you’re acyclic then
this is the running time. So that’s all general. OK. So somehow I need to
take a cyclic graph and make it acyclic. We’ve actually done this
already in recitation. So if I have a
graph– let’s take a very simple cyclic graph. OK. One thing I could do is explode
it into multiple layers. We did this on quiz
two in various forms. It’s like the only cool thing
you can do with shortest paths, I feel like. If you want to make a
shortest path problem harder, require that you reduce your
graph to k copies of the graph. I’m going to do it
in a particular way here– which I think you’ve
seen in recitation– which is to think of this axis as
time, or however you want, and make all of the
edges go from each layer to the next layer. This should be a
familiar technique. So the idea is,
every time I follow an edge I go down
to the next layer. This makes any graph acyclic. Done. What in the world
does this mean? What is it doing? What does it mean? Double rainbow. All right. AUDIENCE: [LAUGHTER] PROFESSOR: So– I
don’t know how I’ve gone so long in the
semester without referring to double rainbow. It used to be my favorite. All right. So here’s what it means. Delta sub k of sv. I’m going to define
this first– this is a new kind of
subproblem– which is, what is the shortest– what
is the weight of the shortest s to v path that uses,
at most, k edges. So I want it to be shortest
in terms of total weight, but I also want it to
use few edges total. So this is going to be 0. In some sense, if you
look at– so here’s s and I’m always going
to make s this. And then this is going to
be v in the zero situation. This is going to be v
in the one situation, v– so if I look at this
v, I look at the shortest path from s to v, that
is delta sub 0 of sv. So maybe I’ll call this v
sub 0, v sub 1, v sub 2. OK. Shortest path from
here to here is, there’s no way to
get there on 0 edges. Shortest path from
here to here, that is the best way to get there
with, at most, one edge. Shortest path from
here to here– well, if I add some
vertical edges too, I guess, cheating a little bit. Then this is the best
way to get from s to v using at most two edges. And then you get
a recurrence which is the min over all last edges. So I’m just copying
that recurrence, but realizing that the s to
u part uses one fewer edge. And then I use the edge uv. OK. That’s our new recurrence. By adding this k
parameter I’ve made this recurrence on
subproblems acyclic. Unfortunately, I’ve increased
the number of subproblems. The number of subproblems
now is v squared. Technically, v times v minus 1. Because I really–
actually, v squared. Sorry. I start at 0. And what I care about, my goal,
is delta sub v minus 1 of sv. Because by
Bellman-Ford analysis I know that I only care about
simple paths, paths of length at most v minus 1. I’m assuming here no
negative weight cycles. I should’ve said that earlier. If you assume that, then
this is what I care about. So k ranges from 0 to v minus 1. So there are v choices for k. There are v choices for v.
So the number of subproblems is v squared. How much time do I
spend per subproblem? Well, the same as before. The indegree– where
did I write it? Up here– the indegree
of that problem. So what I’m really
doing is summing over all v of the indegree. And then I multiply it by
v. So the running time, total running time is ve. Sound familiar? This is Bellman-Ford’s
algorithm again. And this is actually where
Bellman-Ford algorithm came from is this view
on dynamic programming. So we’re seeing yet another
way to do Bellman-Ford. It may seem familiar. But in the next
three lectures we’re going to see a whole
bunch of problems that can succumb to
the same approach. And that’s super cool.

97 thoughts on “19. Dynamic Programming I: Fibonacci, Shortest Paths”

  1. 0:00~31:00 DP 설명.

    – 벨만 포드 알고리즘 만든 벨만이 다이나믹 프로그래밍이라는 이름을 창시했다. 벨만이 다이나믹 프로그래밍이라고 이름 지은 이유는 별 뜻 없고 '그냥 멋있어 보여서'다.

    – 하나의 문제를 완전탐색으로 재귀적으로 풀 때 그 문제의 부분문제(Subproblems)의 답을 재활용하는 기법이 메모아이제이션(Memoization)이다. 재귀로 문제를 풀면 exponential 시간이 소요되는데, 한 번 풀었던 문제의 답을 메모로 적어 놓고 다시 필요할 때 반환함으로써 그 문제 풀이는 O(1) 시간밖에 걸리지 않는다.

    – DP에는 재귀적인 구현인 하향식 접근(Top-down approach) 외에도 반복문을 이용한 상향식 접근(Bottom-up approach) 방법이 있다. 상향식은 함수 호출을 적게 하니까 스택을 적게 먹는 장점이 있겠지.

    – 시간복잡도: 부분 문제의 갯수 * 부분 문제를 푸는 데에 걸리는 시간(메모 해 놔서 재귀 안 걸리는 데는 O(1)임)

    ———-

    31:00 이후 최단 경로 설명.

    – 오랫동안 잊고 있었던 그래프 이론이 등장함.
    /
    – 정점(vertex, node), 간선(edge, link, line)
    – 차수(degree): 어떤 정점의 간선의 수.
    /
    – DAG (Directed Acyclic Graph): 비순환 유향(방향) 그래프
    – 내차수(indegree): 방향 그래프에서 어떤 정점으로 들어오는 간선(incoming edge)의 수.
    – 외차수(outdegree): 방향 그래프에서 어떤 정점에서 나가는 간선(outgoing edge)의 수.
    – 위상 정렬(Topological sorting): 선수과목 구조처럼 보이게 DAG을 왼쪽에서 오른쪽으로 한 방향으로 죽 늘어 놓는 것.

    – 출발점에서 나가는 간선을 기준으로 부분문제를 만들 수도 있고 목적지로 들어오는 간선을 기준으로 부분문제를 만들 수도 있음

    – 사이클이 있는 그래프에서 시간복잡도는 무한대임.
    – DAG의 경우 시간복잡도: O(V+E)
    – DP 쓰려면 부분 문제는 비순환이어야 한다.

  2. 20:00 what about searching the created dictionary? That will take asymptotically n operations, so the result should still be exponential.

  3. @ time of 10.15 you reduced t(n-1) to t(n-2) that will give us the lower bound is it possible to increase t(n-2) to t(n-1)..for solving the same recurrence relation …

  4. I’m still a little confused as to how the cyclic graph algorithm works as well as the complexity for it. Could someone clear this up for me?

  5. Do they teach this in College?? We in India study these in High School only. Not in school but as hobby. BTW Great Lecture.

  6. I believe the lookup in the memo is not completely free: it could take `O(log(n))` if the data-structure used is a map/BST/etc . HashMap or suchlikes makes things better though but that should explicitly be pointed out as well.

  7. Really…did that girl who came in late have to walk in front of everyone?! If you are late then you should go ALL THE WAY IN THE BACK

  8. One mistake about fibonacci. The best algorithm to compute fib(n) is constant time using the explicit formula ((1+sqrt5)^n-(1-sqrt5)^n)/(2^n * sqrt5). Of course, you would prefer something like Mathematica that can do symbolic algebra like sqrt5*sqrt5=5 rather than converting first to floating point and doing estimations.

  9. Might anyone be able to explain his step about completing the running time of O(V + E). He states that the total time is the sum over all sub problems of the time per subproblem. He writes the total time = Sum of v in V for indegree(v) = O(E). That part makes sense to me. What I don't understand is when he says "+ 1" , that this evaluates to O(E + V). What does he mean by "+ 1". Is the "+1" supposed to be for a single vertex, so summing over all vertices leads to "+1" v times, or V?

  10. I understand that the time per subproblem is indegree(v), or the number of incoming edges to V. But why does he then change that to indegree(v) + 1? What is the "+ 1" for?

  11. Thank you guys. I am doing my masters though I couldn't get in MIT, you give me exposure to top class teaching materials.

  12. This is great courses ! v :0) my 2 cent on Fib tho, is a bit bad and good example, bad maybe because it can be easily seen has a linear recursive, so no really needs for special memoise structure to solve we can use an array or the recursive stack to hold that linear memoise , but this is not the case generally for trees or a graphes where memoise often go side by side with the algo . Fib is good because we can make the transition from recursive to iterative went seeing that recursion (in a really world stack an recursion are not infinit (cpu stacks are usually really optimise tho) ). So study you problem first and dont create special structure when none a really need. here my exemple where I move the memoise on the stack.

    def fib(n):

    def f(n1):
    if n1 == 1:
    return((1,0))
    else:
    nn1,nn2 = f(n1-1)
    return((nn1+nn2, nn1))

    if n == 0: return(0)
    r,x = f(n)
    return(r)

    for x in range(0, 11):
    y = fib(x)

  13. 21:16 Please edit that. To many "typos", i spent mucht time to pause that video, just to realize that it's an error in the video.

  14. Instead of the prof writing on the blackboard, wouldn't it save lots of time to just project his notes or give out a pdf before class.

  15. Is it just on my broswer that this lecture title and description are in Chinese?!!!! I feel weird because all other lecture titles in the playlist are in English.

  16. Why is it called dynamic programming when it is basically divide and conquer, and in the case of using recursion, it is basically cached recursion? The term dynamic is generally used more for something that is not fixed such as dynamic routing when you drive to work. You do not insist on driving the same exact route, you instead decide which route to take based on traffic patterns, construction… as you are driving. In contrast, a static algorithm would be something "hardwired" that is not flexible such as a fixed sorting algorithm that doesn't "care" what the input is, it just sorts it using the same method each time. Computer Science (in my opinion) uses some "whacked out" (inaccurate) terms.

  17. First of all, other than for illustrative purposes, why the hell would you ever use recursion to solve Fibonacci numbers when a simple iterative algorithm can be used?

    Secondly, trying all of the possible guesses is VERY bad for some problems cuz the number of possible guesses is astronomical.

  18. 43:09 "Infinite algorithm" is an improper term since part the definition of an algorithm is that is must terminate. Very bad use of terminology.

  19. Easy solution to shortest path problem… stay the "F" home. (don't go anywhere). There is nothing shorter than nowhere.

  20. I'm an old-time programmer who learned his trade in the trenches, not at school, and all this seems unnecessarily complicated to me. Recursion is conceptually elegant, but inefficient if for no other reason that it requires the overhead of a function call to do what could be done in a simple loop using local variables. The "memoization" idea seems to involve creating a global variable as a way of sneaking local variables past the thought police.

    When it comes down to it, everything boils down to machine code, and the most efficient algorithm is the one that requires the fewest instruction cycles. We weren't necessarily worse programmers than the computer scientists, but we had to contend with real-world limitations. Primitive equipment required primitive methods.

    The first computer I worked with had 4000 6-bit characters of core storage (aka RAM), instruction timings in milliseconds, and no OS. A good programmer was one who could squeeze every ounce of performance out of that beast, and since the biggest program you could write wasn't much more than 1000 lines, if that, you didn't have to worry about spaghettification, memory management and all the rest of it. We evolved as the equipment evolved, and we came up with methods that the computer scientists might have learned from. Now you have code that struggles to get by with a mere 8Gb of RAM and runs slowly on a 3GHz CPU.

  21. I googled the word " nerd" this guys picture showed up….. no shit, understandably, I must be a nerd to actually understand what hes teaching… great style I might add.

  22. A program written in python based on video content. https://github.com/worksking/algrithm/blob/master/dynamic_program_fibonacci.py

  23. In the last part, if total distinct sub problems are |V|^2 and for each sub problem we are spending O(indegree) time, then how can total overall time complexity be O(VE) ?

  24. I don’t see how the DAG shortest path recurrence performs a topological sort. Don’t you need to perform a topological sort on the DAG separately, and THEN do the recurrence?

  25. Powerful, DP technique.

    DP = "careful brute force".

    This is why I never mind the screams: I know I'm doing it right.

  26. I wish my prof taught my algo class was this excited about any of the algos, instead of sitting there hating life

  27. Instead of making the algorithm O(VE), we can have a bool array that stores that this point is being traversed upon/being used to calculate a point already, and then use our analysis that number of edges must reduce in each step. So we basically just skip if we come back to the same edge any time. Does this algorithm work?

  28. Can someone explain this to me?
    He calculated the time taken by each subproblem (In shortest path) is O(|V| + |E|), right.
    Now he redefined the the problem by giving the "k" parameter to solve the cyclic problem, which gave him #subproblems = O(|V|.|V|), one for k and other for v.

    So how come when he multiplied #sub-problems x time / subproblem, the answer become O(|V||E|)??

  29. youtube facebook google windows theoretical computer science
    epr paradox and discrete mathematics graph theory: vertex edge,…etc.

  30. K, so here is one question/concept that seems to arise from this lecture. If you can traverse an expansive abstract DAG from the bottom up (base case/root to solution), then you have a nice linear solution without memoization that you probably never really needed dynamic programming for. If you can't, you try to recurse backwards to create a similar linearity. Recursing backwards while memoizing repeated computations in a hypothetical call tree, i.e. utilizing optimal substructure to create this effective linearity while consuming linear memory.
    My question would be: is there a notion of forward, reverse or both optimal substructure? Let's say a sorted list would seem to have both. You can easily follow it from either end to the other. This is a hard concept to express as a question. But if the best solution can be found from guessing from the first node forward or from the last node backwards, what makes one more efficient than the other?

  31. 26:10 "So is this clear what this is doing? I think so."
    [Then corrects the 2nd, 3rd, 4th and 5th typos that prevented comprehension. 🙂 ]

  32. I watched this guys couple of old lectures quite sometime back and he made me fall in love with dynamic programming. Love this guy.

  33. Fibonacci – it looks like fucking around. For n>1 the Fn is:
    a=0, b=1
    for i=2 to n
    Fn = a+b
    a=b
    b=Fn
    end
    Better use your brain and something like Knuth's "The Art of Computer Programming".

Leave a Reply

Your email address will not be published. Required fields are marked *