The following

content is provided under a Creative

Commons license. Your support will help MIT

OpenCourseWare continue to offer high quality

educational resources for free. To make a donation or

view additional materials from hundreds of MIT courses,

visit MIT OpenCourseWare at ocw.mit.edu. PROFESSOR: We’re going to start

a brand new, exciting topic, dynamic programming. AUDIENCE: Yes! PROFESSOR: Yeah! So exciting. Actually, I am really excited

because dynamic programming is my favorite thing in

the world, in algorithms. And it’s going to be

the next four lectures, it’s so exciting. It has lots of different facets. It’s a very general,

powerful design technique. We don’t talk a lot about

algorithm design in this class, but dynamic programming is

one that’s so important. And also takes a little

while to settle in. We like to injected it

into you now, in 006. So in general, our motivation

is designing new algorithms and dynamic programming,

also called DP, is a great way– or a

very general, powerful way to do this. It’s especially good, and

intended for, optimization problems, things

like shortest paths. You want to find the

best way to do something. Shortest path is you

want to find the shortest path, the minimum-length path. You want to minimize,

maximize something, that’s an optimization

problem, and typically good algorithms to solve them

involve dynamic programming. It’s a bit of a broad statement. You can also think of

dynamic programming as a kind of exhaustive search. Which is usually

a bad thing to do because it leads to

exponential time. But if you do it in a clever

way, via dynamic programming, you typically get

polynomial time. So one perspective is

that dynamic programming is approximately

careful brute force. It’s kind of a

funny combination. A bit of an oxymoron. But we take the idea of

brute force, which is, try all possibilities

and you do it carefully and you get it to

polynomial time. There are a lot of

problems where essentially the only known polynomial

time algorithm is via dynamic programming. It doesn’t always work,

there’s some problems where we don’t think there are

polynomial time algorithms, but when it’s possible

DP is a nice, sort of, general approach to it. And we’re going to be talking a

lot about dynamic programming. There’s a lot of different

ways to think about it. We’ll look at a few today. We’re going to warm up today

with some fairly easy problems that we already

know how to solve, namely computing

Fibonacci numbers. It’s pretty easy. And computing shortest paths. And then in the

next three lectures we’re going to get to

more interesting examples where it’s pretty

surprising that you can even solve the problem

in polynomial time. Probably the first burning

question on your mind, though, is why is it

called dynamic programming? What does that even mean? And I used to have this

spiel about, well, you know, programming

refers to the– I think it’s the British

notion of the word, where it’s about optimization. Optimization in American

English is something like programming

in British English, where you want to

set up the program– the schedule for your

trains or something, where programming

comes from originally. But I looked up the

actual history of, why is it called

dynamic programming. Dynamic programming was invented

by a guy named Richard Bellman. You may have heard of Bellman

in the Bellman-Ford algorithm. So this is actually the

precursor to Bellman-Ford. And we’re going to see

Bellman-Ford come up naturally in this setting. So here’s a quote about him. It says, Bellman

explained that he invented the name dynamic

programming to hide the fact that he was doing

mathematical research. He was working at this

place called Rand, and under a secretary of defense

who had a pathological fear and hatred for

the term research. So he settled on the

term dynamic programming because it would be

difficult to give a pejorative meaning to it. And because it was

something not even a congressman could object to. Basically, it sounded cool. So that’s the origin of the

name dynamic programming. So why is the called that? Who knows. I mean, now you know. But it’s not–

it’s a weird term. Just take it for what it is. It may make some

kind of sense, but– All right. So we are going to start

with this example of how to compute Fibonacci numbers. And maybe before

we actually start I’m going to give you

a sneak peak of what you can think of

dynamic programming as. And this equation,

so to speak, is going to change throughout

today’s lecture. In the end we’ll

settle on a sort of more accurate perspective. The basic idea of

dynamic programming is to take a problem,

split it into subproblems, solve those subproblems,

and reuse the solutions to your subproblems. It’s like a lesson in recycling. So we’ll see that in

Fibonacci numbers. So you remember

Fibonacci numbers, right? The number of rabbits you have

on day n, if they reproduce. We’ve mentioned them before,

we’re talking about AVL trees, I think. So this is the

usual– you can think of it as a recursive definition

or recurrence on Fibonacci numbers. It’s the definition of what

the nth Fibonacci number is. So let’s suppose our goal–

an algorithmic problem is, compute the nth

Fibonacci number. And I’m going to assume here

that that fits in a word. And so basic

arithmetic, addition, whatever’s constant

time per operation. So how do we do it? You all know how to do it. Anyways– but I’m going to give

you the dynamic programming perspective on things. So this will seem

kind of obvious, but it is– we’re going to apply

exactly the same principles that we will apply over and

over in dynamic programming. But here it’s in a

very familiar setting. So we’re going to start with

the naive recursive algorithm. And that is, if you want to

compute the nth Fibonacci number, you check whether

you’re in the base case. I’m going to write it this way. So f is just our return value. You’ll see why I write

it this way in a moment. Then you return f. In the base case

it’s 1, otherwise you recursively call

Fibonacci of n minus 1. You recursively call

Fibonacci of n minus 2. Add them together, return that. This is a correct algorithm. Is it a good algorithm? No. It’s very bad. Exponential time. How do we know it’s

exponential time, other than from experience? Well, we can write the

running time as recurrence. T of n represents the time

to compute the nth Fibonacci number. How can I write the recurrence? You’re gonna throwback to

the early lectures, divide and conquer. I hear whispers. Yeah? AUDIENCE: [INAUDIBLE]. PROFESSOR: Yeah. T of n minus 1 plus t of

n minus 2 plus constant. I don’t know how

many you have by now. So to create the

nth Fibonacci number we have to compute the n

minus first Fibonacci number, and the n minus second

Fibonacci number. That’s these two recursions. And then we take

constant time otherwise. We do constant number of

additions, comparisons. Return all these operations–

take constant time. So that’s a recurrence. How do we solve this recurrence? Well one way is to see this

is the Fibonacci recurrence. So it’s the same thing. There’s this plus whatever. But in particular, this is at

least the nth Fibonacci number. And if you know

Fibonacci stuff, that’s about the golden ratio

to the nth power. Which is bad. We had a similar

recurrence in AVL trees. And so another way

to solve it– it’s just good review–

say, oh well, that’s at least 2 times t of n minus 2. Because it’s going

to be monotone. The bigger n is, the

more work you have to do. Because to do the

nth thing you have to do the n minus first thing. So we could just reduce t of

n minus 1 to t of n minus 2. That will give us a lower bound. And now these two terms– now

this is sort of an easy thing. You see that you’re

multiplying by 2 each time. You’re subtracting

2 from n each time. How many times can

I subtract 2 from n? N/2 times, before I

get down to a constant. And so this is equal

to 2 to the n over 2– I mean, times some

constant, which is what you get

in the base case. So I guess I should say theta. This thing is theta that. OK. So it’s at least that big. And the right constant is phi. And the base of the exponent. OK. So that’s a bad algorithm. We all know it’s

a bad algorithm. But I’m going to give you a

general approach for making bad algorithms like this good. And that general approach

is called memoization. We’ll go over here. And this is a technique

of dynamic programming. So I’m going to call this the

memoized dynamic programming algorithm. So did I settle on

using memo in the notes? Yeah. The idea is simple. Whenever we compute

a Fibonacci number we put it in a dictionary. And then when we need to

compute the nth Fibonacci number we check, is it

already in the dictionary? Did we already

solve this problem? If so, return that answer. Otherwise, computer it. You’ll see the transformation

is very simple. OK. These two lines are

identical to these two lines. So you can see how

the transformation works in general. You could do this with

any recursive algorithm. The memoization transformation

on that algorithm– which is, we initially make an

empty dictionary called memo. And before we actually do

the computation we say, well, check whether this version

of the Fibonacci problem, computing f of n, is

already in our dictionary. So if that key is already

in the dictionary, we return the corresponding

value in the dictionary. And then once we’ve computed

the nth Fibonacci number, if we bothered to do this,

if this didn’t apply, then we store it

in the memo table. So we say well, if you ever

need to compute f of n again, here it is. And then we return that value. So this is a general procedure. It can apply to any

recursive algorithm with no side effects

I guess, technically. And it turns out, this makes

the algorithm efficient. Now there’s a lot of ways

to see why it’s efficient. In general, maybe it’s helpful

to think about the recursion tree. So if you want to compute

fn in the old algorithm, we compute fn minus

1 and fn minus two completely separately. To compute fn minus 1 we compute

fn minus 2 and fn minus 3. To compute fn minus 2 we compute

fn minus 3 and fn minus 4. And so on. And you can see why

that’s exponential in n. Because we’re only decrementing

n by one or two each time. But then you observe, hey,

these fn minus 3’s are the same. I should really only have

to compute them once. And that’s what

we’re doing here. The first time you call

fn minus 3, you do work. But once it’s done and you go

over to this other recursive call, this will

just get cut off. There’s no tree here. Here we might have

some recursive calling. Here we won’t, because it’s

already in the memo table. In fact, this already

happens with fn minus 2. This whole tree disappears

because fn minus 2 has already been done. OK. So it’s clear why

it improves things. So in fact you can argue

that this call will be free because you already

did the work in here. But I want to give you a very

particular way of thinking about why this is efficient,

which is following. So you could write down a

recurrence for the running time here. But in some sense recurrences

aren’t quite the right way of thinking about

this because recursion is kind of a rare thing. If you’re calling

Fibonacci of some value, k, you’re only going to make

recursive calls the first time you call Fibonacci of k. Because henceforth,

you’ve put it in the memo table

you will not recurse. So you can think of

there being two versions of calling Fibonacci of k. There’s the first time, which

is the non-memoized version that does recursion– does some work. And then every time

henceforth you’re doing memoized calls

of Fibonacci of k, and those cost constant time. So the memoized calls

cost constant time. So we can think of

them as basically free. That’s when you call

Fibonacci of n minus 2, because that’s a

memoized call, you really don’t pay anything for it. I mean, you’re already

paying constant time to do addition and whatever. So you don’t have to

worry about the time. There’s no recursion here. And then what we care

about is that the number of non-memorized calls,

which is the first time you call Fibonacci of k, is n. No theta is even necessary. We are going to

call Fibonacci of 1. At some point we’re going

to call Fibonacci of 2 at some point, and the original

call is Fibonacci of n. All of those things will

be called at some point. That’s pretty easy to see. But in particular,

certainly at most this, we never call

Fibonacci of n plus 1 to compute Fibonacci of n. So it’s at most n calls. Indeed it will be exactly n

calls that are not memoized. Those ones we have to pay for. How much do we have to pay? Well, if you don’t count

the recursion– which is what this recurrence

does– if you ignore recursion then the total amount

of work done here is constant. So I will say the non-recursive

work per call is constant. And therefore I claim

that the running time is constant– I’m sorry, is linear. Constant would be

pretty amazing. This is actually not the

best algorithm– as an aside. The best algorithm for computing

the nth Fibonacci number uses log n arithmetic

operations. So you can do better,

but if you want to see that you

should take 6046. OK. We’re just going to get

to linear today, which is a lot better

than exponential. So why linear? Because there’s n non-memoize

calls, and each of them cost constant. So it’s the product

of those two numbers. This is an important idea. And it’s so important

I’m going to write it down again in a slightly

more general framework. In general, in

dynamic programming– I didn’t say why it’s

called memoization. The idea is you have

this memo pad where you write down all

your scratch work. That’s this memo dictionary. And to memoize is to write

down on your memo pad. I didn’t make it up. Another crazy term. It means remember. And then you remember all the

solutions that you’ve done. And then you reuse

those solutions. Now these solutions are

not really a solution to the problem

that I care about. The problem I care about is

computing the nth Fibonacci number. To get there I had to compute

other Fibonacci numbers. Why? Because I had a

recursive formulation. This is not always the

way to solve a problem. But usually when you’re

solving something you can split it into parts,

into subproblems, we call them. They’re not always

of the same flavor as your original goal

problem, but there’s some kind of related parts. And this is the big challenge

in designing a dynamic program, is to figure out what

are the subproblems. Let’s say, the

first thing I want to know about a dynamic program,

is what are the subproblems. Somehow they are designed to

help solve your actual problem. And the idea of memoization is,

once you solve a subproblem, write down the answer. If you ever need to solve

that same problem again you reuse the answer. So that is the core idea. And so in this sense

dynamic programming is essentially recursion

plus memoization. And so in this case these

are the subproblems. Fibonacci of 1 through

Fibonacci of n. The one we care about

is Fibonacci of n. But to get there we solve

these other subproblems. In all cases, if this

is the situation– so for any dynamic program,

the running time is going to be equal to the

number of different subproblems you might have to solve,

or that you do solve, times the amount of time

you spend per subproblem. OK. In this situation we

had n subproblems. And for each of them

we spent constant time. And when I measure the

time per subproblem which, in the Fibonacci

case I claim is constant, I ignore recursive calls. That’s the key. We don’t have to

solve recurrences with dynamic programming. Yay. No recurrences necessary. OK. Don’t count recursions. Obviously, don’t count

memoized recursions. The reason is, I only

need to count them once. After the first time

I do it, it’s free. So I count how many different

subproblems do I need to do? These are they going to be

the expensive recursions where I do work, I do

some amount of work, but I don’t count the

recursions because otherwise I’d be double counting. I only want to count

each subproblem once, and then this will solve it. So a simple idea. In general, dynamic programming

is a super simple idea. It’s nothing fancy. It’s basically just memoization. There is one extra trick

we’re going to pull out, but that’s the idea. All right. Let me tell you

another perspective. This is the one maybe

most commonly taught. Is to think of– but I’m

not a particular fan of it. I really like memoization. I think it’s a simple idea. And as long as you

remember this formula here, it’s really easy to work with. But some people like to

think of it this way. And so you can pick whichever

way you find most intuitive. Instead of thinking of a

recursive algorithm, which in some sense starts at the

top of what you want to solve and works its way down,

you could do the reverse. You could start at the

bottom and work your way up. And this is probably

how you normally think about computing

Fibonacci numbers or how you learned it before. I’m going to write it

in a slightly funny way. The point I want to make

is that the transformation I’m doing from the naive

recursive algorithm, to the memoized algorithm,

to the bottom-up algorithm is completely automated. I’m not thinking,

I’m just doing. OK. It’s easy. This code is exactly

the same as this code and as that code, except

I replaced n by k. Just because I needed a couple

of different n values here. Or I want to iterate

over n values. And then there’s this

stuff around that code which is just formulaic. A little bit of thought

goes into this for loop, but that’s it. OK. This does exactly the same

thing as the memoized algorithm. Maybe it takes a

little bit of thinking to realize, if you unroll all

the recursion that’s happening here and just write

it out sequentially, this is exactly

what’s happening. This code does exactly the

same additions, exactly the same computations as this. The only difference

is how you get there. Here we’re using a loop,

here we’re using recursion. But the same things

happen in the same order. It’s really no difference

between the code. This code’s probably going

to be more efficient practice because you don’t make

function calls so much. In fact I made a

little mistake here. This is not a

function call, it’s just a lookup into a table. Here I’m using a hash

table to be simple, but of course you

could use an array. But they’re both constant

time with good hashing. All right. So is it clear

what this is doing? I think so. I think I made a little typo. So we have to compute–

oh, another typo. We have to compute f1 up to

fn, which in python is that. And we compute it

exactly how we used to. Except now, instead

of recursing, I know that when I’m computing

the k Fibonacci number– man. So many typos. AUDIENCE: [LAUGHTER] PROFESSOR: You

guys are laughing. When I compute the

kth Fibonacci number I know that I’ve already

computed the previous two. Why? Because I’m doing them

in increasing order. Nothing fancy. Then I can just do

this and the solutions will just be waiting there. If they work, I’d

get a key error. So I’d know that there’s a bug. But in fact, I won’t

get a key error. I will have always computed

these things already. Then I store it in my table. Then I iterate. Eventually I’ve solved all the

subproblems, f1 through fn. And the one I cared

about was the nth one. OK. So straightforward. I do this because

I don’t really want to have to go through

this transformation for every single problem we do. I’m doing it in Fibonacci

because it’s super easy to write the code

out explicitly. But you can do it for all

of the dynamic programs that we cover in the

next four lectures. OK. I’m going to give you

now the general case. This was the special

Fibonacci version. In general, the bottom-up does

exactly the same computation as the memoized version. And what we’re doing is

actually a topological sort of the subproblem

dependency DAG. So in this case, the

dependency DAG is very simple. In order to compute–

I’ll do it backwards. In order to compute fn,

I need to know fn minus 1 and fn minus 2. If I know those

I can compute fn. Then there’s fn

minus 3, which is necessary to compute this

one, and that one, and so on. So you see what

this DAG looks like. Now, I’ve drawn

it conveniently so all the edges go left to right. So this is a topological

order from left to right. And so I just need to do

f1, f2, up to fn in order. Usually it’s totally

obvious what order to solve the subproblems in. But in general, what

you should have in mind is that we are doing

a topological sort. Here we just did it in our

heads because it’s so easy. And usually it’s so easy. It’s just a for loop. Nothing fancy. All right. I’m missing an arrow. All right. Let’s do something a little

more interesting, shall we? All right. One thing you can do from

this bottom-up perspective is you can save space. Storage space in the algorithm. We don’t usually worry

about space in this class, but it matters in reality. So here we’re

building a table size, n, but in fact we

really only need to remember the last two values. So you could just store

the last two values, and each time you make a

new one delete the oldest. so by thinking a

little bit here you realize you only

need constant space. Still linear time,

but constant space. And that’s often the case. From the bottom-up

perspective you see what you really

need to store, what you need to keep track of. All right. I guess another nice thing

about this perspective is, the running time

is totally obvious. This is clearly constant time. So this is clearly linear time. Whereas, in this

memoized algorithm you have to think

about, when’s it going to be memoized,

when is it not? I still like this perspective

because, with this rule, just multiply a

number of subproblems by time per subproblem,

you get the answer. But it’s a little less

obvious than code like this. So choose however you

like to think about it. All right. We move onto shortest paths. So I’m again, as usual, thinking

about single-source shortest paths. So we want to compute the

shortest pathway from s to v for all v. OK. I’d like to write this initially

as a naive recursive algorithm, which I can then memoize,

which I can then bottom-upify. I just made that up. So how could I write this as

a naive recursive algorithm? It’s not so obvious. But first I’m going to tell you

how, just as an oracle tells you, here’s what you should do. But then we’re going to think

about– go back, step back. Actually, it’s up to you. I could tell you

the answer and then we could figure out

how we got there, or we could just

figure out the answer. Preferences? Figure it out. All right. Good. No divine inspiration allowed. So let me give you a tool. The tool is guessing. This may sound silly, but

it’s a very powerful tool. The general idea is, suppose

you don’t know something but you’d like to know it. So what’s the answer

to this question? I don’t know. Man, I really want a cushion. How am I going to

answer the question? Guess. OK? AUDIENCE: [LAUGHTER] PROFESSOR: It’s a

tried and tested method for solving any problem. I’m kind of belaboring

the point here. The algorithmic concept is,

don’t just try any guess. Try them all. OK? AUDIENCE: [LAUGHTER] PROFESSOR: Also pretty simple. I said dynamic

programming was simple. OK. Try all guesses. This is central to the

dynamic programming. I know it sounds obvious, but if

I want to fix my equation here, dynamic programming is roughly

recursion plus memoization. This should really

be, plus guessing. Memoization, which is obvious,

guessing which is obvious, are the central concepts

to dynamic programming. I’m trying to make it sound

easy because usually people have trouble with

dynamic programming. It is easy. Try all the guesses. That’s something a

computer can do great. This is the brute force part. OK. But we’re going to

do it carefully. Not that carefully. I mean, we’re just

trying all the guesses. Take the best one. That’s kind of important

that we can choose one to be called best. That’s why dynamic

programming is good for optimization problems. You want to maximize

something, minimize something, you try them all and then you

can forget about all of them and just reduce it

down to one thing which is the best one, or a best one. OK. So now I want you

to try to apply this principle to

shortest paths. Now I’m going to draw a

picture which may help. We have the source, s,

we have some vertex, v. We’d like to

find the shortest– a shortest path from s to v. Suppose I want to know

what this shortest path is. Suppose this was it. You have an idea already? Yeah. AUDIENCE: What you could do is

you could look at everywhere you can go from s. [INAUDIBLE] shortest path

of each of those notes. PROFESSOR: Good. So I can look at all the

places I could go from s, and then look at the shortest

paths from there to v. So we could call this s prime. So here’s the idea. There’s some hypothetical

shortest path. I don’t know where

it goes first, so I will guess

where it goes first. I know the first

edge must be one of the outgoing edges from s. I don’t know which one. Try them all. Very simple idea. Then from each of

those, if somehow I can compute the shortest

path from there to v, just do that and

take the best choice for what that first edge was. So this would be the

guess first edge approach. It’s a very good idea. Not quite the one I wanted

because unfortunately that changes s. And so this would

work, it would just be slightly less

efficient if I’m solving single-source

shortest paths. So I’m going to tweak

that idea slightly by guessing the last edge

instead of the first edge. They’re really equivalent. If I was doing this

I’d essentially be solving a single-target

shortest paths, which we talked about before. So I’m going to draw

the same picture. I want to get to v. I’m

going to guess the last edge, call it uv. I know it’s one of the incoming

edges to v– unless s equals v, then there’s a special case. As long as this path has

length of at least 1, there’s some last edge. What is it? I don’t know. Guess. Guess all the possible

incoming edges to v, and then recursively compute the

shortest path from s to u. And then add on the edge v. OK. So what is this shortest path? It’s delta of s comma

u, which looks the same. It’s another subproblem

that I want to solve. There’s v subproblems

here I care about. . So that’s good. I take that. I add on the weight

of the edge uv. And that should hopefully

give me delta of s comma v. Well, if I was lucky and I

guessed the right choice of u. In reality, I’m not lucky. So I have to minimize

over all edges uv. So this is the–

we’re minimizing over the choice of u. V is already given here. So I take the minimum over

all edges of the shortest path from s to u, plus

the weight of the edge uv. That should give me the shortest

path because this gave me the shortest path from s to u. Then I added on the edge

I need to get there. And wherever the shortest path

is, it uses some last edge, uv. There’s got to be some choice

of u that is the right one. That’s the good guess

that we’re hoping for. We don’t know what

the good guess is so we just try them all. But whatever it is, this will

be the weight of that path. It’s going to take

the best path from s to u because sub

paths are shortest paths are shortest paths. Optimal substructure. So this part will

be delta of su. This part is obviously w of uv. So this will give

the right answer. Hopefully. OK. It’s certainly

going to– I mean, this is the analog of the

naive recursive algorithm for Fibonacci. So it’s not going to be

efficient if I– I mean, this is an algorithm, right? You could say– this

is a recursive call. We’re going to treat this

as recursive call instead of just a definition. Then this is a

recursive algorithm. How good or bad is this

recursive algorithm? AUDIENCE: Terrible. PROFESSOR: Terrible. Very good. Very bad, I should say. It’s definitely going

to be exponential without memoization. But we know. We know how to make

algorithms better. We memoize. OK. So I think you know how to write

this as a memoized algorithm. To define the function delta

of sv, you first check, is s comma v in the memo table? If so return that value. Otherwise, do this computation

where this is a recursive call and then stored it

in the memo table. OK. I don’t think I need

to write that down. It’s just like the

memoized code over there. Just there’s now two

arguments instead of one. In fact, s isn’t changing. So I only need to store

with v instead of s comma v. Is that a good algorithm? I claim memoization

makes everything faster. Is that a fast algorithm? Not so obvious, I guess. Yes? How many people think, yes,

that’s a good algorithm? AUDIENCE: Better. PROFESSOR: Better. Definitely better. Can’t be worse. How many people think it’s

a bad algorithm still? OK. So three for yes, zero for no. How many people aren’t sure? Including the yes votes? Good. All right. It’s not so tricky. Let me draw you a graph. Something like that. So we wanted to commit

delta of s comma v. Let me give these

guys names, a and b. So we compute delta of

s comma v. To compute that we need to know delta

of s comma a and delta of s comma v. All right? Those are the two ways– sorry,

actually we just need one. Only one incoming edge to v.

So its delta of s comma a. Sorry– I should have

put a base case here too. Delta of s comma s equals 0. OK. Delta of s comma

a plus the edge. OK. There is some

shortest path to a. To compute the

shortest path to a we look at all the

incoming edges to a. There’s only one. So delta of s comma b. Now I want to compute the

shortest paths from b. Well, there’s two

ways to get to b. One of them is delta of s

comma b– sorry, s comma s. Came from s. The other way is delta of s

comma v. Do you see a problem? Yeah. Delta of s comma v is what

we were trying to figure out. Now you might say, oh,

it’s OK because we’re going to memoize our

answer to delta s comma v and then we can reuse it here. Except, we haven’t finished

computing delta of s comma v. We can only put it in

the memo table once we’re done. So when this call happens the

memo table has not been set. And we’re going to

do the same thing over and over and over again. This is an infinite algorithm. Oops. Not so hot. So it’s going to be infinite

time on graphs with cycles. OK. For DAGs, for acyclic graphs, it

actually runs in v plus e time. This is the good case. In this situation we

can use this formula. The time is equal to the

number of subproblems times the time per subproblem. So I guess we have to think

about that a little bit. Where’s my code? Here’s my code. Number of subproblems

is v. There’s v different subproblems

that I’m using here. I’m always reusing

subproblems of the form delta s comma something. The something could be

any of the v vertices. How much time do I

spend per subproblem? That’s a little tricky. It’s the number

of incoming edges to v. So time for a

sub problem delta of sv is the indegree of v. The

number of incoming edges to v. So this depends on

v. So I can’t just take a straightforward

product here. What this is really

saying is, you should sum up over

all sub problems of the time per sub problem. So total time is the sum over

all v and v, the indegree of v. And we know this

is number of edges. It’s really– so indegree

plus 1, indegree plus 1. So this is v plus v. OK. Handshaking again. OK. Now we already knew an algorithm

for shortest paths and DAGs. And it ran a v plus e time. So it’s another way

to do the same thing. If you think about

it long enough, this algorithm

memoized, is essentially doing a depth first search

to do a topological sort to run one round

of Bellman-Ford. So we had topological sort

plus one round of Bellman-Ford. This is kind of it

all rolled into one. This should look kind of like

the Bellman Ford relaxation step, or shortest

paths relaxation step. It is. This min is really

doing the same thing. So it’s really the

same algorithm. But we come at it from

a different perspective. OK. But I claim I can use

this same approach to solve shortest paths in

general graphs, even when they have cycles. How am I going to do that? DAGs seem fine– oh, what

was the lesson learned here? Lesson learned is that

subproblem dependencies should be acyclic. Otherwise, we get an

infinite algorithm. For memoization to work

this is what you need. It’s all you need. OK. We’ve almost seen this already. Because I said that, to

do a bottom up algorithm you do a topological sort of

this subproblem dependency DAG. I already said it

should be acyclic. OK. We just forgot. I didn’t tell you yet. So for that to work

it better be acyclic. For DP to work, for memoization

to work, it better be acyclic. If you’re acyclic then

this is the running time. So that’s all general. OK. So somehow I need to

take a cyclic graph and make it acyclic. We’ve actually done this

already in recitation. So if I have a

graph– let’s take a very simple cyclic graph. OK. One thing I could do is explode

it into multiple layers. We did this on quiz

two in various forms. It’s like the only cool thing

you can do with shortest paths, I feel like. If you want to make a

shortest path problem harder, require that you reduce your

graph to k copies of the graph. I’m going to do it

in a particular way here– which I think you’ve

seen in recitation– which is to think of this axis as

time, or however you want, and make all of the

edges go from each layer to the next layer. This should be a

familiar technique. So the idea is,

every time I follow an edge I go down

to the next layer. This makes any graph acyclic. Done. What in the world

does this mean? What is it doing? What does it mean? Double rainbow. All right. AUDIENCE: [LAUGHTER] PROFESSOR: So– I

don’t know how I’ve gone so long in the

semester without referring to double rainbow. It used to be my favorite. All right. So here’s what it means. Delta sub k of sv. I’m going to define

this first– this is a new kind of

subproblem– which is, what is the shortest– what

is the weight of the shortest s to v path that uses,

at most, k edges. So I want it to be shortest

in terms of total weight, but I also want it to

use few edges total. So this is going to be 0. In some sense, if you

look at– so here’s s and I’m always going

to make s this. And then this is going to

be v in the zero situation. This is going to be v

in the one situation, v– so if I look at this

v, I look at the shortest path from s to v, that

is delta sub 0 of sv. So maybe I’ll call this v

sub 0, v sub 1, v sub 2. OK. Shortest path from

here to here is, there’s no way to

get there on 0 edges. Shortest path from

here to here, that is the best way to get there

with, at most, one edge. Shortest path from

here to here– well, if I add some

vertical edges too, I guess, cheating a little bit. Then this is the best

way to get from s to v using at most two edges. And then you get

a recurrence which is the min over all last edges. So I’m just copying

that recurrence, but realizing that the s to

u part uses one fewer edge. And then I use the edge uv. OK. That’s our new recurrence. By adding this k

parameter I’ve made this recurrence on

subproblems acyclic. Unfortunately, I’ve increased

the number of subproblems. The number of subproblems

now is v squared. Technically, v times v minus 1. Because I really–

actually, v squared. Sorry. I start at 0. And what I care about, my goal,

is delta sub v minus 1 of sv. Because by

Bellman-Ford analysis I know that I only care about

simple paths, paths of length at most v minus 1. I’m assuming here no

negative weight cycles. I should’ve said that earlier. If you assume that, then

this is what I care about. So k ranges from 0 to v minus 1. So there are v choices for k. There are v choices for v.

So the number of subproblems is v squared. How much time do I

spend per subproblem? Well, the same as before. The indegree– where

did I write it? Up here– the indegree

of that problem. So what I’m really

doing is summing over all v of the indegree. And then I multiply it by

v. So the running time, total running time is ve. Sound familiar? This is Bellman-Ford’s

algorithm again. And this is actually where

Bellman-Ford algorithm came from is this view

on dynamic programming. So we’re seeing yet another

way to do Bellman-Ford. It may seem familiar. But in the next

three lectures we’re going to see a whole

bunch of problems that can succumb to

the same approach. And that’s super cool.

how come they have so many boards?

Awesome lectures – learning heaps.

there are a lot of comments, not there is…

there are plus plural, there is plus single

00:12

erik you are witty bro

That's a huge chalk

0:00~31:00 DP 설명.

– 벨만 포드 알고리즘 만든 벨만이 다이나믹 프로그래밍이라는 이름을 창시했다. 벨만이 다이나믹 프로그래밍이라고 이름 지은 이유는 별 뜻 없고 '그냥 멋있어 보여서'다.

– 하나의 문제를 완전탐색으로 재귀적으로 풀 때 그 문제의 부분문제(Subproblems)의 답을 재활용하는 기법이 메모아이제이션(Memoization)이다. 재귀로 문제를 풀면 exponential 시간이 소요되는데, 한 번 풀었던 문제의 답을 메모로 적어 놓고 다시 필요할 때 반환함으로써 그 문제 풀이는 O(1) 시간밖에 걸리지 않는다.

– DP에는 재귀적인 구현인 하향식 접근(Top-down approach) 외에도 반복문을 이용한 상향식 접근(Bottom-up approach) 방법이 있다. 상향식은 함수 호출을 적게 하니까 스택을 적게 먹는 장점이 있겠지.

– 시간복잡도: 부분 문제의 갯수 * 부분 문제를 푸는 데에 걸리는 시간(메모 해 놔서 재귀 안 걸리는 데는 O(1)임)

———-

31:00 이후 최단 경로 설명.

– 오랫동안 잊고 있었던 그래프 이론이 등장함.

/

– 정점(vertex, node), 간선(edge, link, line)

– 차수(degree): 어떤 정점의 간선의 수.

/

– DAG (Directed Acyclic Graph): 비순환 유향(방향) 그래프

– 내차수(indegree): 방향 그래프에서 어떤 정점으로 들어오는 간선(incoming edge)의 수.

– 외차수(outdegree): 방향 그래프에서 어떤 정점에서 나가는 간선(outgoing edge)의 수.

– 위상 정렬(Topological sorting): 선수과목 구조처럼 보이게 DAG을 왼쪽에서 오른쪽으로 한 방향으로 죽 늘어 놓는 것.

– 출발점에서 나가는 간선을 기준으로 부분문제를 만들 수도 있고 목적지로 들어오는 간선을 기준으로 부분문제를 만들 수도 있음

– 사이클이 있는 그래프에서 시간복잡도는 무한대임.

– DAG의 경우 시간복잡도: O(V+E)

– DP 쓰려면 부분 문제는 비순환이어야 한다.

what a nut

20:00 what about searching the created dictionary? That will take asymptotically n operations, so the result should still be exponential.

MIND BLOWN when he showed the recursion tree, that was just dope

so what's the difference between dynamic programming to neural network machine learning?

@ time of 10.15 you reduced t(n-1) to t(n-2) that will give us the lower bound is it possible to increase t(n-2) to t(n-1)..for solving the same recurrence relation …

excellent presentation – thxxxx!!!!

I’m still a little confused as to how the cyclic graph algorithm works as well as the complexity for it. Could someone clear this up for me?

You are awesome bro!!And so is your teaching.

I'm just curious why it's not memoRization 🙂

Is it just me or the instructor sounds a lot like Sheldon cooper 🙂

curious … what was that reference to the double rainbow 🙂

Im a Mechanical Engineer and i dont why im here.

👍👍👍👍👍👍👍

At 28:48: Fn-1 should have "in-edge" from Fn-2.. Isn't it?

Do they teach this in College?? We in India study these in High School only. Not in school but as hobby. BTW Great Lecture.

I believe the lookup in the memo is not completely free: it could take `O(log(n))` if the data-structure used is a map/BST/etc . HashMap or suchlikes makes things better though but that should explicitly be pointed out as well.

Really…did that girl who came in late have to walk in front of everyone?! If you are late then you should go ALL THE WAY IN THE BACK

Why would it take constant time if n is in memo? Wouldn’t you have to search through the memo to look for n?

Even with a passionate professor, intro to algorithms is still the most boring and useless (for most) class ever lmao

One mistake about fibonacci. The best algorithm to compute fib(n) is constant time using the explicit formula ((1+sqrt5)^n-(1-sqrt5)^n)/(2^n * sqrt5). Of course, you would prefer something like Mathematica that can do symbolic algebra like sqrt5*sqrt5=5 rather than converting first to floating point and doing estimations.

Great teaching! I had an algorithmic challenge this year, the solution was dynamic programming.

Might anyone be able to explain his step about completing the running time of O(V + E). He states that the total time is the sum over all sub problems of the time per subproblem. He writes the total time = Sum of v in V for indegree(v) = O(E). That part makes sense to me. What I don't understand is when he says "+ 1" , that this evaluates to O(E + V). What does he mean by "+ 1". Is the "+1" supposed to be for a single vertex, so summing over all vertices leads to "+1" v times, or V?

I understand that the time per subproblem is indegree(v), or the number of incoming edges to V. But why does he then change that to indegree(v) + 1? What is the "+ 1" for?

Thank you guys. I am doing my masters though I couldn't get in MIT, you give me exposure to top class teaching materials.

What a great person. He is making a very complicated topic at least a bit understandable for me 🙂

I love blackboards and big chunks of chalk. So much better than whiteboards, or worse: powerpoint on a screen.

This is great courses ! v :0) my 2 cent on Fib tho, is a bit bad and good example, bad maybe because it can be easily seen has a linear recursive, so no really needs for special memoise structure to solve we can use an array or the recursive stack to hold that linear memoise , but this is not the case generally for trees or a graphes where memoise often go side by side with the algo . Fib is good because we can make the transition from recursive to iterative went seeing that recursion (in a really world stack an recursion are not infinit (cpu stacks are usually really optimise tho) ). So study you problem first and dont create special structure when none a really need. here my exemple where I move the memoise on the stack.

def fib(n):

def f(n1):

if n1 == 1:

return((1,0))

else:

nn1,nn2 = f(n1-1)

return((nn1+nn2, nn1))

if n == 0: return(0)

r,x = f(n)

return(r)

for x in range(0, 11):

y = fib(x)

this man looks like computer science

I wish I were ln(smart) as this guy

Glitches

21:16 Please edit that. To many "typos", i spent mucht time to pause that video, just to realize that it's an error in the video.

I wish I could have Professor like him in my bachelor's and master's .

Lessons Are Very Hard But I Really Wanna Learn More Please Send More Info To My Email Address [email protected]

I need a lecture about time complexity of programs.

Lost. Im so lost.

Maybe I misheard my professor, but I always thought he was saying memorization

i got my senpai

2:53 Wow, what is wrong with her face!!

Did you mention "Bottom-up DP" by chance ? Better to google up that term to learn more about it…

…

…

…

Oh, no!!

Instead of the prof writing on the blackboard, wouldn't it save lots of time to just project his notes or give out a pdf before class.

Dynamic programming the my favrourite thing in the world…

in algorithms.

Bro u shit on bellman principel

Wonderful lecture, Thanks for explaining the term.

Is it just on my broswer that this lecture title and description are in Chinese?!!!! I feel weird because all other lecture titles in the playlist are in English.

Didn't get the part after '"Let's start a brand new thing called dynamic programming, yah……"

Im just here for the sound of the chalkboard :- .

missing when lectures were chalk and blackboard only,

Why is it called dynamic programming when it is basically divide and conquer, and in the case of using recursion, it is basically cached recursion? The term dynamic is generally used more for something that is not fixed such as dynamic routing when you drive to work. You do not insist on driving the same exact route, you instead decide which route to take based on traffic patterns, construction… as you are driving. In contrast, a static algorithm would be something "hardwired" that is not flexible such as a fixed sorting algorithm that doesn't "care" what the input is, it just sorts it using the same method each time. Computer Science (in my opinion) uses some "whacked out" (inaccurate) terms.

26:38, those aren't typos they are writos.

First of all, other than for illustrative purposes, why the hell would you ever use recursion to solve Fibonacci numbers when a simple iterative algorithm can be used?

Secondly, trying all of the possible guesses is VERY bad for some problems cuz the number of possible guesses is astronomical.

43:09 "Infinite algorithm" is an improper term since part the definition of an algorithm is that is must terminate. Very bad use of terminology.

Easy solution to shortest path problem… stay the "F" home. (don't go anywhere). There is nothing shorter than nowhere.

I'm an old-time programmer who learned his trade in the trenches, not at school, and all this seems unnecessarily complicated to me. Recursion is conceptually elegant, but inefficient if for no other reason that it requires the overhead of a function call to do what could be done in a simple loop using local variables. The "memoization" idea seems to involve creating a global variable as a way of sneaking local variables past the thought police.

When it comes down to it, everything boils down to machine code, and the most efficient algorithm is the one that requires the fewest instruction cycles. We weren't necessarily worse programmers than the computer scientists, but we had to contend with real-world limitations. Primitive equipment required primitive methods.

The first computer I worked with had 4000 6-bit characters of core storage (aka RAM), instruction timings in milliseconds, and no OS. A good programmer was one who could squeeze every ounce of performance out of that beast, and since the biggest program you could write wasn't much more than 1000 lines, if that, you didn't have to worry about spaghettification, memory management and all the rest of it. We evolved as the equipment evolved, and we came up with methods that the computer scientists might have learned from. Now you have code that struggles to get by with a mere 8Gb of RAM and runs slowly on a 3GHz CPU.

I understood only 50% of the stuff.

This topic starts at 31:20

It's too good to be true

Hopefully is not a word. mr. wizard.

at 10:44,How did he came with power of 2 is n/2? Can someone explain?

I googled the word " nerd" this guys picture showed up….. no shit, understandably, I must be a nerd to actually understand what hes teaching… great style I might add.

Level: Asian

Now, on top of that we just need to build a model using TensorFlow and watch how it will do the job for us

god bless OpenCourseWare

A program written in python based on video content. https://github.com/worksking/algrithm/blob/master/dynamic_program_fibonacci.py

Awesome. Thank you.

If you are scrolling down the comments have a good day and like this comment

You kept on scrolling anyway, didn't you? 😉

he wrote algorithm and everything without reading notes

Can someone explain why you do shortest path backwards and not forward? Isn't it the same?

Much love to you dude. skooter you way too the tutor shooter looter

It was funny when he keeps correcting the typos in bottom up approach at around 26:00

In the last part, if total distinct sub problems are |V|^2 and for each sub problem we are spending O(indegree) time, then how can total overall time complexity be O(VE) ?

Great lecture! Thank you for making this topic accessible – in all senses of the word

I don’t see how the DAG shortest path recurrence performs a topological sort. Don’t you need to perform a topological sort on the DAG separately, and THEN do the recurrence?

Powerful, DP technique.

DP = "careful brute force".

This is why I never mind the screams: I know I'm doing it right.

I wish my prof taught my algo class was this excited about any of the algos, instead of sitting there hating life

Professor Demaine,

I really enjoy your confident lectures.

Instead of making the algorithm O(VE), we can have a bool array that stores that this point is being traversed upon/being used to calculate a point already, and then use our analysis that number of edges must reduce in each step. So we basically just skip if we come back to the same edge any time. Does this algorithm work?

Great lecture!

Can someone explain this to me?

He calculated the time taken by each subproblem (In shortest path) is O(|V| + |E|), right.

Now he redefined the the problem by giving the "k" parameter to solve the cyclic problem, which gave him #subproblems = O(|V|.|V|), one for k and other for v.

So how come when he multiplied #sub-problems x time / subproblem, the answer become O(|V||E|)??

I love watching these videos while taking a dump

youtube facebook google windows theoretical computer science

epr paradox and discrete mathematics graph theory: vertex edge,…etc.

This teacher is amazing!! I loved it!

K, so here is one question/concept that seems to arise from this lecture. If you can traverse an expansive abstract DAG from the bottom up (base case/root to solution), then you have a nice linear solution without memoization that you probably never really needed dynamic programming for. If you can't, you try to recurse backwards to create a similar linearity. Recursing backwards while memoizing repeated computations in a hypothetical call tree, i.e. utilizing optimal substructure to create this effective linearity while consuming linear memory.

My question would be: is there a notion of forward, reverse or both optimal substructure? Let's say a sorted list would seem to have both. You can easily follow it from either end to the other. This is a hard concept to express as a question. But if the best solution can be found from guessing from the first node forward or from the last node backwards, what makes one more efficient than the other?

26:10 "So is this clear what this is doing? I think so."

[Then corrects the 2nd, 3rd, 4th and 5th typos that prevented comprehension. 🙂 ]

I watched this guys couple of old lectures quite sometime back and he made me fall in love with dynamic programming. Love this guy.

Fibonacci – it looks like fucking around. For n>1 the Fn is:

a=0, b=1

for i=2 to n

Fn = a+b

a=b

b=Fn

end

Better use your brain and something like Knuth's "The Art of Computer Programming".

love your tshirts erik

I love that he writes all of his code in python

Number of Blackboards, TLE problem :/

this made pokemon go work haha