This article is rated C-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||||||||
|
Daily pageviews of this article
A graph should have been displayed here but
graphs are temporarily disabled. Until they are enabled again, visit the interactive graph at
pageviews.wmcloud.org |
Shuroo ( talk) 09:19, 29 May 2009 (UTC) This explanation is too specific. We need a general explanation of the Bellman equation.
I can't find the "principle of optimality" anywhere in Bellman's 1952 paper. Perhaps this should be his 1957 book. —Preceding unsigned comment added by 129.100.144.166 ( talk) 17:01, 11 December 2007 (UTC)
Bellman's definition from Dynamic Programming, 1957 (Dover 2003 edition, p. 83):
-- Rinconsoleao ( talk) 19:57, 25 May 2009 (UTC)
The equation given in the article is wrong, or at least not true in general. The right equation is
where T(x,y) is the transformation of the state given a state x and an action y.
Shuroo —Preceding unsigned comment added by Shuroo ( talk • contribs) 20:27, 27 May 2009 (UTC)
That depends on how we are defining the notation. The way the problem was set up (I didn't write it myself) defines transitions by stating that the feasible future states are . So the 'transformation of the state' you mention is captured by that notation. Of course, I don't mind moving to clearer notation if others prefer. -- Rinconsoleao ( talk) 11:04, 28 May 2009 (UTC)
By the way, the notation appears to come from Stokey, Lucas, Prescott (1989), Chapter 4. -- Rinconsoleao ( talk) 11:07, 28 May 2009 (UTC)
The notation I have suggested is easier to understand and much more practical for dynamic programming problems with finite horizon. (It also better matches with the article in Vikipedia on dynamic programming.) —Preceding unsigned comment added by Shuroo ( talk • contribs) 14:28, 28 May 2009 (UTC)
It seems as though the statement of the Bellman equation for general would be more clearly derived from stating the recursive definition for general rather than for specifically. Then the idea of finding a fixed point for the contractive mapping (i.e., the Bellman equation) is much more easily elucidated to the reader (or happened upon by the reader). Bashfuloctopus ( talk) 06:26, 18 January 2017 (UTC)
Yet another argument for my notation is that it can be easily extended to stochastic problems (e.g., inventory management). To obtain the stochastic form of the equation we simply add E (expectation) before the second term since the new state is a random variable.
where T(x,y) is the transformation of the state given a state x and an action y. Shuroo ( talk) 09:21, 29 May 2009 (UTC)shuroo
I agree that the form you are proposing is probably easier to understand. Go ahead and change the notation if you want. (Notice that you will need to change the notation on the original sequence problem too, for consistency with the form of the Bellman equation you are proposing.) I'll keep an eye open for errors. -- Rinconsoleao ( talk) 10:21, 29 May 2009 (UTC)
Done Shuroo ( talk) 08:51, 30 May 2009 (UTC)
I'm sorry, I thought your notation for the stochastic case would show more explicitly how the next state is determined. The notation T(x,y) strikes me as misleading, because it looks like a function, which suggests that it should take a unique value. A rigorous way to define it would be to write a c.d.f. of the form representing the probability distribution of possible states conditional on the current state and action . Alternatively, we could write something like , where is a mean-zero shock, but that is more restrictive. -- Rinconsoleao ( talk) 13:47, 1 June 2009 (UTC)
What I wrote was a simple but valid formula that could explain the example next section. (I used T for the needed random variable but indicated it was a random variable , not a function.) It seems to me that adding too much details would make the article much less attractive because in order to introduce conditional distributions properly you need to add some heavy machinery.
Pity that you erased it. Shuroo ( talk) 18:47, 1 June 2009 (UTC)
By the way, the current article uses the version that you erased, in the example, but in an implicit way. Surely it is better to write it explicitly. Shuroo ( talk) 19:32, 1 June 2009 (UTC)
Now I've added a lot more discussion, and a stochastic example with (I think) rigorous notation. Let me know if you find errors.-- Rinconsoleao ( talk) 18:16, 4 June 2009 (UTC)
If you want to do the stochastic part proporly better write it in terms of a Markov decision process. Shuroo ( talk) 10:13, 5 June 2009 (UTC)
I wrote it this way because the G(y|x,a) notation is actually very close to your y=T(x,a) notation, except that it makes explicit the fact that we are talking about a random variable. And it allowed me to write a very similar Bellman equation for the stochastic and deterministic cases (so the stochastic subsection ends up very short and simple without being incorrect). But we could certainly include a link to MDP too.-- Rinconsoleao ( talk) 11:03, 6 June 2009 (UTC)
In any case, if you want the article to be rigorous,at least to some extent, you certainly need to state the Markovian property of the system. Shuroo ( talk) 18:57, 6 June 2009 (UTC)
"It breaks a dynamic optimization problem into simpler subproblems, writing the value of a decision problem at a certain point in time in terms of the payoff from some initial choices and the value of the remaining decision problem that results from those initial choices, as Bellman's Principle of Optimality prescribes." this sentance is VERY long and obfuscated ; it may deserve a reformulation dont you think —Preceding unsigned comment added by 124.180.239.182 ( talk) 09:35, 20 July 2010 (UTC)
As I understand it, the mathematical expression given under the Bellman's Principle of Optimality section is incorrect. The claim is made that it is the same as (the RHS of, presumably) the previous equation, but I don't think that's true -- as it stands, it is actually a description of a greedy solution, not a solution involving the principle of optimal substructure. The reason being that the first term greedily selects the action at time 0 giving maximum payoff, when in fact a suboptimal action at this time may maximise the value overall by bringing the system into a different state at time 1 whose value (maximum possible discounted sum of payoffs), when added to this (suboptimal) time-0 payoff, is greater. Not sure what the correct expression would be I'm afraid. 130.123.96.22 ( talk) 03:18, 15 February 2011 (UTC)
I think the term "value function" deserves an article alone! Luizabpr ( talk) 19:07, 21 May 2011 (UTC)
There is something wrong with the equation defining the conditional probability, but am not quite sure of the fix. Currently it says:
But this can't be right, since is a state, and so in general, the idea of is impossible to define. Perhaps the following was intended:
??? I dunno, please fix. linas ( talk) 17:00, 28 June 2012 (UTC)
Hello fellow Wikipedians,
I have just modified one external link on Bellman equation. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at {{
Sourcecheck}}
).
This message was posted before February 2018.
After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than
regular verification using the archive tool instructions below. Editors
have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the
RfC before doing mass systematic removals. This message is updated dynamically through the template {{
source check}}
(last update: 5 June 2024).
Cheers.— InternetArchiveBot ( Report bug) 11:00, 30 October 2016 (UTC)
Hello fellow Wikipedians,
I have just modified one external link on Bellman equation. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.
This message was posted before February 2018.
After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than
regular verification using the archive tool instructions below. Editors
have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the
RfC before doing mass systematic removals. This message is updated dynamically through the template {{
source check}}
(last update: 5 June 2024).
Cheers.— InternetArchiveBot ( Report bug) 15:19, 17 July 2017 (UTC)
Isn't that basically the main application of Bellman equation? 98.156.185.48 ( talk) 02:06, 29 September 2023 (UTC)
This article is rated C-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||||||||
|
Daily pageviews of this article
A graph should have been displayed here but
graphs are temporarily disabled. Until they are enabled again, visit the interactive graph at
pageviews.wmcloud.org |
Shuroo ( talk) 09:19, 29 May 2009 (UTC) This explanation is too specific. We need a general explanation of the Bellman equation.
I can't find the "principle of optimality" anywhere in Bellman's 1952 paper. Perhaps this should be his 1957 book. —Preceding unsigned comment added by 129.100.144.166 ( talk) 17:01, 11 December 2007 (UTC)
Bellman's definition from Dynamic Programming, 1957 (Dover 2003 edition, p. 83):
-- Rinconsoleao ( talk) 19:57, 25 May 2009 (UTC)
The equation given in the article is wrong, or at least not true in general. The right equation is
where T(x,y) is the transformation of the state given a state x and an action y.
Shuroo —Preceding unsigned comment added by Shuroo ( talk • contribs) 20:27, 27 May 2009 (UTC)
That depends on how we are defining the notation. The way the problem was set up (I didn't write it myself) defines transitions by stating that the feasible future states are . So the 'transformation of the state' you mention is captured by that notation. Of course, I don't mind moving to clearer notation if others prefer. -- Rinconsoleao ( talk) 11:04, 28 May 2009 (UTC)
By the way, the notation appears to come from Stokey, Lucas, Prescott (1989), Chapter 4. -- Rinconsoleao ( talk) 11:07, 28 May 2009 (UTC)
The notation I have suggested is easier to understand and much more practical for dynamic programming problems with finite horizon. (It also better matches with the article in Vikipedia on dynamic programming.) —Preceding unsigned comment added by Shuroo ( talk • contribs) 14:28, 28 May 2009 (UTC)
It seems as though the statement of the Bellman equation for general would be more clearly derived from stating the recursive definition for general rather than for specifically. Then the idea of finding a fixed point for the contractive mapping (i.e., the Bellman equation) is much more easily elucidated to the reader (or happened upon by the reader). Bashfuloctopus ( talk) 06:26, 18 January 2017 (UTC)
Yet another argument for my notation is that it can be easily extended to stochastic problems (e.g., inventory management). To obtain the stochastic form of the equation we simply add E (expectation) before the second term since the new state is a random variable.
where T(x,y) is the transformation of the state given a state x and an action y. Shuroo ( talk) 09:21, 29 May 2009 (UTC)shuroo
I agree that the form you are proposing is probably easier to understand. Go ahead and change the notation if you want. (Notice that you will need to change the notation on the original sequence problem too, for consistency with the form of the Bellman equation you are proposing.) I'll keep an eye open for errors. -- Rinconsoleao ( talk) 10:21, 29 May 2009 (UTC)
Done Shuroo ( talk) 08:51, 30 May 2009 (UTC)
I'm sorry, I thought your notation for the stochastic case would show more explicitly how the next state is determined. The notation T(x,y) strikes me as misleading, because it looks like a function, which suggests that it should take a unique value. A rigorous way to define it would be to write a c.d.f. of the form representing the probability distribution of possible states conditional on the current state and action . Alternatively, we could write something like , where is a mean-zero shock, but that is more restrictive. -- Rinconsoleao ( talk) 13:47, 1 June 2009 (UTC)
What I wrote was a simple but valid formula that could explain the example next section. (I used T for the needed random variable but indicated it was a random variable , not a function.) It seems to me that adding too much details would make the article much less attractive because in order to introduce conditional distributions properly you need to add some heavy machinery.
Pity that you erased it. Shuroo ( talk) 18:47, 1 June 2009 (UTC)
By the way, the current article uses the version that you erased, in the example, but in an implicit way. Surely it is better to write it explicitly. Shuroo ( talk) 19:32, 1 June 2009 (UTC)
Now I've added a lot more discussion, and a stochastic example with (I think) rigorous notation. Let me know if you find errors.-- Rinconsoleao ( talk) 18:16, 4 June 2009 (UTC)
If you want to do the stochastic part proporly better write it in terms of a Markov decision process. Shuroo ( talk) 10:13, 5 June 2009 (UTC)
I wrote it this way because the G(y|x,a) notation is actually very close to your y=T(x,a) notation, except that it makes explicit the fact that we are talking about a random variable. And it allowed me to write a very similar Bellman equation for the stochastic and deterministic cases (so the stochastic subsection ends up very short and simple without being incorrect). But we could certainly include a link to MDP too.-- Rinconsoleao ( talk) 11:03, 6 June 2009 (UTC)
In any case, if you want the article to be rigorous,at least to some extent, you certainly need to state the Markovian property of the system. Shuroo ( talk) 18:57, 6 June 2009 (UTC)
"It breaks a dynamic optimization problem into simpler subproblems, writing the value of a decision problem at a certain point in time in terms of the payoff from some initial choices and the value of the remaining decision problem that results from those initial choices, as Bellman's Principle of Optimality prescribes." this sentance is VERY long and obfuscated ; it may deserve a reformulation dont you think —Preceding unsigned comment added by 124.180.239.182 ( talk) 09:35, 20 July 2010 (UTC)
As I understand it, the mathematical expression given under the Bellman's Principle of Optimality section is incorrect. The claim is made that it is the same as (the RHS of, presumably) the previous equation, but I don't think that's true -- as it stands, it is actually a description of a greedy solution, not a solution involving the principle of optimal substructure. The reason being that the first term greedily selects the action at time 0 giving maximum payoff, when in fact a suboptimal action at this time may maximise the value overall by bringing the system into a different state at time 1 whose value (maximum possible discounted sum of payoffs), when added to this (suboptimal) time-0 payoff, is greater. Not sure what the correct expression would be I'm afraid. 130.123.96.22 ( talk) 03:18, 15 February 2011 (UTC)
I think the term "value function" deserves an article alone! Luizabpr ( talk) 19:07, 21 May 2011 (UTC)
There is something wrong with the equation defining the conditional probability, but am not quite sure of the fix. Currently it says:
But this can't be right, since is a state, and so in general, the idea of is impossible to define. Perhaps the following was intended:
??? I dunno, please fix. linas ( talk) 17:00, 28 June 2012 (UTC)
Hello fellow Wikipedians,
I have just modified one external link on Bellman equation. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at {{
Sourcecheck}}
).
This message was posted before February 2018.
After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than
regular verification using the archive tool instructions below. Editors
have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the
RfC before doing mass systematic removals. This message is updated dynamically through the template {{
source check}}
(last update: 5 June 2024).
Cheers.— InternetArchiveBot ( Report bug) 11:00, 30 October 2016 (UTC)
Hello fellow Wikipedians,
I have just modified one external link on Bellman equation. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.
This message was posted before February 2018.
After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than
regular verification using the archive tool instructions below. Editors
have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the
RfC before doing mass systematic removals. This message is updated dynamically through the template {{
source check}}
(last update: 5 June 2024).
Cheers.— InternetArchiveBot ( Report bug) 15:19, 17 July 2017 (UTC)
Isn't that basically the main application of Bellman equation? 98.156.185.48 ( talk) 02:06, 29 September 2023 (UTC)