This is the
talk page for discussing improvements to the
Reinforcement learning article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google ( books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
![]() | This article is rated C-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||
|
![]() |
Daily pageviews of this article
A graph should have been displayed here but
graphs are temporarily disabled. Until they are enabled again, visit the interactive graph at
pageviews.wmcloud.org |
Is R=Σtγtrt, or or ?
Answer: It is :
What exactly is a policy? The Sutton-Barto book is very vague on this point, and so is this article. In both cases the word is used without much explanation.
According to both the book and the article, a policy is a mapping from states to action probabilities. Fine. But this is not elaborated upon. What does a policy look like? I infer that it must be a table (2-D array), indexed by state and action, and containing probabilities, say pij for the i-th state and j-th action, each pij being a transition probability for the MDP. If so, what is its relation to the values derived from rewards? I.e. where exactly do the probabilities pij come from? How does one generate a policy table starting from values?
Sorry if I appear stupid, but I've been studying the book and I find it very difficult to comprehend, even though the maths is very simple (almost too simple). Or maybe it's in there somewhere but I've missed it?
-- 84.9.83.127 09:36, 18 November 2006 (UTC)
I hope the new version explains what a policy might mean. In fact, it has multiple meanings and is used somewhat inconsistently in the literature. Szepi ( talk) 03:11, 7 September 2010 (UTC)
There is a short article on Q learning and could be merged with reinforcement learning Kpmiyapuram 14:23, 24 April 2007 (UTC)
Szepi ( talk) 03:20, 7 September 2010 (UTC)
Where's all the stuff about learning in games? It would be great if someone could incorporate this. Jeremy Tobacman 23:40, 1 August 2007 (UTC)
This article starts with a reference to 'Reinforcement learning' in psychology. Isn't there an article about that? -- Rinconsoleao 13:43, 27 September 2007 (UTC)
I feel the literature referenced by Csaba Szepesvàri was a useful addition and perhaps should not have been removed. Even though he referenced a book written by himself, he is a well known and respected researcher in reinforcement learning and this book is a useful overview of the field. I do not know of many good recent alternatives, so I would favor reverting MrOllie's revision. However, rather than immediately doing so, I thought it might be better to start a discussion.
Chaosdruid ( talk) 05:03, 6 March 2011 (UTC)
'The theory of small mdps is [..] mature; [..] the theory of large mdps needs more work.'
What does that even mean ? Theory is theory; if you understand an mdp with 10 states, than you understand one with ten million states, although standard algorithms may run too slow, I can't see the conceptual difference between ten and ten million as far as theory is concerned.
Does the author mean either: a) small equals finite and large equals countably or uncountably infinite, or b) approximation methods (in itself only useful when direct methods fail) are not as well understood.
— Preceding unsigned comment added by 157.193.140.25 ( talk) 09:21, 26 August 2011 (UTC)
I only use small in the context of finite MDPs. "Theory of small, finite MDPs" means theoretical results concerning algorithms whose complexity scales at least linearly with the size of the state-action space. I think this is intuitive, but if you have some suggestions, but I would welcome any alternative suggestions. I realize this could be misunderstood (someone might think that small means 10 or 100s, though I did not think this would be likely to happen).
Szepi ( talk) 15:23, 16 September 2011 (UTC)
Can someone make a sub-category for machine learning maybe? -- 77.4.90.71 ( talk) 16:35, 1 November 2011 (UTC)
The whole article is a subcategory of machine learning. Perhaps you seek practical applications or tools? Or I'm just not sure what you mean. Krehel ( talk) 00:11, 24 September 2018 (UTC)
The table comparing algorithms is just plain wrong:
I'm not quite sure how to reorganise this table without it becoming monstrous in size. In its present state it is however highly confusing and misleading. I'd say it would be better to remove it than to keep it as it currently is. LordDamorcro ( talk) 18:39, 5 July 2021 (UTC)
The following Wikimedia Commons file used on this page or its Wikidata item has been nominated for deletion:
Participate in the deletion discussion at the nomination page. — Community Tech bot ( talk) 20:25, 11 September 2021 (UTC)
Wikipedia is not for the purpose only of informing persons already expert in the subject matter, not is it a forum for authors to demonstrate their knowledge or show off their technical grasp to others in their field. Articles in Wikipedia are supposed to EXPLAIN things. This means breaking down jargon. It means setting out topics in a manner that makes them approachable for people not already well read in the field.
Too many Wikipedia articles, including this one, are written by peopple incapable of understanding this extremely obvious perspective. The purpose is not to compose some form of canonical description of the field in the most compact, concise or dense langiage possible. It is the opposite. Many authors here are academics, but it seems clear many would struggle successfully to teach a class anything at all. 49.180.205.46 ( talk) 10:20, 9 September 2022 (UTC)
related literature about effect of academic pressure 209.35.172.23 ( talk) 07:02, 20 April 2023 (UTC)
It would be good to have a section on the applications of RL on this page. I haven't done any major writing on wiki and not sure If I can just add one. eg. Robotics, self driving cars, gaming (AlphaGo) etc. Amitkannan ( talk) 07:16, 26 September 2023 (UTC)
This is the
talk page for discussing improvements to the
Reinforcement learning article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google ( books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
![]() | This article is rated C-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||
|
![]() |
Daily pageviews of this article
A graph should have been displayed here but
graphs are temporarily disabled. Until they are enabled again, visit the interactive graph at
pageviews.wmcloud.org |
Is R=Σtγtrt, or or ?
Answer: It is :
What exactly is a policy? The Sutton-Barto book is very vague on this point, and so is this article. In both cases the word is used without much explanation.
According to both the book and the article, a policy is a mapping from states to action probabilities. Fine. But this is not elaborated upon. What does a policy look like? I infer that it must be a table (2-D array), indexed by state and action, and containing probabilities, say pij for the i-th state and j-th action, each pij being a transition probability for the MDP. If so, what is its relation to the values derived from rewards? I.e. where exactly do the probabilities pij come from? How does one generate a policy table starting from values?
Sorry if I appear stupid, but I've been studying the book and I find it very difficult to comprehend, even though the maths is very simple (almost too simple). Or maybe it's in there somewhere but I've missed it?
-- 84.9.83.127 09:36, 18 November 2006 (UTC)
I hope the new version explains what a policy might mean. In fact, it has multiple meanings and is used somewhat inconsistently in the literature. Szepi ( talk) 03:11, 7 September 2010 (UTC)
There is a short article on Q learning and could be merged with reinforcement learning Kpmiyapuram 14:23, 24 April 2007 (UTC)
Szepi ( talk) 03:20, 7 September 2010 (UTC)
Where's all the stuff about learning in games? It would be great if someone could incorporate this. Jeremy Tobacman 23:40, 1 August 2007 (UTC)
This article starts with a reference to 'Reinforcement learning' in psychology. Isn't there an article about that? -- Rinconsoleao 13:43, 27 September 2007 (UTC)
I feel the literature referenced by Csaba Szepesvàri was a useful addition and perhaps should not have been removed. Even though he referenced a book written by himself, he is a well known and respected researcher in reinforcement learning and this book is a useful overview of the field. I do not know of many good recent alternatives, so I would favor reverting MrOllie's revision. However, rather than immediately doing so, I thought it might be better to start a discussion.
Chaosdruid ( talk) 05:03, 6 March 2011 (UTC)
'The theory of small mdps is [..] mature; [..] the theory of large mdps needs more work.'
What does that even mean ? Theory is theory; if you understand an mdp with 10 states, than you understand one with ten million states, although standard algorithms may run too slow, I can't see the conceptual difference between ten and ten million as far as theory is concerned.
Does the author mean either: a) small equals finite and large equals countably or uncountably infinite, or b) approximation methods (in itself only useful when direct methods fail) are not as well understood.
— Preceding unsigned comment added by 157.193.140.25 ( talk) 09:21, 26 August 2011 (UTC)
I only use small in the context of finite MDPs. "Theory of small, finite MDPs" means theoretical results concerning algorithms whose complexity scales at least linearly with the size of the state-action space. I think this is intuitive, but if you have some suggestions, but I would welcome any alternative suggestions. I realize this could be misunderstood (someone might think that small means 10 or 100s, though I did not think this would be likely to happen).
Szepi ( talk) 15:23, 16 September 2011 (UTC)
Can someone make a sub-category for machine learning maybe? -- 77.4.90.71 ( talk) 16:35, 1 November 2011 (UTC)
The whole article is a subcategory of machine learning. Perhaps you seek practical applications or tools? Or I'm just not sure what you mean. Krehel ( talk) 00:11, 24 September 2018 (UTC)
The table comparing algorithms is just plain wrong:
I'm not quite sure how to reorganise this table without it becoming monstrous in size. In its present state it is however highly confusing and misleading. I'd say it would be better to remove it than to keep it as it currently is. LordDamorcro ( talk) 18:39, 5 July 2021 (UTC)
The following Wikimedia Commons file used on this page or its Wikidata item has been nominated for deletion:
Participate in the deletion discussion at the nomination page. — Community Tech bot ( talk) 20:25, 11 September 2021 (UTC)
Wikipedia is not for the purpose only of informing persons already expert in the subject matter, not is it a forum for authors to demonstrate their knowledge or show off their technical grasp to others in their field. Articles in Wikipedia are supposed to EXPLAIN things. This means breaking down jargon. It means setting out topics in a manner that makes them approachable for people not already well read in the field.
Too many Wikipedia articles, including this one, are written by peopple incapable of understanding this extremely obvious perspective. The purpose is not to compose some form of canonical description of the field in the most compact, concise or dense langiage possible. It is the opposite. Many authors here are academics, but it seems clear many would struggle successfully to teach a class anything at all. 49.180.205.46 ( talk) 10:20, 9 September 2022 (UTC)
related literature about effect of academic pressure 209.35.172.23 ( talk) 07:02, 20 April 2023 (UTC)
It would be good to have a section on the applications of RL on this page. I haven't done any major writing on wiki and not sure If I can just add one. eg. Robotics, self driving cars, gaming (AlphaGo) etc. Amitkannan ( talk) 07:16, 26 September 2023 (UTC)