This is the
talk page for discussing improvements to the
Thompson sampling article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google ( books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
![]() | This article is rated Start-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||||
|
I would like to suggest an additional reference for the Thompson sampling page, to be added to the collection of already existing references at the end of the second history sentence, which reads: "It was subsequently rediscovered numerous times independently in the context of reinforcement learning."
The new reference pre-dates the references already given, which may make it of interest to Wikipedia readers (the new reference has date 1994, while the earliest one there now is for 1997). Our work, like the others cited, was unaware of the earlier (1933!) Thompson reference.
Because it is my own publication, I am not editing the page directly, but merely making a suggestion here that it might be of interest and worth doing. (I don't fully understand all of the Wikipedia COI guidelines, but this seems about right...)
There is a link to the publication here: (Link to Rivest-Yin paper on Simulation results for a new two-armed bandit heuristic) This is not the link I would propose to insert, but only so an editor can see the bib file and the context a bit more fully.
I do not have a link to an online copy of this article, other than the one I have posted on my own web site. A possible link to be added to the wikipedia page might look like this: Simulation Results for a new two-armed bandit heuristic. Ronald L. Rivest and Yiqun Yin. Proceedings of a workshop on Computational Learning Theory and Natural Learning Systems (Princeton, New Jersey, 1994) pp. 477--486.
Ronald L. Rivest ( talk) 23:16, 5 August 2013 (UTC)
The section "Relationship to other approaches > Probability matching" just briefly describes probability matching, but doesn't in any way describe how Thompson sampling relates to it. It's quite confusing
66.29.243.106 (
talk)
14:54, 8 September 2014 (UTC)
The article cites Wyatt's thesis as "a first proof of convergence for the bandit case". If this is referring to the proof in sec. 4.3, pp. 61-63 of that source, then I believe this sentence should be removed because the proof is incorrect. The gist of the proof is that our estimates of per-arm reward converge to the true estimates as the number of samples on each arm tends to infinity; hence we need only show that the number of samples on each arm always tends to infinity. This is "proved" by showing that the probability of sampling a given arm is always strictly positive, at each step. But this does not prove that the arm is sampled infinitely often: see the first Borel-Cantelli lemma, for instance.
Gostevehoward (
talk)
16:22, 8 April 2017 (UTC)
http://auai.org/uai2016/proceedings/papers/20.pdf
2001:700:1200:5118:6D4F:98C0:7303:A490 ( talk) 13:25, 6 September 2018 (UTC)
This is the
talk page for discussing improvements to the
Thompson sampling article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google ( books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
![]() | This article is rated Start-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||||
|
I would like to suggest an additional reference for the Thompson sampling page, to be added to the collection of already existing references at the end of the second history sentence, which reads: "It was subsequently rediscovered numerous times independently in the context of reinforcement learning."
The new reference pre-dates the references already given, which may make it of interest to Wikipedia readers (the new reference has date 1994, while the earliest one there now is for 1997). Our work, like the others cited, was unaware of the earlier (1933!) Thompson reference.
Because it is my own publication, I am not editing the page directly, but merely making a suggestion here that it might be of interest and worth doing. (I don't fully understand all of the Wikipedia COI guidelines, but this seems about right...)
There is a link to the publication here: (Link to Rivest-Yin paper on Simulation results for a new two-armed bandit heuristic) This is not the link I would propose to insert, but only so an editor can see the bib file and the context a bit more fully.
I do not have a link to an online copy of this article, other than the one I have posted on my own web site. A possible link to be added to the wikipedia page might look like this: Simulation Results for a new two-armed bandit heuristic. Ronald L. Rivest and Yiqun Yin. Proceedings of a workshop on Computational Learning Theory and Natural Learning Systems (Princeton, New Jersey, 1994) pp. 477--486.
Ronald L. Rivest ( talk) 23:16, 5 August 2013 (UTC)
The section "Relationship to other approaches > Probability matching" just briefly describes probability matching, but doesn't in any way describe how Thompson sampling relates to it. It's quite confusing
66.29.243.106 (
talk)
14:54, 8 September 2014 (UTC)
The article cites Wyatt's thesis as "a first proof of convergence for the bandit case". If this is referring to the proof in sec. 4.3, pp. 61-63 of that source, then I believe this sentence should be removed because the proof is incorrect. The gist of the proof is that our estimates of per-arm reward converge to the true estimates as the number of samples on each arm tends to infinity; hence we need only show that the number of samples on each arm always tends to infinity. This is "proved" by showing that the probability of sampling a given arm is always strictly positive, at each step. But this does not prove that the arm is sampled infinitely often: see the first Borel-Cantelli lemma, for instance.
Gostevehoward (
talk)
16:22, 8 April 2017 (UTC)
http://auai.org/uai2016/proceedings/papers/20.pdf
2001:700:1200:5118:6D4F:98C0:7303:A490 ( talk) 13:25, 6 September 2018 (UTC)