This is the
talk page for discussing improvements to the
PageRank article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google ( books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
Archives: Index, 1Auto-archiving period: 30 days |
This article is rated B-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Google uses an automated web spider called Googlebot to actually count links and gather other information on web pages.
It is by no means clear that the counting can be said to be done by Googlebot, and it is not intuitively a spidering operation, more likely a feature of the database to which the spidering software stores its flies. Therefore this needs a citation to be in the article. Clearly what parts of the Google infrastructure are called "Googlebot" is up to Google, however if it extends too far, the description needs to be changed. All the best:
Rich
Farmbrough, 13:25, 11 August 2014 (UTC).
In the power method section, the first step of the derivation is :
If the matrix \mathcal{M} is a transition probability, i.e., column-stochastic with no columns consisting of just zeros and \mathbf{R} is a probability distribution (i.e., |\mathbf{R}|=1, \mathbf{E}\mathbf{R}=\mathbf{1} where \mathbf{E} is matrix of all ones), Eq. (**) is equivalent to
\mathbf{R} = \left( d \mathcal{M} + \frac{1-d}{N} \mathbf{E} \right)\mathbf{R} =: \widehat{ \mathcal{M}} \mathbf{R}. (***)
And so on...
My Question: Is it necessary that \mathcal{M} has this particular property for this step of the derivation to hold? It seems that only the property that R is a probability distribution is required. — Preceding unsigned comment added by Mtjoul ( talk • contribs) 22:41, 21 December 2014 (UTC)
Hello fellow Wikipedians,
I have just added archive links to one external link on
PageRank. Please take a moment to review
my edit. You may add {{
cbignore}}
after the link to keep me from modifying it, if I keep adding bad data, but formatting bugs should be reported instead. Alternatively, you can add {{
nobots|deny=InternetArchiveBot}}
to keep me off the page altogether, but should be used as a last resort. I made the following changes:
When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at {{
Sourcecheck}}
).
This message was posted before February 2018.
After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than
regular verification using the archive tool instructions below. Editors
have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the
RfC before doing mass systematic removals. This message is updated dynamically through the template {{
source check}}
(last update: 18 January 2022).
Cheers.— cyberbot II Talk to my owner:Online 05:05, 31 March 2016 (UTC)
In "Damping Factor", after the two formulas, it states, "The difference between them is that the PageRank values in the first formula sum to one, while in the second formula each PageRank is multiplied by N and the sum becomes N." However (unless I'm blind), from the first equation to the second, only the first part has been multiplied by N. (Otherwise you'd have 1 - d + Nd(stuff).) With the second equation given as-is, the sum of page ranks is rather trickier than "N". Given that the equation is already acknowledged to be wrong, it's probably not urgent, but hey. — Preceding unsigned comment added by 98.102.161.228 ( talk) 17:37, 22 August 2016 (UTC) The parameter of this code is good to be used in many problems..hoursguru.com
The article mentions 0.31 as the optimal value for d, however nowhere in the cited paper can I directly find 0.31. I can find a reference to an epsilon of 0.3 in Appendix 1.1, but I am not yet convinced this is equivalent to the damping factor d. Can someone clarify? -- José Devezas ( talk) 08:19, 19 December 2018 (UTC)
I also wonder whether the statement even makes sense. How can there be a single damping factor for all "biological data"? What application are we talking about? What is being modelled? I will remove the statement. If someone feels it adds value or explanatory power to the article, perhaps they can clarify it. GBMorris ( talk) 13:25, 9 March 2020 (UTC)
everything links to A which means A is an important page, and A links to C, thus C is important as well.
M = np.array([ [1, 1, 1], # * -> A [0, 0, 0], [1, 0, 0] # A -> C ]) print(pagerank(M, 0.001, 0.85)) array([[0.61536926], [0.28131799], [0.10331275]])
B should be last but its second. Am I missing something?
It's already difficult enough to understand as it is, why not clarify the variable names and add comments, as in any good code? For example:
def pagerank(matrix, num_iterations: int = 100, damping: float = 0.85): size = matrix.shape[1] res_vector = [1/size] * size M_hat = (damping * matrix + (1 - damping) / size) for i in range(num_iterations): res_vector = M_hat @ res_vector # matrix multiplication return res_vector
Also, notice the lines in the article:
v = np.random.rand(N, 1) v = v / np.linalg.norm(v, 1)
these insert random numbers in the multiplication vector. However, as far as I understand, this vector is supposed to start initially as `1/number_of_pages` (1/size) for each of its elements, which could be re-written
v = [1/N] * N
or, using the new varible names,
res_vector = [1/size] * size
According to this video on YouTube: Linear Algebra – Introduction to PageRank (4:21 timestamp).
I'm unsure why this was like this, so I'm not editing the page; however, if someone can confirm, the initial method seems "way too complicated" and there's no explanation for it. -- D.g.lab. ( talk) 16:31, 10 April 2021 (UTC)
😐site:example.com 2A00:F41:4828:36B6:0:4B:5D7E:4801 ( talk) 23:52, 24 February 2022 (UTC)
the article states:'PageRank continues to be the basis of googles....'[25] reference 25 is over 10y old so it does not prove anything. 213.247.64.239 ( talk) 14:32, 18 January 2023 (UTC)
The redirect Supplemental Result has been listed at redirects for discussion to determine whether its use and function meets the redirect guidelines. Readers of this page are welcome to comment on this redirect at Wikipedia:Redirects for discussion/Log/2024 April 25 § Supplemental Result until a consensus is reached. Utopes ( talk / cont) 05:12, 25 April 2024 (UTC)
This is the
talk page for discussing improvements to the
PageRank article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google ( books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
Archives: Index, 1Auto-archiving period: 30 days |
This article is rated B-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Google uses an automated web spider called Googlebot to actually count links and gather other information on web pages.
It is by no means clear that the counting can be said to be done by Googlebot, and it is not intuitively a spidering operation, more likely a feature of the database to which the spidering software stores its flies. Therefore this needs a citation to be in the article. Clearly what parts of the Google infrastructure are called "Googlebot" is up to Google, however if it extends too far, the description needs to be changed. All the best:
Rich
Farmbrough, 13:25, 11 August 2014 (UTC).
In the power method section, the first step of the derivation is :
If the matrix \mathcal{M} is a transition probability, i.e., column-stochastic with no columns consisting of just zeros and \mathbf{R} is a probability distribution (i.e., |\mathbf{R}|=1, \mathbf{E}\mathbf{R}=\mathbf{1} where \mathbf{E} is matrix of all ones), Eq. (**) is equivalent to
\mathbf{R} = \left( d \mathcal{M} + \frac{1-d}{N} \mathbf{E} \right)\mathbf{R} =: \widehat{ \mathcal{M}} \mathbf{R}. (***)
And so on...
My Question: Is it necessary that \mathcal{M} has this particular property for this step of the derivation to hold? It seems that only the property that R is a probability distribution is required. — Preceding unsigned comment added by Mtjoul ( talk • contribs) 22:41, 21 December 2014 (UTC)
Hello fellow Wikipedians,
I have just added archive links to one external link on
PageRank. Please take a moment to review
my edit. You may add {{
cbignore}}
after the link to keep me from modifying it, if I keep adding bad data, but formatting bugs should be reported instead. Alternatively, you can add {{
nobots|deny=InternetArchiveBot}}
to keep me off the page altogether, but should be used as a last resort. I made the following changes:
When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at {{
Sourcecheck}}
).
This message was posted before February 2018.
After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than
regular verification using the archive tool instructions below. Editors
have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the
RfC before doing mass systematic removals. This message is updated dynamically through the template {{
source check}}
(last update: 18 January 2022).
Cheers.— cyberbot II Talk to my owner:Online 05:05, 31 March 2016 (UTC)
In "Damping Factor", after the two formulas, it states, "The difference between them is that the PageRank values in the first formula sum to one, while in the second formula each PageRank is multiplied by N and the sum becomes N." However (unless I'm blind), from the first equation to the second, only the first part has been multiplied by N. (Otherwise you'd have 1 - d + Nd(stuff).) With the second equation given as-is, the sum of page ranks is rather trickier than "N". Given that the equation is already acknowledged to be wrong, it's probably not urgent, but hey. — Preceding unsigned comment added by 98.102.161.228 ( talk) 17:37, 22 August 2016 (UTC) The parameter of this code is good to be used in many problems..hoursguru.com
The article mentions 0.31 as the optimal value for d, however nowhere in the cited paper can I directly find 0.31. I can find a reference to an epsilon of 0.3 in Appendix 1.1, but I am not yet convinced this is equivalent to the damping factor d. Can someone clarify? -- José Devezas ( talk) 08:19, 19 December 2018 (UTC)
I also wonder whether the statement even makes sense. How can there be a single damping factor for all "biological data"? What application are we talking about? What is being modelled? I will remove the statement. If someone feels it adds value or explanatory power to the article, perhaps they can clarify it. GBMorris ( talk) 13:25, 9 March 2020 (UTC)
everything links to A which means A is an important page, and A links to C, thus C is important as well.
M = np.array([ [1, 1, 1], # * -> A [0, 0, 0], [1, 0, 0] # A -> C ]) print(pagerank(M, 0.001, 0.85)) array([[0.61536926], [0.28131799], [0.10331275]])
B should be last but its second. Am I missing something?
It's already difficult enough to understand as it is, why not clarify the variable names and add comments, as in any good code? For example:
def pagerank(matrix, num_iterations: int = 100, damping: float = 0.85): size = matrix.shape[1] res_vector = [1/size] * size M_hat = (damping * matrix + (1 - damping) / size) for i in range(num_iterations): res_vector = M_hat @ res_vector # matrix multiplication return res_vector
Also, notice the lines in the article:
v = np.random.rand(N, 1) v = v / np.linalg.norm(v, 1)
these insert random numbers in the multiplication vector. However, as far as I understand, this vector is supposed to start initially as `1/number_of_pages` (1/size) for each of its elements, which could be re-written
v = [1/N] * N
or, using the new varible names,
res_vector = [1/size] * size
According to this video on YouTube: Linear Algebra – Introduction to PageRank (4:21 timestamp).
I'm unsure why this was like this, so I'm not editing the page; however, if someone can confirm, the initial method seems "way too complicated" and there's no explanation for it. -- D.g.lab. ( talk) 16:31, 10 April 2021 (UTC)
😐site:example.com 2A00:F41:4828:36B6:0:4B:5D7E:4801 ( talk) 23:52, 24 February 2022 (UTC)
the article states:'PageRank continues to be the basis of googles....'[25] reference 25 is over 10y old so it does not prove anything. 213.247.64.239 ( talk) 14:32, 18 January 2023 (UTC)
The redirect Supplemental Result has been listed at redirects for discussion to determine whether its use and function meets the redirect guidelines. Readers of this page are welcome to comment on this redirect at Wikipedia:Redirects for discussion/Log/2024 April 25 § Supplemental Result until a consensus is reached. Utopes ( talk / cont) 05:12, 25 April 2024 (UTC)