Fast inverse square root is a former featured article candidate. Please view the links under Article milestones below to see why the nomination failed. For older candidates, please check the archive. | ||||||||||||||||||||||
Fast inverse square root has been listed as one of the Mathematics good articles under the good article criteria. If you can improve it further, please do so. If it no longer meets these criteria, you can reassess it. | ||||||||||||||||||||||
|
This article is rated GA-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||
|
I have rewritten a significant fraction of the article, and shifted some other pieces around. The main reason for doing this is that I felt that, as it was, the article had too many long and hard to read equations which obscured the very essential point of the algorithm. Namely, that aliasing an int to a float is a quick and dirty approximation of its logarithm. From this, everything else derives trivially. I tried give this point a more central place in the article, and also provided a graph comparing Ix with a scaled and shifted log2(x). The other significant changes I made are:
— Edgar.bonet ( talk) 12:05, 25 July 2014 (UTC)
There are two figures in this article that do not have axis labels and no description of the axes is given in the figure captions. As such, they are essentially useless because you cannot tell what they mean. They should be corrected or removed, and the the perpetrators should be rapped across the knuckles, then tarred and feathered for their shoddy work. http://xkcd.com/833/
128.174.127.111 ( talk) 15:21, 30 September 2014 (UTC)
I committed the first of these graphs, and I am aware of the missing axis labels. There are two curves in the graph, as identified by the key in the top-left quadrant:
I did not label the y-axis because it can represent two different things. I could put a label like “Ix or L log2(x) + L (B − σ)”, but I really do not believe this would have any usefulness. As for the x-axis, the appropriate label would be just “x”, as this is the name given to the independent variable in the two expressions above. Would such a label be useful? — Edgar.bonet ( talk) 12:53, 1 October 2014 (UTC)
This article goes into considerable depth over the optimization of the "magic constant", but ignores the other 2. Jan Kadlec wrote a program to see if he could optimize all three. After an exhaustive search:
float inv_sqrt(float x)
{ union { float f; uint32 u; } y = {x};
y.u = 0x5F3759DFul - (y.u >> 1);
return 0.5f * y.f * (3.0f - x * y.f * y.f); }
becomes:
float inv_sqrt(float x)
{ union { float f; uint32 u; } y = {x};
y.u = 0x5F1FFFF9ul - (y.u >> 1);
return 0.703952253f * y.f * (2.38924456f - x * y.f * y.f); }
With all 3 parameters optimized, you can get min max error = 0.0006501967. All the details can be found here. — Preceding unsigned comment added by Slacka123 ( talk • contribs) 06:02, 7 April 2015 (UTC) ; edited 09:26, 7 April 2015 (UTC)
I revised my document (referenced in this article), http://www.geometrictools.com/Documentation/FastInverseSqrt.pdf, which now describes a minimax algorithm that almost generates the magic number. By "almost" I mean: whoever wrote the original code probably estimated the minimax value using sampling. The code's magic number and the value I computed by implementing the minimax algorithm are nearly the same number. The algorithm is more general in that the initial guess to Newton's method is y0 = a + b*x. The first iterate is y1 = y0*(3-x*(a + b*x)^2)/2, which can be thought of as a function of 'a' and 'b'. For each (a,b), the relative error in the approximation is E(a,b) = max_{x in [1,2)}|sqrt(x)*y1-1|. If you minimize this for all a > 0 and b < 0, you obtain an error bound about half that produced by the Quake3 code, but the implementation is slower (but still faster than 1/sqrt(x)). If you constrain the problem by choosing b = -1/4 and compute the minimum of E(a,-1/4), you get an 'a' that generates a magic number nearly the same as the Quake3 code, but the global error bound is slightly smaller (which argues the code writer estimated the minimizer for 'a'). Lomont's document attempts to explain with a minimax algorithm, but for y0 fitted to 1/sqrt(x); however, the goal is for y1 to be a good estimate. Just thought I'd mention this in case whoever likes updating the main page can feel free to do so. Daveeberly ( talk) 20:26, 9 May 2015 (UTC)
In the 1990 Graphics Gems 1 book, Paul Lalonde and Robert Dawson of Dalhousie University, Halifax, Nova Scotia share code for the fast sqrt and inverse sqrt. The method they describe is more verbose but effectively the same as the modern code, though they don't use the exact optimized magic constant. Clearly the constant is just a refinement from the basic structure they discribe. https://imcs.dvfu.ru/lib.int/docs/Programming/Graphics/Graphics%20Gems%20I.pdf See the chapter "A HIGH SPEED HIGH SPEED, LOW PRECISION PRECISION SQUARE ROOT QUARE ROOT"
http://www.realtimerendering.com/resources/GraphicsGems/gemsiii/sqrt.c — Preceding unsigned comment added by 76.102.116.190 ( talk) 02:31, 9 October 2016 (UTC)
References
In the section "Overview of the code", there is the expression:
At the time, the general method to compute the inverse square root was to calculate an approximation for 1/√x, then revise that approximation via another method until it came within an acceptable error range of the actual result.
The word "at the time" looks to mean the method before fast inverse square root. However, the expression also matches the fast method. The method just uses special way to calculate an approximation for 1/√x. I think the past method was to calculate the multiplicative inverse and then to calculate the square root of it (it can be reversed). It should be clarified in the article. Gjhdvhd ( talk) 09:44, 23 October 2016 (UTC)
This article starts motivating the algorithm by stating
At the time, it was generally computationally expensive to compute the reciprocal of a floating-point number
This suggests that the point of the algorithm is to avoid the cost of computing the inverse (of the square root). However, it rather seems that the point of the algorithm is to avoid having more than one Newton iteration. — Preceding unsigned comment added by 79.210.119.29 ( talk) 02:21, 28 April 2017 (UTC)
Is there a similar method to calculate normal square root? — Preceding unsigned comment added by 2A01:119F:21D:7900:A5CD:B2BA:D621:91CF ( talk) 09:38, 2 December 2017 (UTC)
float one = 1.0; float constant = (*(uint32_t *)&one) >> 1; uint32_t temp = (*(uint32_t *)&x) >> 1 + constant; float sqrt = *(float *)&temp;
I'd like to propose protecting this page under "Pending changes" ( WP:PEND). Looking at the edit history, almost every "Undo" for the past year is reverting someone's removal of the profanity in the algorithm's code comments. Further, many of the undone edits are by new or unregistered users. Ignoring these, I see about 2 accepted edits per month, which seems infrequent. The page protection guidelines state that WP:PEND is appropriate for "[i]nfrequently edited articles with high levels of vandalism ... from unregistered and new users", which seems appropriate for this page. I realize the profanity edits are made in good faith and aren't actual vandalism, but they're still disruptive. If there is some way to protect just the code block, that would be better, but I don't know of one. Decka7 ( talk) 15:55, 20 November 2018 (UTC)
Under accuracy it claims a worse case error of 1.7% When I run the algorithm I get a worse case relative error of 0.00174915. Ie 0.17% Has the article simply shifted the decimal place? It would be nice is someone else could double check.
Bill W102102 ( talk) 09:57, 12 March 2019 (UTC)
Thank you Bill W102102 ( talk) 10:40, 13 March 2019 (UTC)
Is this the only Wikipedia article that's illustrated by a picture of a person who definitely did not do the work? How about deleting the picture? Zaslav ( talk) 23:49, 5 June 2019 (UTC)
Agreed, it's nonsense to have Carmack's mugshot here. He already has his own WikiPedia page and that image makes good sense there but here it is just noise. — Preceding unsigned comment added by 213.127.87.1 ( talk) 10:02, 14 November 2019 (UTC)
The "Moroz 2018." citation reference doesn't seem to link properly, but I'm not sure why. — Fishrock123 ( talk) 18:59, 29 January 2020 (UTC)
Is the comment present in the original code? Is the profanity merely gratuitous? Is its presence in this article necessary? 87.75.117.183 ( talk) 18:36, 10 January 2021 (UTC)
@ David Eppstein: Wikipedia:Offensive material says "Material that would be considered vulgar or obscene by typical Wikipedia readers should be used if and only if its omission would cause the article to be less informative, relevant, or accurate, and no equally suitable alternative is available". I propose for debate that the article should state that a swear word has been edited and that the use of the f-bomb within the material should be replaced by f***. — Quantling ( talk | contribs) 18:26, 17 April 2021 (UTC)
As pointed out in the article, the constant 0x5f3759df gives rise to the σ of 0.0450466. But the article also clearly states just prior that optimum σ as calculated by the uniform norm definition is 0.0430357, which implies a constant of 0x5f37bcb6 or 0x5f37bcb5. There is a fair deviation here that is unjustified. However insignificant it may be there is no explanation for the inconsistency in the article.
As a side note, Id like to see the derivation of either number from the uniform norm. I attempted to find my own optimal with the minimal square error approach and the error is actually larger. Some explanation as to why the uniform norm is the best one, or the better one at least, would be reasonable for this article. 50.34.41.50 ( talk) 16:55, 8 May 2021 (UTC)
The inverse of the square root function is surely the square. The function in this article might be called the "Reciprocal of the Square Root" but "inverse square root" is used elsewhere. -- 82.3.131.192 ( talk) 18:02, 24 May 2021 (UTC)
Your revert reintroduces a grammatical error, an error that "The following code follows C standards" but then the code that was given still used illegal unions. I had restructured the text to make this more clear. Reintroduces the error that that memcpy would be slower or was more memory infefficient which is not true. And reintroduces the error that an extra variable would incur additional cost. That was at least misleading.
If you are of the opinion that there is too much bloat then we should remove the sample code with the union, since that is not actually an improvement over the original code of Quake III Arena.
While the Youtube video is not a reliable source, the author is a well known expert and has published best practices books on C++, and the same information as in the source can be found on Wikipedia itself Type punning. I encourage you to find a better link instead of blanket reverting 5 different constructive edits. Kwinzman ( talk) 08:41, 22 October 2021 (UTC)
I've removed reference to a similar approximation developed simultaneously to Kahan and Ng's in 1986 because I can't seem to find a reference which exists that actually talks about it. I also cannot verify that the cited reference in the article (A New Generation of Hardware and Software for the FPS T Series) supports the claim in text.
There is a modern mention by Kinney in a submission to a conference. The submission was not accepted, but more importantly for us if you read the section it doesn't say anything more than "In 1986, one of the authors of this paper originally invented this method, which was called “The Kinney Method,” to implement vector square root for the production FPS T Series Hypercube Supercomputer (Gustafson, 1986). William Kahan and K.C. Ng at Berkeley also independently discovered this around 1986." The Gustafson 1986 cite is to a white-paper which doesn't mention square roots (reciprocal or otherwise) or use of floating point / integer storage to gain an approximate logarithm. I am not willing to say I've exhausted all the published literature but I've made a good faith effort to find any contemporary account of it and I can't find any.
Accordingly I have removed the material. If you want to restore it please consider posting as to why in this discussion. Thank you. Protonk ( talk) 19:42, 17 January 2022 (UTC)
Using of union dosent remove undef behaviour from the code. The trick of using unions as typecast is comonly used in code, nearly all C programers use it, but the C standart defines it as undefined behaviout, only reading from the same type you wrote before is defined for enums. Plus the required space to save enum is equal to the bigest size among used variables/structs in the enum. Saving to one and reading from second member of enum is NOT defined - we only assune the compiler is clever and dont do some tricks with it. There is no need for compiler to do this, it will harm the efectivity of it, so obviously this trick works... But it is NOT defined by C standart, so it dosent remove the undef vehaviour from the code. 2A00:102A:5004:A4B6:F584:6CC9:A4F0:4CE6 ( talk) 09:51, 26 January 2022 (UTC)
Here /info/en/?search=Fast_inverse_square_root#Zero_finding about Halley's method you can read that it converges cubicly while newton method converges quadraticly (square). But in Halley's method need each time one division operation. So basicly for one Halley's method iteration need two iterations... Say first number chosen with precision 0.001. Then with Newton's method after two iterations precision will be (0.001^2)^2= 0.000000000001. And with Halley's method after one iteration (which takes time like two Newton's method iterations) precision will be 0.001^3=0.000000001. Also I think that what is written here "Subsequent additions by hardware manufacturers have made this algorithm redundant for the most part. For example, on x86, Intel introduced the SSE instruction rsqrtss in 1999. In a 2009 benchmark on the Intel Core 2, this instruction took 0.85ns per float compared to 3.54ns for the fast inverse square root algorithm, and had less error." is comparing SSE single precision with x87 FPU double precision. But I recently realise, that from here /info/en/?search=Division_algorithm#Newton–Raphson_division division operation needs about the same time as calulating inverse square root with Newton's method. So basicly it's very unclear from where comes this Intel SSE instruction ~4x speed by using Halley's method? Paraboloid ( talk) 09:58, 10 May 2024 (UTC)
Fast inverse square root is a former featured article candidate. Please view the links under Article milestones below to see why the nomination failed. For older candidates, please check the archive. | ||||||||||||||||||||||
Fast inverse square root has been listed as one of the Mathematics good articles under the good article criteria. If you can improve it further, please do so. If it no longer meets these criteria, you can reassess it. | ||||||||||||||||||||||
|
This article is rated GA-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||||||||||||||||||||||||||||||||||||||
|
I have rewritten a significant fraction of the article, and shifted some other pieces around. The main reason for doing this is that I felt that, as it was, the article had too many long and hard to read equations which obscured the very essential point of the algorithm. Namely, that aliasing an int to a float is a quick and dirty approximation of its logarithm. From this, everything else derives trivially. I tried give this point a more central place in the article, and also provided a graph comparing Ix with a scaled and shifted log2(x). The other significant changes I made are:
— Edgar.bonet ( talk) 12:05, 25 July 2014 (UTC)
There are two figures in this article that do not have axis labels and no description of the axes is given in the figure captions. As such, they are essentially useless because you cannot tell what they mean. They should be corrected or removed, and the the perpetrators should be rapped across the knuckles, then tarred and feathered for their shoddy work. http://xkcd.com/833/
128.174.127.111 ( talk) 15:21, 30 September 2014 (UTC)
I committed the first of these graphs, and I am aware of the missing axis labels. There are two curves in the graph, as identified by the key in the top-left quadrant:
I did not label the y-axis because it can represent two different things. I could put a label like “Ix or L log2(x) + L (B − σ)”, but I really do not believe this would have any usefulness. As for the x-axis, the appropriate label would be just “x”, as this is the name given to the independent variable in the two expressions above. Would such a label be useful? — Edgar.bonet ( talk) 12:53, 1 October 2014 (UTC)
This article goes into considerable depth over the optimization of the "magic constant", but ignores the other 2. Jan Kadlec wrote a program to see if he could optimize all three. After an exhaustive search:
float inv_sqrt(float x)
{ union { float f; uint32 u; } y = {x};
y.u = 0x5F3759DFul - (y.u >> 1);
return 0.5f * y.f * (3.0f - x * y.f * y.f); }
becomes:
float inv_sqrt(float x)
{ union { float f; uint32 u; } y = {x};
y.u = 0x5F1FFFF9ul - (y.u >> 1);
return 0.703952253f * y.f * (2.38924456f - x * y.f * y.f); }
With all 3 parameters optimized, you can get min max error = 0.0006501967. All the details can be found here. — Preceding unsigned comment added by Slacka123 ( talk • contribs) 06:02, 7 April 2015 (UTC) ; edited 09:26, 7 April 2015 (UTC)
I revised my document (referenced in this article), http://www.geometrictools.com/Documentation/FastInverseSqrt.pdf, which now describes a minimax algorithm that almost generates the magic number. By "almost" I mean: whoever wrote the original code probably estimated the minimax value using sampling. The code's magic number and the value I computed by implementing the minimax algorithm are nearly the same number. The algorithm is more general in that the initial guess to Newton's method is y0 = a + b*x. The first iterate is y1 = y0*(3-x*(a + b*x)^2)/2, which can be thought of as a function of 'a' and 'b'. For each (a,b), the relative error in the approximation is E(a,b) = max_{x in [1,2)}|sqrt(x)*y1-1|. If you minimize this for all a > 0 and b < 0, you obtain an error bound about half that produced by the Quake3 code, but the implementation is slower (but still faster than 1/sqrt(x)). If you constrain the problem by choosing b = -1/4 and compute the minimum of E(a,-1/4), you get an 'a' that generates a magic number nearly the same as the Quake3 code, but the global error bound is slightly smaller (which argues the code writer estimated the minimizer for 'a'). Lomont's document attempts to explain with a minimax algorithm, but for y0 fitted to 1/sqrt(x); however, the goal is for y1 to be a good estimate. Just thought I'd mention this in case whoever likes updating the main page can feel free to do so. Daveeberly ( talk) 20:26, 9 May 2015 (UTC)
In the 1990 Graphics Gems 1 book, Paul Lalonde and Robert Dawson of Dalhousie University, Halifax, Nova Scotia share code for the fast sqrt and inverse sqrt. The method they describe is more verbose but effectively the same as the modern code, though they don't use the exact optimized magic constant. Clearly the constant is just a refinement from the basic structure they discribe. https://imcs.dvfu.ru/lib.int/docs/Programming/Graphics/Graphics%20Gems%20I.pdf See the chapter "A HIGH SPEED HIGH SPEED, LOW PRECISION PRECISION SQUARE ROOT QUARE ROOT"
http://www.realtimerendering.com/resources/GraphicsGems/gemsiii/sqrt.c — Preceding unsigned comment added by 76.102.116.190 ( talk) 02:31, 9 October 2016 (UTC)
References
In the section "Overview of the code", there is the expression:
At the time, the general method to compute the inverse square root was to calculate an approximation for 1/√x, then revise that approximation via another method until it came within an acceptable error range of the actual result.
The word "at the time" looks to mean the method before fast inverse square root. However, the expression also matches the fast method. The method just uses special way to calculate an approximation for 1/√x. I think the past method was to calculate the multiplicative inverse and then to calculate the square root of it (it can be reversed). It should be clarified in the article. Gjhdvhd ( talk) 09:44, 23 October 2016 (UTC)
This article starts motivating the algorithm by stating
At the time, it was generally computationally expensive to compute the reciprocal of a floating-point number
This suggests that the point of the algorithm is to avoid the cost of computing the inverse (of the square root). However, it rather seems that the point of the algorithm is to avoid having more than one Newton iteration. — Preceding unsigned comment added by 79.210.119.29 ( talk) 02:21, 28 April 2017 (UTC)
Is there a similar method to calculate normal square root? — Preceding unsigned comment added by 2A01:119F:21D:7900:A5CD:B2BA:D621:91CF ( talk) 09:38, 2 December 2017 (UTC)
float one = 1.0; float constant = (*(uint32_t *)&one) >> 1; uint32_t temp = (*(uint32_t *)&x) >> 1 + constant; float sqrt = *(float *)&temp;
I'd like to propose protecting this page under "Pending changes" ( WP:PEND). Looking at the edit history, almost every "Undo" for the past year is reverting someone's removal of the profanity in the algorithm's code comments. Further, many of the undone edits are by new or unregistered users. Ignoring these, I see about 2 accepted edits per month, which seems infrequent. The page protection guidelines state that WP:PEND is appropriate for "[i]nfrequently edited articles with high levels of vandalism ... from unregistered and new users", which seems appropriate for this page. I realize the profanity edits are made in good faith and aren't actual vandalism, but they're still disruptive. If there is some way to protect just the code block, that would be better, but I don't know of one. Decka7 ( talk) 15:55, 20 November 2018 (UTC)
Under accuracy it claims a worse case error of 1.7% When I run the algorithm I get a worse case relative error of 0.00174915. Ie 0.17% Has the article simply shifted the decimal place? It would be nice is someone else could double check.
Bill W102102 ( talk) 09:57, 12 March 2019 (UTC)
Thank you Bill W102102 ( talk) 10:40, 13 March 2019 (UTC)
Is this the only Wikipedia article that's illustrated by a picture of a person who definitely did not do the work? How about deleting the picture? Zaslav ( talk) 23:49, 5 June 2019 (UTC)
Agreed, it's nonsense to have Carmack's mugshot here. He already has his own WikiPedia page and that image makes good sense there but here it is just noise. — Preceding unsigned comment added by 213.127.87.1 ( talk) 10:02, 14 November 2019 (UTC)
The "Moroz 2018." citation reference doesn't seem to link properly, but I'm not sure why. — Fishrock123 ( talk) 18:59, 29 January 2020 (UTC)
Is the comment present in the original code? Is the profanity merely gratuitous? Is its presence in this article necessary? 87.75.117.183 ( talk) 18:36, 10 January 2021 (UTC)
@ David Eppstein: Wikipedia:Offensive material says "Material that would be considered vulgar or obscene by typical Wikipedia readers should be used if and only if its omission would cause the article to be less informative, relevant, or accurate, and no equally suitable alternative is available". I propose for debate that the article should state that a swear word has been edited and that the use of the f-bomb within the material should be replaced by f***. — Quantling ( talk | contribs) 18:26, 17 April 2021 (UTC)
As pointed out in the article, the constant 0x5f3759df gives rise to the σ of 0.0450466. But the article also clearly states just prior that optimum σ as calculated by the uniform norm definition is 0.0430357, which implies a constant of 0x5f37bcb6 or 0x5f37bcb5. There is a fair deviation here that is unjustified. However insignificant it may be there is no explanation for the inconsistency in the article.
As a side note, Id like to see the derivation of either number from the uniform norm. I attempted to find my own optimal with the minimal square error approach and the error is actually larger. Some explanation as to why the uniform norm is the best one, or the better one at least, would be reasonable for this article. 50.34.41.50 ( talk) 16:55, 8 May 2021 (UTC)
The inverse of the square root function is surely the square. The function in this article might be called the "Reciprocal of the Square Root" but "inverse square root" is used elsewhere. -- 82.3.131.192 ( talk) 18:02, 24 May 2021 (UTC)
Your revert reintroduces a grammatical error, an error that "The following code follows C standards" but then the code that was given still used illegal unions. I had restructured the text to make this more clear. Reintroduces the error that that memcpy would be slower or was more memory infefficient which is not true. And reintroduces the error that an extra variable would incur additional cost. That was at least misleading.
If you are of the opinion that there is too much bloat then we should remove the sample code with the union, since that is not actually an improvement over the original code of Quake III Arena.
While the Youtube video is not a reliable source, the author is a well known expert and has published best practices books on C++, and the same information as in the source can be found on Wikipedia itself Type punning. I encourage you to find a better link instead of blanket reverting 5 different constructive edits. Kwinzman ( talk) 08:41, 22 October 2021 (UTC)
I've removed reference to a similar approximation developed simultaneously to Kahan and Ng's in 1986 because I can't seem to find a reference which exists that actually talks about it. I also cannot verify that the cited reference in the article (A New Generation of Hardware and Software for the FPS T Series) supports the claim in text.
There is a modern mention by Kinney in a submission to a conference. The submission was not accepted, but more importantly for us if you read the section it doesn't say anything more than "In 1986, one of the authors of this paper originally invented this method, which was called “The Kinney Method,” to implement vector square root for the production FPS T Series Hypercube Supercomputer (Gustafson, 1986). William Kahan and K.C. Ng at Berkeley also independently discovered this around 1986." The Gustafson 1986 cite is to a white-paper which doesn't mention square roots (reciprocal or otherwise) or use of floating point / integer storage to gain an approximate logarithm. I am not willing to say I've exhausted all the published literature but I've made a good faith effort to find any contemporary account of it and I can't find any.
Accordingly I have removed the material. If you want to restore it please consider posting as to why in this discussion. Thank you. Protonk ( talk) 19:42, 17 January 2022 (UTC)
Using of union dosent remove undef behaviour from the code. The trick of using unions as typecast is comonly used in code, nearly all C programers use it, but the C standart defines it as undefined behaviout, only reading from the same type you wrote before is defined for enums. Plus the required space to save enum is equal to the bigest size among used variables/structs in the enum. Saving to one and reading from second member of enum is NOT defined - we only assune the compiler is clever and dont do some tricks with it. There is no need for compiler to do this, it will harm the efectivity of it, so obviously this trick works... But it is NOT defined by C standart, so it dosent remove the undef vehaviour from the code. 2A00:102A:5004:A4B6:F584:6CC9:A4F0:4CE6 ( talk) 09:51, 26 January 2022 (UTC)
Here /info/en/?search=Fast_inverse_square_root#Zero_finding about Halley's method you can read that it converges cubicly while newton method converges quadraticly (square). But in Halley's method need each time one division operation. So basicly for one Halley's method iteration need two iterations... Say first number chosen with precision 0.001. Then with Newton's method after two iterations precision will be (0.001^2)^2= 0.000000000001. And with Halley's method after one iteration (which takes time like two Newton's method iterations) precision will be 0.001^3=0.000000001. Also I think that what is written here "Subsequent additions by hardware manufacturers have made this algorithm redundant for the most part. For example, on x86, Intel introduced the SSE instruction rsqrtss in 1999. In a 2009 benchmark on the Intel Core 2, this instruction took 0.85ns per float compared to 3.54ns for the fast inverse square root algorithm, and had less error." is comparing SSE single precision with x87 FPU double precision. But I recently realise, that from here /info/en/?search=Division_algorithm#Newton–Raphson_division division operation needs about the same time as calulating inverse square root with Newton's method. So basicly it's very unclear from where comes this Intel SSE instruction ~4x speed by using Halley's method? Paraboloid ( talk) 09:58, 10 May 2024 (UTC)