![]() | This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 | Archive 4 |
03:33, 26 October 2005 Oleg Alexandrov m (I don't find the octave code particularly relevant;)
It's very relevant. Actually calculating this is interesting, not just knowing how to calculate it with pen and paper. -- marvinXP ( talk)
The normal equation forms the matrix product of A-transpose and A. This forms a new, square matrix C. This new square matrix (C) is a Normal matrix. A Normal matrix has the property that its product with its transpose is the same whether pre or post multiplied. This makes/ensures the matrix symetric and at least positive semidefinate (usually positive definate).
It is easy to confuse the form for the normal equation and normal matrix if both refer to a generic matrix using the same symbol 'A'. It is the symetric positive semidefinate property (and its consequences) that is 'Normal'.
Philip Oakley ( 84.13.249.47) 22:37, 12 April 2006 (UTC)
In that this is an encyclopedia, i.e. a place where people go to understand concepts they currently don't, I feel that the "explanation" of this term is utterly too complex and filled with mathematical jargon. Not that it doesn't belong, but there needs to be introductory material geared towards newcomers to this material. If I wanted a mathematical proof or advanced applications I would probably consult a book on statistics.
As an example of a better and more appropriate introduction geared towards mathematical newcomers, the wikipedia entry for "Method of least squares" strikes me as well written and clear. —The preceding unsigned comment was added by 161.55.228.176 ( talk) 18:17, 28 September 2006 (UTC).
I couldn't agree more with the immediately foregoing comments. One thing that would help greatly IMHO would be to use standard notation, consistent with the usual applications of these ideas, as shown in
Regression analysis. It is usual to refer to the data or inputs as the X matrix, the unknown parameters as Beta or B, and the dependent or outputs as Y. Accordingly, the normal equations should be expressed in a format like:
-. Y = X B <=> B = (X' X) X' Y n 1 n k 1 k 1 k n k n 1
where the subscripts are the row and column dimensions, identified to show how they must match, and -. indicates generalized inverse. -- Mbhiii 18:48, 27 October 2006 (UTC)
Internal consistency should not be the only criterion, but external consistency, as much as possible, with the dominant practical uses of the day, should be very important, as well, because it increases recognizability and, therefore, the general utility of the article. The intersubjective standard of reference that Wikipedia represents, and is slowly increasing, requires no less. This argues for the X matrix form. I know the subscript notation is new, but it's very useful. It's a revision of Einstein's matrix notation. (He used one subscript and one superscript per matrix, which conflicted with exponents.) Though very gifted in abstract visualization, he was not always the best student, and non commutative matrix multiplication gave him headaches in long calculations. This cleaned up version of his notation is gaining acceptance among those teaching matrix algebra and its applications. Simply identify the left subscript as the number of rows and the right one as number of columns, and all is clear. -- Mbhiii 14:28, 31 October 2006 (UTC)
An X matrix form for the normal equations increases the article's external consistency. Revised Einstein matrix notation is for clarity, an aid to people like students or programmers who don't necessarily use matrix algebra all the time. By the way, Einstein matrix notation is not Einstein summation notation. -- Mbhiii 13:55, 1 November 2006 (UTC)
Zvika, perhaps I am overexplaining, but as a master's in physics I didn't see the equality of the middle terms right away. It seemed useful - to me, and probably to some other users as well - to explain that step it in some more detail. Pallas44 13:06, 16 November 2006 (UTC)
Could someone please explain in detail how:
Thanks, appzter 23:14, 15 January 2007 (UTC)
hello, i have a problem understanding one aspect of this
on the previous page after the multiplication of (AX-b)T.(AX-b)
its says "The two middle terms bT(Ax) and (Ax)Tb are equal"
i think this is incorrect as
The two middle terms bT(Ax) = ((Ax)Tb)T are equal can someone please clarify this, as if i am correct the derivation would be different
thanks! that is what i was asking about!
I was redirected ftom Normal equations to this page (Linear least squares) but the meaning of the Normal equations is not explained or defined here. Should the redirection be canceled? Kotecky ( talk) 14:50, 11 January 2008 (UTC)
Please see talk:least squares#A major proposal for details, which include a proposed extensive revision of this article. Petergans ( talk) 17:32, 22 January 2008 (UTC)
The contents of this page have been replaced. For discussion see talk:least squares#A major proposal Petergans ( talk) 09:22, 30 January 2008 (UTC)
I am very unhappy with the recent rewrite to the article. What used to be a simple topic about an overdetermined linear system and some calculus became a full-blown theoretical article on fitting experimental data. While it is the latter where least squares are most used, the way the article is now it is incomprehensible except to the specialist.
Ideally the first part of the article would be the origianl article, and the more complex stuff be later. If the author of the rewrite is not willing to do that, however, I propose a wholesale revert. That may lose valuable information and the insights of a specialist, but it is better than the current incomprihensible thing. Oleg Alexandrov ( talk) 15:51, 31 January 2008 (UTC)
which contain more technical details, but it has sufficient detail to stand on its own.
In addition Gauss-Newton algorithm has been revised. The earlier article contained a serious error regarding the validity of setting second derivatives to zero. Points to notice include:
This completes the fist phase of restructuring of the topic of least squares analysis. From now on I envisage only minor revision of related articles. May I suggest that comments relating to more than one article be posted on talk: least squares and that comments relating to a specific article be posted on the talk page of that article. This note is being posted an all four talk pages and Wikipedia talk:WikiProject Mathematics.
Petergans ( talk) 09:43, 8 February 2008 (UTC)
I had moved QR and SVD to non-linear least squares before reading the discussion above. I apologise for that. I had not realized that ill-conditioning was as important in linear least squares as this discussion would seem to indicate that it is. Where do you folks suggest that they should be, bearing in mind that we don't want to repeat too much material in the linear and non-linear articles? I'm not in favour of creating a separate article. It's a pity that WP does not have a "small print" option for the more technical bits.
I think that it is necessary to distinguish two type of ill-conditioning: intrinsic and non-intrinsic. An example of intrinsic ill-conditioning comes with fitting high-order polynomials, where the normal equations matrix is a vandermonde matrix, which is intrinsically ill-conditioned for large n. In that case the remedy is to re-cast the problem in terms of fitting with orthogonal polynomials. With the example of 50 basis functions, is it sensible to determine all of them simultaneously? Surely a lot of them are "known" from simpler systems?
In non-intrinsic cases, either the model is inadequate or the data don't define the parameters, or both. In my opinion it is futile to seek a mathematical remedy for this kind of problem. If the data cannot be improved the parameters of that model cannot be determined. Petergans ( talk) 16:24, 11 February 2008 (UTC)
I hope somebody clarify the example. They jump from computing r1 = 0.42, r2 = -0.25, r3 = 0.07, r4 = -0.24, and S = 0.305 to say that alpha = 2.6 +- 0.2 and beta = 0.3 +- 0.1. Where 0.2 and 0.1 come from? Tesi1700 ( talk) 00:16, 17 February 2008 (UTC)
Whether or not something is "too hard to understand" is a matter of personal opinion. My object in re-casting the section of straight line fitting was to give an example without the use of matrix algebra. Matrix algebra is, in my opinion, less accessible to non-mathematicians. It is essential to include standard deviations right from the start as, in data fitting, the parameter estimates are meaningless without them. Petergans ( talk) 16:25, 18 February 2008 (UTC)
It's not so much that we disagree, it's more that we have quite different backgrounds. What you say about standard deviations does not apply in quantitative sciences, where the topic of measurement errors and their consequences is of fundamental importance. The calibration curve illustrates the point. See how each datum is expressed as a value and an error bar. When an unknown analyte is measured the value is taken from the straight line parameters, and the error is propagated from the errors on the parameters and the correlation coefficient between them. You will find this treatment in elementary texts on analytical chemistry. Petergans ( talk) 08:59, 19 February 2008 (UTC)
I have been through the "Straight Line Fitting" section over and over and I still do not understand something. If you apply an offset 'a' to the x variable, which is equivalent to shifting the data to the right or left, then the 1 sigma error estimates on alpha and beta should not change, right? This ends up being true for the sigma(beta) calculation since D,m and S do not change when you apply an offset to x. However, sigma(alpha) does change, since Sx2 changes when you change the x values. Is it possible that the Sx2 in the sigma(alpha) calculation should be the sum of the square of the deviation of x from the mean of x? Perhaps a reference to the derivation of this parameter would help. -- Jonny5cents2 ( talk) 09:08, 10 March 2008 (UTC)
I think a picture showing that the connection between the the least squares solution of a system and the orthogonal projection of on to . For me at least, a picture like that was what made understand the concept of a least-squares solution. —Preceding unsigned comment added by Veddan ( talk • contribs) 16:51, 25 March 2008 (UTC)
To Peter: At non-linear least squares it is good to write things in a certain way to make the point about not being able to find the minimum in one step.
At this linear least squares article, this is is not necessary. Going through the same motion as at the other article for the sake of consistency is not a good idea. The order you chose does not flow well. Oleg Alexandrov ( talk) 16:42, 31 March 2008 (UTC)
The picture may lead the reader to think that the dependence on x is linear. That is not necessarily so as now made clearer in this edit. Possibly the picture could be replaced by one that builds the solution as a curve not a straight line out of some basis functions. Jmath666 ( talk) 16:22, 4 April 2008 (UTC)
Jmath, you are guilty of a commonplace confusion. Linear regression is linear regardless of whether the dependence on x is on the one hand linear or affine, or other other hand more complicated. The term "linear" in "linear regression" is not about the nature of the dependence on x. If you fit a parabola
by ordinary least squares, then that's linear regression. The dependence of the vector of least-squares estimates of a, b, and c upon the vector of y-values is linear. Michael Hardy ( talk) 15:01, 20 April 2008 (UTC)
It is quite wrong to suggest that the solution to a linear least squares problem is an approximation. When the Gauss-Markov-Aitken conditions apply it is a minimum variance solution. The variances on the parameters are part of the least squares solution. When the probability distribution of the derived parameters is known, uncertainty in them can be expressed in terms of confidence limits. There are no approximations involved unless the probability distribution of the parameters is approximated, say, by a Student's t distribution, but that only affects the confidence limits; the standard deviations on the parameters are dependent only on the number and precision of the dependent variables and the values of the independent variable(s); they are independent of the probability distribution of the experimental errors. In science the model function is an expression of a physical "law". In regression analysis the model function is in effect a postulate of an empirical relationship. In neither case is the model function an approximation except in the sense that the underlying "law" or relationship may be an approximation to reality. The residual vector is given as : the objective function is not an approximation. Petergans ( talk) 09:14, 15 April 2008 (UTC)
Sorry, Oleg, you and the other mathematicians are still missing the point. True, a least squares solution is an approximation from the mathematical point of view, but the experimentalist sees things differently - it is a best-fit solution. The difference arises from the fact that the experimenter has control over the number, precision and disposition of the data points, so that the matrix X is not simply a given quantity. No approximations are made in deriving the least squares solution, but the derived parameter values, errors and correlation coefficients will depend on the qualities of the measurements. For that reason it is wrong to treat the topic as a purely mathematical one and it is potentially confusing to call the best fit an approximate solution. Petergans ( talk) 14:28, 20 April 2008 (UTC)
Peter removed all information about the fact that fitting a linear model to data is the same as solving an overdetermined linear system. That's a pity, since I believe that it is very important to write the linear least squares problem in the language of linear algebra before using the machinery of linear algebra to solve it. The references support my point of view, see [3], [4], [5], [6], [7], [8]. The above are just the first several references from google books, they all have the linear system formulation. Oleg Alexandrov ( talk) 03:11, 30 April 2008 (UTC)
The methods of orthogonal decomposition do not use the normal equations at all, so it is wrong to place these methods as a subsection of "Solving the normal equations". I was at pains to re-organise the article so as to make this clear. I am reverting in order to restore that article structure. Petergans ( talk) 09:44, 3 May 2008 (UTC)
Oleg, the results of your recent tinkering are absolutely awful. It appears that you have not fully understood why the article needed to be restructured. This is the reason. The normal equations method and the orthogonal decomposition methods are different ways of minimizing the sum of squared residuals. The minimization is performed in order to best-fit the observed data, that is, to reduce the overall difference between observed and calculated data to its smallest value. I hope I have made this even clearer in the current revision.
I has been deemed necessary to simplify the structure of the article as a whole. I have taken the opportunity to make minor improvements in the later sections. Please look carefully at the article as a whole before you consider making any further changes.
BTW the second paragraph that you added to the lead-in is inappropriate for an article about linear least squares. That’s why it was removed. Petergans ( talk) 11:23, 4 May 2008 (UTC)
I agree that the point about a set of overdetermined equations is a good mathematical point. However, this article is about experimental data, not mathematics. If you know of a scientific or mathematical topic in which a set of overdetermined linear equations are generated other than by experimental measurements, then that topic would merit a completely different section, properly sourced to literature in the public domain. If no such topic exists, the point has only theoretical value and as such is not a useful part of the problem statement. I am open to persuasion by reasoned argument, but personal preference alone does not constitute a reason. Petergans ( talk) 09:19, 5 May 2008 (UTC)
That's because i) Björck is a mathematician and ii) he has the space to explain everything in detail. My point is that that practitioners may be confused by unneccessary maths. is not acceptible because it may appear to contradict the derivation of the normal equations etc. As you have written before, it is a property of the solution. Petergans ( talk) 07:45, 9 May 2008 (UTC)
Peter, if the "practitioners" get confused by (which can be explained carefully, like Björck does), then this article will be as good as useless to them, as the mathematics becomes very heavy very quickly later on in the article.
Also, people whose bread and butter is heavy use of data fitting for problems they encounter in experiments, they will just use an excel plugin, or something, without bothering to understand how the math works in this article (experimental people have better things to do with their time than understanding all low-level details of every tool they use).
The people who will truly benefit from this article are "theoreticians", whose concern is not to fit mountains of data quickly, but who develop the methods practitioners then use. Oleg Alexandrov ( talk) 15:13, 9 May 2008 (UTC)
The additional short paragraph showing the formulation in terms of the linear system is not going to confuse anybody, since it will be at the end of the section, and that text is very simple (yes, even for experimental people). The formulation in terms of the linear system is very much supported by references, and as Jheald points out, not all linear least squares problems come from data fitting (but all linear least squares problems reduce to an overdetermined linear system).
Your claim that it will make people confuse errors and residuals is weak, the meaning of is very clear from the context, and besides, if anybody, data fitting people know very well not to confuse the two.
Lastly, your claim that people don't know about matrices is very weak too, if a science student takes any two math courses in college (and they will take probably more), those two courses will be calculus and linear algebra. Matrices are a very fundamental and simple concept in the sciences, and the natural setting in which to explain this article. Oleg Alexandrov ( talk) 15:58, 9 May 2008 (UTC)
My two cents: I think there is a constituency of people who know linear algebra but not statistics. For this constituency, the ideal explanation of linear least squares is as the solution of an overdetermined system y = M x, really meaning the solution of y' = M x, where y' is the projection of y onto the image of M, which is the "closest" solvable system (in the least squares" a.k.a. Euclidean sense) to the original one. That this approach is pedagogically reasonable is reinforced by its appearance in intro linear algebra texts such as Shifrin & Adams and Bretscher. More to the point, Wikipedia is a reference work, for people who know math as well as those who don't, and this is useful material that can be presented concisely, as fits a reference work. I strongly urge the editors of this article to keep a linear-algebraic explanation. Joshua R. Davis ( talk) 03:03, 14 May 2008 (UTC)
In the sense of the comment above, I would now prefer the specific example to precede the general solution rather than follow it, though that would be a less logical order. It would delay the introduction of matrix notation which would be an advantage to those readers who are not familiar with it, such as (I imagine) biologists and others of that ilk. What is the general feeling? Petergans ( talk) 08:57, 14 May 2008 (UTC)
Before we even discuss that, I would like to point out that the current article structure is not right.
* 1 Problem statement and solutions o 1.1 Normal equations method + 1.1.1 General solution + 1.1.2 Specific solution, straight line fitting, with example o 1.2 Orthogonal decomposition methods * 2 Weighted linear least squares
The normal equations are not just a "method". That section establishes that the linear least squares problem
None of these hold for non-linear least squares, and this section is the foundation of the article. Without the "Normal equations" section nothing in this article makes any sense, including the "orthogonal decomposition methods". So, if we think of this article as a tree, the normal equations belong in the trunk, not in a branch.
Here is my proposed sectioning.
* The linear least squares problem * Derivation of the normal equations * Example * Solving the linear least squares problem * Using the normal equations * Orthogonal decomposition methods * Weighted linear least squares
Comments? Oleg Alexandrov ( talk) 15:28, 14 May 2008 (UTC)
Good point, I did not realize that this method was proving the existence and uniqueness along the way. I still think though that the current order is quite convoluted. How about this:
* The linear least squares problem * Example? * Solving the linear least squares problem * Using the normal equations * Orthogonal decomposition methods * Weighted linear least squares
This is along the lines of what Peter mentioned too. The example could be simplified from the theoretical situation with n points and a line (made particular later to 4 points). It would still use the normal equations, but in a particular case (and by the way, I think the normal equations are still easier to understand than the orthogonal decomposition, which requires serious matrix theory). Oleg Alexandrov ( talk) 16:57, 14 May 2008 (UTC)
In the section "Properties of the least-squares estimators" the term "conformal vector" is used. I could not find a definition here or at Least squares. I know what "conformal" means in geometry, but what does it mean here? Joshua R. Davis ( talk) 13:03, 15 May 2008 (UTC)
I am quite unhappy with the example in the article. First, it is very theoretical, with n points (an example should be as simple as possible, while keeping the essence). Second, it gives no insights into how the parameters alpha and beta are found, it just plugs in numbers into the normal equation which is just another lengthy calculation below. Third, the standard errors are too advanced topic here, they belong at linear regression instead (the goal of this article is to minimize a sum of squares of linear functions). I propose to start with a line and three points to fit, and do a simple re-derivation of the normal equations for this particular setting, and avoiding the standard errors business. Comments? Oleg Alexandrov ( talk) 15:41, 15 May 2008 (UTC)
It seems to me that the article should start by saying that least squares aims to find a "best" solution to an overdetermined system
and that such a system can arise in many ways, for instance in trying to fit a set of basis functions φj
I find that a much more straightforward, easy to understand starting point, because the equations are less fussy and less cluttered; and I think because it gives a broader, more generally inclusive idea of what least squares actually is.
I reject the idea that starting with the matrix notation is too telescoped and too succinct. To people familiar with it (which, let's face it, is pretty much anyone who's done any maths after the age of eleven), the shorter, terser form is much easier for the brain to recognise and chunk quickly. While to people who are slightly rusty, the short terse form is immediately unpacked and made more concrete. And in a production version of the article, one would probably immediately make it more concrete still, by saying eg:
And then explain the least squares criterion for bestness...
That seems to me much more close to the pyramid structure an article should be aiming for: start with something quite short, simple and succinct; and then steadily unpack it and slowly add detail. I think it gives a much cleaner start, starting from a birds-eye overview, than the present text which throws you stright into the morass of
That's my opinion, anyway. Jheald ( talk) 09:41, 16 May 2008 (UTC)
Here is a draft proposed rewrite of the current article. I followed the Bjorck book. These are the assumptions that guided me:
Comments? Oleg Alexandrov ( talk) 03:34, 17 May 2008 (UTC)
I copied the proposed version in the article. I think more work is needed, as the prose and article structure is clumsy in places. I also fully support Joshua's proposal to simplify the example and move it before the theory. Oleg Alexandrov ( talk) 17:44, 18 May 2008 (UTC)
Any first year undergraduate would be severely reprimanded for writing
since it is obvious that five significant figures are not merited by the data.
This is emphatically the wrong example to give to inexperienced readers since it gives the impression that very precise results can be obtained from poor data. As I have repeatedly stated, no result is meaningful without an estimate of error and this should be made clear even in the simplest example.
In this case, since the observed values have only one significant figure, the results have at most two significant figures. The precision of the parameters is a little greater than the precision of the observations because the parameters are overdetermed 2:1.
At the very least there should be a link to the determination of standard deviation, or a footnote giving the sd values. Petergans ( talk) 09:11, 21 May 2008 (UTC)
I do agree that parameter estimates are very important to any real-world data fitting computation, however, that could be too much for a first example, and besides, parameter estimation is not part of the linear least squares problem itself, which is concerned with finding the best fit location only (parameter estimates belong to linear regression).
Joshua, I plan to later modify the data a bit (now almost the same numbers show both as x and y values which is confusing). Then I will also remake the picture and show the errors. I'll try to find some time for this later this week. Oleg Alexandrov ( talk) 15:29, 21 May 2008 (UTC)
Let's get one thing straight. There is no real difference between linear least squares and linear regression when the latter uses the the least squares method to obtain the parameters. The different names are an historical fact, resulting from the employment of the principle of least squares by different categories of user. In the WP context both articles must take experimental error into consideration.
Experimental error is not just a "statistical aspect" of the data. All measurements are subject to error. The fact of experimental error is fundamental to any problem of data fitting. The principal reason for collecting more data points than are absolutely needed, that is, for applying the least squares method in the first place, is to improve the precision of the derived parameters. I raise the issue of significant figures, not as a matter for debate, but to give the editor the opportunity to correct his mistake in his own way; otherwise I will have to do it myself. Petergans ( talk) 07:06, 22 May 2008 (UTC)
The recent re-ordering is the wrong way round. Weighted least squares is more general, so I suggest the properties section should be modified to take account of this and then the old order makes more sense. The modification could be as simple as adding "assuming unit weights", but I would prefer expressions that include weights. Petergans ( talk) 16:28, 25 June 2008 (UTC)
The section about "Inverting the normal equations" warns that "An exception occurs...". As someone trying to learn from the article, this sentence seems unnecessarily vague. Someone (like me) who knows a little math, knows that if the matrix has an inverse, it is unique. Is the article trying to indicate a computational problem, like rounding, or something else? If you are saying that computing equipment cannot calculate the inverse to an acceptable precision, then say that. Stephensbell ( talk) 18:20, 23 October 2008 (UTC)
Certain editors are changing the symbol for transpose from T () with ' (something which doesn't work at all properly within <math> tags, so I'm not going to try). I think this requires consensus both here and at the math notation page. — Arthur Rubin (talk) 22:31, 6 November 2008 (UTC)
Writing
for the transpose of the matrix A is done far more frequently in the statistics literature than is the superscript-T notation for transpose. It's also done in Herstein's Topics in Algebra, but the superscript-T is used in Weisberg's Applied Linear Regression, so I privately jocularly thing of the superscript-T notation as the Weisberg notation and the prime notation as the Herstein notation. Michael Hardy ( talk) 17:34, 7 November 2008 (UTC)
\top
myself, but don't really care as long as it is consistent. I do however think the excessive bolding is very ugly: I think it should be removed (unless anyone objects, I may do this in the next couple of days).
3mta3 (
talk)
17:14, 6 April 2009 (UTC)
As an MD with an interest in mathematics, and some background knowledge of linear algebra, I was trying to read this article from its beginning to the section entitled 'Inverting the normal equations'. Several points were decidedly unclear to me: 1. The alphas from the motivational example are gone in the section on 'The general problem'. I guess they are simply substracted from the yi, but this was confusing. 2. The section on "Uses in data fitting' ends by saying 'The problem then reduces to the overdetermined linear system mentioned earlier, with Xij = φj(xi).' This is unclear to me, because in the 'General problem' section it is said that the overdetermined linear system usually has no solution. The data fitting procedure, on the other hand, does come up with a solution. So, I would think that the fitting problem we start with is an overdetermined system, and the data fitting procedure comes up with the "best" solution. At the point where it is said that 'The problem then reduces to the overdetermined linear system mentioned earlier', in reality we have left behind that overdetermined linear system already, in order to find the approximate solution. 3. In the 'Derivation of the normal equations' section, and despite a little knowledge of matrix algebra, it was unclear to me how the normal equations are "translated' in matrix notation.
I apologize for my nonprofessional view of these matters, but, then, these encyclopedia articles are meant for people who do not know everything about the subject. So I thought it would be helpful to let know how the article feels to a nonmathematician, nonphysicist reader. Frandege ( talk) 21:47, 28 November 2008 (UTC)
To JMath: Thank you for looking into this; I believe the article benefited a great deal from the changes. I am still not very sure about the alphas which have now become beta1. My difficulty is that the equation which starts the 'General problem' section (: ) does not allow for terms without X-values. This stands in contrast to the section 'Uses in data fitting', where it is easy to conceive : for all x in the equation :. I can see that this point bears little on the further derivation, but it would be preferable to get rid of this inconsistency. To Flavio Guitan: for (1) see my comment above, for (2) this is clear now, for (3) thank you for pointing this out - I should have seen that, but got jumbled up with the transpose sign. Frandege ( talk) 19:41, 30 November 2008 (UTC)
Thank you again. The confusion arises from the use of Xij, which I confounded with the x's used as abscissa in the second plot on the article page. I can see now that the Xijs are nothing else than the φj(xi), where the xi are the abscissas in the plot. As I stated yesterday the alpha from the previous version (our current beta1) correspond to φj=1. In the example given in the first plot of the article, there would also be a beta2, corresponding to φj(xi)=xi^2. I am still tempted to think that the article would be clearer if the' general problem' were only described where and when it arises, i.e. in the data fitting.
I fix a minor inconsistency (the alpha was still mentioned in one sentence. Frandege ( talk) 19:24, 1 December 2008 (UTC)
![]() | This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 | Archive 4 |
03:33, 26 October 2005 Oleg Alexandrov m (I don't find the octave code particularly relevant;)
It's very relevant. Actually calculating this is interesting, not just knowing how to calculate it with pen and paper. -- marvinXP ( talk)
The normal equation forms the matrix product of A-transpose and A. This forms a new, square matrix C. This new square matrix (C) is a Normal matrix. A Normal matrix has the property that its product with its transpose is the same whether pre or post multiplied. This makes/ensures the matrix symetric and at least positive semidefinate (usually positive definate).
It is easy to confuse the form for the normal equation and normal matrix if both refer to a generic matrix using the same symbol 'A'. It is the symetric positive semidefinate property (and its consequences) that is 'Normal'.
Philip Oakley ( 84.13.249.47) 22:37, 12 April 2006 (UTC)
In that this is an encyclopedia, i.e. a place where people go to understand concepts they currently don't, I feel that the "explanation" of this term is utterly too complex and filled with mathematical jargon. Not that it doesn't belong, but there needs to be introductory material geared towards newcomers to this material. If I wanted a mathematical proof or advanced applications I would probably consult a book on statistics.
As an example of a better and more appropriate introduction geared towards mathematical newcomers, the wikipedia entry for "Method of least squares" strikes me as well written and clear. —The preceding unsigned comment was added by 161.55.228.176 ( talk) 18:17, 28 September 2006 (UTC).
I couldn't agree more with the immediately foregoing comments. One thing that would help greatly IMHO would be to use standard notation, consistent with the usual applications of these ideas, as shown in
Regression analysis. It is usual to refer to the data or inputs as the X matrix, the unknown parameters as Beta or B, and the dependent or outputs as Y. Accordingly, the normal equations should be expressed in a format like:
-. Y = X B <=> B = (X' X) X' Y n 1 n k 1 k 1 k n k n 1
where the subscripts are the row and column dimensions, identified to show how they must match, and -. indicates generalized inverse. -- Mbhiii 18:48, 27 October 2006 (UTC)
Internal consistency should not be the only criterion, but external consistency, as much as possible, with the dominant practical uses of the day, should be very important, as well, because it increases recognizability and, therefore, the general utility of the article. The intersubjective standard of reference that Wikipedia represents, and is slowly increasing, requires no less. This argues for the X matrix form. I know the subscript notation is new, but it's very useful. It's a revision of Einstein's matrix notation. (He used one subscript and one superscript per matrix, which conflicted with exponents.) Though very gifted in abstract visualization, he was not always the best student, and non commutative matrix multiplication gave him headaches in long calculations. This cleaned up version of his notation is gaining acceptance among those teaching matrix algebra and its applications. Simply identify the left subscript as the number of rows and the right one as number of columns, and all is clear. -- Mbhiii 14:28, 31 October 2006 (UTC)
An X matrix form for the normal equations increases the article's external consistency. Revised Einstein matrix notation is for clarity, an aid to people like students or programmers who don't necessarily use matrix algebra all the time. By the way, Einstein matrix notation is not Einstein summation notation. -- Mbhiii 13:55, 1 November 2006 (UTC)
Zvika, perhaps I am overexplaining, but as a master's in physics I didn't see the equality of the middle terms right away. It seemed useful - to me, and probably to some other users as well - to explain that step it in some more detail. Pallas44 13:06, 16 November 2006 (UTC)
Could someone please explain in detail how:
Thanks, appzter 23:14, 15 January 2007 (UTC)
hello, i have a problem understanding one aspect of this
on the previous page after the multiplication of (AX-b)T.(AX-b)
its says "The two middle terms bT(Ax) and (Ax)Tb are equal"
i think this is incorrect as
The two middle terms bT(Ax) = ((Ax)Tb)T are equal can someone please clarify this, as if i am correct the derivation would be different
thanks! that is what i was asking about!
I was redirected ftom Normal equations to this page (Linear least squares) but the meaning of the Normal equations is not explained or defined here. Should the redirection be canceled? Kotecky ( talk) 14:50, 11 January 2008 (UTC)
Please see talk:least squares#A major proposal for details, which include a proposed extensive revision of this article. Petergans ( talk) 17:32, 22 January 2008 (UTC)
The contents of this page have been replaced. For discussion see talk:least squares#A major proposal Petergans ( talk) 09:22, 30 January 2008 (UTC)
I am very unhappy with the recent rewrite to the article. What used to be a simple topic about an overdetermined linear system and some calculus became a full-blown theoretical article on fitting experimental data. While it is the latter where least squares are most used, the way the article is now it is incomprehensible except to the specialist.
Ideally the first part of the article would be the origianl article, and the more complex stuff be later. If the author of the rewrite is not willing to do that, however, I propose a wholesale revert. That may lose valuable information and the insights of a specialist, but it is better than the current incomprihensible thing. Oleg Alexandrov ( talk) 15:51, 31 January 2008 (UTC)
which contain more technical details, but it has sufficient detail to stand on its own.
In addition Gauss-Newton algorithm has been revised. The earlier article contained a serious error regarding the validity of setting second derivatives to zero. Points to notice include:
This completes the fist phase of restructuring of the topic of least squares analysis. From now on I envisage only minor revision of related articles. May I suggest that comments relating to more than one article be posted on talk: least squares and that comments relating to a specific article be posted on the talk page of that article. This note is being posted an all four talk pages and Wikipedia talk:WikiProject Mathematics.
Petergans ( talk) 09:43, 8 February 2008 (UTC)
I had moved QR and SVD to non-linear least squares before reading the discussion above. I apologise for that. I had not realized that ill-conditioning was as important in linear least squares as this discussion would seem to indicate that it is. Where do you folks suggest that they should be, bearing in mind that we don't want to repeat too much material in the linear and non-linear articles? I'm not in favour of creating a separate article. It's a pity that WP does not have a "small print" option for the more technical bits.
I think that it is necessary to distinguish two type of ill-conditioning: intrinsic and non-intrinsic. An example of intrinsic ill-conditioning comes with fitting high-order polynomials, where the normal equations matrix is a vandermonde matrix, which is intrinsically ill-conditioned for large n. In that case the remedy is to re-cast the problem in terms of fitting with orthogonal polynomials. With the example of 50 basis functions, is it sensible to determine all of them simultaneously? Surely a lot of them are "known" from simpler systems?
In non-intrinsic cases, either the model is inadequate or the data don't define the parameters, or both. In my opinion it is futile to seek a mathematical remedy for this kind of problem. If the data cannot be improved the parameters of that model cannot be determined. Petergans ( talk) 16:24, 11 February 2008 (UTC)
I hope somebody clarify the example. They jump from computing r1 = 0.42, r2 = -0.25, r3 = 0.07, r4 = -0.24, and S = 0.305 to say that alpha = 2.6 +- 0.2 and beta = 0.3 +- 0.1. Where 0.2 and 0.1 come from? Tesi1700 ( talk) 00:16, 17 February 2008 (UTC)
Whether or not something is "too hard to understand" is a matter of personal opinion. My object in re-casting the section of straight line fitting was to give an example without the use of matrix algebra. Matrix algebra is, in my opinion, less accessible to non-mathematicians. It is essential to include standard deviations right from the start as, in data fitting, the parameter estimates are meaningless without them. Petergans ( talk) 16:25, 18 February 2008 (UTC)
It's not so much that we disagree, it's more that we have quite different backgrounds. What you say about standard deviations does not apply in quantitative sciences, where the topic of measurement errors and their consequences is of fundamental importance. The calibration curve illustrates the point. See how each datum is expressed as a value and an error bar. When an unknown analyte is measured the value is taken from the straight line parameters, and the error is propagated from the errors on the parameters and the correlation coefficient between them. You will find this treatment in elementary texts on analytical chemistry. Petergans ( talk) 08:59, 19 February 2008 (UTC)
I have been through the "Straight Line Fitting" section over and over and I still do not understand something. If you apply an offset 'a' to the x variable, which is equivalent to shifting the data to the right or left, then the 1 sigma error estimates on alpha and beta should not change, right? This ends up being true for the sigma(beta) calculation since D,m and S do not change when you apply an offset to x. However, sigma(alpha) does change, since Sx2 changes when you change the x values. Is it possible that the Sx2 in the sigma(alpha) calculation should be the sum of the square of the deviation of x from the mean of x? Perhaps a reference to the derivation of this parameter would help. -- Jonny5cents2 ( talk) 09:08, 10 March 2008 (UTC)
I think a picture showing that the connection between the the least squares solution of a system and the orthogonal projection of on to . For me at least, a picture like that was what made understand the concept of a least-squares solution. —Preceding unsigned comment added by Veddan ( talk • contribs) 16:51, 25 March 2008 (UTC)
To Peter: At non-linear least squares it is good to write things in a certain way to make the point about not being able to find the minimum in one step.
At this linear least squares article, this is is not necessary. Going through the same motion as at the other article for the sake of consistency is not a good idea. The order you chose does not flow well. Oleg Alexandrov ( talk) 16:42, 31 March 2008 (UTC)
The picture may lead the reader to think that the dependence on x is linear. That is not necessarily so as now made clearer in this edit. Possibly the picture could be replaced by one that builds the solution as a curve not a straight line out of some basis functions. Jmath666 ( talk) 16:22, 4 April 2008 (UTC)
Jmath, you are guilty of a commonplace confusion. Linear regression is linear regardless of whether the dependence on x is on the one hand linear or affine, or other other hand more complicated. The term "linear" in "linear regression" is not about the nature of the dependence on x. If you fit a parabola
by ordinary least squares, then that's linear regression. The dependence of the vector of least-squares estimates of a, b, and c upon the vector of y-values is linear. Michael Hardy ( talk) 15:01, 20 April 2008 (UTC)
It is quite wrong to suggest that the solution to a linear least squares problem is an approximation. When the Gauss-Markov-Aitken conditions apply it is a minimum variance solution. The variances on the parameters are part of the least squares solution. When the probability distribution of the derived parameters is known, uncertainty in them can be expressed in terms of confidence limits. There are no approximations involved unless the probability distribution of the parameters is approximated, say, by a Student's t distribution, but that only affects the confidence limits; the standard deviations on the parameters are dependent only on the number and precision of the dependent variables and the values of the independent variable(s); they are independent of the probability distribution of the experimental errors. In science the model function is an expression of a physical "law". In regression analysis the model function is in effect a postulate of an empirical relationship. In neither case is the model function an approximation except in the sense that the underlying "law" or relationship may be an approximation to reality. The residual vector is given as : the objective function is not an approximation. Petergans ( talk) 09:14, 15 April 2008 (UTC)
Sorry, Oleg, you and the other mathematicians are still missing the point. True, a least squares solution is an approximation from the mathematical point of view, but the experimentalist sees things differently - it is a best-fit solution. The difference arises from the fact that the experimenter has control over the number, precision and disposition of the data points, so that the matrix X is not simply a given quantity. No approximations are made in deriving the least squares solution, but the derived parameter values, errors and correlation coefficients will depend on the qualities of the measurements. For that reason it is wrong to treat the topic as a purely mathematical one and it is potentially confusing to call the best fit an approximate solution. Petergans ( talk) 14:28, 20 April 2008 (UTC)
Peter removed all information about the fact that fitting a linear model to data is the same as solving an overdetermined linear system. That's a pity, since I believe that it is very important to write the linear least squares problem in the language of linear algebra before using the machinery of linear algebra to solve it. The references support my point of view, see [3], [4], [5], [6], [7], [8]. The above are just the first several references from google books, they all have the linear system formulation. Oleg Alexandrov ( talk) 03:11, 30 April 2008 (UTC)
The methods of orthogonal decomposition do not use the normal equations at all, so it is wrong to place these methods as a subsection of "Solving the normal equations". I was at pains to re-organise the article so as to make this clear. I am reverting in order to restore that article structure. Petergans ( talk) 09:44, 3 May 2008 (UTC)
Oleg, the results of your recent tinkering are absolutely awful. It appears that you have not fully understood why the article needed to be restructured. This is the reason. The normal equations method and the orthogonal decomposition methods are different ways of minimizing the sum of squared residuals. The minimization is performed in order to best-fit the observed data, that is, to reduce the overall difference between observed and calculated data to its smallest value. I hope I have made this even clearer in the current revision.
I has been deemed necessary to simplify the structure of the article as a whole. I have taken the opportunity to make minor improvements in the later sections. Please look carefully at the article as a whole before you consider making any further changes.
BTW the second paragraph that you added to the lead-in is inappropriate for an article about linear least squares. That’s why it was removed. Petergans ( talk) 11:23, 4 May 2008 (UTC)
I agree that the point about a set of overdetermined equations is a good mathematical point. However, this article is about experimental data, not mathematics. If you know of a scientific or mathematical topic in which a set of overdetermined linear equations are generated other than by experimental measurements, then that topic would merit a completely different section, properly sourced to literature in the public domain. If no such topic exists, the point has only theoretical value and as such is not a useful part of the problem statement. I am open to persuasion by reasoned argument, but personal preference alone does not constitute a reason. Petergans ( talk) 09:19, 5 May 2008 (UTC)
That's because i) Björck is a mathematician and ii) he has the space to explain everything in detail. My point is that that practitioners may be confused by unneccessary maths. is not acceptible because it may appear to contradict the derivation of the normal equations etc. As you have written before, it is a property of the solution. Petergans ( talk) 07:45, 9 May 2008 (UTC)
Peter, if the "practitioners" get confused by (which can be explained carefully, like Björck does), then this article will be as good as useless to them, as the mathematics becomes very heavy very quickly later on in the article.
Also, people whose bread and butter is heavy use of data fitting for problems they encounter in experiments, they will just use an excel plugin, or something, without bothering to understand how the math works in this article (experimental people have better things to do with their time than understanding all low-level details of every tool they use).
The people who will truly benefit from this article are "theoreticians", whose concern is not to fit mountains of data quickly, but who develop the methods practitioners then use. Oleg Alexandrov ( talk) 15:13, 9 May 2008 (UTC)
The additional short paragraph showing the formulation in terms of the linear system is not going to confuse anybody, since it will be at the end of the section, and that text is very simple (yes, even for experimental people). The formulation in terms of the linear system is very much supported by references, and as Jheald points out, not all linear least squares problems come from data fitting (but all linear least squares problems reduce to an overdetermined linear system).
Your claim that it will make people confuse errors and residuals is weak, the meaning of is very clear from the context, and besides, if anybody, data fitting people know very well not to confuse the two.
Lastly, your claim that people don't know about matrices is very weak too, if a science student takes any two math courses in college (and they will take probably more), those two courses will be calculus and linear algebra. Matrices are a very fundamental and simple concept in the sciences, and the natural setting in which to explain this article. Oleg Alexandrov ( talk) 15:58, 9 May 2008 (UTC)
My two cents: I think there is a constituency of people who know linear algebra but not statistics. For this constituency, the ideal explanation of linear least squares is as the solution of an overdetermined system y = M x, really meaning the solution of y' = M x, where y' is the projection of y onto the image of M, which is the "closest" solvable system (in the least squares" a.k.a. Euclidean sense) to the original one. That this approach is pedagogically reasonable is reinforced by its appearance in intro linear algebra texts such as Shifrin & Adams and Bretscher. More to the point, Wikipedia is a reference work, for people who know math as well as those who don't, and this is useful material that can be presented concisely, as fits a reference work. I strongly urge the editors of this article to keep a linear-algebraic explanation. Joshua R. Davis ( talk) 03:03, 14 May 2008 (UTC)
In the sense of the comment above, I would now prefer the specific example to precede the general solution rather than follow it, though that would be a less logical order. It would delay the introduction of matrix notation which would be an advantage to those readers who are not familiar with it, such as (I imagine) biologists and others of that ilk. What is the general feeling? Petergans ( talk) 08:57, 14 May 2008 (UTC)
Before we even discuss that, I would like to point out that the current article structure is not right.
* 1 Problem statement and solutions o 1.1 Normal equations method + 1.1.1 General solution + 1.1.2 Specific solution, straight line fitting, with example o 1.2 Orthogonal decomposition methods * 2 Weighted linear least squares
The normal equations are not just a "method". That section establishes that the linear least squares problem
None of these hold for non-linear least squares, and this section is the foundation of the article. Without the "Normal equations" section nothing in this article makes any sense, including the "orthogonal decomposition methods". So, if we think of this article as a tree, the normal equations belong in the trunk, not in a branch.
Here is my proposed sectioning.
* The linear least squares problem * Derivation of the normal equations * Example * Solving the linear least squares problem * Using the normal equations * Orthogonal decomposition methods * Weighted linear least squares
Comments? Oleg Alexandrov ( talk) 15:28, 14 May 2008 (UTC)
Good point, I did not realize that this method was proving the existence and uniqueness along the way. I still think though that the current order is quite convoluted. How about this:
* The linear least squares problem * Example? * Solving the linear least squares problem * Using the normal equations * Orthogonal decomposition methods * Weighted linear least squares
This is along the lines of what Peter mentioned too. The example could be simplified from the theoretical situation with n points and a line (made particular later to 4 points). It would still use the normal equations, but in a particular case (and by the way, I think the normal equations are still easier to understand than the orthogonal decomposition, which requires serious matrix theory). Oleg Alexandrov ( talk) 16:57, 14 May 2008 (UTC)
In the section "Properties of the least-squares estimators" the term "conformal vector" is used. I could not find a definition here or at Least squares. I know what "conformal" means in geometry, but what does it mean here? Joshua R. Davis ( talk) 13:03, 15 May 2008 (UTC)
I am quite unhappy with the example in the article. First, it is very theoretical, with n points (an example should be as simple as possible, while keeping the essence). Second, it gives no insights into how the parameters alpha and beta are found, it just plugs in numbers into the normal equation which is just another lengthy calculation below. Third, the standard errors are too advanced topic here, they belong at linear regression instead (the goal of this article is to minimize a sum of squares of linear functions). I propose to start with a line and three points to fit, and do a simple re-derivation of the normal equations for this particular setting, and avoiding the standard errors business. Comments? Oleg Alexandrov ( talk) 15:41, 15 May 2008 (UTC)
It seems to me that the article should start by saying that least squares aims to find a "best" solution to an overdetermined system
and that such a system can arise in many ways, for instance in trying to fit a set of basis functions φj
I find that a much more straightforward, easy to understand starting point, because the equations are less fussy and less cluttered; and I think because it gives a broader, more generally inclusive idea of what least squares actually is.
I reject the idea that starting with the matrix notation is too telescoped and too succinct. To people familiar with it (which, let's face it, is pretty much anyone who's done any maths after the age of eleven), the shorter, terser form is much easier for the brain to recognise and chunk quickly. While to people who are slightly rusty, the short terse form is immediately unpacked and made more concrete. And in a production version of the article, one would probably immediately make it more concrete still, by saying eg:
And then explain the least squares criterion for bestness...
That seems to me much more close to the pyramid structure an article should be aiming for: start with something quite short, simple and succinct; and then steadily unpack it and slowly add detail. I think it gives a much cleaner start, starting from a birds-eye overview, than the present text which throws you stright into the morass of
That's my opinion, anyway. Jheald ( talk) 09:41, 16 May 2008 (UTC)
Here is a draft proposed rewrite of the current article. I followed the Bjorck book. These are the assumptions that guided me:
Comments? Oleg Alexandrov ( talk) 03:34, 17 May 2008 (UTC)
I copied the proposed version in the article. I think more work is needed, as the prose and article structure is clumsy in places. I also fully support Joshua's proposal to simplify the example and move it before the theory. Oleg Alexandrov ( talk) 17:44, 18 May 2008 (UTC)
Any first year undergraduate would be severely reprimanded for writing
since it is obvious that five significant figures are not merited by the data.
This is emphatically the wrong example to give to inexperienced readers since it gives the impression that very precise results can be obtained from poor data. As I have repeatedly stated, no result is meaningful without an estimate of error and this should be made clear even in the simplest example.
In this case, since the observed values have only one significant figure, the results have at most two significant figures. The precision of the parameters is a little greater than the precision of the observations because the parameters are overdetermed 2:1.
At the very least there should be a link to the determination of standard deviation, or a footnote giving the sd values. Petergans ( talk) 09:11, 21 May 2008 (UTC)
I do agree that parameter estimates are very important to any real-world data fitting computation, however, that could be too much for a first example, and besides, parameter estimation is not part of the linear least squares problem itself, which is concerned with finding the best fit location only (parameter estimates belong to linear regression).
Joshua, I plan to later modify the data a bit (now almost the same numbers show both as x and y values which is confusing). Then I will also remake the picture and show the errors. I'll try to find some time for this later this week. Oleg Alexandrov ( talk) 15:29, 21 May 2008 (UTC)
Let's get one thing straight. There is no real difference between linear least squares and linear regression when the latter uses the the least squares method to obtain the parameters. The different names are an historical fact, resulting from the employment of the principle of least squares by different categories of user. In the WP context both articles must take experimental error into consideration.
Experimental error is not just a "statistical aspect" of the data. All measurements are subject to error. The fact of experimental error is fundamental to any problem of data fitting. The principal reason for collecting more data points than are absolutely needed, that is, for applying the least squares method in the first place, is to improve the precision of the derived parameters. I raise the issue of significant figures, not as a matter for debate, but to give the editor the opportunity to correct his mistake in his own way; otherwise I will have to do it myself. Petergans ( talk) 07:06, 22 May 2008 (UTC)
The recent re-ordering is the wrong way round. Weighted least squares is more general, so I suggest the properties section should be modified to take account of this and then the old order makes more sense. The modification could be as simple as adding "assuming unit weights", but I would prefer expressions that include weights. Petergans ( talk) 16:28, 25 June 2008 (UTC)
The section about "Inverting the normal equations" warns that "An exception occurs...". As someone trying to learn from the article, this sentence seems unnecessarily vague. Someone (like me) who knows a little math, knows that if the matrix has an inverse, it is unique. Is the article trying to indicate a computational problem, like rounding, or something else? If you are saying that computing equipment cannot calculate the inverse to an acceptable precision, then say that. Stephensbell ( talk) 18:20, 23 October 2008 (UTC)
Certain editors are changing the symbol for transpose from T () with ' (something which doesn't work at all properly within <math> tags, so I'm not going to try). I think this requires consensus both here and at the math notation page. — Arthur Rubin (talk) 22:31, 6 November 2008 (UTC)
Writing
for the transpose of the matrix A is done far more frequently in the statistics literature than is the superscript-T notation for transpose. It's also done in Herstein's Topics in Algebra, but the superscript-T is used in Weisberg's Applied Linear Regression, so I privately jocularly thing of the superscript-T notation as the Weisberg notation and the prime notation as the Herstein notation. Michael Hardy ( talk) 17:34, 7 November 2008 (UTC)
\top
myself, but don't really care as long as it is consistent. I do however think the excessive bolding is very ugly: I think it should be removed (unless anyone objects, I may do this in the next couple of days).
3mta3 (
talk)
17:14, 6 April 2009 (UTC)
As an MD with an interest in mathematics, and some background knowledge of linear algebra, I was trying to read this article from its beginning to the section entitled 'Inverting the normal equations'. Several points were decidedly unclear to me: 1. The alphas from the motivational example are gone in the section on 'The general problem'. I guess they are simply substracted from the yi, but this was confusing. 2. The section on "Uses in data fitting' ends by saying 'The problem then reduces to the overdetermined linear system mentioned earlier, with Xij = φj(xi).' This is unclear to me, because in the 'General problem' section it is said that the overdetermined linear system usually has no solution. The data fitting procedure, on the other hand, does come up with a solution. So, I would think that the fitting problem we start with is an overdetermined system, and the data fitting procedure comes up with the "best" solution. At the point where it is said that 'The problem then reduces to the overdetermined linear system mentioned earlier', in reality we have left behind that overdetermined linear system already, in order to find the approximate solution. 3. In the 'Derivation of the normal equations' section, and despite a little knowledge of matrix algebra, it was unclear to me how the normal equations are "translated' in matrix notation.
I apologize for my nonprofessional view of these matters, but, then, these encyclopedia articles are meant for people who do not know everything about the subject. So I thought it would be helpful to let know how the article feels to a nonmathematician, nonphysicist reader. Frandege ( talk) 21:47, 28 November 2008 (UTC)
To JMath: Thank you for looking into this; I believe the article benefited a great deal from the changes. I am still not very sure about the alphas which have now become beta1. My difficulty is that the equation which starts the 'General problem' section (: ) does not allow for terms without X-values. This stands in contrast to the section 'Uses in data fitting', where it is easy to conceive : for all x in the equation :. I can see that this point bears little on the further derivation, but it would be preferable to get rid of this inconsistency. To Flavio Guitan: for (1) see my comment above, for (2) this is clear now, for (3) thank you for pointing this out - I should have seen that, but got jumbled up with the transpose sign. Frandege ( talk) 19:41, 30 November 2008 (UTC)
Thank you again. The confusion arises from the use of Xij, which I confounded with the x's used as abscissa in the second plot on the article page. I can see now that the Xijs are nothing else than the φj(xi), where the xi are the abscissas in the plot. As I stated yesterday the alpha from the previous version (our current beta1) correspond to φj=1. In the example given in the first plot of the article, there would also be a beta2, corresponding to φj(xi)=xi^2. I am still tempted to think that the article would be clearer if the' general problem' were only described where and when it arises, i.e. in the data fitting.
I fix a minor inconsistency (the alpha was still mentioned in one sentence. Frandege ( talk) 19:24, 1 December 2008 (UTC)