This
level-5 vital article is rated C-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | |||||||||||
|
Daily pageviews of this article
A graph should have been displayed here but
graphs are temporarily disabled. Until they are enabled again, visit the interactive graph at
pageviews.wmcloud.org |
This page needs a proof and more rigorous maths.
situations for multivariable should be added: if u is a function of x, and y, and both x, and y are functions of t, then,
I don't see how Ex 1 is a "This calculation is a typical chain rule application." The calculation doesn't seem to involve differentiation which I would argue is the typical chain rule application. -- flatfish89 ( talk) 18:24, 24 April 2010 (UTC)
This discussion is outdated. Suggesting deletion. AurelienLourot ( talk) 19:30, 30 December 2014 (UTC)
I've replaced the current proof with one I saw somewhere. It relies on nothing but the definition of a derivative and the concept of limits. I believe it is quite rigourous and more formal than the previous one, but if there are any flaws with it feel free to point them out or even restore the old one.
Also, if you think it is too long or verbose then correct it.
Someone42 13:43, 25 May 2005 (UTC)
In addition, this proof relies on f(g(x+deltax))-f(g(x))=f(g;(x), which it does not. f(g'(x))=f(g(x+deltax)-g(x)). Additionally, the chain rule is not f'(g(x))=f'(g(x))g'(x), as dividing by f'(g(x)) would give you 1=g'(x), which isn't always true. The chain rule is f'(g(x))=f(g'(x))g'(x) —Preceding unsigned comment added by 66.169.198.79 ( talk) 06:08, 3 October 2009 (UTC)
I think the basic idea behind the chain rule is getting swamped in a sea of details and special cases. The basic idea of the chain rule is pretty easy to explain in words:
The best linear approximation of the composition is the composition of the best linear approximations.
Every special case of the chain rule, more or less expresses this fact in various situations, with a wide variety of abstraction or concreteness. But, this is the basic idea, and it would be good if it were made a bit more prominent. Revolver 01:43, 9 October 2005 (UTC)
That is a very nice description of the chain rule. I'm not sure I dare edit "one of the 500 most frequently viewed mathematics articles," but I'd like to see the above sentence appearing prominently in the article. David Bulger ( talk) 03:56, 5 July 2010 (UTC)
If I'm not completely mistaken, the definition mixes up f and . Also, x isn't some kind of magic symbol, so doesn't make sense either. It says that the composition of f and g (which is a function) is equal to a certain value of that function, namely that at x. I think the whole thing should read
Or leave out h completely and just write:
-- K. Sperling ( talk) 00:37, 12 November 2005 (UTC)
I would like to see an extensive expansion of the chain rule for several variables. Thanks, Silly rabbit 06:00, 15 November 2005 (UTC)
Is the notation in the section about the chain rule of higher dimensions really correct? For example and , among others. To me it seems wrong, or at least misleading. In my opinion it should be and using the notation style from the original author. I personally would drop the subscripts as well in this case:. -- Vilietha ( talk) 05:52, 12 June 2012 (UTC)
Since the "composition of two functions" is technically a "composite function", would its derivative be called a "composite derivative"? Also, does the secondary (inside) derivative have a special name (something like "harmonic derivative"——though I think that term means something else)? ~Kaimbridge~ 20:15, 3 February 2006 (UTC)
This discussion is outdated. Suggesting deletion. AurelienLourot ( talk) 19:33, 30 December 2014 (UTC)
I have no formal training in maths so forgive me if I sound naiive. Near the bottom of the proof it says "Observe that as and ." Would I be correct to think that "" shows that the "error" (right word?) involved goes to zero as delta goes to zero? 202.180.83.6 03:52, 16 February 2006 (UTC)
the primes in example 1 and 2 (where it says f'(x) = ) are very difficult to see. They look exactly like f(x). It may cause a lot of confusion, is there any way to make the primes in f'(x) stand out?
I think that the exceptions to the rule should be mentioned. For instance, , as would be suggested by the power rule. The problem is that sqrt{x+a} is just sqrt{x} shifted back, but x+25 still differentiates to 1. This means that according to the power rule, that constant that's added to x has no affect on it, even thought the derivitive should be . Which doesn't follow directly from the Chain Rule. He Who Is 23:33, 3 June 2006 (UTC)
All that calculus gives me a headache! I have no idea if and/or how they relate, but perhaps the chain rule pertaining to probability theory deserves a place somewhere on this page? You know, the P(A1 n A2 ... n An) = P(A1)P(A2|A1)P(A3|A1 n A2)...P(An|n n-1/i=1 Ai) thing, sorry about the crummy representation. 218.165.75.221 10:04, 30 September 2006 (UTC) M.H.
Uh, how about no. 69.215.17.209 14:41, 22 April 2007 (UTC)
Personally, I would like to see some detail about the Chain Rule for probability theory. I have been looking around the internet, and have not been able to find a discussion of it (ideally a step-by-step example or a detailed proof). So, it would be a good thing if wikipedia included something on it. Should the Chain Rule for probability theory be included on this page, the page for probability theory, or it's own page [ex. Chain Rule (Probability Theory)]?? SteelSoul ( talk) 17:50, 2 February 2009 (UTC)
The statement (f o g)'(x) = (f(g(x)))' is incorrect and fundamentally misunderstands the prime (f') notation. The prime is a transformation from functions to functions; as such it should be applied before the variable x is evaluated, as on the LHS but not as on the RHS. 69.215.17.209 14:44, 22 April 2007 (UTC)
I see that this edit was reverted, with the argument that this notation is common enough to be included. I understand that people may sometimes use it (I've never seen it myself, and I challenge anyone to produce examples from a common calculus text), but it is not standard and pedagogically very confusing. It is ambiguous whether the constant f(x) or the function f is being differentiated. I do not think that such poor notation should be perpetuated in an encyclopedia, without evidence that it is at least commonly used. -- 69.212.231.101 03:52, 26 July 2007 (UTC)
Perhaps resulting from corrections above, f prime is now invisible when inline (at end of first proof). g prime is visible inline due to different glyph for g not overlapping the prime in the way that it overlaps for f. f prime works ok with display notation instead of inline. Should be a simple fix but I don't know how. 58.175.211.1 ( talk) —Preceding undated comment added 15:26, 7 February 2013 (UTC)
The examples could use some more description, depending on if we're shooting for "definition" or "instructional detail." Substituting U for X^2+1 makes the plug 'n' chug easier, but it's not strictly necessary. Any objections to expanding current objections to include various applications of chain and detailed description of how and why subsitutions are valid?-- Legomancer ( talk) 22:45, 8 September 2009 (UTC)
For any coordinate (real valued function) y on a line (e.g. the real line) and any point p, denote by dyp the equivalence class of y-y(p)1 (where 1 is the constant function, with value 1) modulo functions vanishing at p to higher order. If y=f(x) (i.e. y = f ⚬ x for some other coordinate x on the same line and some f:R→ R) then the definition of differentiability of f at x(p) ensures that dyp = f'(x(p)) dxp because f(x)-f(x(p))1 differs from f'(x(p))(x-x(p)1) by a function vanishing at p to higher order.
The chain rule is an immediate consequence. If u = g(y) then, omitting evaluations/subscripts at p, du = g'(y)dy= g'(f(x))f'(x) dx.
Most arguments formalize this basic idea without discussing the conceptual meaning. Geometry guy 00:57, 2 November 2010 (UTC)
I was revisiting the first proof, and I've come to the conclusion that I don't think it's correct; at least, not in spirit. I was trying to rewrite it from scratch (which is my normal style), and the best I could do was as follows:
When g(x) equals g(a), THEN A MIRACLE OCCURS and this product is still equal to the difference quotient. Hence we can compute the derivative of f ∘ g at a by computing the limit as x goes to a of the above function. This limit exists because the above function is a product and the limit as x goes to a of each of its factors exists. Furthermore, because Q is continuous, the limit of the first factor equals f′(g(a)), and by definition the limit of the second factor equals g′(a). This proves the chain rule.
The problem is that when g(x) equals g(a) and x is not a, the miracle doesn't occur; the value of the product is f′(g(a)) times zero, which is zero. If we were to take the limit instead of evaluating, then the miracle would occur, but I then don't know how to prove that the limit computes what we want. If we could split up the product, then the miracle would occur, but then we need to show that the limit of the product exists, and I don't know how to prove that directly. The standard proofs get around this by explicitly measuring error terms; when we approach things that way, we never see the zero product, hence the miracle occurs. The whole reason we have this proof, though, is because it avoids error terms, and if we have to introduce them to make this work then there's no point in keeping this proof. So I'm stuck; I don't see how to fill this gap. In fact, as far as I can tell, since the product is zero this proof is just wrong.
The article presently seems to ignore this difficulty. It glosses over it by introducing Q only at the end and ignoring the need for a miracle. But as far as I can tell, it has exactly the same problem. Am I missing something? Or what? Ozob ( talk) 04:32, 11 December 2010 (UTC)
In the example "Suppose that a skydiver ..." the formula g(t) = 4000 − 9.8t2 should be replaced with g(t) = 4000 − ½9.8t2 isn't it? 2.36.204.64 ( talk) 22:21, 19 January 2011 (UTC)
1) clarify 2nd bullet from "...rate of change in atmospheric pressure at height..." to ...rate of change in atmospheric pressure w.r.t. h, at height...
2) clarify 4th bullet from "...rate of change in atmospheric pressure t seconds after..." to ...rate of change in atmospheric pressure w.r.t. t, t seconds after...
3) the bottom paragraph that starts "It is not true..." is misleading and includes an error. I would end it with the sentence "This need not have anything to do with the buoyant force ten seconds after the skydiver's jump." and start a new paragraph just below that states the following:
It is true that (f o g)'(t) = f'(h) * g'(t). To find the buoyant force w.r.t. t ten seconds after his jump, we must evaluate g(10), his height ten seconds after he jumps, and substitute the result into f'(h). g(10) is 3510 meters above sea level, so the true buoyant force w.r.t. t ten seconds after the jump is (proportional to) f'(3510) * g'(10) = 7.133 * -98 = -699.
This example demonstrates the Chain Rule as the product of two rates. The last sentence that states "g(10) is 3020 meters above sea level, so the true buoyant force ten seconds after the jump is (proportional to) f'(3020)." is erroneous. To use the Chain Rule you need to multiply by f'(g(t)) by g'(t). —Preceding unsigned comment added by 69.117.93.37 ( talk) 04:41, 31 January 2011 (UTC)
(Copied from WT:WPM. Ozob ( talk) 02:07, 2 March 2011 (UTC))
The article titled chain rule currently says:
Does this last form really fail to "specify where each of these derivatives is to be evaluated"? It seems to me that the first form above clutters things in such a way as to interfere with understanding, and that the second, read correctly, doesn't really fail to do anything that should be done.
Opinions? Michael Hardy ( talk) 23:04, 1 March 2011 (UTC)
If y = g(u) and u = f(x), then the point at which to evaluate dy/du is u and the point at which to evaluate du/dx is x. That seems obvious. The extra notation will be confusing. Michael Hardy ( talk) 04:32, 2 March 2011 (UTC)
I disagree: it does make sense to evaluate a function at a variable.
There you have evaluation of a function at the variable x and evaluation of a function at the variable u. Are students really going to mistakenly assume I mean the pointwise product of ƒ and g if I write it that way? I don't think so.
And if I write
is that not also "evaluation of a function at a variable"? Michael Hardy ( talk) 20:14, 2 March 2011 (UTC)
To write
is at best redundant. That u is where it's evaluated is inherent in the meaning of the Leibniz notation. It's hard to see how anyone could mistakenly think otherwise. That's why this whole thing about evaluation is pointless. Michael Hardy ( talk) 18:24, 3 March 2011 (UTC)
I might misunderstand the notations, but why is the multivariate chain rule written as:
Why is there a composition on the RHS, and not a product of derivatives, as in the univariate case ? Is it to be understood as a matrix operation, in which case composition corresponds to a product, and it that case, shouldn't this be explicitely signaled ? Donvinzk ( talk) 11:34, 4 June 2011 (UTC)
[...]
This is exactly the formula D(f ∘ g) = Df ∘ g + Dg ∘ f.
Shouldn't this also be D(f o g) = Df o Dg ?
I have doubts in the comment that the second proof does not need a theorem about products of limits. In the intermediate step, we need to consider the product of and , which is equivalent to [Q-f'(g(a))]*{[g(x)-g(a)]/(x-a) - g'(a)}. My thought is that both proofs rely on the same theorem. However, I would like to get comments from more experienced editors before changing anything in the article. 202.130.125.147 ( talk) 09:04, 14 September 2013 (UTC)
I understand the need to make the article accessible for those who have not had extensive math education, but at the same time there are massive inconsistencies (and outright errors) in notation on this page that really offend the sensibilities of anyone who had studied math, and are bound to cause confusion if people take certain bits of the notation as they are formally stated. There's no need for there to be a dichotomy between formal accuracy and lucidness, and I'd argue that beginning students in calculus are poorly-served by having a resource which sacrifices accuracy for naive intuition.
For starters, we really ought to remove all ambiguity between f ∘ g and f; when you mean the former, write the former, and when you mean the latter, write the latter. These are conflated all over the place (the "higher order derivatives" section, for one, in which none of the stated formulas are formally true), and it is incorrect and confusing.
Additionally, any use of Leibniz' notation where the argument is inside the differential is abusive; it's a mild abuse of notation, granted, but it really should be avoided where it can be.
Thoughts?
129.2.129.149 ( talk) 13:44, 25 October 2013 (UTC)
The present state of the article states: "For example, the chain rule for (f ∘ g)(x) is ". Reading it with a fresh view, and knowing chain rule, I find it highly confusing. Thinking about it a little more, it appears that the formula is correct by itself, but wrongly specified. In fact nor Leibnitz, nor Newton did know the symbol of function composition, and probably not the modern notion of composition of function. For being correct, the formula must be introduced by "If f is a function of a variable g, which is itself function of x, then the chain rule is: ...". If one want to introduce the chain rule for (f ∘ g), Leibnitz notation must be avoided, because it supposes that the variables are named, which is not the case in (f ∘ g). On the other hand, Newton notation, where the variables needs not to be named works well and gives: "The chain rule for (f ∘ g) is (f ∘ g)′ = (f′ ∘ g) g′." I agree with the IP that, even with my correction, the formulation with Leibnitz notation is difficult to make formally correct, because it needs the rather strange notion of a variable that is function of another variable. But it works well in practice.
In conclusion, my opinion is that the two versions of the chain rule must appear in the lead, appropriately introduced.
D.Lazard ( talk) 10:27, 26 October 2013 (UTC)
Am I missing something, or are the claims in this section completely wrong? D(f o g) = Df o Dg seems to me to be a false statement; the correct formulation would be D(f o g) = ((Df) o g)Dg. That this section cites no sources at all does not exactly inspire confidence, as well. 71.163.32.84 ( talk) 03:05, 28 October 2013 (UTC)
I've made a first pass at attempting to be more explicit about the notation. Please feel free to improve it or to comment here. Ozob ( talk) 13:39, 30 October 2013 (UTC)
The second proof contains the following line:
Is this line really necessary? Wouldn't it be enough to write that η behaves like ε, i.e. it simply tends to zero as its argument tends to zero?
The second proof terminates finally with the following line:
But I don't see the need to define η at zero in that second proof. Am I missing something? Don't we just need to know that η tends to zero as its argument tends to zero? AurelienLourot ( talk) 19:51, 30 December 2014 (UTC)
Would it be pertinent/beneficial to add a small section exposing the non-standard approach to the chain rule, i.e. using hyperreal numbers and standard parts? — Preceding unsigned comment added by Gio97 ( talk • contribs) 08:30, 10 April 2015 (UTC)
I tend to agree with he point being made in the edit summary in this revert. Placing the prime after an expression is pushing an abuse of notation too far. The correct notation would be . The prime just does not have the necessary flexibility (expressive power). — Quondum 02:24, 20 May 2015 (UTC)
For the life of me, I can't make heads or tails of that recently-added animation. I'm not even sure what it's supposed to be showing. Is it just me? Perhaps with a bit of clarification it could be a useful addition to the article, but in its present state I'm not sure it'll contribute to anyone's understanding. 96.231.153.5 ( talk) 06:07, 26 January 2016 (UTC)
Regarding the example u(x, y) = x2 + 2y. We strictly have , and since the function does not take r as a function parameter. The article makes use of non standard notation if you apply the chain rule here. You shall not apply the chain rule here. See https://www.icp.uni-stuttgart.de/~icp/mediawiki/images/7/74/Remark_on_partial_derivatives.pdf please correct this!-- 94.217.251.2 ( talk) 10:24, 26 June 2018 (UTC)
I've stumbled across that formula and the order of partial derivatives in the second sum is not correct, it should be like this: not this
It is not always true that the order of the partial derivatives can be exchanged without affecting the result as asserted by the Schwartz's theorem. So it is appropriate to write the formula as general as possible or at least precise that we are under certain conditions. — Preceding unsigned comment added by 213.243.253.119 ( talk • contribs) 09:28, 6 May 2019 (UTC)
I don't think it's wise, didactically, to use a capital F in F(x) = f(g(x)) since capitals are very often used to designate the primitive, as in F'(x) = f(x). Why not just use h or something? — Preceding unsigned comment added by 77.61.180.106 ( talk • contribs) 13:00, 4 December 2020 (UTC)
The First Example states "(f ∘ g)(t) is the atmospheric pressure the skydiver experiences t seconds after his jump". It's not that simple, because the distance fell after t-seconds is a tiny bit less than that given by g(t) due to buoyancy. I expect you would need to use a differential equation to model the physics in the first example rather than a simple composite function.— Preceding unsigned comment added by MathewMunro ( talk • contribs) 09:26, 14 February 2021 (UTC)
For example, suppose that we want to compute the rate of change in atmospheric pressure ten seconds after the skydiver jumps. This is (f ∘ g)′(10) and has units of pascals per second. The factor g′(10) in the chain rule is the velocity of the skydiver ten seconds after his jump, and it is expressed in meters per second. is the change in pressure with respect to height at the height g(10) and is expressed in pascals per meter. The product of and therefore has the correct units of pascals per second.
Here, notice that it is not possible to evaluate f anywhere else. For instance, the 10 in the problem represents ten seconds, while the expression would represent the change in pressure at a height of ten meters, which is not what we wanted. Similarly, while g′(10) = −98 has a unit of meters per second, the expression f′(g′(10)) would represent the change in pressure at a height of −98 meters, which is again not what we wanted. However, g(10) is 3020 meters above sea level, the height of the skydiver ten seconds after his jump, and this has the correct units for an input to f.
I have a question about the example function g(x) in the First Proof. Currently the article states: ‘For example, this happens for g(x) = x^2*sin(1 / x) near the point a = 0.’ I think the example function should be a split function as follows: “g(x) = x^2*sin(1 / x) for x doesn’t equal 0, and g(x) = 0 for x = 0”. As it stands the function is undefined at the point where a = 0, and so not differentiable there. I think the point of the article would be valid if it used the split function I have suggested. — Preceding unsigned comment added by Matthew.howey ( talk • contribs) 10:38, 29 March 2021 (UTC (UTC)
Fixed D.Lazard ( talk) 11:40, 29 March 2021 (UTC)
I've proceeded to make the edit I suggested, obviously if anyone disagrees please amend or revert. Matthew.howey ( talk) 20:40, 15 April 2021 (UTC)
I think the current notation for partial derivatives () is unnecessarily pedantic. Thus I attempted to change it to what I believe is the most common notation () [3], which was almost immediately reverted ( [4]) with the reason "Partial derivative with respect to a function is not defined". While that is correct, it is also not what I wrote, but I agree that it should perhaps be very clearly stated that it is not to be read as that. I think the notation is by far the most common, and I don't think the article is helping anyone by not adhering to that. Does anyone have major objections to changing the notation, perhaps with the addition of a few sentences explaining how it should be read? QuarksAndElectrons ( talk) 10:03, 27 October 2023 (UTC)
This
level-5 vital article is rated C-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | |||||||||||
|
Daily pageviews of this article
A graph should have been displayed here but
graphs are temporarily disabled. Until they are enabled again, visit the interactive graph at
pageviews.wmcloud.org |
This page needs a proof and more rigorous maths.
situations for multivariable should be added: if u is a function of x, and y, and both x, and y are functions of t, then,
I don't see how Ex 1 is a "This calculation is a typical chain rule application." The calculation doesn't seem to involve differentiation which I would argue is the typical chain rule application. -- flatfish89 ( talk) 18:24, 24 April 2010 (UTC)
This discussion is outdated. Suggesting deletion. AurelienLourot ( talk) 19:30, 30 December 2014 (UTC)
I've replaced the current proof with one I saw somewhere. It relies on nothing but the definition of a derivative and the concept of limits. I believe it is quite rigourous and more formal than the previous one, but if there are any flaws with it feel free to point them out or even restore the old one.
Also, if you think it is too long or verbose then correct it.
Someone42 13:43, 25 May 2005 (UTC)
In addition, this proof relies on f(g(x+deltax))-f(g(x))=f(g;(x), which it does not. f(g'(x))=f(g(x+deltax)-g(x)). Additionally, the chain rule is not f'(g(x))=f'(g(x))g'(x), as dividing by f'(g(x)) would give you 1=g'(x), which isn't always true. The chain rule is f'(g(x))=f(g'(x))g'(x) —Preceding unsigned comment added by 66.169.198.79 ( talk) 06:08, 3 October 2009 (UTC)
I think the basic idea behind the chain rule is getting swamped in a sea of details and special cases. The basic idea of the chain rule is pretty easy to explain in words:
The best linear approximation of the composition is the composition of the best linear approximations.
Every special case of the chain rule, more or less expresses this fact in various situations, with a wide variety of abstraction or concreteness. But, this is the basic idea, and it would be good if it were made a bit more prominent. Revolver 01:43, 9 October 2005 (UTC)
That is a very nice description of the chain rule. I'm not sure I dare edit "one of the 500 most frequently viewed mathematics articles," but I'd like to see the above sentence appearing prominently in the article. David Bulger ( talk) 03:56, 5 July 2010 (UTC)
If I'm not completely mistaken, the definition mixes up f and . Also, x isn't some kind of magic symbol, so doesn't make sense either. It says that the composition of f and g (which is a function) is equal to a certain value of that function, namely that at x. I think the whole thing should read
Or leave out h completely and just write:
-- K. Sperling ( talk) 00:37, 12 November 2005 (UTC)
I would like to see an extensive expansion of the chain rule for several variables. Thanks, Silly rabbit 06:00, 15 November 2005 (UTC)
Is the notation in the section about the chain rule of higher dimensions really correct? For example and , among others. To me it seems wrong, or at least misleading. In my opinion it should be and using the notation style from the original author. I personally would drop the subscripts as well in this case:. -- Vilietha ( talk) 05:52, 12 June 2012 (UTC)
Since the "composition of two functions" is technically a "composite function", would its derivative be called a "composite derivative"? Also, does the secondary (inside) derivative have a special name (something like "harmonic derivative"——though I think that term means something else)? ~Kaimbridge~ 20:15, 3 February 2006 (UTC)
This discussion is outdated. Suggesting deletion. AurelienLourot ( talk) 19:33, 30 December 2014 (UTC)
I have no formal training in maths so forgive me if I sound naiive. Near the bottom of the proof it says "Observe that as and ." Would I be correct to think that "" shows that the "error" (right word?) involved goes to zero as delta goes to zero? 202.180.83.6 03:52, 16 February 2006 (UTC)
the primes in example 1 and 2 (where it says f'(x) = ) are very difficult to see. They look exactly like f(x). It may cause a lot of confusion, is there any way to make the primes in f'(x) stand out?
I think that the exceptions to the rule should be mentioned. For instance, , as would be suggested by the power rule. The problem is that sqrt{x+a} is just sqrt{x} shifted back, but x+25 still differentiates to 1. This means that according to the power rule, that constant that's added to x has no affect on it, even thought the derivitive should be . Which doesn't follow directly from the Chain Rule. He Who Is 23:33, 3 June 2006 (UTC)
All that calculus gives me a headache! I have no idea if and/or how they relate, but perhaps the chain rule pertaining to probability theory deserves a place somewhere on this page? You know, the P(A1 n A2 ... n An) = P(A1)P(A2|A1)P(A3|A1 n A2)...P(An|n n-1/i=1 Ai) thing, sorry about the crummy representation. 218.165.75.221 10:04, 30 September 2006 (UTC) M.H.
Uh, how about no. 69.215.17.209 14:41, 22 April 2007 (UTC)
Personally, I would like to see some detail about the Chain Rule for probability theory. I have been looking around the internet, and have not been able to find a discussion of it (ideally a step-by-step example or a detailed proof). So, it would be a good thing if wikipedia included something on it. Should the Chain Rule for probability theory be included on this page, the page for probability theory, or it's own page [ex. Chain Rule (Probability Theory)]?? SteelSoul ( talk) 17:50, 2 February 2009 (UTC)
The statement (f o g)'(x) = (f(g(x)))' is incorrect and fundamentally misunderstands the prime (f') notation. The prime is a transformation from functions to functions; as such it should be applied before the variable x is evaluated, as on the LHS but not as on the RHS. 69.215.17.209 14:44, 22 April 2007 (UTC)
I see that this edit was reverted, with the argument that this notation is common enough to be included. I understand that people may sometimes use it (I've never seen it myself, and I challenge anyone to produce examples from a common calculus text), but it is not standard and pedagogically very confusing. It is ambiguous whether the constant f(x) or the function f is being differentiated. I do not think that such poor notation should be perpetuated in an encyclopedia, without evidence that it is at least commonly used. -- 69.212.231.101 03:52, 26 July 2007 (UTC)
Perhaps resulting from corrections above, f prime is now invisible when inline (at end of first proof). g prime is visible inline due to different glyph for g not overlapping the prime in the way that it overlaps for f. f prime works ok with display notation instead of inline. Should be a simple fix but I don't know how. 58.175.211.1 ( talk) —Preceding undated comment added 15:26, 7 February 2013 (UTC)
The examples could use some more description, depending on if we're shooting for "definition" or "instructional detail." Substituting U for X^2+1 makes the plug 'n' chug easier, but it's not strictly necessary. Any objections to expanding current objections to include various applications of chain and detailed description of how and why subsitutions are valid?-- Legomancer ( talk) 22:45, 8 September 2009 (UTC)
For any coordinate (real valued function) y on a line (e.g. the real line) and any point p, denote by dyp the equivalence class of y-y(p)1 (where 1 is the constant function, with value 1) modulo functions vanishing at p to higher order. If y=f(x) (i.e. y = f ⚬ x for some other coordinate x on the same line and some f:R→ R) then the definition of differentiability of f at x(p) ensures that dyp = f'(x(p)) dxp because f(x)-f(x(p))1 differs from f'(x(p))(x-x(p)1) by a function vanishing at p to higher order.
The chain rule is an immediate consequence. If u = g(y) then, omitting evaluations/subscripts at p, du = g'(y)dy= g'(f(x))f'(x) dx.
Most arguments formalize this basic idea without discussing the conceptual meaning. Geometry guy 00:57, 2 November 2010 (UTC)
I was revisiting the first proof, and I've come to the conclusion that I don't think it's correct; at least, not in spirit. I was trying to rewrite it from scratch (which is my normal style), and the best I could do was as follows:
When g(x) equals g(a), THEN A MIRACLE OCCURS and this product is still equal to the difference quotient. Hence we can compute the derivative of f ∘ g at a by computing the limit as x goes to a of the above function. This limit exists because the above function is a product and the limit as x goes to a of each of its factors exists. Furthermore, because Q is continuous, the limit of the first factor equals f′(g(a)), and by definition the limit of the second factor equals g′(a). This proves the chain rule.
The problem is that when g(x) equals g(a) and x is not a, the miracle doesn't occur; the value of the product is f′(g(a)) times zero, which is zero. If we were to take the limit instead of evaluating, then the miracle would occur, but I then don't know how to prove that the limit computes what we want. If we could split up the product, then the miracle would occur, but then we need to show that the limit of the product exists, and I don't know how to prove that directly. The standard proofs get around this by explicitly measuring error terms; when we approach things that way, we never see the zero product, hence the miracle occurs. The whole reason we have this proof, though, is because it avoids error terms, and if we have to introduce them to make this work then there's no point in keeping this proof. So I'm stuck; I don't see how to fill this gap. In fact, as far as I can tell, since the product is zero this proof is just wrong.
The article presently seems to ignore this difficulty. It glosses over it by introducing Q only at the end and ignoring the need for a miracle. But as far as I can tell, it has exactly the same problem. Am I missing something? Or what? Ozob ( talk) 04:32, 11 December 2010 (UTC)
In the example "Suppose that a skydiver ..." the formula g(t) = 4000 − 9.8t2 should be replaced with g(t) = 4000 − ½9.8t2 isn't it? 2.36.204.64 ( talk) 22:21, 19 January 2011 (UTC)
1) clarify 2nd bullet from "...rate of change in atmospheric pressure at height..." to ...rate of change in atmospheric pressure w.r.t. h, at height...
2) clarify 4th bullet from "...rate of change in atmospheric pressure t seconds after..." to ...rate of change in atmospheric pressure w.r.t. t, t seconds after...
3) the bottom paragraph that starts "It is not true..." is misleading and includes an error. I would end it with the sentence "This need not have anything to do with the buoyant force ten seconds after the skydiver's jump." and start a new paragraph just below that states the following:
It is true that (f o g)'(t) = f'(h) * g'(t). To find the buoyant force w.r.t. t ten seconds after his jump, we must evaluate g(10), his height ten seconds after he jumps, and substitute the result into f'(h). g(10) is 3510 meters above sea level, so the true buoyant force w.r.t. t ten seconds after the jump is (proportional to) f'(3510) * g'(10) = 7.133 * -98 = -699.
This example demonstrates the Chain Rule as the product of two rates. The last sentence that states "g(10) is 3020 meters above sea level, so the true buoyant force ten seconds after the jump is (proportional to) f'(3020)." is erroneous. To use the Chain Rule you need to multiply by f'(g(t)) by g'(t). —Preceding unsigned comment added by 69.117.93.37 ( talk) 04:41, 31 January 2011 (UTC)
(Copied from WT:WPM. Ozob ( talk) 02:07, 2 March 2011 (UTC))
The article titled chain rule currently says:
Does this last form really fail to "specify where each of these derivatives is to be evaluated"? It seems to me that the first form above clutters things in such a way as to interfere with understanding, and that the second, read correctly, doesn't really fail to do anything that should be done.
Opinions? Michael Hardy ( talk) 23:04, 1 March 2011 (UTC)
If y = g(u) and u = f(x), then the point at which to evaluate dy/du is u and the point at which to evaluate du/dx is x. That seems obvious. The extra notation will be confusing. Michael Hardy ( talk) 04:32, 2 March 2011 (UTC)
I disagree: it does make sense to evaluate a function at a variable.
There you have evaluation of a function at the variable x and evaluation of a function at the variable u. Are students really going to mistakenly assume I mean the pointwise product of ƒ and g if I write it that way? I don't think so.
And if I write
is that not also "evaluation of a function at a variable"? Michael Hardy ( talk) 20:14, 2 March 2011 (UTC)
To write
is at best redundant. That u is where it's evaluated is inherent in the meaning of the Leibniz notation. It's hard to see how anyone could mistakenly think otherwise. That's why this whole thing about evaluation is pointless. Michael Hardy ( talk) 18:24, 3 March 2011 (UTC)
I might misunderstand the notations, but why is the multivariate chain rule written as:
Why is there a composition on the RHS, and not a product of derivatives, as in the univariate case ? Is it to be understood as a matrix operation, in which case composition corresponds to a product, and it that case, shouldn't this be explicitely signaled ? Donvinzk ( talk) 11:34, 4 June 2011 (UTC)
[...]
This is exactly the formula D(f ∘ g) = Df ∘ g + Dg ∘ f.
Shouldn't this also be D(f o g) = Df o Dg ?
I have doubts in the comment that the second proof does not need a theorem about products of limits. In the intermediate step, we need to consider the product of and , which is equivalent to [Q-f'(g(a))]*{[g(x)-g(a)]/(x-a) - g'(a)}. My thought is that both proofs rely on the same theorem. However, I would like to get comments from more experienced editors before changing anything in the article. 202.130.125.147 ( talk) 09:04, 14 September 2013 (UTC)
I understand the need to make the article accessible for those who have not had extensive math education, but at the same time there are massive inconsistencies (and outright errors) in notation on this page that really offend the sensibilities of anyone who had studied math, and are bound to cause confusion if people take certain bits of the notation as they are formally stated. There's no need for there to be a dichotomy between formal accuracy and lucidness, and I'd argue that beginning students in calculus are poorly-served by having a resource which sacrifices accuracy for naive intuition.
For starters, we really ought to remove all ambiguity between f ∘ g and f; when you mean the former, write the former, and when you mean the latter, write the latter. These are conflated all over the place (the "higher order derivatives" section, for one, in which none of the stated formulas are formally true), and it is incorrect and confusing.
Additionally, any use of Leibniz' notation where the argument is inside the differential is abusive; it's a mild abuse of notation, granted, but it really should be avoided where it can be.
Thoughts?
129.2.129.149 ( talk) 13:44, 25 October 2013 (UTC)
The present state of the article states: "For example, the chain rule for (f ∘ g)(x) is ". Reading it with a fresh view, and knowing chain rule, I find it highly confusing. Thinking about it a little more, it appears that the formula is correct by itself, but wrongly specified. In fact nor Leibnitz, nor Newton did know the symbol of function composition, and probably not the modern notion of composition of function. For being correct, the formula must be introduced by "If f is a function of a variable g, which is itself function of x, then the chain rule is: ...". If one want to introduce the chain rule for (f ∘ g), Leibnitz notation must be avoided, because it supposes that the variables are named, which is not the case in (f ∘ g). On the other hand, Newton notation, where the variables needs not to be named works well and gives: "The chain rule for (f ∘ g) is (f ∘ g)′ = (f′ ∘ g) g′." I agree with the IP that, even with my correction, the formulation with Leibnitz notation is difficult to make formally correct, because it needs the rather strange notion of a variable that is function of another variable. But it works well in practice.
In conclusion, my opinion is that the two versions of the chain rule must appear in the lead, appropriately introduced.
D.Lazard ( talk) 10:27, 26 October 2013 (UTC)
Am I missing something, or are the claims in this section completely wrong? D(f o g) = Df o Dg seems to me to be a false statement; the correct formulation would be D(f o g) = ((Df) o g)Dg. That this section cites no sources at all does not exactly inspire confidence, as well. 71.163.32.84 ( talk) 03:05, 28 October 2013 (UTC)
I've made a first pass at attempting to be more explicit about the notation. Please feel free to improve it or to comment here. Ozob ( talk) 13:39, 30 October 2013 (UTC)
The second proof contains the following line:
Is this line really necessary? Wouldn't it be enough to write that η behaves like ε, i.e. it simply tends to zero as its argument tends to zero?
The second proof terminates finally with the following line:
But I don't see the need to define η at zero in that second proof. Am I missing something? Don't we just need to know that η tends to zero as its argument tends to zero? AurelienLourot ( talk) 19:51, 30 December 2014 (UTC)
Would it be pertinent/beneficial to add a small section exposing the non-standard approach to the chain rule, i.e. using hyperreal numbers and standard parts? — Preceding unsigned comment added by Gio97 ( talk • contribs) 08:30, 10 April 2015 (UTC)
I tend to agree with he point being made in the edit summary in this revert. Placing the prime after an expression is pushing an abuse of notation too far. The correct notation would be . The prime just does not have the necessary flexibility (expressive power). — Quondum 02:24, 20 May 2015 (UTC)
For the life of me, I can't make heads or tails of that recently-added animation. I'm not even sure what it's supposed to be showing. Is it just me? Perhaps with a bit of clarification it could be a useful addition to the article, but in its present state I'm not sure it'll contribute to anyone's understanding. 96.231.153.5 ( talk) 06:07, 26 January 2016 (UTC)
Regarding the example u(x, y) = x2 + 2y. We strictly have , and since the function does not take r as a function parameter. The article makes use of non standard notation if you apply the chain rule here. You shall not apply the chain rule here. See https://www.icp.uni-stuttgart.de/~icp/mediawiki/images/7/74/Remark_on_partial_derivatives.pdf please correct this!-- 94.217.251.2 ( talk) 10:24, 26 June 2018 (UTC)
I've stumbled across that formula and the order of partial derivatives in the second sum is not correct, it should be like this: not this
It is not always true that the order of the partial derivatives can be exchanged without affecting the result as asserted by the Schwartz's theorem. So it is appropriate to write the formula as general as possible or at least precise that we are under certain conditions. — Preceding unsigned comment added by 213.243.253.119 ( talk • contribs) 09:28, 6 May 2019 (UTC)
I don't think it's wise, didactically, to use a capital F in F(x) = f(g(x)) since capitals are very often used to designate the primitive, as in F'(x) = f(x). Why not just use h or something? — Preceding unsigned comment added by 77.61.180.106 ( talk • contribs) 13:00, 4 December 2020 (UTC)
The First Example states "(f ∘ g)(t) is the atmospheric pressure the skydiver experiences t seconds after his jump". It's not that simple, because the distance fell after t-seconds is a tiny bit less than that given by g(t) due to buoyancy. I expect you would need to use a differential equation to model the physics in the first example rather than a simple composite function.— Preceding unsigned comment added by MathewMunro ( talk • contribs) 09:26, 14 February 2021 (UTC)
For example, suppose that we want to compute the rate of change in atmospheric pressure ten seconds after the skydiver jumps. This is (f ∘ g)′(10) and has units of pascals per second. The factor g′(10) in the chain rule is the velocity of the skydiver ten seconds after his jump, and it is expressed in meters per second. is the change in pressure with respect to height at the height g(10) and is expressed in pascals per meter. The product of and therefore has the correct units of pascals per second.
Here, notice that it is not possible to evaluate f anywhere else. For instance, the 10 in the problem represents ten seconds, while the expression would represent the change in pressure at a height of ten meters, which is not what we wanted. Similarly, while g′(10) = −98 has a unit of meters per second, the expression f′(g′(10)) would represent the change in pressure at a height of −98 meters, which is again not what we wanted. However, g(10) is 3020 meters above sea level, the height of the skydiver ten seconds after his jump, and this has the correct units for an input to f.
I have a question about the example function g(x) in the First Proof. Currently the article states: ‘For example, this happens for g(x) = x^2*sin(1 / x) near the point a = 0.’ I think the example function should be a split function as follows: “g(x) = x^2*sin(1 / x) for x doesn’t equal 0, and g(x) = 0 for x = 0”. As it stands the function is undefined at the point where a = 0, and so not differentiable there. I think the point of the article would be valid if it used the split function I have suggested. — Preceding unsigned comment added by Matthew.howey ( talk • contribs) 10:38, 29 March 2021 (UTC (UTC)
Fixed D.Lazard ( talk) 11:40, 29 March 2021 (UTC)
I've proceeded to make the edit I suggested, obviously if anyone disagrees please amend or revert. Matthew.howey ( talk) 20:40, 15 April 2021 (UTC)
I think the current notation for partial derivatives () is unnecessarily pedantic. Thus I attempted to change it to what I believe is the most common notation () [3], which was almost immediately reverted ( [4]) with the reason "Partial derivative with respect to a function is not defined". While that is correct, it is also not what I wrote, but I agree that it should perhaps be very clearly stated that it is not to be read as that. I think the notation is by far the most common, and I don't think the article is helping anyone by not adhering to that. Does anyone have major objections to changing the notation, perhaps with the addition of a few sentences explaining how it should be read? QuarksAndElectrons ( talk) 10:03, 27 October 2023 (UTC)