02.05.03 · analysis / multivariable-differentiation

Chain rule for multi-variable functions

shipped3 tiersLean: partial

Anchor (Master): Apostol *Calculus* Vol. 2 Ch. 8 §8.18–8.21; Dieudonné *Foundations of Modern Analysis* Ch. VIII; Cartan *Calcul Différentiel*; Faà di Bruno 1855 *Sullo sviluppo delle funzioni* (Annali di Scienze Matematiche e Fisiche 6); Itô 1944 *Stochastic Integral* (Proc. Imp. Acad. Tokyo 20)

Intuition [Beginner]

A composition of functions chains one operation after another. The inner function takes an input and produces an intermediate output. The outer function takes that intermediate output and produces a final output. In one variable, the rate of change of the chained operation equals the rate of change of the outer function (at the intermediate output) times the rate of change of the inner function (at the original input). Two gears coupled together: a small turn of the input shaft turns the intermediate shaft a little, and that little turn of the intermediate shaft turns the output shaft a little more.

The multi-variable version keeps the same idea, with a small upgrade. The "rate of change" of a function from -space to -space is no longer a single number — it is a rectangular table of numbers called the Jacobian matrix, with one row per output coordinate and one column per input coordinate. Each entry records how one output coordinate responds to a small change in one input coordinate. The chain rule says: the Jacobian matrix of the composition is the matrix product of the outer Jacobian (evaluated at the intermediate output) and the inner Jacobian.

A picture worth carrying: nesting boxes. The inner box converts inputs into intermediates; the outer box converts intermediates into outputs. The chain rule says you can sense how the outermost output reacts to the innermost input by multiplying the two response tables together, in the correct order.

Visual [Beginner]

A three-panel diagram. The leftmost panel shows a number line carrying the input variable . An arrow labelled leads to the middle panel, which shows the plane with a circle traced by the point . A second arrow labelled leads to the rightmost panel, another number line, where the entire circle has been collapsed to the single value because the squared distance from the origin to any point on the unit circle equals .

Composition diagram: a real number maps through g into the plane, then through f back to the real line. Two short arrows over the panels are labelled Dg(a) and Df(g(a)) — the Jacobian matrices of the inner and outer maps. A long bottom arrow says D(f composition g)(a) equals Df(g(a)) composed with Dg(a) — the chain-rule statement.

The two short arrows above the panels carry the Jacobians of the two maps. The long bottom arrow expresses the chain rule: the Jacobian of the composition is the matrix product of the two inner Jacobians. The visual signature: the chain rule is multiplication of derivative-tables, with the outer table evaluated at the intermediate point.

Worked example [Beginner]

Let the inner map be , a path tracing out the unit circle in the plane. Let the outer map be , the squared distance from the origin. The composition is for every . So the composed function is constant; its rate of change is .

Check this two ways. The direct computation. The composed function equals everywhere, so its derivative at every point equals . The chain-rule computation. The inner Jacobian is , a column. The outer Jacobian is the gradient row evaluated at the intermediate point , which gives . The matrix product is . The two computations agree.

What this tells us: the chain rule reproduces the direct calculation, but it works without needing to first carry out the composition. The chain-rule machine takes the two Jacobian tables and multiplies — that is enough.

Check your understanding [Beginner]

Formal definition [Intermediate+]

Let and be open sets, let with , and let . The map is differentiable at when a linear map exists with $$ g(a + h) = g(a) + Dg(a) h + \rho_g(h), \qquad \lim_{h \to 0} \frac{|\rho_g(h)|}{|h|} = 0. $$ The map is the (Fréchet) derivative of at . In the standard bases of and , the matrix of is the Jacobian matrix , an array whose -entry is the partial derivative of the -th component of with respect to the -th variable, evaluated at . Analogous notation applies to at , where has Jacobian matrix of shape .

The composition rule states: if is differentiable at and is differentiable at , then is differentiable at with $$ D(f \circ g)(a) = Df(g(a)) \circ Dg(a). $$ In Jacobian-matrix form, , an ordinary matrix product of shapes and , producing a matrix. Following Apostol [Apostol Ch. 8 §8.18–8.21].

A few equivalent restatements are worth recording.

  • Component form. Writing and , the -entry of is the sum , the standard scalar-form chain rule of multivariable calculus.
  • Curve form. For a differentiable curve and a differentiable scalar , the derivative of at equals the inner product of the gradient with the tangent vector . This is the specialisation of the general statement.
  • Sign convention. Composition is read right-to-left: acts first, then . The matrix product inherits the same right-to-left convention. Reversing the order produces a different and incorrect matrix in general.

Counterexamples to common slips

  • Differentiability is stronger than the existence of partial derivatives. A map can have every partial derivative at and still fail to be differentiable at , in which case the chain rule does not apply. The classical witness is extended by : the partials and both equal , yet is not even continuous at the origin and therefore not differentiable there.
  • Continuity of the partials is the standard sufficient condition. If every exists and is continuous on , then is differentiable on . This is the practical hypothesis under which one normally invokes the chain rule.
  • Matrix order matters. The product is , in that order. Swapping the factors gives a matrix of shape when the chain rule wants a matrix, and the entries are unrelated.
  • The inner derivative is evaluated at the input; the outer derivative is evaluated at the intermediate point. A reader who evaluates both at commits an off-by-one error and recovers a numerically wrong derivative.

Key theorem with proof [Intermediate+]

Theorem (multi-variable chain rule). Let and be open. Let be differentiable at with , and let be differentiable at . Then is differentiable at and $$ D(f \circ g)(a) = Df(b) \circ Dg(a). $$

Proof. Set and . Define the two remainders $$ \rho_g(h) = g(a + h) - g(a) - A h, \qquad \rho_f(k) = f(b + k) - f(b) - B k, $$ defined for small enough that and for small enough that . By differentiability of at and of at , $$ \lim_{h \to 0} \frac{|\rho_g(h)|}{|h|} = 0, \qquad \lim_{k \to 0} \frac{|\rho_f(k)|}{|k|} = 0. $$

Fix small. Set , the intermediate displacement produced by . Compute the composition: \begin{align} (f \circ g)(a + h) &= f(g(a + h)) = f(b + k) \ &= f(b) + B k + \rho_f(k) \ &= f(b) + B(A h + \rho_g(h)) + \rho_f(k) \ &= f(g(a)) + (B A) h + B \rho_g(h) + \rho_f(k). \end{align}

The candidate linear approximation at is . The remainder of at is therefore $$ R(h) = B \rho_g(h) + \rho_f(k(h)). $$ To establish differentiability of at with derivative , the requirement is as .

For the first term, by linearity of and the operator-norm bound, $$ |B \rho_g(h)| \leq |B|_{\mathrm{op}} \cdot |\rho_g(h)|. $$ Since as , the ratio tends to as well.

For the second term, two estimates combine. The first bounds in terms of . From and the triangle inequality, $$ |k(h)| \leq |A|{\mathrm{op}} |h| + |\rho_g(h)| \leq (|A|{\mathrm{op}} + 1) |h| $$ for all sufficiently small that , which holds eventually because . Set , so on a small ball about . Note also that as because is continuous at (differentiability implies continuity), so .

The second estimate is the definition of . Given , choose so that implies . Choose so that forces both and the bound . Then implies $$ |\rho_f(k(h))| < \frac{\varepsilon |k(h)|}{C} \leq \frac{\varepsilon \cdot C |h|}{C} = \varepsilon |h|, $$ hence . Since was arbitrary, as .

Combining the two bounds, $$ \frac{|R(h)|}{|h|} \leq \frac{|B|_{\mathrm{op}} \cdot |\rho_g(h)|}{|h|} + \frac{|\rho_f(k(h))|}{|h|} \xrightarrow[h \to 0]{} 0. $$ The candidate linear map satisfies the differentiability defining condition for at . Uniqueness of the derivative (the linear map approximating at to first order is determined by its values on a basis through partial derivatives along coordinate directions) identifies .

Bridge. The composition rule is the structural backbone of differential calculus on Euclidean spaces, and four neighbouring frames lock into it. First, the proof reduces to the multi-variable limit and continuity of 02.05.01: continuity of at is what makes , which is what unlocks the estimate, and the path-independence requirement of 02.05.01 is what makes the linear-approximation language well-defined regardless of how approaches . Second, the rule connects to the implicit and inverse function theorems through a single corollary: if is invertible as a linear map, the chain rule applied to forces — the derivative of the inverse equals the inverse of the derivative, the key identity behind local invertibility. Third, the rule generalises to higher derivatives through the Faà di Bruno formula, in which the -th derivative of is a sum over set partitions of with combinatorial coefficients; the bare chain rule is the case with a single one-block partition. Fourth, the rule is the foundational reason that calculus has a local-to-global theory: it pushes derivatives through coordinate changes, through diffeomorphism reparametrisations, through pullbacks of differential forms, and through the local-coordinate patching that defines manifolds. Read together, the four bridges identify the chain rule as the load-bearing functoriality that lets the differential calculus of extend coherently to curves, surfaces, manifolds, bundles, and beyond.

Exercises [Intermediate+]

Lean formalization [Intermediate+]

lean_status: partial — Mathlib provides the multi-variable chain rule in Fréchet-derivative form through HasFDerivAt.comp and fderiv.comp, together with the continuous-linear-map composition on EuclideanSpace ℝ (Fin n). The Jacobian-matrix interpretation comes through ContinuousLinearMap.toMatrix paired with the standard basis on EuclideanSpace. The textbook-style packaging in Apostol notation and the Faà di Bruno higher-order chain rule under one named result is the Codex-facing gap.

[object Promise]

The companion module at Codex.Analysis.MultiVariable.ChainRule re-exports these statements and records the unification gap.

Advanced results [Master]

Banach-space chain rule. Let , , be Banach spaces, and be open, and let be Fréchet-differentiable at with and be Fréchet-differentiable at . Then is Fréchet-differentiable at with . The proof transcribes the Euclidean argument with operator norms in place of the matrix operator norm; completeness of , , is not used directly in the chain rule itself, which holds for normed spaces in general, but is invoked in downstream constructions (inverse function theorem, ODE existence) that depend on the chain rule [Dieudonné Ch. VIII].

Pushforward on tangent vectors. Let be a smooth map of smooth manifolds. The differential at , $\phi_{, p} : T_p M \to T_{\phi(p)} N\phi\psi \circ \phi(\psi \circ \phi){*, p} = \psi{, \phi(p)} \circ \phi_{, p}M = \mathbb{R}^nN = \mathbb{R}^m$ in standard coordinates. Functoriality of the tangent functor on the category of smooth manifolds is precisely this statement.

Pullback on differential forms. For a smooth map and a differential form on , the pullback $\phi^ \omega(\psi \circ \phi)^* = \phi^* \circ \psi^$. The reversal of order from pushforward to pullback reflects the contravariance of the form bundle. The chain rule of this unit is the -form / function-pullback specialisation , paired with the -form pullback formula , which itself encodes the chain rule.

Faà di Bruno formula. Let be -times differentiable at the appropriate points. Then $$ \frac{d^n}{dx^n} (f \circ g)(x) = \sum_{\pi \in \mathrm{Part}(n)} f^{(|\pi|)}(g(x)) \prod_{B \in \pi} g^{(|B|)}(x), $$ where the sum runs over all set partitions of , is the number of blocks of , and the product is over the blocks of [Faà di Bruno 1855]. The bare chain rule is the case with the single-block partition. The multivariable generalisation replaces real-valued with -multilinear maps and sums over compositions indexed by partition-and-flag data; Cartan's notation packages the construction efficiently.

Itô formula. Let be a continuous semimartingale with quadratic-variation process , and let . Then $$ f(X_t) = f(X_0) + \int_0^t f'(X_s) , dX_s + \frac{1}{2} \int_0^t f''(X_s) , d[X, X]_s. $$ The second integral is the stochastic correction term, absent from the deterministic chain rule, present because Brownian motion has nonzero quadratic variation [Itô 1944]. The formula extends to vector-valued semimartingales and functions of several variables, with the correction term involving the Hessian against the matrix-valued quadratic variation. Itô's discovery in 1944 is the foundation of stochastic calculus.

Synthesis. Five observations organise the unit. First, the chain rule reduces to four ingredients: the linear-approximation definition of the derivative, the operator-norm bound , the continuity of at that produces , and the triangle inequality used to bound in terms of . The four ingredients are precisely the basic machinery of the multi-variable limit and continuity unit 02.05.01, reorganised. Second, the rule supports both the Fréchet-derivative coordinate-free form and the Jacobian-matrix coordinate form simultaneously; the matrix product is the standard-basis representation of the operator composition. Third, the rule generalises smoothly to Banach spaces, with no change in the proof beyond the swap of matrix norms for operator norms; the Banach-space chain rule is the abstract platform from which the implicit function theorem, the Picard-Lindelöf theorem for ODEs, and Newton's method on Banach spaces all descend. Fourth, the rule is the foundational functoriality of differentiation: the tangent functor on the category of smooth manifolds, the pullback contravariantly on the de Rham complex, and the pushforward covariantly on tangent vectors all are the chain rule, dressed in categorical clothing. Fifth, the rule has a stochastic refinement — the Itô formula — in which the quadratic-variation correction term measures the failure of the classical chain rule for paths with nonzero quadratic variation; the correction term vanishes for processes of bounded variation, recovering the classical statement.

Full proof set [Master]

Multi-variable chain rule. Proved in §"Key theorem with proof" above by the linear-approximation argument with the two remainders and and the operator-norm bound on the outer linear map.

Derivative of the inverse. Proved as Exercise 6 by applying the chain rule to .

Directional-derivative form. Proved as Exercise 5 by specialising the general matrix form to a row times an column.

Banach-space chain rule. Statement above. The Euclidean proof transcribes verbatim with now the operator norm on bounded linear maps between Banach spaces and with , defined by the same condition. The key inequality holds for bounded operators on normed spaces, and the rest of the proof uses only the triangle inequality and the definitions.

Faà di Bruno (sketch). Statement above. Proof by induction on . The base case is the bare chain rule. The induction step differentiates the formula for derivatives once more and reorganises the sum over partitions: each refinement of an existing partition (adding a new singleton block or extending an existing block) corresponds to a term in the derivative of the previous-stage product. The combinatorial bookkeeping is the content of the formula; the analytic content is the chain rule applied times. Full proof in Faà di Bruno [Faà di Bruno 1855] and modern accounts via exponential generating functions for set-partition statistics.

Tangent-functor functoriality. Statement above (pushforward on tangent vectors). The differential at is defined via the chain rule applied to test functions: for a tangent vector and a smooth function near . The composition identity unpacks by associativity of composition. Applying the chain rule in local coordinates on , , and the target shows the linear-map identity .

Itô formula (sketch). Statement above. The proof discretises on a partition and expands to second order via Taylor's theorem. The first-order terms converge in probability to . The second-order terms converge to , the correction term, by the definition of quadratic variation. Higher-order Taylor terms are negligible because the partition mesh shrinks. Full proof in Itô [Itô 1944] and modern stochastic-calculus texts.

Connections [Master]

Multi-variable limit and continuity 02.05.01 — the chain rule's proof rests on continuity of at , the linear-approximation form of differentiability, and the operator-norm bound, all of which are the machinery the limit-and-continuity unit assembles. Without the path-independence requirement of 02.05.01, the linear-approximation language is not well-posed, and the chain rule loses its meaning.

Partial derivative and the differential (pending unit 02.05.02) — the chain rule's statement names the Fréchet derivative , whose existence requires the differentiability concept developed in the partial-derivative unit. The Jacobian-matrix form is the standard-basis representation of the operator composition. The chain rule is the principal computational theorem of the partial-derivative framework.

Implicit and inverse function theorems (pending unit 02.05.04) — the inverse function theorem produces a inverse from invertibility of ; the chain-rule identity (Exercise 6) is the bridge between the existence statement and the derivative formula. The implicit function theorem is a corollary of the inverse function theorem and inherits the chain-rule identity for its derivative formulas.

Smooth manifold 03.02.01 — the chain rule is the foundational reason that local-coordinate transition maps preserve the differential structure: a smooth atlas demands that overlapping charts compose to give smooth coordinate changes, and the chain rule lets one chart's partial derivatives translate to another. The tangent functor on the category of smooth manifolds is the chain rule, packaged categorically.

Differential forms and exterior derivative 03.04.04 — the pullback of a differential form under a smooth map satisfies , the contravariant functoriality of the form complex; specialisation to -forms (functions) gives the function-pullback chain rule. The identity — the exterior derivative commutes with pullback — is the chain rule in another guise.

Stokes's theorem and de Rham cohomology [03.04.05–06] — the change-of-variables formula for multi-variable integration, the differential-forms version of which is for an orientation-preserving diffeomorphism, has as its Jacobian-correction factor precisely the chain-rule Jacobian of this unit.

Ordinary differential equations (pending chapter 02.06) — the existence theorem for solutions to with smooth, the Picard-Lindelöf theorem, uses Banach fixed-point on a function space; differentiability of the flow at the initial condition is computed via the chain rule, with the variational equation governing the derivative of the flow.

Historical & philosophical context [Master]

Gottfried Wilhelm Leibniz introduced the chain rule in single-variable form in the 1684 Acta Eruditorum paper Nova methodus pro maximis et minimis [Leibniz 1684], with the original notation that survives in modern textbooks; the differential symbols , were Leibniz's invention and the chain rule was their first major computational payoff. Cauchy and Lagrange in the early nineteenth century gave rigorous proofs in the framework of single-variable analysis, with Cauchy's 1821 Cours d'analyse recording the - version.

The multi-variable version emerged through nineteenth-century pedagogical practice — Riemann's lectures, Jacobi's work on functional determinants, the implicit-function-theorem tradition of Dini — and was given its modern coordinate-free formulation by Élie Cartan around 1900 with the intrinsic differential , separated from its matrix representation. Apostol's 1969 Calculus Vol. 2 [Apostol Ch. 8 §8.18–8.21] packaged the Cartan-Dieudonné framing for an honours undergraduate audience, with the Jacobian-matrix form as the standard-basis incarnation. Francesco Faà di Bruno's 1855 paper [Faà di Bruno 1855] in the Annali di Scienze Matematiche e Fisiche gave the higher-order chain rule with set-partition coefficients; the combinatorial content was rediscovered independently several times before being attributed correctly. Kiyoshi Itô's 1944 Stochastic Integral [Itô 1944] in the Proceedings of the Imperial Academy of Tokyo extended the chain rule to stochastic processes with nonzero quadratic variation, introducing the correction term that defines Itô calculus and Itô's foundational role in modern probability theory.

Bibliography [Master]

[object Promise]