Triple Jeopardy
So I talked about why linear functions, not exponential functions, are equal to their own derivative. Now I want to try and convince you that it is in fact true for every function.
Ok not really, but after learning differentiable manifolds or sufficiently advanced differential calculus, one might see statements in papers that look like this:
…and after so much hard work, we have a map \(h:M\to N\). Its derivative, which we also denote by h , acts by…
So what I really want to do is explain why, when I read a statement like this, it makes perfect sense and flows nicely with how I’m picturing the situation.
Example 1
Let \(M\) be the sphere sitting in 3-dimensional space, (\(x^2 + y^2 + z^2 = 1\)), and let \(R:M\to M\) be the map which rotates the sphere about the \(z\)-axis by an angle \(\pi/2 = 90\degree\). We can take the derivative of \(R\) at a point \(p\), and formally we have a map from a tangent space to a tangent space:
\[\left.{\rm d}R\right\rvert_{p} : T_pM \to T_{R(p)}M .\]Now look closely at the north pole, the point \(n=(0,0,1)\) on the sphere. The map \(R\) obviously fixes this point, so the derivative \({\rm d}R := \left.{\rm d}R\right\rvert_{n}\) maps the tangent space at \(n\) to itself:
\[{\rm d}R : T_nM \to T_nM .\]What does this action explicitly look like? Well, you can do the official thing for computing derivatives on manifolds - go to a local chart around \(n\) and compute the derivative there. That’s perfectly fine to do, but let’s instead use the embedding of \(M\) into \(\R^3\), as I think it’ll translate easier.
The map \(R\) is actually the restriction of a map \(\hat R: \R^3\to \R^3\), the linear “\((x,y)\)-plane-rotation-by-\(\pi/2\)” map. In matrix form,
\[\hat R = \begin{pmatrix} \cos\frac\pi2 & -\sin\frac\pi2 & 0 \\ \sin\frac\pi2 & \cos\frac\pi2 & 0 \\ 0 & 0 & 1 \end{pmatrix} = \begin{pmatrix} 0 & -1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{pmatrix}\]We can get \({\rm d}R\) by taking \({\rm d}\hat R\) at \(n\), and then restricting to the tangent space \(T_nM\), which in this case can be thought of quite literally as the 2-dimensional horizontal tangent plane through \(n\). Now \({\rm d}\hat R= \hat R\), since the derivative of a linear function is itself. Or to be super precise, the derivative of a linear function at any point \(p\in \R^k\) is equal to itself, defined on the tangent space at \(p\) which is thought of as a copy of \(\R^k\) that has been “shifted” so that its origin now lies at \(p\). For us, this translates to: \({\rm d}R\) acts on the tangent plane to the sphere at \(n\) by the restriction of the above matrix to the “\((x,y)\)-submatrix”, since the tangent plane is parallel to the \((x,y)\)-plane:
\[{\rm d}R = \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}\]That is, as \(R\) rotates the sphere, \({\rm d}R\) rotates the plane tangent to the sphere! They are doing the same thing!
But you just basically took a function that was linear, and showed that its derivative is itself. You already did this!
Well, technically the function was defined on the sphere - and there’s no such thing as a “linear function” on a non-linear space such as the sphere…
You know what I mean!
Ok, yes, to be fair I would probably also call that function “linear”, even on the sphere. But the main point here shows up when you start zooming in to look closely at the north pole, and conflating points near n with points in the tangent plane at n. After all, what was the point of looking at the tangent plane (in the definition of the derivative)? It’s because, when you move to points slightly away from \(n\), corresponding points in the tangent plane remain close. So the (differentiable) function should be behaving relatively the same on corresponding points. In this case, it’s rotating nearby points, and thus also the tangent plane, by \(\pi/2\), (of course any angle will work here). Remember the derivative is supposed to measure precisely what is happening with a function locally.
Example 2
Let’s go back to basics, and think of a function \(f:\R\to \R\). Picture its graph in the plane. At any point \(x\), we can figure out the value of \(f(x)\) visually, by starting at the origin, moving along the \(x\)-axis until we get to the value \(x\), then going straight up (or down) until we hit the graph of \(f\), and finally moving horizontally back to the \(y\)-axis. Then \(f(x)\) is exactly how far along the \(y\)-axis you ended up.
OK, maybe that’s a bit overly procedural, but there is a similar process when computing the derivative \(f'(x)\), which remains faithful to the formal concept of mapping tangent spaces. Remember that \(f'(x)\) is really a \(1\times 1\) matrix, whose entry may be \(x\)-dependent. The algorithm for the derivative is as follows. Start at the point \(x\) on the \(x\)-axis, and consider a vector starting at \(x\) which points directly to the right 1 unit. Then “lift” the vector to the graph: that is, move vertically to the graph of \(f\), and draw a vector which is tangent to the graph and has horizontal component \(+1\), (so that it vertically projects down to look the same as the original vector). Finally move horizontally back to the \(y\)-axis, and draw the horizontally projected lifted vector, which points vertically along the \(y\)-axis. The vertical component of this vector is the quantity \(f'(x)\).
I could have read off the information directly from the lifted vector siting along the graph of \(f\), but I hope this illustrates the idea of a vector starting in the tangent space of the domain, and mapping to the tangent space in the image space. Of course, the bigger point to see is that it’s visually very similar to plugging points into \(f\).
By way of tangent vectors
If \(p\in M\) and \(X\in T_pM\) is a tangent vector at \(p\). one should think of \(X\) as a status of movement, or as a velocity vector of some (moving) object at a moment in time (instantaneous velocity). Really, \(X\) captures the statement: “I’m at \(p\) and I’m about to take a step in the direction \(X\).”
When you have a function \(f:M\to N\), we can of course plug a point \(x\) of \(M\) into \(f\), and we get as output a point \(f(x)\) in \(N\). When working with the derivative, we formally plug a tangent vector in \(M\), say \(X\), into \({\rm d}f\), and we get as output a tangent vector in \(N\), \({\rm d}f(X)\). This formally separates different processes which should really be thought of as the same. The derivative is just a way of allowing \(f\) to interpret a tangent vector as an input, but intuitively it should be nothing new. Really, \(f\) contains all the information, and the operations of calculus are carried by tangent vectors.
If I plug in \(x=2\) into the function \(f(x) = x^2\), it will spit the number \(4\) at me. If I tell \(f\) that I’m sitting at \(x=3\) it will tell me it’s sitting at \(9\).
If I tell \(f\): “I’m at \(x=3\), with the intent of moving to the right at a speed 1”, then it will tell me “I’m at \(9\), with the intent of changing with the speed \(+6\)”.
If I tell \(f\) that I’m at \(x\), and I’m going to move, then \(f\) will tell me that it’s at \(x^2\) and it’s going to move \(f'(x) = 2x\) as fast.