Reverse-Mode Automatic Differentiation
Support for first and second-order, reverse-mode autodiff are provided by gradient and Hessian tapes.
First Order
First-order, reverse-mode autodiff can be performed using Gradient tapes.
Gradient Tapes
Gradient tapes keep track of operations and their contributions to the gradient; unlike forward-mode autodiff, reverse-mode autodiff requires tracking since it applies the chain rule in "reverse." To begin using reverse-mode autodiff, create a gradient tape and assign it variables to track. These variables will be passed into and returned from methods supported by the tape.
using Mathematics.NET.AutoDiff;
using Mathematics.NET.Core;
GradientTape<Real> tape = new();
var x = tape.CreateVariable(1.23);
We also have to pass in an initial value—the point at which the gradients are calculated—to the variable creation method. If we want to track multiple variables, we can simply write
GradientTape<Real> tape = new();
var x = tape.CreateVariable(1.23);
var y = tape.CreateVariable(0.66);
// Add more variables as needed.
To simplify this process, we may instead choose to create a vector of variables.
AutoDiffVector3 x = tape.CreateAutoDiffVector(1.23, 0.66, 2.34);
Once we are satisfied, we may use these in our equations.
Single-Variable Equations
Suppose we want to compute the derivative of the function
at the point . We can write
var result = tape.Divide(
tape.Multiply(tape.Sin(x), tape.Ln(x)),
tape.Exp(
tape.Multiply(-Real.One, x)));
which will give us the value of the function at our specified point. At this moment, the derivative has not been calculated, but we are, however, able to examine the nodes that have been added to our tape. We can use LogNodes
to do so, provided we have set up a logger.
using Microsoft.Extensions.Logging;
using var loggerFactory = LoggerFactory.Create(builder => builder.AddConsole());
var logger = loggerFactory.CreateLogger<GradientTape<Real>>();
tape.LogNodes(logger, CancellationToken.None);
Here, we pass in a cancellation token in case the gradient tape is large and we do not want to log all of the nodes onto the console. We can also set a limit to how many nodes are logged (by default, this value is 100).
tape.LogNodes(logger, CancellationToken.None, 25);
Using this on our gradient tape will give us the following output:
Root Node 0:
Weights: [0, 0]
Parents: [0, 0]
Node 1:
Weights: [0.3342377271245026, 0]
Parents: [0, 1]
Node 2:
Weights: [0.8130081300813008, 0]
Parents: [0, 2]
Node 3:
Weights: [0.20701416938432612, 0.9424888019316975]
Parents: [1, 2]
Node 4:
Weights: [-1, 0]
Parents: [0, 4]
Node 5:
Weights: [0.2922925776808594, 0]
Parents: [4, 5]
Node 6:
Weights: [3.4212295362896734, -2.2837086494091605]
Parents: [3, 5]
The root node represents the variable we are currently tracking. Nodes from unary operations will provide one weight and parent, while nodes from binary operations will provide two weights and parents. This may be helpful when we want to determine which node came from which operation. (For performance reasons, the names of these methods are not tracked.) Below is a graph representation of the nodes on our gradient tape:
Finally, we can calculate the gradient of our function by using ReverseAccumulate
.
tape.ReverseAccumulate(out var gradient);
Since this is a single variable equation, we can access the first element of gradients
to get our result.
Console.WriteLine(gradient[0]); // 3.525753368769319
Multivariable Equations
The multivariable case is as simple as the single variable case; we only need to track more variables on our gradient tape.
var x = tape.CreateVariable(1.23);
var y = tape.CreateVariable(0.66);
var z = tape.CreateVariable(2.34);
Now, if we wanted to compute the gradient of the function
at the points , , and , we can write
var result = tape.Divide(
tape.Cos(x.X1),
tape.Multiply(
tape.Add(x.X1, x.X2),
tape.Sin(x.X3)));
If we want to examine the nodes, we can once again use LogNodes
.
Root Node 0:
Weights: [0, 0]
Parents: [0, 0]
Root Node 1:
Weights: [0, 0]
Parents: [1, 1]
Root Node 2:
Weights: [0, 0]
Parents: [2, 2]
Node 3:
Weights: [-0.9424888019316975, 0]
Parents: [0, 3]
Node 4:
Weights: [1, 1]
Parents: [0, 1]
Node 5:
Weights: [-0.695563326462902, 0]
Parents: [2, 5]
Node 6:
Weights: [0.7184647930691263, 1.8900000000000001]
Parents: [4, 5]
Node 7:
Weights: [0.7364320899293144, -0.18126788958785509]
Parents: [3, 6]
Notice that there are now three root nodes, each representing the variables , , and , respectively. Here is a graph representation of our tape:
As before, we can use ReverseAccumulate
to get our gradients and write them to the console:
tape.ReverseAccumulate(out var gradient);
Console.WriteLine(gradient.ToDisplayString()); // [-0.8243135949243512, -0.13023459678281554, 0.2382974299363868 ]
which, for clarity, represents the following equations:
AutoDiff Vectors
Instead of tracking , , and individually, we can create a vector of variables.
tape.CreateAutoDiffVector(1.23, 0.66, 2.34);
We can use this to calculate, for example, a Jacobian-vector product with the vector functions
and the vector for . Here is the code:
using Mathematics.NET.AutoDiff;
GradientTape<Real> tape = new();
var x = tape.CreateAutoDiffVector(1.23, 0.66, 2.34);
Vector3<Real> v = new(0.23, 1.57, -1.71);
var result = tape.JVP(F1, F2, F3, x, v);
Console.WriteLine(result);
// f(x, y, z) = Sin(x) * (Cos(y) + Sqrt(z))
static Variable F1(GradientTape tape, AutoDiffVector3 x)
{
return tape.Multiply(
tape.Sin(x.X1),
tape.Add(tape.Cos(x.X2), tape.Sqrt(x.X3)));
}
// f(x, y, z) = Sqrt(x + y + z)
static Variable F2(GradientTape tape, AutoDiffVector3 x)
{
return tape.Sqrt(
tape.Add(
tape.Add(x.X1, x.X2),
x.X3));
}
// f(x, y, z) = Sinh(Exp(x) * y / z)
static Variable F3(GradientTape tape, AutoDiffVector3 x)
{
return tape.Sinh(
tape.Multiply(
tape.Exp(x.X1),
tape.Divide(x.X2, x.X3)));
}
Note that this time, we do not call the method ReverseAccumulate
since the JVP
does it for us. This should give us the following result: (-1.2556937075301358, 0.021879748724684178, 4.842981131678516)
.
Complex Variables
We can also work with complex numbers and complex derivatives by specifying [Complex] as a type parameter when we create our gradient tape. Suppose we want to find the gradient of the function:
at the points and . We can write
using Mathematics.NET.AutoDiff;
using Mathematics.NET.Core;
GradientTape<Complex> tape = new();
var z = tape.CreateVariable(new(1.23, 2.34));
var w = tape.CreateVariable(new(-0.66, 0.23));
var result = tape.Cos(
tape.Multiply(
tape.Sin(z),
tape.Sqrt(w)));
// Optional: examine the nodes on the gradient tape
tape.LogNodes(logger, CancellationToken.None);
Console.WriteLine();
tape.ReverseAccumulate(out var gradient);
// The value of the function at the point z = 1.23 + i2.34 and w = -0.66 + i0.23
Console.WriteLine("Value: {0}", result);
// The gradient of the function: ∂f/∂z and ∂f/∂w, respectively
Console.WriteLine("Gradient: {0}", gradient.ToDisplayString());
This is similar to the code we would have written in the real number case. (Note that some methods such as Atan2
are not available for complex gradient tapes.) This outputs the following to the console:
Root Node 0:
Weights: [(0, 0), (0, 0)]
Parents: [0, 0]
Root Node 1:
Weights: [(0, 0), (0, 0)]
Parents: [1, 1]
Node 2:
Weights: [(1.7509986221653533, -4.84670574511495), (0, 0)]
Parents: [0, 2]
Node 3:
Weights: [(0.09980501743235655, -0.5896861208610882), (0, 0)]
Parents: [1, 3]
Node 4:
Weights: [(0.13951299258538988, 0.8242959875555208), (4.937493465463717, 1.7188022913039218)]
Parents: [2, 3]
Node 5:
Weights: [(24.762656886395174, -27.774291395591305), (0, 0)]
Parents: [4, 5]
Value: (27.784322505370138, 24.753716703326287)
Gradient: [(126.28638563049401, -98.74954259806483), (-38.801295827094066, -109.6878698782088) ]
Custom Operations
If there is a function we need that is not provided in the class, we are still able to use it for our gradient tape provided we know its derivative. Suppose, for example, we did not have the Sin
method. Since we know its derivative is Cos
, we could write the following:
GradientTape tape = new();
var x = tape.CreateVariable(1.23);
var result = tape.CustomOperation(
x, // A variable
x => Real.Sin(x), // The function
x => Real.Cos(x)); // The derivative of the function
tape.ReverseAccumulate(out var gradient);
Console.WriteLine("Value: {0}", result);
Console.WriteLine("Gradient: {0}", gradient.ToDisplayString());
For custom binary operations in general, we can write
_ = tape.CustomOperation(
x,
y,
(x, y) => // f(x, y)
(x, y) => // ∂f/∂x
(x, y) => // ∂f/∂y
);
Using variables in loops is not recommended since each iteration will add nodes to the tape. If the derivative of the operation is known ahead of time, it may be possible to avoid this problem by using custom operations.
Second Order
Second-order, reverse-mode autodiff can be performed using Hessian tapes.
Hessian Tapes
The steps needed to perform second-order, reverse-mode autodiff is similar to the steps needed to perform the first-order case. This time, however, we have access to the following overloads and/or versions of ReverseAccumulate:
HessianTape<Complex> tape = new();
// Do some math...
// Use when we are only interested in the gradient
tape.ReverseAccumulate(out ReadOnlySpan<Complex> gradient);
// Use when we are only interested in the Hessian
tape.ReverseAccumulate(out ReadOnlySpan2D<Complex> hessian);
// Use when we are interested in both the gradient and Hessian
tape.ReverseAccumulate(out var gradient, out var hessian);
The last version may be useful for calculations such as finding the Laplacian of a scalar function in spherical coordinates, which involves derivatives of first and second orders:
Note that, in the future, we will not have to do this manually since there will be a method made specifically to compute Laplacians in spherical coordinates. For now, if we wanted to compute the Laplacian of the function
we can write
using Mathematics.NET.AutoDiff;
using Mathematics.NET.Core;
HessianTape<Real> tape = new();
var x = tape.CreateAutoDiffVector(1.23, 0.66, 0.23);
// f(r, θ, ϕ) = cos(r) / ((r + θ) * sin(ϕ))
_ = tape.Divide(
tape.Cos(x.X1),
tape.Multiply(
tape.Add(x.X1, x.X2),
tape.Sin(x.X3)));
tape.ReverseAccumulate(out var gradient, out var hessian);
// Manual Laplacian computation
var u = Real.One / (x.X1.Value * Real.Sin(x.X2.Value)); // 1 / (r * sin(θ))
var laplacian = 2.0 * gradient[0] / x.X1.Value +
hessian[0, 0] +
u * Real.Cos(x.X2.Value) * gradient[1] / x.X1.Value +
hessian[1, 1] / (x.X1.Value * x.X1.Value) +
u * u * hessian[2, 2];
Console.WriteLine(laplacian);
which should give us 48.80966092022821
.