Enforcing Symmetries in Neural Networks#

We are going to explain what we mean by symmetries, why they are important, and how we can enforce them in neural networks.

What are symmetries?#

Generally speaking, we say that some mathematical object possesses a certain symmetry if it either remains unchanged, or changes in a predictable way, under some transformation. We are particularly interested in symmetries of mathematical equations that describe a physical system. Typically these equations are written in terms of variables expressed in a coordinate system. But the origin and orientation of that coordinate system are arbitrary, and the form of the equations should not depend on them.

How do we describe symmetries?#

Symmetries are described using the language of group theory. A group G is a set of elements typically denoted by g1,g2,, along with a binary operation:

:G×GG,

that combines two elements to produce a third element:

(g1,g2)(g1,g2)g1g2=g3.

Typically, we do not need to write the explicitly, and we can simply write g1g2=g3. The group must satisfy the following properties:

  1. Closure: For all g1,g2G, g1g2G.

  2. Associativity: For all g1,g2,g3G, (g1g2)g3=g1(g2g3).

  3. Identity: There exists an element eG such that for all gG, eg=ge=g.

  4. Inverses: For all gG, there exists an element g1G such that gg1=g1g=e.

Examples of groups#

You already know many examples of groups. Here is some of them.

The set of integers with addition#

The set of integers Z forms a group under addition +. We write the group by (Z,+). All the properties of a group are satisfied. The set of integers is closed under addition, the addition is associative, the identity element is 0, and the inverse of an integer n is n.

The set of non-zero rational numbers with multiplication#

The set of non-zero rational numbers, Q{0}, forms a group under multiplication ×. Again, we write the group by (Q{0},×). All the properties of a group are satisfied. The set of rational numbers is closed under multiplication, the multiplication is associative, the identity element is 1, and the inverse of a rational number q is 1/q.

The set of real numbers with addition#

The set of real numbers R forms a group under addition +. We write the group by (R,+). All the properties of a group are satisfied in an obvious way. The identity element is 0 and the inverse of a real number x is x.

The set of non-zero real numbers with multiplication#

The set of non-zero real numbers, R{0}, forms a group under multiplication ×. Check the group properties for yourself.

The translation group#

Consider the Euclidean space R3 with elementwise addition:

(x1,x2,x3)+(y1,y2,y3)=(x1+y1,x2+y2,x3+y3).

(R3,+) forms a group. The identity element is (0,0,0) and the inverse of (x1,x2,x3) is (x1,x2,x3). This is called the translation group denoted by T(3). It can be generalized to n dimensions, T(n).

The general linear group#

The general linear group of dimension n, denoted by GL(n,R), is the set of all invertible n×n matrices. Mathematically,

GL(n,R)={ARn×ndet(A)0}.

The group operation is matrix multiplication. Check the group properties for yourself. What is the identity element of GL(n,R)?

The special orthogonal group#

The special orthogonal group of dimension n, denoted by SO(n), is the set of all n×n orthogonal matrices with determinant one. Mathematically,

SO(n)={ARn×nAAT=I,det(A)=1}.

The group operation is matrix multiplication. You can think of SO(n) as the group of all rotations in n dimensions. Check that the SO(n) is indeed closed under matrix multiplication.

SO(n) is a subgroup of GL(n). We write:

SO(n)GL(n).

This means that SO(n) is a group in its own right, and it is a subset of GL(n).

Let’s look at some examples of these groups in 3D space. Rotation of degree θ about the z-axis is a member of SO(3). It corresponds to the matrix:

R1=[cos(θ)sin(θ)0sin(θ)cos(θ)0001].

Check that det(R1)=1.

Similarly, rotation of degree ϕ about the y-axis is:

R2=[cos(ϕ)0sin(ϕ)010sin(ϕ)0cos(ϕ)].

Check again that det(R2)=1.

If you multiply the two, you get:

R3=R1R2=[cos(θ)cos(ϕ)sin(θ)cos(θ)sin(ϕ)sin(θ)cos(ϕ)cos(θ)sin(θ)sin(ϕ)sin(ϕ)0cos(ϕ)].

Check that det(R3)=1. Another thing to observe is that R2R1R1R2. We say that matrix multiplication is not commutative.

The orthogonal group#

The orthogonal group of dimension n, denoted by O(n), is the set of all n×n orthogonal matrices. Mathematically,

O(n)={ARn×nAAT=I}.

Again, the group operation is matrix multiplicatin. You can think of O(n) as the group of all transformations that preserve the length of vectors. This includes rotations and reflections. Check that O(n) is closed under matrix multiplication.

O(n) is a subgroup of GL(n) and it contains SO(n) as a subgroup. We write:

SO(n)O(n)GL(n).

An element of the orthogonal group not in the special orthogonal group is a reflection, for example:

R4=[100010001].

Check that R4R4T=I.

Group homorphisms#

Two groups are homomorphic if there is a function between them that preserves the group structure. This just means that the group elements of one group are relabeled versions of the group elements of the other group. The function that does this is called a homomorphism.

Mathematically, let G and H be two groups. If we can find a function ϕ:GH such that:

ϕ(g1g2)=ϕ(g1)ϕ(g2),

for all g1,g2G, then f is a homomorphism from G to H.

If the homomorphism is bijective, i.e., one-to-one and onto, then it is called an isomorphism. When G and H are isomorphic, we write GH.

Example: Z is homomorphic to Z3#

Let’s look at an example of an homorphism. Consider the group of integers modulo 3, denoted by Z3. The group operation is addition modulo 3. For example, we have

0+1(mod 3)=1,1+2(mod 3)=0.

Note that x(mod 3) is the remainder when x is divided by 3. So, Z3={0,1,2}.

Now, consider the function

f:ZZ3,

that gives the remainder of an integer is divided by 3:

f(x)=x(mod 3).

This is an homomorphism. Check that:

f(x+y)=(x+y)(mod 3)=x(mod 3)+y(mod 3)=f(x)+f(y).

But it is not an isomorphism because it is not one-to-one.

Example: The group of real numbers with addition is isomorphic to the group of positive real numbers with multiplication#

Let R+ be the set of positive real numbers. It is a group under multiplication. Why?

Consider the function:

f:RR+,

defined by:

f(x)=ex.

The function is one-to-one and onto. Show that it is a homomorphism.

Group of transformations#

Let V be a set. A group of transformations of V is a group, say G, of bijections from V to V. So, an element g of G is a function:

g:VV,

that is one-to-one and onto (bijective). Here the group operation is the composition of functions and we are assuming that G is closed under composition. So, if g1 and g2 are in G, then the composition:

g=g1g2,

defined by:

x(g1g2)(x)=g1(g2(x)),

is also in G.

Why is G a group? Well, first function composition is associative:

(g1g2)g3=g1(g2g3).

Second, the identity function e is in G (it is one-to-one and onto):

e(x)=x,

and it is the identity element of G. Finally, the inverse of a function g is also in G.

Group representations#

A group representation is a way to represent the elements of a group as matrices. Using group representations you can study the group using linear algebra.

Example: The group of invertible linear transformations is isomorphic to the general linear group#

Let V be a real vector space of dimension n and let GL(V) be the set of all invertible linear transformations of V, i.e.,

GL(V)={f:VVf is invertible}.

GL(V) is a group under function composition.

We will show that it is isomorphic to the general linear group GL(n). Let B={e1,,en} be a basis of V. Then, any linear transformation fGL(V) can be represented by a matrix A such that:

f(ei)=j=1nAijej.

The matrix A is invertible because f is invertible.

The map

ϕ:GL(V)GL(n),

that sends a linear transformation to its matrix representation

fϕ(f)=A,

is an isomorphism between GL(V) and GL(n). Why? We can write:

GL(V)GL(n).

The Eucledian group#

Consider the Euclidean space Rn. This is the space on which we write physical equations. We will introduce the Euclidean group E(n) which is a group of transformation of the Euclidean space that corresponds to changes of coordinates that, in many physical examples, do not change the form of the equations. It is not the only such group, but it is an easy example to start with.

The Euclidean group E(n) is the group of all isometries of Rn. To explain this, we need to introduce the concept of affine transformations. An affine transformation is a linear transformation followed by a translation. So, f:RnRn is an affine transformation if:

f(x)=g(x)+b,

where g:RnRn is a linear transformation and bRn is a translation vector. An isometry is an affine transformation that preserves distances. This means that:

f(x)f(y)=xy,

for all x,y in Rn.

Now, we can define the Euclidean group E(n) as:

E(n)={f:RnRnf is an isometry}.

Again, the group operation is function composition. We can show that E(n) can be written as some sort of Cartesian product of the translation group T(n) and the orthogonal group O(n):

E(n)T(n)O(n).

Here the symbol stands for a semidirect product. We say that E(n) is a semidirect product of product of O(n) extended by T(n). The semidirect product gives us a way to reduce the group structure of E(n) to the group structures of T(n) and O(n) and to do everything in terms of linear algebra.

Intuitively, the semidirect product means that the group E(n) is a combination of the translation group and the orthogonal group. So, every element of E(n) can be written as a translation followed by a rotation/reflection, i.e., as (b,A), where b is a translation vector and A is an orthogonal matrix. You can apply that transformation to any vector x of Rn by:

(b,A)x=A(x+b).

The semidirect product specifies how the translation and the rotation/reflection interact. Take two elements (b1,A1) and (b2,A2) of E(n). Their composition is:

(b1,A1)(b2,A2)=(b1+A1b2,A1A2).

It is the semidirect product that specifies how the two group operations interact.

Understanding the semidirect product is not trivial. We do it in the next section, but feel free to skip it if you are not interested.

Semidirect products#

You can skip this section if you are not interested in the details of the semidirect product.

Let G be a group and H and K be two subgroups of G, i.e.,

HG,KG.

Furthermore, we assume that H and K only share the identity element, i.e.,

HK={e},

and that elements of G can be uniquely written as a product of elements of H and K, i.e.,

G=HK={hkhH,kK in a unique way}.

We are going to assume that H is a normal subgroup of G, i.e., for all hH and gG, we have:

ghg1H.

This is a wierd assumption, but it is necessary for the semidirect product to work. We will show that under these assumptions the group G can be written as:

GHK.

The meaning of everything will become apparent as we go.

First, GHK means that G means that there is a bijection from H×K to G that preserves the group structure. Let’s make this bijection explicit. We define:

ϕ:H×KG,

such that:

(h,k)ϕ(h,k)=hk.

This map is indeed onto because every element of G can be written as a product of an element of H and an element of K. It is also one-to-one. Suppose that:

hk=hk,

for some h,hH and k,kK. Then, multiplying by k1 on the right, we get:

h=hkk1.

Multiplying with (h)1 on the left, we get:

h(h)1=kk1.

Now, the left hand side is an element of H and the right hand side is an element of K. But H and K only share the identity element. So, h=h and k=k. This shows that the map is one-to-one.

To show that the map preserves the group structure, we need to explicitly define the group operation on H×K. This is where the comes in. It fixes the way the group operations of H and K interact. We will construct the group operation so that the map ϕ preserves the group structure. This is what we want to achieve:

ϕ((h1,k1)(h2,k2))=ϕ(h1,k1)ϕ(h2,k2).

On the right hand side, we just use the definition of the map:

ϕ((h1,k1)(h2,k2))=ϕ(h1,k1)ϕ(h2,k2)=h1k1h2k2.

Now, we try to turn the result into something that looks like a product of an H with a K. Let’s just introduce a k11k1=e in the middle:

ϕ((h1,k1)(h2,k2))=h1k1h2k2=h1k1h2ek2=h1k1h2k11k1k2=(h1k1h2k11)(k1k2).

Clearly, the term in the secon parenthesis is in K. What about the term in the first parenthesis? It is the product of an element of H and the wierd blue term k1h2k11. This is where the normality of H comes in. k1 is in K which is a subgroup of G, so it is also in G. h2 is in H which is a normal subgroup of G. So, k1h2k11 is in H. This means that the product of an element of H and the blue term is in H. From this, we see that we are forced to choose the following definition for the group operation on H×K:

(h1,k1)(h2,k2)=(h1k1h2k11,k1k2).

Connection to the Euclidean group#

This is all too theoretical. How does this connect to the Euclidean group? Take:

G=E(n),H=T(n),K=O(n).

First, we do have that T(n) and O(n) are subgroups of E(n). This is obvious. Let’s prove all the other things we need.

We start by proving that E(n)=T(n)O(n). Let f be an element of E(n). Consider the vector to which f maps the origin:

b=f(0).

Now, define the translation tb by:

tb(x)=x+b.

This is an element of T(n). Finally, define the function g by:

g=tb1f.

Or in terms of its action on a vector x:

g(x)=tb1f(x)=x=f(x)b.

We will show that g is an element of O(n). g is obviously linear and an isometry. All, we need to show is that it keeps the origin fixed. Indeed, by construction:

g(0)=f(0)b=bb=0.

Try to show that the decomposition is unique.

Now, we need to show that T(n) is a normal subgroup of E(n). Let f be an element of E(n) and tb be an element of T(n). The latter is such that:

tb(x)=x+b.

Now, consider the composition:

h=ftbf1.

We need to show that this is an translation, i.e., it is in T(n). Where does an arbitrary vector x go under this map? We have:

h(x)=ftbf1(x)=f(f1(x)+b)=x+f(b)=tf(b)(x).

This shows that h is a translation:

h=ftbf1=tf(b),

which proves the desired result.

Finally, we need to show that T(n)O(n)={e}, where e is the identity element of E(n), i.e.,

e(x)=x.

This is obvious. Why?

Having proved all these things, we can use the result above to write:

E(n)T(n)O(n).

Let’s write down the group operation explicitly:

(tb1,A1)(tb2,A2)=(tb1A1tb2A11,A1A2).

We can simplify the first term of the right hand side:

tb1A1tb2A11(x)=tb1A1(A11x+b2)=A1A11x+A1b2+b1=x+A1b2+b1.

If we identify b1 with the translation vector b1 and A1 with the rotation/reflection matrix A1, we get exactly what we wrote in the previous section. By the way, this identification is another isomorphism.

Invariance#

Now that we know what symmetries are, we can talk about invariance. Consider a function f from a vector space V to the real numbers. We say that f is invariant under a group of transformations G if:

f(g(x))=f(x),

for all x in V and all g in G.

If you have a physical problem with a known symmetry like that, you better construct a model that respects that symmetry.

If the symmetry group is finite, then there is an easy way to construct an invariant function from an arbitrary function. Say hθ is an arbitrary real function of V parameterized by θ. Define the function:

fθ(x)=gGhθ(g(x)).

Show that fθ is invariant under G.

When G is a continuous group, the sum above becomes an integral:

fθ(x)=Ghθ(g(x))dg.

But integrating over a continuous group is not trivial. You get into the theory of Lie groups. It is also very likely that you will not be able to find an explicit expression for fθ.

Equivariance#

Let f be a function from a vector space V to another vector space W. Let G be a group of transformations that can act both on V and W. Suppose that you know that f changes in a predictable way under the action of G. We can write this as:

f(g(x))=g(f(x)).

When this happens, we say that f is equivariant under G.

In the equation above, on the left hand-side g acts on V and on the right hand-side g acts on W. These actions can be very different. Let me give you a specific example. Suppose that V is the set of 3D coordinates of a molecule with n atoms. We can write:

V=R3n,

or for an x in V:

x=(r1,,rn),

where ri is the position of the i-th atom.

Now, suppose that f gives us the force acting on each atom of the molecule. Again, we have:

W=R3n,

and for an f in W:

f=(f1,,fn),

where fi is the force acting on the i-th atom.

Now, take G to be the Euclidean group E(3). An element g=(b,A) of E(3) acts on V by rotating and translating each atom of the molecule, i.e., by:

gx=(Ar1+b,,Arn+b).

How do the forces change under the action of G? The translation part of g does not change the forces. But the rotation part does. The force acting on the i-th atom changes by:

gfi=Afi.

Eucledian neural networks#

Eucledian neural networks, see (Geiger et al. 2022), are neural networks that respect the symmetries of the Euclidean group. They rely on spherical harmonics to construct invariant and covariant functions. More details in their paper. We are going to demonstrate what they are capable of using numerical examples.