Let’s get something straight from the beginning: this is a fun post! The title is catchy, I’ll admit that, but there are no generalisations claims here. I’m not going to give you a complete formulation of machine learning in Dirac notation. This post is the story of how one can get inspired by Dirac notation and start writing down matrices in a weird way which turns out to be kind of useful to understand the specific problem of matrix factorization. I’m not sure if the notation is elegant or if it can be of some use to do other things, I’ll leave this decision to future generations. I think it has some weird beauty in it (not as much as the original Dirac notation, of course). Before I start, a warning: this post is not suitable for people that are highly sensitive to notation abuse. I abuse Dirac notation a lot here, and may the gods of mathematics have mercy of my soul.

#### Dirac notation in a nutshell

Let me just start by telling you what Dirac notation is. Dirac notation is a simple and concise way to write vectors and matrices. I mean, it is actually beautifully defined in a formal way, follow the link if you want to know more. But, for our porpuses, all we need to know here is that it is a nice way to write down and manipulate vectors. Let’s look into some examples.

We might want to start with the simplest case: 2-dimensional vectors. Let assume we have a 2-dimesional vector space (we will indicate this vector space as

For example, we can pick the euclidian basis, which is orthonormal and in 2-d can be represented as

If we substitute the column expression and we use basic vector algebra, we obtain a column representation of the vector

Suppose now we have two vectors

where *row* corresponding to the column

So far so good. However, this notation can be heavy, especially if we have many vectors to write down and multiply in more complex ways. Luckily, there’s another way to write down column and rows vectors, which proved itself more powerful and clear, at least in some context (quantum mechanics): Dirac notation. Dirac notation is actually a symbolism to write vectors in any vector space (discrete or continuous ) of any dimensions. Moreover, it’s also possible to write the vectors in the *dual* space associated with the vector space in an immediate way. Here, let’s just focus on discrete dimensional spaces, where a vector is represented as a column and the corresponding vector in the dual space is a row.

Let’s start with our 2-dimensional example. It’s actually very simple: whenever we have column vectors, for example the basis vectors

The vector

Now, if we want to transpose *row* (corresponding to a vector in the dual space). When we have a row, we can write it in Dirac notation inside a “bra” as

where

If we define a second vector

Moreover, we can also combine the bras and the kets the other way around, as

We just need to make sure the vectors have the same dimension

#### Customized Dirac notation

What’s next? Now that we have learnt how to represent vectors in Dirac notation, we would like to use it to represent *matrices*! How can we do that? The natural environment for Dirac notation is quantum mechanics. In quantum mechanics, there are basically two types of matrices: Matrices representing operators on vectors, which are usually written as *squared* (i.e. of dimesions *unitary* (i.e *states*, which are usually written as

When one needs to write down density matrices

where

When one needs to write down an operator acting on a vector, (i.e. a matrix multiplied by a vector), Dirac notation comes very handy too, as the operation can be written as

On the side, note that unitary matrices preserve the norm of a vector (which is a very important property in quantum mechanics). This is very easy to see in Dirac notation by writing down the norm of

with the last equation coming from the fact that

So far so good. However, in machine learning, the matrices we are dealing with are rarely squared and definitely not unitary or trace-one. We need to find another way to write them down.

###### Writing datasets using Dirac notation

A very common type of matrix in machine learning is the dataset matrix. It usually consists of a collection of vectors (the features vectors) arranged in rows (or columns, usually rows however), where each vector is a data point. Now the question is: is there a way to exploit Dirac notation for vectors to write down such matrices in a compact and informative way?

That’s the way I wrote them when I was working on my matrix factorization problem. As I mentioned above, in machine learning data points are usually represented as rows. But row vectors can be simply written as bras in Dirac notation. We can write down a

Note that I put a suffix

Let’s say now that we like to represents our data points as *columns* instead that as rows. It’s not big deal, as we can use the same rule and write the dataset matrix *kets* instead of bras

Easy right? just remember that a bra *row* of dimension *column* of dimensions

However, this notation is still cumbersome. We need to write columns or rows where their elements are bra or kets, and this might be heavy. Let us now abuse Dirac notation again, and write down

This notation contains exactly the same information as the one above. Moreover, if we like column vectors more than rows, we can simply write

Noticing that

Now one question could be: why would we write a dataset in this way? Well, because the notation is informative. We can see immediately, at first look, that it is a dataset. The inner bra-ket notation tell us that we are dealing with a collection of *sample vectors* and informs us about their dimensionality (i.e. the number of features

###### Matrix multiplication

To appreciate the usefulness of this notation, let’s suppose we want to write down a

Now we need to be careful about one thing: that the *inner* suffixes in the expression (in this case the

we can introduce the first rule to apply when you combine the symbols: when you have the two symbols

The dimensions of *outer* suffix in the expression. Indeed, another rule to apply when one combines these objects is that the resulting matrix dimension is always given by the *outer* suffixes of the combination.

For example, we can also combine them as

and we introduce the second rule you can apply to manipulate the symbols: if you have the symbol

Applying the second rule followed by the first rule, we obtain

Now, remember that the dimensions of the matrix is given by the outer suffixes, so

#### Matrix factorization

The nice thing about this notation is that we can also work with matrices of different dimensions. To see that, let’s consider the specific problem of matrix factorization.

In matrix factorization, we have a target matrix of dimensions

where

Then, we can write the users latent vectors matrix as

Using our rule to remove the

In the same way as the notation for the dataset matrices contain useful information about the structure of the dataset, the notation above tells us important things about the equation. The inner bra-ket notation tells us immediately that the entries of the matrix are the scalar product between users and items latent vectors of dimension

#### Take home

These are the key points you should take home from this post, if you are crazy enough to try this notation:

- Dirac notation for bras and kets works as usually: if you have an object like
k⟨φ′|φ⟩k, this still represents the scalar product of two vectors. We just needs to be careful and check if the indices are the same, since in machine learning this might not be the case. - The two rules to check whether your expression makes sense or not are:
- Rule 1: The
*outer*indices in the whole combination denote the dimensions of the resulting matrix. - Rule 2: The
*inner*indices in the expression must be the same.

- Rule 1: The
- The two rules to make your expressions pretty are simple:
- Rule 1-bis: if you have the two symbols
⌈ and⌋ next to each other, in this order (the shapes must match like in tetris) and with no indices, you can remove them from the equation. - Rule 2-bis: if you have the symbol
⌋ on the far left of the equation (or the symbol⌈ in the far right) you can move it on the far right (far left).

- Rule 1-bis: if you have the two symbols