Let’s get something straight from the beginning: this is a fun post! The title is catchy, I’ll admit that, but there are no generalisations claims here. I’m not going to give you a complete formulation of machine learning in Dirac notation. This post is the story of how one can get inspired by Dirac notation and start writing down matrices in a weird way which turns out to be kind of useful to understand the specific problem of matrix factorization. I’m not sure if the notation is elegant or if it can be of some use to do other things, I’ll leave this decision to future generations. I think it has some weird beauty in it (not as much as the original Dirac notation, of course). Before I start, a warning: this post is not suitable for people that are highly sensitive to notation abuse. I abuse Dirac notation a lot here, and may the gods of mathematics have mercy of my soul.
Dirac notation in a nutshell
Let me just start by telling you what Dirac notation is. Dirac notation is a simple and concise way to write vectors and matrices. I mean, it is actually beautifully defined in a formal way, follow the link if you want to know more. But, for our porpuses, all we need to know here is that it is a nice way to write down and manipulate vectors. Let’s look into some examples.
We might want to start with the simplest case: 2-dimensional vectors. Let assume we have a 2-dimesional vector space (we will indicate this vector space as
For example, we can pick the euclidian basis, which is orthonormal and in 2-d can be represented as
If we substitute the column expression and we use basic vector algebra, we obtain a column representation of the vector
Suppose now we have two vectors
So far so good. However, this notation can be heavy, especially if we have many vectors to write down and multiply in more complex ways. Luckily, there’s another way to write down column and rows vectors, which proved itself more powerful and clear, at least in some context (quantum mechanics): Dirac notation. Dirac notation is actually a symbolism to write vectors in any vector space (discrete or continuous ) of any dimensions. Moreover, it’s also possible to write the vectors in the dual space associated with the vector space in an immediate way. Here, let’s just focus on discrete dimensional spaces, where a vector is represented as a column and the corresponding vector in the dual space is a row.
Let’s start with our 2-dimensional example. It’s actually very simple: whenever we have column vectors, for example the basis vectors
Now, if we want to transpose
If we define a second vector
Moreover, we can also combine the bras and the kets the other way around, as
We just need to make sure the vectors have the same dimension
Customized Dirac notation
What’s next? Now that we have learnt how to represent vectors in Dirac notation, we would like to use it to represent matrices! How can we do that? The natural environment for Dirac notation is quantum mechanics. In quantum mechanics, there are basically two types of matrices: Matrices representing operators on vectors, which are usually written as
When one needs to write down density matrices
When one needs to write down an operator acting on a vector, (i.e. a matrix multiplied by a vector), Dirac notation comes very handy too, as the operation can be written as
On the side, note that unitary matrices preserve the norm of a vector (which is a very important property in quantum mechanics). This is very easy to see in Dirac notation by writing down the norm of
with the last equation coming from the fact that
So far so good. However, in machine learning, the matrices we are dealing with are rarely squared and definitely not unitary or trace-one. We need to find another way to write them down.
Writing datasets using Dirac notation
A very common type of matrix in machine learning is the dataset matrix. It usually consists of a collection of vectors (the features vectors) arranged in rows (or columns, usually rows however), where each vector is a data point. Now the question is: is there a way to exploit Dirac notation for vectors to write down such matrices in a compact and informative way?
That’s the way I wrote them when I was working on my matrix factorization problem. As I mentioned above, in machine learning data points are usually represented as rows. But row vectors can be simply written as bras in Dirac notation. We can write down a
Note that I put a suffix
Let’s say now that we like to represents our data points as columns instead that as rows. It’s not big deal, as we can use the same rule and write the dataset matrix
Easy right? just remember that a bra
However, this notation is still cumbersome. We need to write columns or rows where their elements are bra or kets, and this might be heavy. Let us now abuse Dirac notation again, and write down
This notation contains exactly the same information as the one above. Moreover, if we like column vectors more than rows, we can simply write
Now one question could be: why would we write a dataset in this way? Well, because the notation is informative. We can see immediately, at first look, that it is a dataset. The inner bra-ket notation tell us that we are dealing with a collection of sample vectors and informs us about their dimensionality (i.e. the number of features
To appreciate the usefulness of this notation, let’s suppose we want to write down a
Now we need to be careful about one thing: that the inner suffixes in the expression (in this case the
we can introduce the first rule to apply when you combine the symbols: when you have the two symbols
The dimensions of
For example, we can also combine them as
and we introduce the second rule you can apply to manipulate the symbols: if you have the symbol
Applying the second rule followed by the first rule, we obtain
Now, remember that the dimensions of the matrix is given by the outer suffixes, so
The nice thing about this notation is that we can also work with matrices of different dimensions. To see that, let’s consider the specific problem of matrix factorization.
In matrix factorization, we have a target matrix of dimensions
Then, we can write the users latent vectors matrix as
Using our rule to remove the
In the same way as the notation for the dataset matrices contain useful information about the structure of the dataset, the notation above tells us important things about the equation. The inner bra-ket notation tells us immediately that the entries of the matrix are the scalar product between users and items latent vectors of dimension
These are the key points you should take home from this post, if you are crazy enough to try this notation:
- Dirac notation for bras and kets works as usually: if you have an object like
k⟨φ′|φ⟩k,this still represents the scalar product of two vectors. We just needs to be careful and check if the indices are the same, since in machine learning this might not be the case.
- The two rules to check whether your expression makes sense or not are:
- Rule 1: The outer indices in the whole combination denote the dimensions of the resulting matrix.
- Rule 2: The inner indices in the expression must be the same.
- The two rules to make your expressions pretty are simple:
- Rule 1-bis: if you have the two symbols
⌈and ⌋next to each other, in this order (the shapes must match like in tetris) and with no indices, you can remove them from the equation.
- Rule 2-bis: if you have the symbol
⌋on the far left of the equation (or the symbol ⌈in the far right) you can move it on the far right (far left).
- Rule 1-bis: if you have the two symbols