Random Walking – When having no clue where to go still makes you reach your destination

In empirical sciences, theories can be sorted into three categories with ascending gain of knowledge: empirical, semi-empirical and ab initio. Their difference can best be explained by an example: In astronomy the movement of all planets was known since ancient times. By pure observation astronomers could predict where in the sky a certain planet would be at a given time. They had the knowledge of how they moved but actually no clue why they did so. Their knowledge was purely empirical, meaning purely based on observation. Kepler became the first to develop a model by postulating that the sun is the center of the planetary system and the planet’s movement is controlled by her. Since Kepler could not explain why the planets would be moved by the sun, he had to introduce free parameters which he varied until the prediction of the model matched the observations. This is a so-called semi-empirical model. But it was not until Newton who with his theory of gravity could predict the planets movements without any free parameters or assumptions but purely by an ab initio, latin for “from the beginning”, theory based on fundamental principles of nature, namely gravity. As scientists are quite curious creatures, they always want to know not only how things work but also why they work this way. Therefore, developing ab initio theories is the holy grail in every discipline.

Luckily, in quantum mechanics the process of finding ab initio theories has been strongly formalized. This means that if we want to know the property of a system, for example its velocity, we just have to kindly ask the system for it. This is done by applying a tool, the so-called operator, belonging to the property of interest on the function describing the system’s current state. The result of this operation is the property we are interested in. Think of a pot of water. We want to know its temperature? We use a tool to measure temperature, a thermometer. We want to know its weight? We use the tool to measure its weight, a scale. An operator is a mathematical tool which transforms mathematical functions and provides us with the functions property connected to the operator. Think of the integral sign which is an operator too. The integral is just the operator of the area under a function and the x-axis.

The problem is, how do we know the above mentioned function describing the system’s state? Fortunately, smart people developed a generic way to answer this problem too: We have to solve the so-called Schrödinger equation. Writing down this equation is comparably easy, we just need to know the potentials of all forces acting on the system and we can solve it. Well, if we can solve it. It can be shown that analytical solutions, that means solutions which can be expressed by a closed mathematical expression, only exist for very simple systems, if at all. For everything else numerical approaches have to be applied. While they still converge towards the exact solution, this takes a lot of computational time. The higher the complexity of the system, the more time it takes. So for complex systems even the exact numerical approach quickly becomes impractical to use. One way out of this misery is simplification. With clever assumptions about the system, based on its observation one can drastically reduce the complexity of the calculations. With this approach, we are able to, within reasonable time, find solutions for the problem which are not exact, but exact within a certain error range.

Another way to find a solution for these complex problems is getting help from one of nature’s most powerful and mysterious principles: chance. The problem of the numerical exact solving approach is that it has to walk through an immensely huge multidimensional space, spanned by the combinations of all possible interactions between all involved particles. Think billions of trillions times billions of trillions. By using a technique called Random Walking the time to explore this space can be significantly reduced. Again, let’s take an example: Imagine we want to know how many trees grow in a forest. The exact solution would be dividing the forest into a grid of e.g., 1 square foot, and counting how many trees are in each square. A random walk would start in the forest center. Then we randomly choose a direction and a distance to walk before counting the trees in the resulting square. If we repeat this just long enough we eventually will have visited every square, therefore knowing the exact number, meaning the random walk converges towards the exact result. By having many people starting together and doing individual random walks stopping when their results deviation is below a certain threshold a quite accurate approximation can be obtained in little time.

Columbia postdoc Benjamin Rudshteyn and his colleagues developed a very efficient algorithm based on this method specifically tailored for calculating molecules containing transition metals such as copper, niobium or gold. While being ubiquitous in biology and chemistry, and playing a central role in important fields such as the development of new drugs or high-temperature superconductors, these molecules are difficult to treat both experimentally and theoretically due to their complex electronic structures. They tested their method by calculating for a collection of 34 tetrahedral, square planar, and octahedral 3D metal-containing complexes the energy needed to dissociate a part of the molecule from the rest. For this, precise knowledge of the energy states of both the initial molecule and the products is needed. By comparing their results with precise experimental data and results of conventional theoretical methods they could show that their method results in at least two times increased accuracy as well as increased robustness, meaning little variation in the statistical uncertainty between the different complexes.

molecule complex geometries
Figure 1: Illustration of the three types of geometry of the datasets molecules: Octahedral (a), square planar (b) and tetrahedral (c), with the transition metal being the central sphere. In (d) the dissociation of part of the molecule is shown.

While still requiring the computational power of modern supercomputers, their findings push the boundaries of the size of transition metal containing molecules for which reliable theoretical data can be produced. These results can then be used as an input to train methods using approximations to further reduce the computational time needed for the calculations.

Dr. Benjamin Rudshteyn is currently a postdoc in the Friesner Group of Theoretical Chemistry, lead by Prof. Dr. Richard A. Friesner, in the Department of Biological Sciences at Columbia University.

The hungry algorithm: machine learning to ease the “what’s for dinner?” decision

When Dr. Jaan Altosaar heard that food deprivation increases stem cell regeneration and immune system activity in rats, he did what many would not dare: he decided to try it himself and fasted for five days. Thoughts of food started to take over his mind and, with what can only be qualified as a superhuman ability to think with low blood sugar, he went on a scientific tangent and channeled them into tackling the complicated task of improving food recommendation systems, which led to publishing a research article about it.

Dr. Altosaar wanted help in making decisions because choosing is hard. When faced with an excessive number of options, we fall victim to decision fatigue and tend to prefer familiar things. Companies know this, and many have developed personalized recommendations for many facets of our lives: Facebook’s posts on your timeline, potential partners on dating apps, or suggested products on Amazon. But Jaan had a clear favorite: Spotify’s Discover Weekly algorithm. The music app gathers information on co-occurrence of artists in playlists and compares the representation of you as a listener to the couple billion playlists it has at its disposal to suggest songs you might enjoy. Since Dr. Altosaar’s problem was similar, he framed the problem as feeding the algorithm a user’s favorite recipes (“playlists”), which are made of a list of ingredients (“songs”). Would the algorithm then cook up suggestions of complimentary meals based on the ingredients in them?

A meal consumed by a user (hamburger) is made up of ingredients (bread, lettuce, tomato, cheese, meat). This information is given to the machine learning algorithm, which will use learnt information about that user to provide a recommendation likely to be eaten by them.

Meal recommendation in an app is challenging on several fronts. First, a food tracking app might record eating the same meal in many different ways or with unique variations (such as a sandwich with homemade hot sauce or omitting pickles). This means that any specific meal is typically only logged by a small number of users. Further, the database of all possible meals a user might track is enormous, and each meal only contains a few ingredients. 

In traditional recommender systems such as those used by Netflix, solving this problem might mean first  translating  the data into a large matrix where users are rows and items (e.g. movies or meals) are columns. The values in the matrix are ones or zeros depending on whether the user consumed the item or not. Modern versions of recommender systems, including the one in Dr. Altosaar’s paper, also incorporate item attributes (ingredients, availability, popularity) and use them as additional information to better tailor recommendations. An outstanding issue, however, is striking a balance between flexibility, to account for the fact that we are not all like Joey Tribbiani and might not like custard, jam and beef all together (even if we like them separately), and scalability, since an increasing number of attributes takes a toll on computing time. Additionally, these machine learning algorithms are not always trained the same way they are later evaluated for performance.

A matrix with a representation of users and items
A sparse matrix representing whether a user “u” consumed an item “m” (coded with a one). If the user did not consume the item, there is a zero. Note that most items in the matrix are zeroes, so that there is not a lot of actual information (thus calling it sparse).

The new type of model Dr. Altosaar and colleagues propose, RankFromSets, frames the problem as a binary classification. This means that it learns to assign a zero to meals unlikely to be consumed by a user, and a one to those that are likely to be consumed. When faced with giving a user a set of potential meals (say five), it strives to maximize the number of meals the user will actually eat from those five recommended to them. To leverage the power of incorporating the meal’s ingredients, the algorithm uses a technique from natural language processing to learn embeddings. These are a way to compress data  to preserve the relevant information you care about to solve your problem; in this case, learning about patterns useful for predicting which ingredients tip the balance for someone to consume a meal. This allows for a numerical representation for each meal based on its constituent foods, and the patterns in how those foods are consumed across all users.

The RankFromSets classification model incorporates several components. There are  embeddings for representing user preferences  alongside the embeddings corresponding to a meal a user might consume. The classifier is spiced up with additional user-independent information about the meal’s popularity and its availability. These components are used by the model to learn the probability that a particular meal will be consumed by a user. Potential meals a user might enjoy – or that might be healthier options – are then ranked, and the top meals will be recommended to the user. For example, if you have had avocados in every one of your meals, they are in season, and all those Millennials are logging in their avocado toast, you are very likely to receive recommendations that include avocados in the future.

As a proof of concept, the authors tested their method not only on food data, which they got from the LoseIt! weight loss app, but also on a dataset unrelated to meal choices. For this independent data set, the authors used reading choices and behavior among users of arXiv, a preprint server. They trained the model on past user behavior data and evaluated performance (accuracy of paper suggestions) on a previously separated portion of that same data (so they knew whether the user had actually read the paper, but this information was hidden from the algorithm for evaluation). This is a typical way to  assess the performance of machine learning systems, and their method outperformed previously-developed recommender systems. The better performance and translatability to tasks other than meal recommendation is indicative of the potential of this tool to be applied in other contexts.

This new recommender system could be applied to either recipe recommendation apps, or even to an app that would suggest first-time customers of a restaurant the menu items that they are more likely to like based on their preferences. The system also has the potential to incorporate additional information beyond whether a user consumed (and liked) a particular ingredient or meal. Sometimes the method of cooking determines whether a food is appealing or not (Brussel sprouts I’m looking at you). Additionally, classifying ingredients by flavor might also be helpful in suggesting similar (and even healthier) alternatives. Therefore, adding those tags as extra layers to the user-meal intersection will certainly provide better recommendations and opportunities to cook outside of the box. Dr. Altosaar’s fast might or might not have gotten him a boost in his stem cells, but he certainly succeeded in helping everyone else worry a bit less about what’s for dinner tonight.

Dr. Jaan Altosaar is a Postdoctoral Research Scientist in the Department of Biomedical Informatics and an active participant in CUPS. He publishes an awesome blog about machine learning and its implications.