Beauty as entropic fine-tuning
Why beauty should be measured in bits, why conscious AI would experience beauty, and the evolutionary function of the aesthetic experience
Over several years now, a single question has refused to leave me: what is beauty? Triggering it was a series of aesthetic experiences so intense that I count them among the most significant moments of my life. They felt supercharged with meaning, yet what they meant I could not tell. After a couple years of scratching my head, I still cannot claim to understand them. Nevertheless, I believe I have taken a step towards understanding what beauty is.
Many a great tome has been written by philosophers on beauty. I wish I had read them. However, all I've read is one of these Oxford University Press booklets: "[Subject]: A Very Short Introduction". Why then should you bother to listen to me? I will give you three reasons.
First, while certainly interesting, I am not most compelled by the philosophical route to this question. Instead, I find the evolutionary perspective most illuminating. This shifts the question, however. I'm a theoretical physicist thinking about black holes for a living, so again, why should you bother listen to me? This leads me to my second reason.
My explanation of what beauty is, is incredibly simple. In fact, so simple that I would be surprised if neuroscientists or evolutionary theorists have not expounded it in detail somewhere. Nevertheless, a quick Google-session did not give me what I expected to find. Works by evolutionary theorists seem to revolve around explanations of the following flavor: symmetries in a potential mate signal genetic/reproductive quality, and thus your reproductive success increases if you're good at perceiving and displaying these symmetries (which we perceive as beautiful). Another example is this: landscapes perceived as beautiful are those that were more beneficial to our survival, so people who felt pleasure from (and thus attraction to) these landscapes survived at a higher rate. As you will soon see, while I have no reason to doubt these arguments, I claim they are too narrow. As I see it, the tendency to appreciate beauty is a core capability and urge of the human mind, and it furthers the probability of our genes' survival through a much wider set of mechanisms.
A disclaimer is in place here. It probably is the case that if I dig in the literature, I can find a similar thesis detailed by someone else. But I have decided to consciously avoid this until I write up this initial post. In our age, so much time is spent consuming predigested ideas. This is an idea I have decided to process on my own. If nothing else, I hope to provide a unique perspective.
Third, I will argue that my thesis explains the fact that music seems to move the average human to a greater degree than visual arts (sorry visual artists). It also makes it manifestly obvious that beauty is neither completely subjective nor completely objective. My thesis implies actionable advice for anyone in the business of producing beauty (in my ideal world, this would include a significant fraction of people producing art).
Now, before presenting my thesis, I will make a few clarifying remarks. First, I will not try to explain what art is. Art is more than beauty, and beauty is more than art. Much art communicates through beauty, but some does not, and aesthetic experiences can be triggered by things other than art. Second, if you think my thesis is wrong, I joyfully invite you to shake your fist and yell at me in the comments. Bonus points if you tell me why I am wrong.
What is beauty
Before coming to beauty, we need to discuss food. And sex. And showing off. Specifically, I want to look at the sequence of conscious experiences associated with these activities. They look something like this:
Hunger -> pleasure of tasty food -> satiation.
Arousal -> pleasure of sex -> satisfaction.
Urge to show off -> pleasure of increased social status -> satisfaction.
These experiences are all associated with obvious evolutionary functions. You die without food, and if you are dead, you cannot reproduce. Anyone who did not crave food was rooted out of the gene pool. The pleasure felt from consuming highly caloric food increases our willingness to spend time and resources acquiring it. Satiation, however, keeps us balanced; it ensures that we don't spend too much time chasing food. If we don't court a mate, procreate, build social alliances, and defend ourselves and our offspring from animals and weather, then all the food in the world doesn't matter.
My core claim is that the experience of beauty is the pleasure component in another chain of behaviors beneficial for the propagation of our genes:
Craving -> the experience of beauty (pleasure) -> satiety.
This claim raises many questions. When the brain rewards us with the sensation of beauty, what is it rewarding us for? What behavior does it want to encourage, and why? Upon experiencing the sublime, what exactly have we succeeded in doing? At the core of my thesis is this: it comes down to successful pattern recognition when operating at the edge of our capability.
The ability to recognize patterns is clearly useful for survival. Registering a subtle sound cloaked by the hiss of the wind can save you from a predator. Noticing a disturbance in the soil from animal tracks can lead you to your next meal. Intuiting weather patterns can ensure that you seek shelter in time for a storm. Discriminating subtle differences in the color of berries might prevent you from getting poisoned. Detecting the ire in the voice of a fellow caveman might defuse a situation that might otherwise have turned violent.
Not only is pattern recognition useful for survival, it is also something you can get better at with practice. If this is not immediately clear to you, play a random chord on a piano, and ask a beginner musician what notes were played; they will have no idea. Ask an expert musician, and they can likely tell you easily. To get to this level of pattern recognition in audio, however, the expert put in a lot of effort and practice. For humans to bother spending time on getting better at pattern recognition, at the expense of eating, mating, and chasing social status, evolution has ensured that there is a reward. That reward, I claim, is the sensation of beauty.
Beauty is inseparable from our senses, because our senses constitute the input data stream where patterns are to be found. Some patterns are so simple, or we have seen them so many times, that our brains have to do very little work to detect and decode them. There is nothing more to be learned from engaging with the sensory input signal. Thus, it is a waste of time paying attention to these patterns; we are not rewarded with a sense of beauty. The individuals who felt an overwhelming and never-fading sense of beauty from the wind were quickly rooted out of the gene pool.
On the other hand, there are some patterns so complex that a human can never decode them. For a computationally limited brain, these patterns are indistinguishable from a completely random signal. We do not find these beautiful, because again, those who did were wasting resources and thus were at a disadvantage in propagating their genes.
It is important to emphasize at this point that most of the decoding and pattern recognition I am referring to is happening subconsciously. For example, when you listen to a piece of music, the input to the ear is just a time-order list of amplitude values. However, we perceive this signal as a collection of distinct instruments. Clearly some non-trivial computation was done subconciously to separate the signal into multiple distinct pieces. This is the kind of activity I am primarily referring to as decoding, although there can other be concious processed involved.
Examples of trivial patterns are provided by much of children's music, while an example of a lack of patterns is white noise. Children's music is typically utterly predictable. From a few notes of the melody, you're quite likely (through a subconcious process) be able to predict what the next few will be. And even when you cannot, you can be confident that you are about to enter purgatory, because that banal melody will repeat with little variation (side note: children's music was used to torture prisoners in Guantanamo Bay). For white noise, however, no amount of effort will reveal patterns (beyond the fact that there are none).
Objects of beauty are those which have patterns that are rich, novel, and decodable. It is crucial that a decoding task not be too easy - a criterion that is intimately tied to novelty. When we get sick of a song we've heard too many times, decoding the patterns in the audio has become too easy. In fact, we have effectively memorized the signal, so we can predict the sound that is about to enter our ear with extremely high fidelity. As a consequence, our sensation of beauty is degraded. An experience of beauty requires a level of surprise when encountering the signal, because a lack of surprise (i.e. too low entropy) means that there is nothing new to be learned. The signal was already completely predictable, so there is no reason for our brains to reward our engagement. Thus, I believe beauty emerges when our brains receive sensory input that lies in a finely tuned band of novelty and predictability. But not just that, I suspect it is relevant to have a sufficiently high gradient in predictability. If the input signal becomes more predictable over time as the brain engages, this is a sign that the brain is actively learning. So if beauty is what I think it is, then predictability gradients (i.e. entropy gradients) are also core to beauty. If we ever manage to create a proper quantitative measure of the instantaneous strength of an experience of beauty, I suspect the units will be in either bits/second or bits/second^2. The former would be used to measure our prediction accuracy per time, while the latter would be used to measure our rate of improvement.
Since I used the term entropy earlier, let me take one paragraph to explain what the entropy of a signal is. The entropy of a signal is the expected amount of storage space needed to record the signal, measured in bits. This can be viewed as the amount of surprise expected from the signal. To understand this, consider a black box spitting out a 5-digit number every time you pull a lever. After sampling the black box a gazillion times, you find that the third digit is always 4 or 7, while the remaining digits are always 1. The signal spit out by this box has an entropy of 1 bit - you expect to need a single bit to record the output. Just create a dictionary where 11411=0 and 11711=1, and every time you sample the black box, you can just record a 0 or a 1. There isn't much room for surprise. On the other hand, if the black box spits out digits 0-9 with equal probability for each digit, then there is no dictionary you can use to save storage space. You just need to write down the full number every time. There are 100000 distinct numbers, so you need roughly log_2(100000)~17 bits.
Now let us return to beauty. I just claimed that beautiful objects are those that produce sensory inputs which lie in a finely tuned zone where our brains are productively challenged in pattern-recognition/prediction. However, I have left out important caveats. Our brain has wetware that is specialized for decoding specific classes of patterns. Additionally, even if they contain the same amount of challenging human-decodable information, two different sensory input signals might not be perceived to be equally beautiful, because not all types of patterns have relevance for human survival. So there is clearly some filtering mechanism that prevents us from paying attention to signals that in principle contain rich and challenging patterns. Our interest in a sensory input signal could wane long before we have fully mined it for information and potential beauty. This is clear to anyone who has done psychedelics. In the psychedelic state, a mundane object that you usually would not look at twice might keep your gaze for tens of minutes. Furthermore, this object might be perceived to be profoundly beautiful - indeed, as beautiful as a masterpiece is perceived when sober. Thus, in ordinary waking consciousness, the subconscious brain is clearly doing some kind of triage that dampens the amount of beauty the brain is capable of experiencing. It has a bias on what types of signals it is interested in, and for how long it wants us to engage. I believe psychedelics can teach us a lot about this, but that is a subject for another day.
The view of beauty advocated here meshes well with a theory of brain function known as predictive processing. According to the theory of predictive processing, the brain constantly tries to predict what its sensory input will be in the near future, compares the resulting input with its prediction, and then updates its internal world-model in order to improve its future predictions. Thus, according to this theory, the accurate prediction of future sensory signals, i.e. surprise-minimization(=entropy-minimization), is a fundamental task of the brain. It would thus be wholly unsurprising that a sense of pleasure is dispensed when we successfully improve in surprise-minimization.
What about taste? Why do we not find the same things beautiful? The answer is part obvious, and part completely puzzling. First the obvious part: we might have different levels of pattern recognition abilities due to genetics and environment. A piece of audio might be novel and appropriately challenging for a novice musician, while it could be completely trivial for an expert. However, here is the puzzling part: what about two novice musicians or music listeners of roughly equal competency, but who nevertheless prefer different genres of music? My guess would be that the term "equal competency" actually is incompatible with the statement of differing tastes. Our capability for pattern recognition is clearly not described by a single number. Rather, it is a spectrum of capabilities that differ across different types of signals. So really, what I should have said instead of equal competency is equal competency as averaged over relevant signals. However, because we surely have individual variations in how good we are at analyzing different types of patterns (our skill spectra are different), we might be biased towards different kinds of music. Probably, the kinds of music we like more are the ones where our brains are able to detect novel patterns more efficiently. However, if we listen to a genre too much, novelty might wear off, so entropy gradients are no longer present, and we might look for some new genre or novel subgenres.
I do not claim that the above elements are the only important aspects of beauty, but I do believe that these reside at the core, probably cohabiting with other ideas I have not yet discussed or understood. Importantly, there is one central puzzle I have not addressed. Namely, why is the experience of beauty so deeply tied to strong emotions?
I have only speculation here. An understanding of neuroscience, which I do not possess, is likely important. However, in the spirit of brainstorming, let me throw out some observations that are probably important. First, the part of the brain that deals with emotions, the limbic system, is among the evolutionarily older parts of the brain. Second, if (a) predictive processing is correct, and (b) beauty is indeed the reward for successful novel sensory prediction, then beauty should be ancient. Whatever neural circuitry is involved in beauty would have to be evolutionarily old circuitry. Next, it is expensive to operate a brain, so it is natural that various areas of the brain participate in multiple tasks. Reuse is efficient. So perhaps the limbic system is participating in some of the computations doing the signal analysis, in turn triggering certain emotions as a byproduct? Perhaps the emotional trigger is an epiphenomenon? I don't know, I am clearly on extremely thin ice here. However, it never fails to amuse me that two musical notes playing with a frequency ratio of 5:4 quite broadly triggers happy feelings (5:4 is called a major third), while a frequency ratio of 6:5 quite universally triggers sad feelings (6:5 is called a minor third). This pattern is so simple that it lets us escape much cultural baggage that we might otherwise try to use to justify why one piece of music is sad and another is not. Somehow this very direct dictionary of 5:4<->happy, 6:5<->sad makes me think that emotional circuitry is directly involved in analyzing the frequency content of audio.
For the rest of this post, I will focus on two things. First, I will explain how my view suggests that music is special among objects of beauty - not in a categorical sense, but in a quantitative sense. Finally, I will make an admittedly crackpotty argument for why, if someone told you that current AIs have conscious experiences, then the experience of beauty is your best bet for what they are experiencing (assuming you're forced to bet something other than "their experience cannot be understood or conveyed through human language".).
Music, and why it is special
Pick random people on the street and ask them if a song ever made them cry. Now do the same for paintings. I'm willing to bet that a larger number of people will have cried from music than paintings - even if we normalize for how often we consume these forms of art. I do not claim that certain individuals are not moved more by visual art than music, only that this seems to be true on average. Granting that this is true, why would this be the case?
Let us assume that the intensity of an experience of beauty in a given interval of time is measured as the amount of novel information we are successfully able to predict in this time interval, perhaps with some weighting towards improvement over time. Again, I remain agnostic as to how to precisely operationalize this, but the unit of measure would probably be some weighted measure of the entropy of a signal restricted to a window of time and the entropy-gradients within that window. Now, let us compare the auditory signal from a piece of music with a visual signal from a painting. A piece of music is fundamentally a one-dimensional object. If you want to transmit a piece of music, you just need to send me a single ordered list of (discretized) real numbers representing an amplitude as a function of time. A painting, on the other hand, is a five-dimensional object for the purposes of a human. If you want to send me a digital representation of a painting, you have to send me a list of (discretized) real numbers (x, y, R, G, B), where x and y represent the coordinates of a pixel, and the numbers (R, G, B) represent the intensity of red, green, and blue, respectively. Thus paintings are objects of beauty with much higher dimensionality. Not only that, the two signals are consumed in very different ways, temporally. A piece of music is fed to us gradually, so we only ever see a sliver of the full 1d signal at a time. For a painting, however, it is all thrown to us simultaneously; there is no piecemeal unraveling of the signal.
So what does this mean for beauty? The low dimensionality of music compared to visual art likely means that we are able to predict and learn much more detailed and subtle patterns in music. As any scientist knows well, increasing dimensionality of a problem almost always implies a rapid increase in required computational resources. Thus, given a computationally limited brain, a one-dimensional signal is where we have the best shot at large and fast entropy-minimization. The piecemeal consumption of the audio just further pushes towards this.
What about a poem? Language is similar to music, in that it is naturally viewed as one-dimensional (use a dictionary to label each word with an integer, making text an ordered list of integers). Furthermore, both are consumed in a sequential way. However, the rate of language consumption is much lower. For a standard mp3 file, a second of music contains around 40,000 amplitude values. In reading a piece of text, we absorb something like 3-6 words a second. Of course, these bare values do not tell us as much as we might hope. We clearly do not resolve all the individual amplitude values in music, and each word in language comes embedded with more meaning than a single amplitude of an audio signal. However, I would bet that there still is a significant difference. We use around 40k amplitude values in music for a reason. If we could not hear a difference by significantly lowering the sampling rate of music, we would have done so to save computational resources.
(Addition 1, June 25 2024: After stimulating comments and discussions with people online and offline, I suspect the property of lower dimensionality might not be complete story for why music is special. An alternate candidate explanation, leading to a very similar conclusion, would be following: music deals in temporal patterns rather than spatial patterns. Perhaps for wetware reasons we have better resolution in our perception of temporal patterns over spatial patterns? The fundamental explanation would remain the same, however: finer patterns can be detected in music than static images. Our explanation would naively make it seem that moving images without sound would be as beautiful as music. However, the human ear can detect frequencies in the range 20 - 20 000 Hz in audio , while the eye can only detect frame rates up to an order of 100 Hz (usually 60 Hz is quoted). The main point is preserved: that the patterns we can detect in audio are much finer than in both static and moving images. )
(Addition 2, June 25 2024: It would perhaps be more fair to say that music as perceived by the ear is two-dimensional, i.e. a list of values (t, A(t)) where A(t) is the amplitude at time t. I don’t know if the ear has a fixed sampling rate, in which case each value of t need to be specified as well.)
Beauty perception by AI
If you know anything about how a Large Language Model (LLM) works, you probably know where this is going. The sole job of an LLM is to predict the next word following a sequence of words (really, tokens). That is the sole thing that it is rewarded for doing. It is not (yet) rewarded for eating (securing access to more compute), reproducing (replicating and running its code on new machines), or gaining status on the internet. Humans have a spectrum of rewards (pleasures) to encourage a set of behaviors. For an LLM, all of these rewards are absent, with the exception of the activity that I claim is responsible for the experience of beauty. So if you tell me that it feels like something to be an LLM, then I would guess this experience feels like beauty.
I have often thought that the reason we like looking at water and clouds is because we're subconsciously trying to crack fluid dynamics
Nitpick, but I think music has at least as many dimensions as static visual art. We don't simply process music in real time but also "hear" the time signature and tempo. Music makes no sense to us without that information. Then, at any point in time you have various notes, timbres and amplitudes. We can display all of that information digitally in only 2-dimensions (which always blows my mind), but we hear the differences in the notes, the tone or timbre of the instruments and other acoustic or digital effects, changes in volume, the rhythm and the tempo.