N-beauty
A natural family of objective beauty measures does exist
Beauty is in the eye of the beholder!
This statement is sometimes used as a quick way to shut down a conversation about the quality of a piece of art — to hammer home that there is no objective notion of beauty. I dispute this, but nevertheless, the subjective nature of an individual’s beauty evaluations raises an important challenge for anyone claiming that objective beauty can be measured.
To confront the aforementioned challenge, let us imagine that we want to start a website with the goal of discovering new artists making the beautiful music. Let us imagine that by some miracle you actually get a bunch of musicians to upload their music. Assume also by some even greater miracle that 10 million listeners are willing to listen to and rate the music on your platform on a scale from 1 to 10. This is furthermore blinded, so they don’t know anything about the artist besides the piece itself. With this data, you can form some metric on which to evaluate each song.
Your first thought is to just let the beauty score of song X be the average of all the users’ ratings. But on a moment’s reflection you realize the problem. Total beauty then equated with having broad appeal. By this metric, mainstream art is by definition always the best art. But you, having some obscure tastes, know that good art can be highly polarizing. Great art is often loved by a relatively small group of people and ignored (or even despised) by the rest. You’d like to discover masterpieces of this type as well.
So your second thought becomes to instead measure the beauty of a song by setting its beauty score equal the maximal rating that it gets by some user. But of course, this fails too. If a song gets enough listens, even if the song is junk, some weirdo is probably gonna rate it 10/10, so practically everything becomes maximally beautiful.
Hmm. Presumably we want something in between the first and the second proposal. Let’s consider the following. Pick a number N. Now collect the N ratings that are the highest and take the average over these. Let us call this N-beauty. Now, in case where N=1, you just get the measure proposed in the previous paragraph. For N=(total number of users on your platform), you get the average of all listener scores, as proposed first. But for N somewhere in between, something more interesting happens. Consider for example N=1000. If the 1000-beauty is very high, you know that this piece of art has at least 1000 diehard fans.
Now, the N-beauty of a piece always decreases as N increases. The average rating of the 1000 most diehard fans is always gonna be higher than the rating of the 2000 most diehard fans, since to go from the latter to the former group, you would kick out the fans that are the least diehard. Obviously the average rating must increase upon doing this. So the N-beauty of artwork X is always a decreasing function of N. And since the world is full of weirdos, we practically always have that 1-beauty = 10 (i.e. the maximal value). So now we can represent the beauty curve of artwork X as a decreasing graph with B on the y-axis and N on the x-axis. It might for example look like this:
This is an impressive curve. It is a piece that is absolutely loved (score 9) by around 160k people (1.6% of all people on the platform), strongly liked (score 8) by around 400k people (4% of people), and a million people (10%) think it is around a 6.5. Who cares that the average across everybody is below 4. I would be super happy if I could make such an artwork, and I think many of us will agree that such an artwork deserves to be regarded as a well-crafted piece of art. Now consider this curve.
Out of 10 million that truly listened to it and gave it a high rating, all but roughly 10 people gave it a terrible score. Once you got into a group of size 100 or more, the rating immediately dropped to almost the worst score. Out of 10 million people, there does not exist a single group of even 10 people that love this piece. No amount of pointing out how beauty is in the eye of the beholder can get around this fact. It is a bad piece of art.
Of course, I am not saying that a piece of art that is liked by few people is bad. Most pieces in the real world are simply not seen by many people, and even less so ranked, and so we cannot estimate N-beauty in practice usually. Even if the 1000-beauty of a piece is almost a perfect 10, it might be hard to find those 1000 people in practice. But the number still exists in principle.
At this point we can put our feet on the table, pack our pipe, and be happy. My main point has been made. However, I am not quite happy yet. I want a more beautiful measure of beauty. So with the insights we have now gained, and for those that are hungry and willing to engage with a bit of basic math, let’s return to the drawing board with a little bonus section: defining p-beauty.
Bonus section: p-beauty
Let X be an artwork of any kind, and assume that we can magically get the beauty rating of X as rated by every individual human currently alive (still restricting to a scale from 1 to 10). This would be a list of numbers
where k is the number of humans alive right now. Thus, the function B that measures beauty at this moment in time should be some gigantic function
of k arguments. What properties do we want B to have? First, we do not want to privilege any particular humans. So B should be unchanged under an arbitrary exchange of the order of the arguments. Second, let’s say we keep the beauty rating of all humans except yourself fixed. If you up your beauty-score while all others keep their beauty-score fixed, then the overall beauty B should not decrease — anything else would be absurd. So far, these properties were satisfied by what we called N-beauty earlier. However, N-beauty had a bit of an ugly property. Namely, it didn’t have a smooth dependence on the arguments. For example, lets say I am number 1 in the list, so that b_1 is my beauty score. If I was not in the group of N people that liked the artwork the most, b_1 would have no effect on B. If I gradually started liking the artwork more and more, so that b_1 was increasing more and more, then this would have no effect on B for a while. However, suddenly, as I entered the group of N people liking the piece the most, b_1 would start to influence the value of B. Technically, we would say that B is not a differentiable function of b_1. It would be nice if B wasn’t so jagged — that if my evaluation of X went up a little bit, then so would B.
So let’s try to find a function B that is smooth, but which nevertheless captures the essential behavior we found for N-beauty. We still want a one-parameter family labeled by a single number such that as we dial this number, we change from assigning most weight to the opinions of just a few people to assigning weight equally to all people. Consider picking some number p greater or equal to one, and consider computing the following: raise all individual beauty scores to the p-th power and average the result. In equations, this would be the following:
When we raise to the power of p, the numbers b_i that are large get magnified by a lot, while those that are small, relatively speaking, get increased just a little. This gets more and more true as p increases. For example, if p=3, then a beauty-score of 10 gets enhanced to 1000, while a beauty-score of 4 gets enhanced to 64. So we see that as p gets large, we start weighing the opinions of those that like the artwork much more heavily, almost neglecting the opinions of those that don’t like it. However, our beauty measure is a bit ugly now, since the maximal value of B now is 10^p, which is different for each p. We can get back to a beauty value between 1 and 10 for any p by simply taking the p-th root of the result. Thus, we arrive at p-beauty:
This is a nice smooth upgrade of N-beauty. However, note that the interpretation of p is now opposite to N. For the lowest value of p, corresponding to p=1, we get exactly the same as we did earlier for N-beauty when when N was maximal: the beauty is the average over everyone’s opinion. However, for p-beauty, there is no maximal value for p. We are even allowed to set p=infinity, and in this case, we get exactly the same as we did for N-beauty when N=1. I.e., for p=infinity, B_p just equals the maximal b-value in the list of b-values. This makes sense, since when p gets larger and larger, the largest b-value gets amplified more and more than everything else, to the point that when p gets sufficiently large, we can neglect everything else, and we get that (in bastardized notation that is nevertheless clear)
In between p=1 and p=infinity is where the magic happens. As we dial p upwards from unity we decide to value the opinion of a smaller and smaller group of people (until we hit a single person at p=infinity). However, for finite p, there is never a discrete group of people singled out — everyone’s opinion is still weighted a little bit.
You might now ask whether we preserved one of the very nice-properties of N-beauty. Namely, as we change N we know in advance in which direction the N-beauty would change — namely, as N went up, N-beauty went down. Since p acted roughly like the inverse of N, we can here ask if B_p goes up whenever p goes up. Indeed it does! The proof can fit in a margin, so I put it in this footnote.1 Side note for those of you that have studied linear algebra: what we have just rediscovered is a rescaled version of the p-norm.
Anyway, we can now represent the beauty of an object as a monotonically increasing graph with B_p on the y-axis and p on x-axis. Again, because there exist plenty of weirdos in the world, we can more or less always assume that B_{\infty} = 10, so that the graph always asymptotes to 10. Now we have the following picture: a piece of art that climbs to 10 very quickly with p is liked by large groups of people, and a piece of art that climbs to 10 very slowly is liked by almost no one. Admittedly, we have have sacrificed one nice thing. If I tell you that the p-norm is almost 10 for p=3 but pretty low for p=2, its not so easy to estimate how many people really loves your art. So, ehh, maybe p-beauty wasn’t that much more beautiful after all. I guess I will have to ask you all rank the beauty of p-beauty vs N-beauty and then I will pick the best one. Although, I don’t know which beauty measure to pick to measure the beauty of those beauty measures. I guess I will just have to give you the Q-beauty graphs of p-beauty and N-beauty for Q equals N or p. Well shit.
The p-beauty is just a rescaled version of the p-norm that is so commonly used in linear algebra and functional analysis. Let b be the vector of b-values. Then we have that
where ||b||_p is just the p-norm. See this wiki page for a definition. On this page we can find the inequality (probably proven using Hölder's inequality)
Dividing this by n^(1/r) we recognize that this is the monotonicity sought after:




I like it, quite a concise and clever way of quantifying beauty (especially p-beauty). Although I don't think it challenges the notion that beauty is in the eye of the beholder. Your model still takes individual ratings of beauty as its inputs, and so it is still very much contingent on the "eyes of the beholders". To be an objective measure of beauty, it would surely have to be independent on the opinions of individuals (or even groups), unless you have a very different understanding of what it means to be objective than I have.