
There’s a group of harmonica enthusiasts who have very graciously let me join them. We call ourselves the Mumbai Harmonics, and we meet regularly for “busking” sessions. We play in Mumbai’s Hanging Gardens on the first Sunday of every month, for example, and the morning walkers invariably stop and sing along and we all manage to have ourselves a rollicking time.
Not long ago, we found ourselves discussing and playing songs that are copied from other songs. For example, take the old American folk standard, Yellow Rose of Texas. Play it to your average Indian Bollywood fan and you’ll see her brow furrow—it rings a bell, but somehow she can’t quite put her finger on it. Then play Tera Mujhse Hai Pehle from the 1973 hit Aa Gale Lag Jaa and watch that brow un-furrow quicker than you can sing And if I ever find her/we never more will part!
Why is that? Because Tera Mujhse copies its tune from Yellow Rose. Changes it substantially, but the essence of the tune remains the same.
How do we know this, have you ever wondered? How do we recognize the similarities between two different songs, enough to then realize that one was copied from the other? It’s in such things as the way the notes follow each other, the flow of musical phrases, the rhythm and time signature of the songs: all of which come together to tell us that they are similar. And if you think of it like that, you can imagine doing the same with articles or books or even paintings. Reading a particular book, you might be reminded in some indefinable way of another. Perhaps it’s in how words follow each other, or in the use of certain words and phrases more than others. For example, someone once told me I use the phrase “you can imagine” pretty often in my writing. As you can imagine, that helps make my writing easy to identify.
For many years, people have put computers to use to look for such patterns, in an effort to study the “style” of the work in question. The aim is to correctly identify the authors of a text, or the writer of a song— in case there’s a dispute, or a suspicion of copying. This pursuit is called “stylometry”, and it has had some intriguing successes. In 2015, for example, two researchers at the University of Texas showed that an early 18th Century play called Double Falsehood had actually been written over a century earlier, by an obscure playwright you’ve never heard of named William Shakespeare. This year, three men from Harvard and Dalhousie Universities—one, poetically and appropriately, is named “Ryan Song”—used stylometric techniques to conclude that the Beatles’ song In My Life was written by one of the Beatles’ two most prolific songwriters, not the other.
How does this work? The title of the paper about In My Life — Assessing Authorship of Beatles Songs from Musical Content: Bayesian Classification Modeling from Bags-Of-Words Representations — mentions one technique, “bags-of-words representations”.
What that means is that in analysing a text, computer scientists treat it as just a grab-bag of words — that is, they pay no attention to grammar or the way those words are put together. Then they count how often each word appears in that grab-bag. Interestingly, what speaks of a given author turns out to be less the presence of uncommon words he might choose to use than the “recurring patterns of common words, such as prepositions” (according to a report in ScienceDaily). So if two texts produce similar “recurring patterns” in this way, you might suspect they have the same author.
With music, the analysis is similar, except that there are no words like there are in books. So instead, researchers break songs down into small chunks—analogous to words —and make a grab-bag of those chunks. In the case of the Beatles, Song and his colleagues broke down a suite of about 70 songs into individual chunks. This is similar, said Mark Glickman, one of the other authors of the paper, to “decomposing a colour into its constituent components of red, green and blue with different weights attached.” Only, instead of just three, the researchers found 149 such components.
They then classified these 149 chunks into five different “representations”—among them, particular chord transitions, pairs of notes that are juxtaposed, and four-note “contours” that indicate whether the tune is rising in pitch, falling or staying the same. Finally, they counted how often each chunk appears in a given song. This produced frequency numbers for the five representations—and these were effectively signatures that marked each song.
And it turns out that songs written by Paul McCartney have very different signatures from those written by John Lennon. (These are the two prolific members of the band that I meant). Many Beatles fans like me know there is such a difference, and Glickman confirms it: Lennon’s melodies tend to vary much less, in a sense, than McCartney’s. What do I mean by that? “Consider the Lennon song Help!” Glickman suggested to Phys.org. In the line When I was younger, so much younger than today, the notes are nearly all the same. Compare to this line from McCartney’s Michelle: Michelle, ma belle. Sont les mots qui vont très bien ensemble.
“In terms of pitch,” Glickman went on, the “Michelle” line is “all over the place.”
And that’s essentially the difference between a Lennon song and a McCartney song: Lennon straight ahead with little deviation, McCartney all over the place. So if you are trying to identify which of these two gents was responsible for a particular Beatles song, this kind of analysis is one way to check. With In My Life, oddly enough, both men have claimed ownership. But the frequency analysis by Song et al of those 149 chunks tells an unmistakable tale about the song: Lennon composed it.
Song et al actually went a little further, applying Bayesian probability techniques in their analysis of the Beatles. These must be used if we have some relevant knowledge about a situation and want to know how that affects the chance of a given event. For example, if I toss two coins and tell you that at least one landed heads, what’s the chance that the other is also heads? You might answer “1/2”, but the correct answer is 1/3. Because if one is heads, the three possibilities for both coins are heads-tails, tails-heads, heads-heads, and in only one of those do both coins show heads.
In this case, we have a bank of knowledge from 70 Beatles songs—songs, that is, whose authors are known. We know their signatures in terms of the frequencies of those 149 musical chunks. That is, we have a relationship between these frequencies and the authorship of the songs. Given that we know this much, what can we say about the unknown provenance of In My Life; more to the point, what is the chance that McCartney composed the song?
Song et al came to this conclusion: the chance McCartney composed In My Life is less than 2%.
Sadly, John Lennon’s not here to finally take credit for In My Life. But the evidence is pretty overwhelming: it’s a Lennon song. We can say as much, statistically.
Or stylometrically. “The way the notes follow each other, the flow of musical phrases, the rhythm and time signature of the songs”, I wrote above, and in effect Song et al captured just such characteristics of Beatles’ songs via their 149 chunks. Are they enough to persuade you of who’s responsible for In My Life, even without the frequency analysis? Well, to a Beatles fan like me, the tune does sound a little more straight-jacketed, if you will, than Michelle. If that is a Lennon signature, I’m willing to believe it was his song.
And in the same way, the stylometric techniques I’ve outlined here should also be able to persuade you that one song is copied from another. If listening is not enough, that is. I mean, I’d love for Song et al to apply their minds to Tera Mujhse and Yellow Rose of Texas.
Though what happens if they find Paul McCartney composed Tera Mujhse?
Once a computer scientist, Dilip D’Souza now lives in Mumbai and writes for his dinners. His Twitter handle is @DeathEndsFun