Is There Such a Thing as a Sonic QR Code?

There are at least two things that Sony Pictures marketing executives did not consider when preparing a cross-promotion between its new Spider-Man film and the song-identification app Shazam. I first read about this promotion this morning on io9.com, because pretty much the first thing I read every morning is Morning Spoilers on io9.com. The film in question, The Amazing Spider-Man 2, opens this Friday, May 2, in the United States. Expecting extended discussion about Peter Parker’s doomed romance with Gwen Stacy or the rise of his frenemy Harry Osbourne to lead the high-tech firm founded by his father, instead there was news of an intriguing little digital-audio phenomenon.

The Sony-Shazam promotion involves viewers of the Spider-Man movie waiting until the end credits, during which the Alicia Keys song “It’s On Again” is heard. Viewers can then use the Shazam app to identify the song. Doing so brings up a special opportunity to add, for free, photos that hint at members of the Sinister Six — villain characters from Sony’s rapidly expanding Spider-Man franchise — to their personal photo galleries. (It should be noted that the Keys song is itself a sort of cross-promotion. It’s full credit is: Alicia Keys feat. Kendrick Lamar – “It’s On Again.”)

The first of these things that Sony Pictures may not have considered is that Shazam shares a name with a superhero from a rival comics publisher, DC. Would it have been too difficult to sign up, instead, with Soundhound, or MusixMatch, or the elegantly named Sound Search for Google Play, among other song-identification services? Perhaps none of this matters. Sony is already engaged in a cold war with other studios among whom the Marvel universe of characters is subdivided. A second-tier, if beloved, character from another universe entirely means nothing when there are already two Quicksilvers running around in your own. For reference, below is an uncharacteristically stern Shazam, drawn by Jeff Smith (best known for his work on Bone):

Is There Such a Thing as a Sonic QR Code?

In any case, the second and more pressing matter is that one needn’t stay until the end credits of the new Spider-Man film to activate the Shazam code with the Alicia Keys song. One needn’t even see the Spider-Man film, let alone wait for it to open in a theater near you. Right now, two full days before the film’s release in the United States, you can pull up the Alicia Keys video on YouTube, and the Shazam app on your phone will recognize that as the correct song, and your phone will, indeed, then provide you with the prized photos. In fact, at this point you don’t even need to do that, since the photos have already proliferated around the Internet. (See them at comingsoon.net and at the above io9.com link.)

But an interesting question arises, which is: How different would the Alicia Keys song played during the end credits have to be from the original version of the song for only the credits rendition to be recognized by Shazam as the correct one to cough up the Sinister Six photos? More to the point, can a specific version of a song function as the sonic equivalent of a QR code. QR codes are those square descendents of zebra codes, such as the one shown below. The “QR” stands for “quick response.” They can contain information such as a URL, which when activated by a phone’s camera can direct the phone’s browser to a particular web page. This QR code links, only semi-helpfully, to the web page on which this article originally appeared:

Is There Such a Thing as a Sonic QR Code?

Of course, from a procedural standpoint, Sony could have gotten around this alternate-version approach by having the song only be available in the credits, but that would have cut into sales of the soundtrack album — which would either have to lack the song entirely, or have its release delayed until several weeks after the film’s debut.

The recipes of these different song-identification apps, such as Shazam and its arch enemy Soundhound, are closely guarded secrets. Enough information is provided to allow for developer-level discussion, but ultimately the apps’ success (both in terms of successful-identification statistics and user adoption) depend on the how-to being at least semi-obscured. But there is quite a bit of information out there, including a 2003 academic paper by Shazam co-founder Avery Li-Chun Wang outlining the company’s approach at the time (PDF), which I found thanks to a October 2009 article by Farhad Manjoo on Slate.com. The summary at the opening of the paper reads as follows:

We have developed and commercially deployed a flexible audio search engine. The algorithm is noise and distortion resistant, computationally efficient, and massively scalable, capable of quickly identifying a short segment of music captured through a cellphone microphone in the presence of foreground voices and other dominant noise, and through voice codec compression, out of a database of over a million tracks. The algorithm uses a combinatorially hashed time-frequency constellation analysis of the audio, yielding unusual properties such as transparency, in which multiple tracks mixed together may each be identified. Furthermore, for applications such as radio monitoring, search times on the order of a few milliseconds per query are attained, even on a massive music database.

The gist of it, as summarized in handy charts like the one up top, appears to be that an entire song is not necessary for identification purposes, that only key segments — “higher energy content,” he calls it — are required. At least in part, this allows for songs to be recognizable above the din of everyday life: “The peaks in each time-frequency locality are also chosen according amplitude, with the justification that the highest amplitude peaks are most likely to survive the distortions listed above.” It may also explain why much of my listening, which being ambient in nature can easily be described as “low energy content,” is often not recognized by Shazam or any other such software. As a side note, this gets at how the human ear listens differently than a microphone. The human ear can listen through a complex noise and locate a a particular subset, such as a conversation, or a phone ringing, or a song for that matter.

Now, of course, there’s a difference between the unique attributes of emerging technologies and the desired results of marketing initiatives. Arguably all that Sony wanted to come out of its Shazam cross-promotion was to get word out about Spider-Man, and to buy some affinity for the Sinister Six with a particular breed of fan, and to that end it has certainly succeeded. Perhaps it also hoped to gain a little tech cred in the process, even if that cred is more window dressing than truly innovative at a technological level.

Still, the idea of a song as a true QR code lingers. Perhaps Harry Osbourne and Peter Parker could team up and develop a functional spec.