I stumbled across this while over the holidays, linked from some article that some friend posted on Facebook. I don’t recall what I saw on Facebook, which means this is almost certainly more interesting, if for no other reason than the amusing generator at the top.
And, of course, there's the image posted here...
Anyway, the methodology that the author used to piece together the information is intensive but also pretty clever. That’s pretty evident, though, given that Netflix’ own people essentially say that very thing in response to seeing the data. What I thought was extra-cool, though, was the implications in how a similar thought process could be used to categorize a lot of things - maybe not to the extent that Netflix can given their massive, massive library. Sure, you might say “what's the point of doing these for a smaller set of items, though?” You wouldn’t have an indefensible argument with that. However, in preparing early in a data set, you could pretty easily implement such a solution with minimal sustained work over time.
Given my day job, I fall to slot machine games as an example. Slot machines are all pretty similar, right? There are handles (well, okay, nobody notices handles, but there are buttons on both physical and virtual machines that do the same thing), there are tumblers, there are paylines, there are game graphics. But the number of potential differences are vast even under that umbrella. Using this example, if you created forty different slots machines, you could find yourself being able to create subcategories in a Netflix style, such as “Slots Games that Feature Licensed Characters” or “Slots Games with Included Minigames” or “Slots Games with More than 6 Paylines.” Obviously I’m terrible at writing these, but you get the idea – offering up granular subgenres could help direct players better than a more generic setup where all slots games, no matter how different, are in one group, and for the web could allow very crawler-friendly, content-rich pages for indexing.
Netflix uses this as a recommendation engine, comparing the unseen metrics for each movie a user watches against the same metrics in their database for other movies to come up with other movies the user might enjoy. Done well, this is a force for user retention – a user is almost surely going to be happier with the service they get if they are constantly being prompted to watch other movies that the numbers imply they’ll enjoy (and, of course, if the numbers are right). There's no reason to think that the same principle wouldn't work for other sites and other data as well, particularly if the metrics were crowdsourced in some way such as via a ratings system that goes beyond a simple one-to-five range to really dig into the facets of the item itself. The key is to be descriptive, and if the things you want to categorize can't be described, well, you're going to be out of luck anyway, aren't you.
- Log in to post comments