Demand Media: Cashing in on Crapification, Part 1

by Kris_Tuttle on February 14, 2011

Although I started out to write a report on Demand Media this report ended up being more about the online con­tent prob­lem, Google, Quora and the prospects for new search engines.  The full report (PDF) is avail­able here: Demand Media Crapi­fi­ca­tion.

The report is a bit long for a sin­gle email or blog post so over the next few days I’ll post the three sec­tions of con­tent in order.  The first sec­tion deals with what’s behind an alarm­ing decline in the gen­eral con­tent returned from the Internet.

After 15,000 hours of hands-on research it’s clear that most of the eas­ily accessed Inter­net con­tent, par­tic­u­larly what is returned from Google search results, is crap. Although gen­er­ally acknowl­edged by peo­ple in the hard-core knowl­edge worker seg­ment, this view­point is now becom­ing main-stream.

So how did things get this bad? While adver­tis­ing may be the “root of all evil,” how deep it goes depends on the con­di­tions. The num­ber and size of the forces at work here are prodi­gious. Here are some of the more tech­ni­cal and content-related forces:

“Search Engine Opti­miza­tion,” or SEO, takes advan­tage of the algo­rithms Google uses to game the sys­tem. By rely­ing on com­put­ers to rank search results, we invite the end­less con­test between those that find ways to out­smart the algo­rithms and the search engines try­ing to return valu­able infor­ma­tion. Google some­times make changes to their algo­rithms in an attempt to foil cur­rent strate­gies but at the same time opens up other loop­holes for exploiters to use. Another neg­a­tive side effect is that these changes some­times upset search rank­ings that were pre­vi­ously good in what amounts to a big reshuf­fling. Google has much bet­ter algo­rithms for return­ing search results but we are unlikely to see them implemented—more on that later.

Dumb Clicks: Gen­er­ally, the more clicks some-thing gets the higher it ranks. It doesn’t mat­ter if the clicks were of the “fooled you, there’s no real con­tent here” vari­ety. (See the attached exam­ple of a com­mon top-ranked page.) Because peo­ple tend to click on some­thing near the top of the results this crap keeps get­ting clicks. Even a senior search engi­neer at Google observ­ing user behav­ior said: “How can you not see that this is a spam page and click on it?!” Time only makes this pat­tern worse as this type of con­tent “crowds out” every­thing else.

Trick­sters & Cheap Shots: One exam­ple of this cat­e­gory is alternativeto.net. In this case, some­one must have noticed a pop­u­lar search tech­nique of using “alter­na­tive to” as in “alter­na­tive to Pho­to­shop.” Sud­denly most queries put in this way returned crappy results full of ads rather than use­ful infor­ma­tion. There is a legion of mostly small-time oper­a­tors that use domain mis­spellings and other sim­ple ploys to cap­ture traf­fic and a few clicks on “parked” domains. This class of prob­lem is fairly easy to solve but still pops up from time to time again like a dis­ease you can’t quite eradicate.

Con­tent Farms: This is where play­ers like Asso­ci­ated Con­tent (acquired by Yahoo), About.com and Demand Media take the game to a new level. Con­tent farms are harder to counter because they invest some money in “real” con­tent to insert them­selves into search results. We’ll save the analy­sis for the fol­low­ing sec­tion on Demand Media. Con­tent farms may seem ben­e­fi­cial com­pared to the SEO vil­lains and trick­sters but that’s what makes them insidious.

Syn­di­ca­tion: Many sites are so des­per­ate for con-tent they are will­ing to syn­di­cate just about any-thing. Because they are often fol­low­ing some of the same SEO strate­gies and now pro­vide even more links to the orig­i­nal con­tent source, the rank­ing and ubiq­uity of a lousy piece of con­tent solid­i­fies like a plaque to block nor­mal infor­ma­tion flow.

Fil­ter Fail­ure: “Leak­age” is when some­thing that is sup­posed to pro­tect us from garbage con­tent begins to break down. Even paid-for ser­vices like Cap­i­talIQ rely on auto­mated fil­ter­ing and even­tu­ally get pen­e­trated by spam con­tent that cor­rupts the feed. This is just another fac­tor sug­gest­ing that effec­tive fil­ters in the future will require some level of human val­i­da­tion even if it also is auto­mated. Some of those meth­ods are described in the final sec­tion. (See this related video from Clay Shirky.)

Two other more anthro­po­log­i­cal fac­tors play a major role in the drive to lower qual­ity content:

Recency bias: Good con­tent has long been pushed out of focus by infe­rior ver­sions be-cause they are newer. This is true in long-standing cat­e­gories like movies and books. While fresh­ness has a value it ends up being coun­ter­pro­duc­tive in many cases and results in a “rein­vent­ing of the wheel” in the case of fac­tual, well rea­soned and doc­u­mented con­tent. Good con­tent is often not “sticky,” fades away quickly, and becomes hard if not impos­si­ble to find. Wikipedia is one exam­ple where this is not the case which is why so many peo­ple search it explicitly.

All clicks are not cre­ated equally: There’s a strong inverse rela­tion­ship between intel­li­gence / exper­tise / judg­ment / insight and the ten­dency to click. On Google “a click is a click,” so con­tent gets dri­ven to the low­est com­mon denom­i­na­tor. Steve Jobs prob­a­bly makes fewer, more informed clicks and online deci­sions than Paris Hilton—but Paris and her ilk are what drive con­tent rankings.

Solv­ing some of these prob­lems is actu­ally eas­ier than it would first appear, because har­ness­ing the power of human behav­ior and adding more avail­able infor­ma­tion to the analy­sis can lead to excel­lent results. Next we’ll look specif­i­cally at Demand Media, the gorilla of the con­tent farms.

[Dis­clo­sure: none]

Comments on this entry are closed.

{ 1 trackback }

Previous post:

Next post: