Category Archives: data mining

Who gets the credit? Attribution Analysis…

A while ago, I wrote about the long list of reasons we click on something.  This is just the background to a much bigger debate – of why we eventually buy something.  That final conversion is fought over every day by a hundred different parties.  And yet the state of the market is still generally to pay for the last click…

Continue reading

… Bing users want the 7 wonders, Google’s the 7 deadly sins

Thought I’d check out Bing’s view of the world, since it recently became 11% of search by some accounts.  And, I have to say – bing [sic] it on, as more search provider options can only be a good thing.

So, Bing’s alphabetis unsurprisingly mainly the sames as Google’s, apart from the following, notable exceptions (I’ve excluded most of the entries that are either the same or very similar).  The listings are first (second) – and then Google’s for comparison.

  • bing (bank of america) – best buy (bank of america)
  • google (gmail) – gmail (google maps)
  • irs.gov (itunes) – imdb (itunes)
  • netflix (nascar) – netflix (nfl.com)
  • orbitz (office depot) – office depot (opm)
  • pogo (pandora) – pandora (photobucket)
  • utube (usps) – usps (ups)
  • www.google.com (walmart) – walmart (weather) 
  • zipcodes (zillow) – zillow (zappos)

And for numbers:

  • 2009 calendar (2012) - 2010 calendar (2012)
  • 53.com (50 cent) – 500 days of summer (50 cent)
  • 7 wonders world (7 zip) – 7zip (7deadly sins)
  • 89 (84 lumber) – 80’s music (80’s fashion)
  • 9 news (93x) – 90210 (92.3)

In summary, people who use bing are more worried about a year behind (2009 calendar instead of 2010), taxes (irs.gov), travel (orbitz), nascar instead of NFL, and finding their way back to www.google.com…

Also, bingers want to know about the 7 wonders of the world, googlers about the 7 deadly sins.  Huh.

The Google Alphabet

Start typing something into google, and the now established “autocomplete” or live suggestions or whatever it’s called today pops up.  I thought I’d take a look at the Zeitgeist, and see what each letter brings (and the #2 result in brackets).  All pretty big-brand, although the second most common search beginning with an “r” is “reverse phone lookup”?  Seems like an unanswered mega-site right there. 

UPDATE: Since posting this, it’s been pointed out to me that there are only 2 humans in the list (Tiger Woods and Lady Gaga, good going!), although they are both second placers.   Have a look at Mark and Darren’s excellent BLN blog to see versions from other countries.

 

UPDATE2:  Added some numbers and characters – and notice that Google Ads now appear in the autocomplete (try typing in 000 (i.e. three zeros) into the google.com homepage…

  • amazon (aol)
  • best buy (bank of america)
  • craigslist (cnn)
  • dictionary (disney channel)
  • ebay (espn)
  • facebook (facebook login)
  • gmail (google maps)
  • hotmail (hulu)
  • imdb (itunes)
  • jcpenney (jet blue)
  • kohls (kmart)
  • lowes (lady gaga)
  • myspace (mapquest)
  • netflix (nfl.com)
  • office depot (opm)
  • pandora (photobucket)
  • qvc (quotes)
  • realtor.com (reverse phone lookup)
  • southwest airlines (sears)
  • target (tiger woods)
  • usps (ups)
  • verizon wireless (victoria secret)
  • walmart (weather)
  • xbox 360 (xm radio)
  • youtube (yahoo)
  • zillow (zappos)

 

So what about numbers?

  • 12 days of christmas (123 greetings)
  • 2010 calendar (2012)
  • 30 rock (3 lyrics)
  • 4chan (411)
  • 500 days of summer (50 cent)
  • 60 minutes (6abc)
  • 7zip (7deadly sins)
  • 80′s music (80′s fashion)
  • 90210 (92.3)
  • 0 balance transfer (007)

And I guess I should do the other common characters too…

  • .net framework (.net framework 3.5)
  • @properties (@live.com)
  • &nbsp (&hearts)
  • ¬_¬ (¬ alt code)
  • ?, !, “, $, %, *, ), (, ~, # etc – nothing…

Twitter Mining

Twitter, for all its fans and detractors, generates a LOT of data.  It may only be a very small percentage of the world’s population who tweet a lot, but even that can be representative of interesting/important trends and changes.

Oh, and by the way, is twitter worried about this twitter.com traffic trend (courtesy of compete.com)?

But in terms of mining the human chatter that happens through Twitter, who is doing anything interesting?  I didn’t find much:

1. TweetDeck‘s  TwitScoop

Dodgy naming aside, this column in tweetdeck is one I keep switched on.  But a keyword tag cloud isn’t exactly world-shattering in 2010.  Still, it’s there, and while the screengrab above doesn’t tell me much (and I’m still not sure what/who “snead” is, even after a search), it’s a good finger-on-the-pulse.

2. TrendsMap

Now this is more like it (and a better name).  A google mashup with (another) tag/word cloud floated on top, it gives an overview or trending topics, along with a real-time snapshot of individual tweets.   And no, I’ve no idea why blasphemy is a hot topic in Ireland right now…

 3. Neoformix

Now this guy I have a lot of time for.  He’s particularly well known for his “Twitter Stream” graphs, which shows word usage trends over time, as below. 

However, head on over to his projects page, and you’ll find charts that include “time of day word correlations” (as below), “Twitter Venn” (twitter Venn diagrams) and a host of other tools. 

So, I know Jeff at Neoformix isn’t the only guy doing interesting analysis of Twitter data, but what surprises me is how few people seem to be working on it – at least that I’m aware of.  Sure, the fact that the word “drunk” is tweeted most between midnight and 5am isn’t going to change the way we see the world – but what about if a brand name suddenly takes off?  Or the word “recession” is on a downwards trend, or “flu” on an upwards trend? 

More Twitter Mining, please…

An Average Day

The New York Times chose to lead this story by pointing out that “the unemployed have more time for leisure and socializing“.   Yes, that seems pretty likely.

But in any case the interactive graph they created from their data is a pretty interesting breakdown of how different groups spend their days, by time of day. 

graphday

It’s full of “obvious” information, like men seem to watch more TV and do less housework, but it’s intriguing to play around with the different groups, they’ve certainly tried to analyze the data.  Here are some facts that I’m sure you wanted to know:

  • Women shop for an average of half an hour a day
  • At midnight, 2% of all Americans (which equates to 6m people) are working
  • At noon, 4% of all Americans (12m) are asleep…
  • 25% of people spend more than an hour travelling to work
  • At 8:50pm, 2 fifths of people are in front of the television

More evidence that we’re a pretty predictable bunch of beings…