Archive

Archive for the ‘data mining’ Category

… Bing users want the 7 wonders, Google’s the 7 deadly sins

February 14th, 2010

Thought I’d check out Bing’s view of the world, since it recently became 11% of search by some accounts.  And, I have to say – bing [sic] it on, as more search provider options can only be a good thing.

So, Bing’s alphabetis unsurprisingly mainly the sames as Google’s, apart from the following, notable exceptions (I’ve excluded most of the entries that are either the same or very similar).  The listings are first (second) – and then Google’s for comparison.

  • bing (bank of america) – best buy (bank of america)
  • google (gmail) – gmail (google maps)
  • irs.gov (itunes) – imdb (itunes)
  • netflix (nascar) – netflix (nfl.com)
  • orbitz (office depot) – office depot (opm)
  • pogo (pandora) – pandora (photobucket)
  • utube (usps) – usps (ups)
  • www.google.com (walmart) – walmart (weather) 
  • zipcodes (zillow) – zillow (zappos)

And for numbers:

  • 2009 calendar (2012) - 2010 calendar (2012)
  • 53.com (50 cent) – 500 days of summer (50 cent)
  • 7 wonders world (7 zip) – 7zip (7deadly sins)
  • 89 (84 lumber) – 80’s music (80’s fashion)
  • 9 news (93x) – 90210 (92.3)

In summary, people who use bing are more worried about a year behind (2009 calendar instead of 2010), taxes (irs.gov), travel (orbitz), nascar instead of NFL, and finding their way back to www.google.com…

Also, bingers want to know about the 7 wonders of the world, googlers about the 7 deadly sins.  Huh.

admin data mining

The Google Alphabet

February 9th, 2010

Start typing something into google, and the now established “autocomplete” or live suggestions or whatever it’s called today pops up.  I thought I’d take a look at the Zeitgeist, and see what each letter brings (and the #2 result in brackets).  All pretty big-brand, although the second most common search beginning with an “r” is “reverse phone lookup”?  Seems like an unanswered mega-site right there. 

UPDATE: Since posting this, it’s been pointed out to me that there are only 2 humans in the list (Tiger Woods and Lady Gaga, good going!), although they are both second placers.   Have a look at Mark and Darren’s excellent BLN blog to see versions from other countries.

 

UPDATE2:  Added some numbers and characters – and notice that Google Ads now appear in the autocomplete (try typing in 000 (i.e. three zeros) into the google.com homepage…

  • amazon (aol)
  • best buy (bank of america)
  • craigslist (cnn)
  • dictionary (disney channel)
  • ebay (espn)
  • facebook (facebook login)
  • gmail (google maps)
  • hotmail (hulu)
  • imdb (itunes)
  • jcpenney (jet blue)
  • kohls (kmart)
  • lowes (lady gaga)
  • myspace (mapquest)
  • netflix (nfl.com)
  • office depot (opm)
  • pandora (photobucket)
  • qvc (quotes)
  • realtor.com (reverse phone lookup)
  • southwest airlines (sears)
  • target (tiger woods)
  • usps (ups)
  • verizon wireless (victoria secret)
  • walmart (weather)
  • xbox 360 (xm radio)
  • youtube (yahoo)
  • zillow (zappos)

 

So what about numbers?

  • 12 days of christmas (123 greetings)
  • 2010 calendar (2012)
  • 30 rock (3 lyrics)
  • 4chan (411)
  • 500 days of summer (50 cent)
  • 60 minutes (6abc)
  • 7zip (7deadly sins)
  • 80′s music (80′s fashion)
  • 90210 (92.3)
  • 0 balance transfer (007)

And I guess I should do the other common characters too…

  • .net framework (.net framework 3.5)
  • @properties (@live.com)
  • &nbsp (&hearts)
  • ¬_¬ (¬ alt code)
  • ?, !, “, $, %, *, ), (, ~, # etc – nothing…

admin data mining

Twitter Mining

January 2nd, 2010

Twitter, for all its fans and detractors, generates a LOT of data.  It may only be a very small percentage of the world’s population who tweet a lot, but even that can be representative of interesting/important trends and changes.

Oh, and by the way, is twitter worried about this twitter.com traffic trend (courtesy of compete.com)?

But in terms of mining the human chatter that happens through Twitter, who is doing anything interesting?  I didn’t find much:

1. TweetDeck‘s  TwitScoop

Dodgy naming aside, this column in tweetdeck is one I keep switched on.  But a keyword tag cloud isn’t exactly world-shattering in 2010.  Still, it’s there, and while the screengrab above doesn’t tell me much (and I’m still not sure what/who “snead” is, even after a search), it’s a good finger-on-the-pulse.

2. TrendsMap

Now this is more like it (and a better name).  A google mashup with (another) tag/word cloud floated on top, it gives an overview or trending topics, along with a real-time snapshot of individual tweets.   And no, I’ve no idea why blasphemy is a hot topic in Ireland right now…

 3. Neoformix

Now this guy I have a lot of time for.  He’s particularly well known for his “Twitter Stream” graphs, which shows word usage trends over time, as below. 

However, head on over to his projects page, and you’ll find charts that include “time of day word correlations” (as below), “Twitter Venn” (twitter Venn diagrams) and a host of other tools. 

So, I know Jeff at Neoformix isn’t the only guy doing interesting analysis of Twitter data, but what surprises me is how few people seem to be working on it – at least that I’m aware of.  Sure, the fact that the word “drunk” is tweeted most between midnight and 5am isn’t going to change the way we see the world – but what about if a brand name suddenly takes off?  Or the word “recession” is on a downwards trend, or “flu” on an upwards trend? 

More Twitter Mining, please…

admin Uncategorized, data mining

An Average Day

August 10th, 2009

The New York Times chose to lead this story by pointing out that “the unemployed have more time for leisure and socializing“.   Yes, that seems pretty likely.

But in any case the interactive graph they created from their data is a pretty interesting breakdown of how different groups spend their days, by time of day. 

graphday

It’s full of “obvious” information, like men seem to watch more TV and do less housework, but it’s intriguing to play around with the different groups, they’ve certainly tried to analyze the data.  Here are some facts that I’m sure you wanted to know:

  • Women shop for an average of half an hour a day
  • At midnight, 2% of all Americans (which equates to 6m people) are working
  • At noon, 4% of all Americans (12m) are asleep…
  • 25% of people spend more than an hour travelling to work
  • At 8:50pm, 2 fifths of people are in front of the television

More evidence that we’re a pretty predictable bunch of beings…

admin data mining, visualization

Data Loss, Data Gain

April 27th, 2009

A couple of things came to light today, which all seem tied together by the common thread private data.

magn

Firstly, I noticed ma.gnolia.com was down. Aside from a frustrating domain name, they had a reasonably successful social bookmarking service. Sadly, due to lack of backup (!), they’ve lost the majority of the bookmarks/favorites that they stored on behalf of their users…

Bang, useful personal data gone.

Secondly, I tuned into “More or Less” a great statistics-focussed radio show on the BBC, on a recommendation from my Dad. Aside from a really great interview with the author of “Sustainable Energy: Without the Hot Air” which I’ll write about another time, the presenter mentioned Daytum. Setup by the Nicholas Feltron, the guy who exposes his personal stats meticulously collated and designed up at feltron.com each year, the site enables you to have your own “Personal Dashboard”.

youdata

Thirdly, I spotted an ad which had a “YouData” logo on it. Smelling a 2.0 startup, I checked out the site - and yes, it’s a (US based) service that lets you sell your attention – the old “pay me to advertise at me” model, but brought up to date.

So how do these strands tie together? Well, they are all about people realising that their own data is:

  1. Valuable and useful to them
  2. Valuable and useful to others
  3. Therefore, has a monetary value

Problem is, losing bookmarks at Magnolia is a greater value by some margin than what someone like YouData would pay for that data. And so that’s the opportunity – finding a way to bridge the gap between how much I value my data and time, and how much others (typically advertisers) value it. The answer may be that in most cases, that gap can’t be bridged?lady gaga poker face

admin Profiling, data mining, privacy

Realtime – Sprint’s Widget Fest

April 13th, 2009

now

I think realtime reporting IS the future, and dashboards that show live, pushed information are going to be ever more ubiquitous.  Hardly any exist right now, but Sprint as part of its “now” marketing campaign has put together a great live dashboard over at http://now.sprint.com/widget/

In addition to more common widgets from World Population to “top words being used online”, there are a bunch more, such as “911 calls being made” to “sticky notes being produced” to “transplants today”.  Some of the more amusing ones are:

- A “push now” button, which (predictably) does nothing, but reports that 66,713 other people have clicked it
- “You, now”, which takes your webcam feed to show you, now
- A “habitable planets” counter

While you’re browsing all of that, a female voiceover provides more realtime data, such as  ”The earth will travel 18 miles between right now… and now”

Genius, and here’s hoping more useful versions come along soon to gadgets near me.

admin Web, data mining, top lists

Privacy and StreetMaps, Again!

April 4th, 2009

I’ve been interviewed twice now (on local radio, nothing too mind-blowing) about Google Street Maps and Privacy.

On one level, it’s the same knee-jerk reaction that happened when the service launched States-side.  A lot of stuff about “what if I’m captured coming out of X-place, or holding hands with Y”.  Well, here’s the news:  it won’t usually be Google StreetMaps that catches us out on those moments…

On another level, stories of people stopping or barricading the Google StreetMaps car have made people think there might be something more to this – and when Google move to countries where privacy is a bigger issue, what will happen then?

My take on this: privacy IS being eroded, on a daily basis, around the world.   That’s just a fact.  Google can blur as many faces as it wants, but I’m being tracked by cameras, URL tracking software, mobile/cellphone masts – and guess what, Google: my car, my branded van (if I had one), my house are all still personally identifiable.

Two things make this loss of privacy okay:

  • The technology that comes with it (including StreetMaps) outweighs the risks by a seriously large factor
  • There is SO MUCH DATA, that no-one and nothing can really do anything that worrying or invasive with it.  There’s too much of it being gathered, and most of it is never looked at.  At least for now, and in countries that don’t have some sort of evil regime in power…

It may be the fact that Google is doing it to make money, but essentially they’re just putting online what we can walk to on our own two legs and see for ourselves.  So let’s calm down, enjoy the benefits, and only go out at night with a hoodie pulled over our faces.

admin Cars, data mining, privacy

Connecting things that aren’t connected

March 1st, 2009

Humans tend to make connections between things, even when those connections don’t exist.  Our brains are constantly trying to rule-build and organise, and often get it wrong.

Today for a while, when a plane passed overhead (they do often where I am), the bulb on my desk lamp dimmed. I, of course, assumed the two events were related.  The fact is, planes passed over every couple of minutes, and the light only dimmed every half hour, and I’ve just now found it’s because I was kicking the cable under the table without knowing it.  They’re unconnected…

That’s what psychologists call an illusory correlation – the false connection of two things, based on data.   (it’s also a tongue-twister).

Sod’s law (Murphy’s Law) is a example – we tend to connect negative events, and ignore positive (or neutral) ones.  How often have you been driving along, only to be confronted at the top of a hill and round a bend with a truck that’s halfway across the road?  “Always happens at the top of a hill and round a bend, typical!” you’ll think.  Obviously, 99% of the time it doesn’t, but we’ll remember the times it does.

So why is this important?  Well, it usually isn’t, because we muddle along anyway.  It can get odd when unexplained events (lights in the sky) are connected with unconfirmed causes (UFOs from outer space).  Or when “there’s no smoke without fire”, which has probably convicted a fair number of innocent people. 

My interest is because at my company, Cognitive Match (of which Favy is now a part) we’re focussed on ways of making REAL connections in observed data.  And equally I guess uncovering the “illusory” ones…

admin AI, Psychology, Statistics, data mining

Exploring the news, visually

September 25th, 2008

I’ve said on here before that not enough visualisation is being used, so it’s great to find new ideas – even if they’re not totally useable/useful. DoodleBuzz is one such example, you could call it the Zen of news exploration. Even after 5 minutes of doodling, I have relatively little idea of what’s going on. But I do feel a lot calmer.

admin data mining, visualization

Cloning me, Cloning you: RFID implants

August 7th, 2008

Ever since Kevin Warwick implanted a chip in his arm in 1998 (the main use of which seemed to be to turn lights on and off in rooms he entered – or at least that’s the bit I remember), RFID chips have been spreading far and wide.

Our most prominent use in the UK has probably been the “biometric” chips in our passports, which yesterday The Times reported as being really very easy to clone.  No suprise there then. 

But a bit concerning if you think of some of the uses suggested for RFID – the most interesting (in terms of learning about people and behaviour) is implanting them, Kevin-Warwick style, under your skin.   A company called Verichip specialises in this, and has implanted them in a Mythbuster, and in this US policeman, who claims his life was saved by the medical data stored on the chip.

Aside from a couple of scare stories linking the chips to heightened cancer risk, the key weakness is the ability to clone and hack them.  Of course, this will probably never be fully solved, and that does mean that if you connect the chip to your credit card details, medical history (although see above) or other area, it could get risky. 

But some benefits are clear

- Medical chipping to identify and feed medical records of a patient quickly, especially at the scene of an accident, etc.

- Ease of entry: the chips are already in use at one night club in Spain.  Taking the chip out of your Oyster card and implanting it in your hand would also mean no more forgetting or fumbling for your card holder on the underground in London.

Tracking: the benefits of knowing where you are start with some typical applications like tracking the whereabouts of prisoners, or kids for their safety, and of course all objects and animals for identification and delivery and so on.  Furthermore, Kevin Warwick’s light switching can be extended to everything around the home of office – lighting, heating, turing your hifi on and off, logging you into to your home PC and so on.  More interstingly, your current location (or at least your location when checked by an RFID scanner) can open up new data streams that you can use to your advantage (or other can misuse to theirs, if so allowed).

Still, I don’t think I’m ready to be chipped quite yet, and if I am, maybe a GPS receiver/transmitter would be better.  If more uncomfortable.

admin RTBB, data mining ,