Archive

Archive for the ‘Statistics’ Category

Connecting things that aren’t connected

March 1st, 2009

Humans tend to make connections between things, even when those connections don’t exist.  Our brains are constantly trying to rule-build and organise, and often get it wrong.

Today for a while, when a plane passed overhead (they do often where I am), the bulb on my desk lamp dimmed. I, of course, assumed the two events were related.  The fact is, planes passed over every couple of minutes, and the light only dimmed every half hour, and I’ve just now found it’s because I was kicking the cable under the table without knowing it.  They’re unconnected…

That’s what psychologists call an illusory correlation – the false connection of two things, based on data.   (it’s also a tongue-twister).

Sod’s law (Murphy’s Law) is a example – we tend to connect negative events, and ignore positive (or neutral) ones.  How often have you been driving along, only to be confronted at the top of a hill and round a bend with a truck that’s halfway across the road?  “Always happens at the top of a hill and round a bend, typical!” you’ll think.  Obviously, 99% of the time it doesn’t, but we’ll remember the times it does.

So why is this important?  Well, it usually isn’t, because we muddle along anyway.  It can get odd when unexplained events (lights in the sky) are connected with unconfirmed causes (UFOs from outer space).  Or when “there’s no smoke without fire”, which has probably convicted a fair number of innocent people. 

My interest is because at my company, Cognitive Match (of which Favy is now a part) we’re focussed on ways of making REAL connections in observed data.  And equally I guess uncovering the “illusory” ones…

admin AI, Psychology, Statistics, data mining

China has more internet users than any other country

September 26th, 2008

253 million at the latest count, according to the government agency China Internet Network Information Centre.   Impressively, 214 million of those are broadband users – but the biggest growth is mobile phone access.   Apparently, the “Great Firewall of China” is still blocking or rendering unusable large numbers of sites, so any strategy looking to address China needs to have local hosting!

admin Statistics, Web

13 hours every minute…

September 16th, 2008

… is how much video is uploaded to Google’s Youtube, according to their official blog post today.  And that’s an “exponentially growing” statistic they believe.   That’s 18720 hours a day, or 780 days every day.  Still with me?   It’s a lot. 

Of course, there will be a long tail of this stuff which is never seen by more than the person who created it, and at the top end there will be a small percentage that are viewed a lot.  Sure enough, a wildcard type search (searching for “*”, if that’s valid) turns up the top video with 101 million views (a music video, like a lot in the top results of that list)…  Wikipedia’s got some notes on the “heavy tail” distribution, which I’m guessing is what this is.

The important take out is that very quickly (by which I mean already) there’s too much content on Youtube for one person to make sense of.  And therefore ways of pre-selecting, filtering and locating stuff of interest – like in every area online now – are needed, beyond just search…

admin Statistics, Video

Reality Mining

July 31st, 2008

What does your cell phone know about you?  Well, a fair bit according to researchers at MIT.  For instance, they claim they can divine, among other things:

- how happy and productive you are
- your social status
- your social group

Fundamentally this is just an extension of any form of data mining – take a large amount of data, and try and make some determinations from it.  The examples based on social group and status can be fairly easily explained – by where you spend your time (the types of shop, street, district), and other mobile phones that yours tends to hang out with.  Happiness and productivity was a correlation they discovered when they combined location and call data with questionnaires.

The same group are doing some interesting work with other areas that use mobile data – such as “social serendipity” – trying to match users that happen to be in similar locations, and that have similar profiles or interests.  People have tried to release products into that space for as long as I can remember, but no-one’s yet cracked it, so it will be interesting to see if this research helps.

A lot of reality mining to date has been to do with mobiles (like BlueTooth MyBlogLog), but obviously anything that can sense us and feed data about us will add to this: cars, PCs, toasters… The more, to my mind, the merrier.

admin Profiling, Statistics, data mining, geotargeting

1 trillion unique URLs

July 26th, 2008

Google search engineers hit the new milestone of 1 trillion unique URLs, a number which is growing at “several billion per day”.  Even with ignoring duplicates, and assuming a lot of pages get shelved as unimportant (endless calendar day pages, empty or forgotten pages, etc.), that’s a lot of content.  That’s 166 URLs for each person on the planet, and 10 for each star in the galaxy (assuming 100 billion stars).  So, quite a lot.

While we’re pushing out big numbers, here are some more to goggle at… They’re not sourced (some of them are estimated, some might be wildly out – but all were spotted on fairly reputable sites).

  • 1.4 billion internet users
  • 50 billion videos viewed online in February
  • 3.3 billion searches on Baidu per month
  • 500 million videos on YouTube
  • 4.1 billion photos on Facebook
  • 2 billion images on Flickr
  • 533 million results for “insurance” search
  • 10 million articles on Wikipedia
  • 3 billion songs sold on iTunes
  • 100 million MySpace members
  • 10 million songs scrobbled on last.fm a day

Pretty overwhelming, huh.

admin Statistics ,