Posts Tagged ‘statistics’

More about lost sleep

January 3, 2017


Following the earlier study,  our client has now produced some further data covering three weeks’ holiday, with results as shown above.  We suspect that the slightly anomalous results for Sunday are indeed due to there only being three observations.  The mean amount of sleep per night is 6:32 for the Work period and 7:08 for the Holiday period.

So there is some difference, but hardly enough to say that the client is compelled to give up work.  He will have to make his own decision, which is rarely a popular piece of advice…


Were it not that I have bad dreams…

December 1, 2016


A middle-aged bureaucrat has collected data on his sleep pattern for 29 weeks or so. He works 4 days a week (not Wednesdays) and wants to know whether these data should impel him towards early retirement.  We can see from the above that the prospect of going back to work on Monday and Thursday causes some lack of sleep.

Table of sleep data

  Mon Tue Wed Thu Fri Sat Sun
Mean 06:25 07:10 05:12 06:52 07:40 07:07 05:00
N 29 29 29 29 29 28 28
Min 01:00 03:00 02:00 04:00 05:00 05:00 00:00
Max 08:30 09:00 07:30 09:00 10:00 12:00 07:00
Q1 06:00 06:30 05:00 06:00 06:30 06:52 04:30
Q2 06:30 07:30 05:30 07:00 07:30 07:15 05:45
Q3 07:30 08:00 06:00 07:30 08:00 08:07 06:30
StDev 01:27 01:29 01:19 01:07 01:07 01:26 01:35

The overall mean is 06:29, while results here indicate an average of 06:50 for those aged 40-55.  The difference is hardly large, but in the other hand the justified expectation of sleeping poorly tw0 nights a week is not something one would wish to continue indefinitely.

We presume that there are essentially two possible explanations for the smaller amount of sleep on Sunday and Tuesday nights:  apprehension and having to get up early/change in routine.  If it was purely a case of the latter, we would expect the effect to be greater on a Sunday night since there are two days of changed routine to account for as against one on a Wednesday night.  But in fact there is no significant difference between the means for Wednesday and Sunday, so we presume apprehension is playing a role here.

Our preliminary recommendation would be for the client to collect the same data for a substantial period of leave so as to establish how far the results above deviate from the natural pattern.  And here it is!

Try Books! read in 2014

February 20, 2015
Stoner 9 8 0
The Road 8.5 3 0
We Need New Names 8 0 0
This Boy 8 2 1
Under The Skin 8 1 1
Gone Girl 7 1 1
Old Filth 7 1 0
A Winter in the Hills 7 0 2
The Children Act 7 0 0
Telling Liddy 6.5 0 0
Almost English 6 0 3
Seizure 5 0 5

The table shows the books read by Try Books! in 2014 and their median scores, along with the number of times someone gave it their highest or lowest rating for the year (remember ties!)


John Williams and Stoner are the clear winners here, while Erica Wagner and Seizure were rather less successful.  But it was A Winter in the Hills  that caused the real excitement, of course.

Seizure Other
Stoner Howard, Jo Aruni, Christine, Dick, Heather, Judy, Linda
Other Jocelyn, Stephanie, Suzannah Vicky

The table above classifies people according to whether Stoner and Seizure were indeed their best and worst books respectively. As ever, this was complicated by not everyone having read (scored) every book, but Howard and Jo seem to be representing the mainstream with Vicky as the rebel.

What does ‘The Information Capital’ have to do with South London?

January 3, 2015


This book presents 100 maps and graphics that will change the way you view the city.  Leaving aside Oliver Uberti’s…sketches…of some of the animals to be found in London Zoo, let’s have a look at some data and see what it means for South London.

South London--City Of Dreadful Night

South London–City Of Dreadful Night

The illustration above shows the locations where pictures posted on Flickr were taken.  Not South London it seems, apart from the Elephant, Walworth Road and Greenwich Park.  South Londoners are condemned to perpetual darkness, starved of the light of exposure on Flickr…

Concentrations of crime

Concentrations of violent crime

Here we see violent crime hotspots, which seem to pick out railway/Underground stations with unerring accuracy.  3 is Brixton, 8 the Elephant, 9 Peckham, 10 Croydon, 18 Woolwich.



Above we see deprivation, coloured according to the scheme below:


So, Lewisham varies between ‘Most deprived’ red and a yellow which has no label but probably means something neutral. If the green was instead blue on this map, one might begin to suspect some hidden agenda…


How we get to work...

How we get to work…

Here we have the most popular modes of transport for getting to work by home location, coded according to the scheme below.


Cor, that’s found me out–when I lived in Peckham I used to get the bus to work, but now I get the train. Are those light blue types really driving to work or to the station say?

Occupational tree (or graph)

Occupational tree (or graph)

Now this would be really interesting if it was explained properly.  The idea is that wards are grouped together according to their concentrations of different job types; but we don’t learn what the distances or branching or angles mean.  My earliest memories are of Charlton 50 years ago and I’ve made it as far as Crofton Park, or travelled 3 nodes on this map.  Clearly I’ve not made very much progress at all, but it would be nice to know the details of my lack of achievement.

Cohabiting in Peckham

Cohabiting in Peckham

As for that love and romance thing, it is suggested that cohabiting is prevalent in Peckham (above) separation is noteworthy in New Cross (below).

Separated in New Cross

Separated in New Cross

Finally, we return to dodgy statistics on obesity.  The figure below shows obesity…



or rather, the boroughs expanded or contracted to reflect the percentage of 10-11 year olds there classified obese in 2012-13.  Which is a slightly strange measure to use–presumably those were the figures closest to hand.

So, Sarf London: a land of obesity and irregular liaisons, subsisting in obscurity (apart from Greenwich Park during the Olympics), lit only by the odd flare of crime…And no Tube either…


A shameful story about obesity

November 1, 2014

The figure above caused some animated discussion on Brockley Central, with many recondite hypotheses being advanced to explain the seeming kinship between South London and unimaginably remote parts of the North.

The first thing to do is to work out what this data is and what it might be telling us.  It certainly looks like Table 7.3, Finished Admission Episodes with a primary diagnosis of obesity, by Government Office Region (GOR) of residence, Strategic Health Authority (SHA) of residence, Primary Care Trust (PCT) of residence and gender, 2012/13 from the data here.  So what are these episodes about?  There are 10,957 of them, and there are 8,024 in Table 7.8 Finished Consultant Episodes with a primary diagnosis of obesity and a main or secondary procedure of ‘Bariatric Surgery’ by Government Office Region (GOR) of residence, Strategic Health Authority (SHA)  of residence, Primary Care Trust (PCT) of residence and gender, 2012/13.  While ‘Admission Episodes’ and ‘Consultant Episodes’ aren’t quite the same, it’s clear that T7.3 is largely about ‘bariatric surgery’, which includes stomach stapling, gastric bypasses and sleeve gastrectomy.  These procedures have traditionally had a fairly marginal place in the NHS, so we suspect that differences in the willingness to perform or to pay for these procedures may be the operative factor here.

There is data specifically on obesity here.  That gives a ‘Top 10’ as follows, which is rather different from the list we started with above–note that the sample for City of London is probably too small to draw definite conclusions.


Area Name Weighted Sample % Obese
Halton 309 35.2%
Barnsley 609 34.4%
South Holland 231 32.5%
Mansfield 274 32.4%
Telford and Wrekin 401 32.3%
North Lincolnshire 424 32.0%
Barking and Dagenham 409 31.6%
East Lindsey 363 31.6%
Thurrock 379 31.4%
City of London 20 31.4%

While not all of the areas in the two datasets are identical, we can make a reasonable job of combining them for London as below.



Any relationship between the two is rather slight, and it does seem that Lambeth, Southwark and Lewisham have high rates of admission for their prevalence of obesity, rather than high obesity as such. It seems reasonable to conclude that we are seeing wide variations in the propensity to subject obesity to hospital treatments, rather than in obesity as such.

A Consensus Reading List for Operational Research

August 13, 2014

Try double-clicking if this is too small!

In another attempt at drawing up a reading list for operational research, I did a Google search on “reading list” AND (“operational research” OR “operations research” OR “management science”).  Confining myself to what seemed to be relevant cases, I got contributions from the following institutions:  Southampton, Derby, Strathclyde, Edinburgh, Cass Business School, Aston, Imperial College, Leicester, Wisconsin, Dublin City University, Nottingham, George Mason University, Sheffield, London University International Programme, Napier, Leiden.  The table above shows those items that occurred more than once, except I omitted a book on Linear Programming in MATLAB as being of no interest outside a teaching context.

Well, it all depends on what you’re teaching of course.  And the level of bibliographic detail depends on the source.  Rosenhead et al, Chatfield and Pidd (2004) should be of some use to practitioners anyway.

Massive Green Party Breakthrough In Prospect

August 9, 2014


A Green Party activist writes as follows before setting off for a month’s continental holiday:

The Indy has an article today that bigs up the Green Party: it says the latest poll (actually a poll of polls so should be, very marginally, more accurate than just one?) shows an increase in the Green vote from – wait for it – 5 per cent to 6 per cent. I’m not getting excited yet because I imagine the confidence interval on the two figures would dwarf any supposed increase, but do you have any idea what the confidence intervals on these figures are likely to be? A friend here, who knows I’m likely to be standing for the Greens in Putney in the elections for the Westminster Laughing House in May 2015, sent me the link in some excitement, but I fear it’s utter bollocks. Any thoughts? My feeling is that any supposed “increase” is completely illusory, but it would be good to have a more informed view.  The article can be found here.

Of course the article fails to say anything about the polls or how many of them there are or how they are combined.

Opinion polls tend to have a sample size of about 1000. We want to know whether an apparent increase in Green support from 5% to 6% is significant. So we ask what the probability of getting at least 60 people in a sample of 1000 declaring Green support is if the underlying propensity to vote Green is really 0.05. From the properties of the binomial distribution (see here  for instance) this comes out at 0.087, or about 9%. So there might be a real effect, if you accept a 10% significance level and if the sampling in the poll is unbiased–I imagine it’s less reliable for Green supporters than for the larger parties.

You could argue that it’s a poll of polls, so presumably the effective sample size is bigger, and the significance somewhat better than 9%.  For instance, if the effect was equal to a new poll with a sample size five times the original survey, then you would repeat the calculation above for at least 300 people in a sample 5000 declaring Green support.

But the article gives no useful information at all!  Poor show all round…


80/20: An ‘iron law’ of religious giving?

May 11, 2014


In his excellent book ‘Through the Eye of a Needle: Wealth, the Fall of Rome, and the Making of Christianity in the West, 350-550 AD’, Peter Brown says:

Hence the double aspect of the Christianity that had emerged in the Latin West in the crucial period between 370 and 400 AD. A new institution had become prominent in a society that knew what it was to give. Its upper classes had always valued the exhilarating “rush” associated with giving to an esteemed public cause, of which civic euergetism was the most spectacular and the most certain of acclaim. Great opportunities for giving now opened up in the relatively new Christian churches . But how would these traditions of highly personalized display impinge on a group that had hitherto been notable for its capacity for collective action? This was a real dilemma. Ideally, giving was open to all Christians. But this was a myth. It was no more true in the fourth century than was the nineteenth-century myth that the great Catholic Cathedral of Saint Patrick’s in Manhattan was built “through the pennies of Irish chambermaids.” (In reality, the first building of Saint Patrick’s was made possible through a campaign by which the bishop approached a hundred leading figures for $ 1,000 each.)   Furthermore, what sociologists of modern religion call “skewness” appears to be an iron law in religious giving: 20 percent of the congregation usually contribute 80 percent of the funds of the religious community that they support.

This looks far too pat. The 80/20 factoid is derived from the Pareto distribution, which reflects wealth distribution in ‘modern’ societies. It will have been far more skewed in pre-modern settings, such as the Late Roman Empire. In addition, skewness is skewness and needs no support from ‘sociologists of modern religion’.

Brown quotes a paper by Iannaccone [Skewness Explained: A Rational Choice Model of Religious Giving, Laurence R. Iannaccone Journal for the Scientific Study of Religion, Vol. 36, No. 2 (Jun., 1997), pp. 141-157.] which says:

Whereas the rest may apply to all aspects of religious participation, skewness is truly a distinctive feature of giving. Professional fund-raisers consider skewness “a bedrock rule of thumb” relevant to virtuallyevery setting, large or small, religious and nonreligious (Hoge 1994: 103). In practice, it means that 20% of a congregation’s members provide more than 80% of the giving. Inevitably, these people also exercise substantial power, for who can afford to alienate the few families that keep the church afloat?

Well, that’s much more sensible, though having correctly stated that this skewness is a straightforward consequence of the properties of the statistical distributions that might plausibly be involved, Iannaccone then drags the regrettable and moth-eaten Chicago School stuff out of the store-cupboard and proceeds to make up lots of parameters, just for fun probably…but I don’t think you can dispute that the giving is at least as skewed as 80/20.

Try Books! read in 2013

December 19, 2013
Median Best Worst
Mrs Palfrey At The Claremont 9 6
A Month in the Country 8.5 3
The hare with amber eyes 8.25 1 1
Things Fall Apart 8.25 2 1
The Dark Room: A Novel 8
Dubliners 7 1 1
The Heart is a Lonely Hunter 7
The Garden of Evening Mists 6.75 1
Beyond Black 6.25 1 2
These is My Words 6 1
A Long Way Down 6 1
Hunger 4 5

The table shows the books read by Try Books! in 2013 and their median scores, along with the number of times someone gave it their highest or lowest rating for the year (remember ties!)  Previous analysis indicates that the median is a good enough indicator for our purposes.

Elizabeth Taylor, novelist (1912-1975)

Elizabeth Taylor, novelist (1912-1975)

Elizabeth Taylor and Mrs Palfrey At The Claremont are the clear winners here, while Knut Hamsun and Hunger were rather less successful.

Hunger Other
Palfrey Ali, Judy, Suzannah Howard, Stephanie
Other Christine, Linda Aruni, Dick, Jo, Jocelyn

The table above classifies people according to whether  Mrs Palfrey and Hunger were indeed their best and worst books respectively. As ever, this was complicated by not everyone having read (scored) every book, but Ali, Judy and Suzannah seem to be safely established as representatives of mainstream opinion.

It might be that the books which appear in both ‘Best’ and ‘Worst’ categories are good for provoking discussion, but in quite a few cases the dissenting opinion was actually submitted by email, and The Heart is a Lonely Hunter was very fruitful in provoking discussion, which you would hardly glean from the table.

Comparing this with the previous results suggests that the most popular books are those that people will engage with simply because they deal with human experience, even though as in If This Is A Man  they don’t have to be fiction, while genre fiction (These Is My Words) and high-concept productions (A Long Way Down) don’t do very well.

University College Oxford Roll of Donors

November 30, 2013
Percentage making a donation against year of matriculation

Percentage making a donation against year of matriculation

Univ have sent us a Newsletter & Roll of Donors, presumably meaning to prod us into action of some kind.  Since this brochure contains a listing of the percentage of surviving ex-students who gave a donation in the academic year 2012/13 by year of matriculation, we suppose we are meant to do a regression.

That gives us the results plotted above, or the model is:

Call: lm(formula = PERCENT ~ YEAR, data = univ)

Residuals: Min       1Q   Median       3Q      Max-16.5684  -3.3648  -0.8145   4.1408  11.8245 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1368.28941  112.48351   12.16 2.85e-16 *** YEAR          -0.67092    0.05668  -11.84 7.65e-16 *** Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 5.784 on 48 degrees of freedom Multiple R-squared: 0.7448,     Adjusted R-squared: 0.7395 F-statistic: 140.1 on 1 and 48 DF,  p-value: 7.648e-16

I don’t really feel like trying a quadratic term here…

Presumably the simplest explanation is that the older year groups just have more disposable income and so they give more to their old college. Generally speaking, charitable giving is more prevalent among older people even among normal human beings, never mind this population, characterised as it is by excessive personal wealth (see for instance here).  Of course, and especially for this population, a great deal of the value is contained in a few high-value donations, so participation rate doesn’t necessarily tell you very much.

Anyway, we could also ask about the performance of particular years against the model prediction.  The figure below shows residuals against year. fig2We see that 1980 is quite significantly–indeed 1 standard deviation–below the predicted value, while 1982 is less frugal.

A friend writes:  I was looking for a pattern in the residuals for significant anniversaries e.g. 25th since graduating, but need more data. It would be interesting to look at the data for previous years to see percentage donating by year since matriculation for different cohorts. Is the class of 82 consistently more generous or is it that in the 20th year after graduating every cohort donates more?

In the absence of more data, we can look at the autocorrelations, as below:

fig3Clearly there is nothing significant here (though we may question the wisdom of going up to lag = 20 with only 50 data points).

Logically, one might expect some effect connected with ‘gaudies’, where those who matriculated in a given period are invited back to the college to engage in donating money and in self-congratulation.  So there might be a quasi-significant-anniversary effect, since recent years (like 1980) tend to get invited back every ten years or so, but this would not show up in the autocorrelations.

Then there’s the question of displacement: are the same people giving what they would have given anyway but at a different time?