Posts Tagged ‘statistics’

Teaching important languages

September 26, 2017
bricolong

British Council ordering

As we have seen, the British Council report Languages for the Future gives a priority ordering of languages as above.

The question then is how this matches up with what is actually taught.  A further British Council report Language Trends 2014 gives the percentage of schools in the state and independent sector where particular languages are taught.

stateschoolang

Languages taught in state schools

indscholang

Languages taught in independent schools

We see that there is no particular sign of Arabic becoming widespread, nor even of Chinese doing so (though that is more common). We presume that ‘Arabic’ is Modern Standard Arabic in all cases and that ‘Chinese’ is Mandarin unless otherwise stated.

We can also look at the numbers of people studying for examinations at various levels.

exam_table

Numbers studying for various examinations

Here, the school examination numbers refer to the numbers of entries as given on the JCQ site while the ‘Degree’ figures refer to first-year full-time students doing first degrees, as on the HESA site.  Here, in the ‘Degree’ column, we have assigned all of ‘Russian and East European Studies’ to Russian and all of ‘Modern Middle Eastern Studies’ to Arabic.

We can try putting these various activities on a common footing by giving them a weighting based on the amount of time in years they take up (taking account of subsidiary languages/subjects for the Degree column).

weightings

Table of weightings

We would then like to compare the input for various languages with their importance according to the British Council report.  There is no obvious common unit of measurement between these two things, so it seems safest just to compare the rank of the languages according to these two measures. The table below refers.

comparison

Comparison of importance according to British Council with resource input, by ranks

On this crude basis, Arabic (especially), Portuguese and Turkish are under-provided, while Polish (heritage speakers) and the traditionally-taught languages French and German may be relatively over-provided, along with Italian.

But if you were just interested in studying languages and wanted to know which ones would be most profitable, the obvious course would be to do Spanish at school–which seems quite possible these days–and then Spanish & Portuguese at university..

Which foreign languages are most useful?

September 25, 2017

booktranslations_v1

The picture above (from the WEF site) gives one answer in terms of the most influential languages as reflected by book translations.  There seem to be definite nodes at English, French and Russian, then less clear ones at Dutch, German and Chinese.  But it is hard to give an exact interpretation of this figure, or indeed the other ones displayed at the same place.

Otherwise, the Internet reveals a number of attempts at weighting-and-ranking:

BRITISH COUNCIL

A British Council report on Languages for the Future dating from 2013 takes account of 1. current UK export trade 2. the language needs of UK business 3. UK government trade priorities 4. emerging high growth markets 5. diplomatic and security priorities 6. the public’s language interests 7. outward visitor destinations 8. UK government’s International Education Strategy priorities 9. levels of English proficiency in other countries 10. the prevalence of different languages on the internet and their table of English proficiency by country is quite interesting:

engprof

From the point of view of importance to Britain, they give a ranking of:

britcosum

This may well be the answer from the British perspective!

WORLD ECONOMIC FORUM

A further study (2016) from the WEF considers languages under the criteria of 1. Geography: The ability to travel 2. Economy: The ability to participate in an economy 3. Communication: The ability to engage in dialogue 4. Knowledge and media: The ability to consume knowledge and media 5. Diplomacy: The ability to engage in international relations and comes up with the following results:

powerlang

LIST 25

List25 gives a list of the world’s 25 most influential languages as of 2014, where the rankings are not just done according to how many people speak the language. Of course this is taken into consideration but so is how many people speak it as a second language, its impact on global commerce and trade, and its lingua franca status around the world.

They have some nice maps, for instance for French:

french

and come up with a ranking of 1. English 2. French 3. Spanish 4. Arabic 5. Mandarin 6. Russian 7. Portuguese 8. German 9. Japanese 10. Hindustani (Hindi/Urdu) 11.  Malay…

CBI

Meanwhile, the CBI Skills Survey for 2017 gives the following:

skillsurvey

So, English is clearly the most important/useful/influential language of all times and peoples, and we will set it aside in what follows.

Using the British Council rankings as a starting point, we can summarise the results as below, where languages outside the British Council list are ranked by number of occurrences and then average ranking where listed:

britco1

Or we can apply the same procedure for all of the languages that occur more than once without privileging the British Council rankings, so that we rank first by number of occurrences and then average ranking where listed:

britco2

So, the world’s second most important language might be French, Spanish or Mandarin.  In fact, the top 5 for the British Council and the combined ranking have the same laguages, if not quite in the same order:  French, Spanish and German (the languages most widely taught in British schools) together with Mandarin and Arabic (rarer and more challenging, one might say).

More about lost sleep

January 3, 2017

sleep1

Following the earlier study,  our client has now produced some further data covering three weeks’ holiday, with results as shown above.  We suspect that the slightly anomalous results for Sunday are indeed due to there only being three observations.  The mean amount of sleep per night is 6:32 for the Work period and 7:08 for the Holiday period.

So there is some difference, but hardly enough to say that the client is compelled to give up work.  He will have to make his own decision, which is rarely a popular piece of advice…

 

Were it not that I have bad dreams…

December 1, 2016

sleep

A middle-aged bureaucrat has collected data on his sleep pattern for 29 weeks or so. He works 4 days a week (not Wednesdays) and wants to know whether these data should impel him towards early retirement.  We can see from the above that the prospect of going back to work on Monday and Thursday causes some lack of sleep.

Table of sleep data

  Mon Tue Wed Thu Fri Sat Sun
Mean 06:25 07:10 05:12 06:52 07:40 07:07 05:00
N 29 29 29 29 29 28 28
Min 01:00 03:00 02:00 04:00 05:00 05:00 00:00
Max 08:30 09:00 07:30 09:00 10:00 12:00 07:00
Q1 06:00 06:30 05:00 06:00 06:30 06:52 04:30
Q2 06:30 07:30 05:30 07:00 07:30 07:15 05:45
Q3 07:30 08:00 06:00 07:30 08:00 08:07 06:30
StDev 01:27 01:29 01:19 01:07 01:07 01:26 01:35

The overall mean is 06:29, while results here indicate an average of 06:50 for those aged 40-55.  The difference is hardly large, but in the other hand the justified expectation of sleeping poorly two nights a week is not something one would wish to continue indefinitely.

We presume that there are essentially two possible explanations for the smaller amount of sleep on Sunday and Tuesday nights:  apprehension and having to get up early/change in routine.  If it was purely a case of the latter, we would expect the effect to be greater on a Sunday night since there are two days of changed routine to account for as against one on a Wednesday night.  But in fact there is no significant difference between the means for Wednesday and Sunday, so we presume apprehension is playing a role here.

Our preliminary recommendation would be for the client to collect the same data for a substantial period of leave so as to establish how far the results above deviate from the natural pattern.  And here it is!

Try Books! read in 2014

February 20, 2015
MEDIAN BEST WORST
Stoner 9 8 0
The Road 8.5 3 0
We Need New Names 8 0 0
This Boy 8 2 1
Under The Skin 8 1 1
Gone Girl 7 1 1
Old Filth 7 1 0
A Winter in the Hills 7 0 2
The Children Act 7 0 0
Telling Liddy 6.5 0 0
Almost English 6 0 3
Seizure 5 0 5

The table shows the books read by Try Books! in 2014 and their median scores, along with the number of times someone gave it their highest or lowest rating for the year (remember ties!)

stoner

John Williams and Stoner are the clear winners here, while Erica Wagner and Seizure were rather less successful.  But it was A Winter in the Hills  that caused the real excitement, of course.

Seizure Other
Stoner Howard, Jo Aruni, Christine, Dick, Heather, Judy, Linda
Other Jocelyn, Stephanie, Suzannah Vicky

The table above classifies people according to whether Stoner and Seizure were indeed their best and worst books respectively. As ever, this was complicated by not everyone having read (scored) every book, but Howard and Jo seem to be representing the mainstream with Vicky as the rebel.

What does ‘The Information Capital’ have to do with South London?

January 3, 2015

infocap

This book presents 100 maps and graphics that will change the way you view the city.  Leaving aside Oliver Uberti’s…sketches…of some of the animals to be found in London Zoo, let’s have a look at some data and see what it means for South London.

South London--City Of Dreadful Night

South London–City Of Dreadful Night

The illustration above shows the locations where pictures posted on Flickr were taken.  Not South London it seems, apart from the Elephant, Walworth Road and Greenwich Park.  South Londoners are condemned to perpetual darkness, starved of the light of exposure on Flickr…

Concentrations of crime

Concentrations of violent crime

Here we see violent crime hotspots, which seem to pick out railway/Underground stations with unerring accuracy.  3 is Brixton, 8 the Elephant, 9 Peckham, 10 Croydon, 18 Woolwich.

IMG_1502

Deprivation

Above we see deprivation, coloured according to the scheme below:

IMG_1503

So, Lewisham varies between ‘Most deprived’ red and a yellow which has no label but probably means something neutral. If the green was instead blue on this map, one might begin to suspect some hidden agenda…

 

How we get to work...

How we get to work…

Here we have the most popular modes of transport for getting to work by home location, coded according to the scheme below.

IMG_1505

Cor, that’s found me out–when I lived in Peckham I used to get the bus to work, but now I get the train. Are those light blue types really driving to work or to the station say?

Occupational tree (or graph)

Occupational tree (or graph)

Now this would be really interesting if it was explained properly.  The idea is that wards are grouped together according to their concentrations of different job types; but we don’t learn what the distances or branching or angles mean.  My earliest memories are of Charlton 50 years ago and I’ve made it as far as Crofton Park, or travelled 3 nodes on this map.  Clearly I’ve not made very much progress at all, but it would be nice to know the details of my lack of achievement.

Cohabiting in Peckham

Cohabiting in Peckham

As for that love and romance thing, it is suggested that cohabiting is prevalent in Peckham (above) separation is noteworthy in New Cross (below).

Separated in New Cross

Separated in New Cross

Finally, we return to dodgy statistics on obesity.  The figure below shows obesity…

Obesity

Obesity

or rather, the boroughs expanded or contracted to reflect the percentage of 10-11 year olds there classified obese in 2012-13.  Which is a slightly strange measure to use–presumably those were the figures closest to hand.

So, Sarf London: a land of obesity and irregular liaisons, subsisting in obscurity (apart from Greenwich Park during the Olympics), lit only by the odd flare of crime…And no Tube either…

 

A shameful story about obesity

November 1, 2014

The figure above caused some animated discussion on Brockley Central, with many recondite hypotheses being advanced to explain the seeming kinship between South London and unimaginably remote parts of the North.

The first thing to do is to work out what this data is and what it might be telling us.  It certainly looks like Table 7.3, Finished Admission Episodes with a primary diagnosis of obesity, by Government Office Region (GOR) of residence, Strategic Health Authority (SHA) of residence, Primary Care Trust (PCT) of residence and gender, 2012/13 from the data here.  So what are these episodes about?  There are 10,957 of them, and there are 8,024 in Table 7.8 Finished Consultant Episodes with a primary diagnosis of obesity and a main or secondary procedure of ‘Bariatric Surgery’ by Government Office Region (GOR) of residence, Strategic Health Authority (SHA)  of residence, Primary Care Trust (PCT) of residence and gender, 2012/13.  While ‘Admission Episodes’ and ‘Consultant Episodes’ aren’t quite the same, it’s clear that T7.3 is largely about ‘bariatric surgery’, which includes stomach stapling, gastric bypasses and sleeve gastrectomy.  These procedures have traditionally had a fairly marginal place in the NHS, so we suspect that differences in the willingness to perform or to pay for these procedures may be the operative factor here.

There is data specifically on obesity here.  That gives a ‘Top 10’ as follows, which is rather different from the list we started with above–note that the sample for City of London is probably too small to draw definite conclusions.

TABLE OF TOP 10 ENGLISH LOCAL AUTHORITIES FOR OBESITY

Area Name Weighted Sample % Obese
Halton 309 35.2%
Barnsley 609 34.4%
South Holland 231 32.5%
Mansfield 274 32.4%
Telford and Wrekin 401 32.3%
North Lincolnshire 424 32.0%
Barking and Dagenham 409 31.6%
East Lindsey 363 31.6%
Thurrock 379 31.4%
City of London 20 31.4%

While not all of the areas in the two datasets are identical, we can make a reasonable job of combining them for London as below.

CHART OF OBESITY ADMISSIONS AGAINST PREVALENCE

CHART

Any relationship between the two is rather slight, and it does seem that Lambeth, Southwark and Lewisham have high rates of admission for their prevalence of obesity, rather than high obesity as such. It seems reasonable to conclude that we are seeing wide variations in the propensity to subject obesity to hospital treatments, rather than in obesity as such.

A Consensus Reading List for Operational Research

August 13, 2014
140813table

Try double-clicking if this is too small!

In another attempt at drawing up a reading list for operational research, I did a Google search on “reading list” AND (“operational research” OR “operations research” OR “management science”).  Confining myself to what seemed to be relevant cases, I got contributions from the following institutions:  Southampton, Derby, Strathclyde, Edinburgh, Cass Business School, Aston, Imperial College, Leicester, Wisconsin, Dublin City University, Nottingham, George Mason University, Sheffield, London University International Programme, Napier, Leiden.  The table above shows those items that occurred more than once, except I omitted a book on Linear Programming in MATLAB as being of no interest outside a teaching context.

Well, it all depends on what you’re teaching of course.  And the level of bibliographic detail depends on the source.  Rosenhead et al, Chatfield and Pidd (2004) should be of some use to practitioners anyway.

Massive Green Party Breakthrough In Prospect

August 9, 2014

greencarter

A Green Party activist writes as follows before setting off for a month’s continental holiday:

The Indy has an article today that bigs up the Green Party: it says the latest poll (actually a poll of polls so should be, very marginally, more accurate than just one?) shows an increase in the Green vote from – wait for it – 5 per cent to 6 per cent. I’m not getting excited yet because I imagine the confidence interval on the two figures would dwarf any supposed increase, but do you have any idea what the confidence intervals on these figures are likely to be? A friend here, who knows I’m likely to be standing for the Greens in Putney in the elections for the Westminster Laughing House in May 2015, sent me the link in some excitement, but I fear it’s utter bollocks. Any thoughts? My feeling is that any supposed “increase” is completely illusory, but it would be good to have a more informed view.  The article can be found here.

Of course the article fails to say anything about the polls or how many of them there are or how they are combined.

Opinion polls tend to have a sample size of about 1000. We want to know whether an apparent increase in Green support from 5% to 6% is significant. So we ask what the probability of getting at least 60 people in a sample of 1000 declaring Green support is if the underlying propensity to vote Green is really 0.05. From the properties of the binomial distribution (see here  for instance) this comes out at 0.087, or about 9%. So there might be a real effect, if you accept a 10% significance level and if the sampling in the poll is unbiased–I imagine it’s less reliable for Green supporters than for the larger parties.

You could argue that it’s a poll of polls, so presumably the effective sample size is bigger, and the significance somewhat better than 9%.  For instance, if the effect was equal to a new poll with a sample size five times the original survey, then you would repeat the calculation above for at least 300 people in a sample 5000 declaring Green support.

But the article gives no useful information at all!  Poor show all round…

 

80/20: An ‘iron law’ of religious giving?

May 11, 2014

eye-of-a-needle

In his excellent book ‘Through the Eye of a Needle: Wealth, the Fall of Rome, and the Making of Christianity in the West, 350-550 AD’, Peter Brown says:

Hence the double aspect of the Christianity that had emerged in the Latin West in the crucial period between 370 and 400 AD. A new institution had become prominent in a society that knew what it was to give. Its upper classes had always valued the exhilarating “rush” associated with giving to an esteemed public cause, of which civic euergetism was the most spectacular and the most certain of acclaim. Great opportunities for giving now opened up in the relatively new Christian churches . But how would these traditions of highly personalized display impinge on a group that had hitherto been notable for its capacity for collective action? This was a real dilemma. Ideally, giving was open to all Christians. But this was a myth. It was no more true in the fourth century than was the nineteenth-century myth that the great Catholic Cathedral of Saint Patrick’s in Manhattan was built “through the pennies of Irish chambermaids.” (In reality, the first building of Saint Patrick’s was made possible through a campaign by which the bishop approached a hundred leading figures for $ 1,000 each.)   Furthermore, what sociologists of modern religion call “skewness” appears to be an iron law in religious giving: 20 percent of the congregation usually contribute 80 percent of the funds of the religious community that they support.

This looks far too pat. The 80/20 factoid is derived from the Pareto distribution, which reflects wealth distribution in ‘modern’ societies. It will have been far more skewed in pre-modern settings, such as the Late Roman Empire. In addition, skewness is skewness and needs no support from ‘sociologists of modern religion’.

Brown quotes a paper by Iannaccone [Skewness Explained: A Rational Choice Model of Religious Giving, Laurence R. Iannaccone Journal for the Scientific Study of Religion, Vol. 36, No. 2 (Jun., 1997), pp. 141-157.] which says:

Whereas the rest may apply to all aspects of religious participation, skewness is truly a distinctive feature of giving. Professional fund-raisers consider skewness “a bedrock rule of thumb” relevant to virtuallyevery setting, large or small, religious and nonreligious (Hoge 1994: 103). In practice, it means that 20% of a congregation’s members provide more than 80% of the giving. Inevitably, these people also exercise substantial power, for who can afford to alienate the few families that keep the church afloat?

Well, that’s much more sensible, though having correctly stated that this skewness is a straightforward consequence of the properties of the statistical distributions that might plausibly be involved, Iannaccone then drags the regrettable and moth-eaten Chicago School stuff out of the store-cupboard and proceeds to make up lots of parameters, just for fun probably…but I don’t think you can dispute that the giving is at least as skewed as 80/20.