Posts Tagged ‘data’

What is this nonsense?

November 19, 2017


Following our earlier discussion, the BBC has published an article on the financial benefits of a university education, with some results as shown above.  But it’s difficult to know what to make of it, since they don’t say anything about data or methodology or (perhaps more realistically) give a link to where these questions might be covered.

Questions which are not answered include what data are they using, what years are they covering, what is the definition of these subjects, what students are they covering.

What data are they using?

The real question is income data.  If it comes from self-report, then you will get low coverage and also inaccurate answers.  If it’s HMRC data, then you might also get some regrettable inaccuracies and omissions and you will miss foreign students and UK students who went abroad after graduation.  There’s also a question about what coverage you get of UK students who don’t take out student loans.  The work is ascribed to Dr Jack Britton from the IFS and there is a recent IFS study that covers similar ground.  Perhaps it’s the same data…the same years…whatever.

What years are they covering?

Search me.

What is the definition of these subjects?

It is hard (for me at least) to work out the coverage of Medicine & Dentistry, Nursing, and All Medicine.  I suppose that All Medicine does not enter into combined, but you never know.  Then you could ask whether Languages is just Modern Foreign Languages, or does it include Classics, Welsh, Irish, Linguistics…and so on…

What students are they covering?

At a guess, it might be UK-based students who have done first degrees at UK universities and who can be followed up.  But then in some subject areas many of them will have done higher degrees and a PhD would probably depress earnings at the 5-year mark.


Finally, the figure above is interesting for its inclusion of the Open University, whose students may well be different on entry and retired on exit…


The worth of foreign languages in Paris

October 14, 2017

Some mandarin-related data from Paris

So, we extend our previous study in London to consider and Paris. The table below shows results tabulated as previously for London


Jobs in Paris involving foreign languages

So, there were 1364 postings mentioning ‘polonais’ with a total estimated yearly salary of 29.35 million Euro and an average salary of 29,185 Euro.  ‘Overall here’ refers to the 12 language names listed while ‘Overall jobs’ is all the postings on the site.

We can also express this in terms of percentages referred to ‘overall jobs’, as below:


Data from Paris in percentage terms

Here, we see that 0.52% of the overall job postings mentioned ‘polonais’, and they had an average salary that was 13.1% higher than for the ‘Overall jobs’.  35% of postings appeared to mention a foreign language and for 30% that language was English.  We can compare this with data from London in the same format:


Data from London in percentage terms

There is a great difference in the worth of Polish (probably genuine) and of Turkish–probably due to small numbers, and you get very different results with [la langue] turque.  Italian and Japanese subtract value in both capitals, while Dutch, Spanish and Portuguese add value in the two of them.

Overall, the language-related jobs have a salary premium of 4.6% in Paris as against 17.3% in London.

The clearest conclusions are:

i)  there are far more jobs possibly requiring a foreign language (English!) in Paris than in London;

ii)  there seems to be a far higher premium for foreign languages in London than in Paris.





Indeed accounting for the value of languages

October 6, 2017

Data as at 2240 on 6 October

So we continue our previous attempts to find some value in foreign languages with the help of Indeed.  We say that the typical undifferentiated graduate may well end up as an accountant, and ask what value may be added if they know a foreign language.  This approach also has the advantage that ‘accountant’ actually means something (unlike ‘consultant’) and it means something outside the UK (unlike ‘solicitor’).


Accountant salaries, with and without languages

The table above shows results for numbers of jobs and average pay, where ‘German’ means postings that mention both ‘Accountant’ and ‘German’ and so on.

We see that:

i) a rather small proportion of the accountant jobs mention languages (about 4% for the languages mentioned here);

ii)  for some languages–Arabic, Turkish–accountant jobs are scarce;

iii)  as previously,  Dutch, German and Spanish are worth money;

iv)  as previously,  Polish and Japanese are not worth money.

How popular is Russian in the UK?

October 2, 2017


The table above gives the numbers of people studying for examinations at various levels.  The school examination numbers refer to the numbers of entries as given on the JCQ site while the ‘Degree’ figures refer to first-year full-time students doing first degrees, as on the HESA site.  Here, in the ‘Degree’ column, we have assigned all of ‘Russian and East European Studies’ to Russian and all of ‘Modern Middle Eastern Studies’ to Arabic.

The table below gives the same data expressed in terms of ranks.


We see that Russian ranks between 5th and 9th, depending on the particular stage we are looking at.

From a slightly different angle,  British Council report Language Trends 2014 gives the percentage of schools in the state and independent sector where particular languages are taught at any level (including non-examination/extra-curricular) as below:


Taking account of proportion of the total school population in independent schools, we might estimate that about 7% of children attend schools with some provision for Russian.

We can then ask what the position Russian ought to hold. A British Council report on Languages for the Future dating from 2013 gives as below in  terms of importance for Britain:


So Russian may be about as popular as it ought to be.  We will not venture an opinion as to whether the same holds for the popularity of Russia.

Important languages, Indeed!

September 30, 2017

Data for Arabic as at 2240 on 30/09/2017

We try another approach to assessing the relative value of modern foreign languages.  The Indeed site allows one to search for job postings according to particular keywords in a particular location and gives a summary in terms of numbers and estimated salaries as illustrated above.

So we can compare these results postings containing the names of various languages such as ‘Arabic’, ‘German’ and so on in London, using in the first case key languages identified by the British Council as we discussed earlier.  This gives results as below, ordered in terms of average salary, which is just the total estimated salary associated with relevant postings divided by the number of postings.


In this table, ‘Overall here’ combines the 12 languages listed while ‘Overall jobs’ reports on all the jobs returned for London at the time of the study.

There are many interesting points here–there does seem to be some value to Dutch, as pointed out by the British Council.  The results for Mandarin are as ever clouded by what you call the language–‘Chinese’ gives a healthier average salary (£27,395) and rather fewer postings (1879).  The low average salary for Polish is presumably down to the kind of work Poles do in London while ‘Italian’ may be referring to restaurants rather than the language, thus depressing the average salary assigned to the term.  The explanation for Japanese might be that all professional-level jobs are filled by native speakers recruited from Japan, leaving only low-paid roles for others.

In general, we see that about 9% of postings mention one of the British Council’s priority languages, and this will overestimate the number of posts.  If as often happens an advert mentions ‘knowledge of French, German, Italian, Spanish or Portuguese) then it will get counted 5 times.  While there are of course other foreign languages, the representation of foreign languages in the London jobs market can be no more than 10%.

We can tabulate the overall results here with those derived from some other search terms as below:


The two points here are that the intuitive ordering of subjects and academic qualifications is reproduced and that languages seem to add less value than an unspecified degree.

Are languages important?

September 27, 2017

Never mind which languages, the question is are any foreign languages important in the English-speaking world?  After all, if you live in some non-Anglophone country you probably need English both for foreign travel and for doing business with the rest of the world, while for an English-speaker the only real need is when you have to sell stuff to foreigners.  And that’s stuff as in stuff, since the English language may be part of the attraction of services like education.

The CBI Skills Survey for 2017 suggests that employers are not satisfied with graduates’ foreign language skills:


but also do not regard them as particularly important:


…unless of course they come under ‘Degree Subject’…

Available surveys do not really show any particular premium for graduates in foreign languages.  A survey with rather unclear methodology looks at average [mean] starting graduate salaries as at October 2016, with some results we have summarised:


So it appears that starting salaries for what appear to be language-based degrees are a little above the average for humanities and a little below the overall average.  By way of comparison, the highest and lowest salaries are shown below:


A more systematic exercise (but with less detailed subject classifications) published by DfE gives median earnings in 2014/15 for those graduating in 2008/09.  As before, we would be hard-pressed to claim a particular premium for Languages:


Finally, what looks like a very thorough study by the IFS is more interested in various factors such as socio-economic background, prior attainment and institution status but gives some rather discrepant information for males and females:



So ‘Lang Lit’ (which must be basically English in terms of numbers) looks like a pretty good deal for women but not for men.

We conclude that there is no real excess demand for graduates in modern foreign languages demonstrated by either employer preferences or salaries achieved…

Teaching important languages

September 26, 2017

British Council ordering

As we have seen, the British Council report Languages for the Future gives a priority ordering of languages as above.

The question then is how this matches up with what is actually taught.  A further British Council report Language Trends 2014 gives the percentage of schools in the state and independent sector where particular languages are taught.


Languages taught in state schools


Languages taught in independent schools

We see that there is no particular sign of Arabic becoming widespread, nor even of Chinese doing so(though that is more common). We presume that ‘Arabic’ is Modern Standard Arabic in all cases and that ‘Chinese’ is Mandarin unless otherwise stated.

We can also look at the numbers of people studying for examinations at various levels.


Numbers studying for various examinations

Here, the school examination numbers refer to the numbers of entries as given on the JCQ site while the ‘Degree’ figures refer to first-year full-time students doing first degrees, as on the HESA site.  Here, in the ‘Degree’ column, we have assigned all of ‘Russian and East European Studies’ to Russian and all of ‘Modern Middle Eastern Studies’ to Arabic.

We can try putting these various activities on a common footing by giving them a weighting based on the amount of time in years they take up (taking account of subsidiary languages/subjects for the Degree column).


Table of weightings

We would then like to compare the input for various languages with their importance according to the British Council report.  There is no obvious common unit of measurement between these two things, so it seems safest just to compare the rank of the languages according to these two measures. The table below refers.


Comparison of importance according to British Council with resource input, by ranks

On this crude basis, Arabic (especially), Portuguese and Turkish are under-provided, while Polish (heritage speakers) and the traditionally-taught languages French and German may be relatively over-provided, along with Italian.

But if you were just interested in studying languages and wanted to know which ones would be most profitable, the obvious course would be to do Spanish at school–which seems quite possible these days–and then Spanish & Portuguese at university..

Which foreign languages are most useful?

September 25, 2017


The picture above (from the WEF site) gives one answer in terms of the most influential languages as reflected by book translations.  There seem to be definite nodes at English, French and Russian, then less clear ones at Dutch, German and Chinese.  But it is hard to give an exact interpretation of this figure, or indeed the other ones displayed at the same place.

Otherwise, the Internet reveals a number of attempts at weighting-and-ranking:


A British Council report on Languages for the Future dating from 2013 takes account of 1. current UK export trade 2. the language needs of UK business 3. UK government trade priorities 4. emerging high growth markets 5. diplomatic and security priorities 6. the public’s language interests 7. outward visitor destinations 8. UK government’s International Education Strategy priorities 9. levels of English proficiency in other countries 10. the prevalence of different languages on the internet and their table of English proficiency by country is quite interesting:


From the point of view of importance to Britain, they give a ranking of:


This may well be the answer from the British perspective!


A further study (2016) from the WEF considers languages under the criteria of 1. Geography: The ability to travel 2. Economy: The ability to participate in an economy 3. Communication: The ability to engage in dialogue 4. Knowledge and media: The ability to consume knowledge and media 5. Diplomacy: The ability to engage in international relations and comes up with the following results:



List25 gives a list of the world’s 25 most influential languages as of 2014, where the rankings are not just done according to how many people speak the language. Of course this is taken into consideration but so is how many people speak it as a second language, its impact on global commerce and trade, and its lingua franca status around the world.

They have some nice maps, for instance for French:


and come up with a ranking of 1. English 2. French 3. Spanish 4. Arabic 5. Mandarin 6. Russian 7. Portuguese 8. German 9. Japanese 10. Hindustani (Hindi/Urdu) 11.  Malay…


Meanwhile, the CBI Skills Survey for 2017 gives the following:


So, English is clearly the most important/useful/influential language of all times and peoples, and we will set it aside in what follows.

Using the British Council rankings as a starting point, we can summarise the results as below, where languages outside the British Council list are ranked by number of occurrences and then average ranking where listed:


Or we can apply the same procedure for all of the languages that occur more than once without privileging the British Council rankings, so that we rank first by number of occurrences and then average ranking where listed:


So, the world’s second most important language might be French, Spanish or Mandarin.  In fact, the top 5 for the British Council and the combined ranking have the same laguages, if not quite in the same order:  French, Spanish and German (the languages most widely taught in British schools) together with Mandarin and Arabic (rarer and more challenging, one might say).

Some interesting data

February 7, 2016

We recently came across some data (about a recruitment exercise, as it happens) as below in Excel.


If we look at the second row, then 28 + 28 + 18 + 13 + 24 + 20 = 131, so that’s all right. If we look at the third row for instance, it’s a little more interesting–2.5 billion + 2.5 billon + 1.6 billion + 1.6 billion + 2.2 billion + 2.2 billion = 124 smacks of anti-climax.

It’s easy enough to correct by hand: 1,733,333,333 is ‘obviously’ 17.33333333 by comparison with the other figures in the same column. It’s less clear how you could corrupt the data format in this way even if you wanted to.

You can’t even do it by hand in Excel: paste value, remove ‘.’ to get integer, then format the thousands with commas, because there are more decimal places stored and you end up with (say)173 billion rather than 1.73 bill.

From the name of the person responsible for the data, it looks like conversion to/from a Polish version of Excel might have something to do with it…In Poland, they use a space as the divider for thousands, which you can see might cause difficulties upon conversion, but I don’t know whether or how that is implemented in Excel.  In other parts of the spreadsheet you certainly have (say) 359,6 for 359.6, so ending up with a comma for the decimal point and for the thousands separator takes some doing.


It is very odd indeed isn’t it? If we make your ‘correction’ of 1,733,333,333 = 17.333… then it does fix row 3 but aren’t rows 1 and 4 still out by a factor of 10? It looks as if something decided to put just one figure in front of the first comma (or decimal point) in the recurring decimals, no matter where the decimal point should have been.
You can’t even do it by hand in Excel: paste value, remove ‘.’ to get integer, then format the thousands with commas, because there are more decimal places stored and you end up with (say)173 billion rather than 1.73 bill. In Poland, they use a space as the divider for thousands, which you can see might cause difficulties upon conversion, but I don’t know whether or how that is implemented in Excel.

When you type 17,3333 and loads of 3s into UK excel (as if you were converting from Polish perhaps) you get a big integer value with the commas (to mark 1000s) put in the right place marking off in threes from the right hand end. Then if we pretend we we put in all the threes possible, we would get a big value with all the digits marked off with commas in threes from the right hand end and the first comma might come after the first digit depending how many digits excel can hold. If you had a number that filled all those digits, and it got bigger by being added to something else, then excel might have to lose the extra right hand digits and the first comma would still always be in the same place because of being marked off three at a time from the right hand end. I think we almost have it but I haven’t expressed it very well.

So all the recurring decimals in your sheet were simply converted to huge integer numbers and, because of the number of digits Excel can hold, they all ended up with a comma after the first digit and then another after each set of three.

Yes that would work but if I try doing it (starting from 52/3, which is clearly what we gave here), I tend to end up with 173 billion–it might be possible to change the number of zeroes Excel stores I suppose…

But it wasn’t put in as 52/3, it was transfered as 17,33333333 (Polish) and converted to 1,733,333,333 etc precisely because you can’t change the number of digits Excel can store. It must be of the form 3n+1, I think.

Sorry if I have become a bit obsessive. It is a lovely problem and much nicer to play with than my PhD! I will go and try and do some writing now though!

And really I don’t think this is how Excel stores integers now – it has 15 digit precision and then a power of 10, but the idea did seem to be almost there … damn I must stop thinking about it!

There’s certainly something in the handcrafting idea. In another part of the spreadsheet, she’s written 349.265 (or 349,265 in her terms) by just putting a comma in 349265–only a factor of 1000 out this time. I believe the young woman in question is now doing a PhD herself…

A shameful story about obesity

November 1, 2014

The figure above caused some animated discussion on Brockley Central, with many recondite hypotheses being advanced to explain the seeming kinship between South London and unimaginably remote parts of the North.

The first thing to do is to work out what this data is and what it might be telling us.  It certainly looks like Table 7.3, Finished Admission Episodes with a primary diagnosis of obesity, by Government Office Region (GOR) of residence, Strategic Health Authority (SHA) of residence, Primary Care Trust (PCT) of residence and gender, 2012/13 from the data here.  So what are these episodes about?  There are 10,957 of them, and there are 8,024 in Table 7.8 Finished Consultant Episodes with a primary diagnosis of obesity and a main or secondary procedure of ‘Bariatric Surgery’ by Government Office Region (GOR) of residence, Strategic Health Authority (SHA)  of residence, Primary Care Trust (PCT) of residence and gender, 2012/13.  While ‘Admission Episodes’ and ‘Consultant Episodes’ aren’t quite the same, it’s clear that T7.3 is largely about ‘bariatric surgery’, which includes stomach stapling, gastric bypasses and sleeve gastrectomy.  These procedures have traditionally had a fairly marginal place in the NHS, so we suspect that differences in the willingness to perform or to pay for these procedures may be the operative factor here.

There is data specifically on obesity here.  That gives a ‘Top 10’ as follows, which is rather different from the list we started with above–note that the sample for City of London is probably too small to draw definite conclusions.


Area Name Weighted Sample % Obese
Halton 309 35.2%
Barnsley 609 34.4%
South Holland 231 32.5%
Mansfield 274 32.4%
Telford and Wrekin 401 32.3%
North Lincolnshire 424 32.0%
Barking and Dagenham 409 31.6%
East Lindsey 363 31.6%
Thurrock 379 31.4%
City of London 20 31.4%

While not all of the areas in the two datasets are identical, we can make a reasonable job of combining them for London as below.



Any relationship between the two is rather slight, and it does seem that Lambeth, Southwark and Lewisham have high rates of admission for their prevalence of obesity, rather than high obesity as such. It seems reasonable to conclude that we are seeing wide variations in the propensity to subject obesity to hospital treatments, rather than in obesity as such.