Some interesting data

We recently came across some data (about a recruitment exercise, as it happens) as below in Excel.

corrupdata

If we look at the second row, then 28 + 28 + 18 + 13 + 24 + 20 = 131, so that’s all right. If we look at the third row for instance, it’s a little more interesting–2.5 billion + 2.5 billon + 1.6 billion + 1.6 billion + 2.2 billion + 2.2 billion = 124 smacks of anti-climax.

It’s easy enough to correct by hand: 1,733,333,333 is ‘obviously’ 17.33333333 by comparison with the other figures in the same column. It’s less clear how you could corrupt the data format in this way even if you wanted to.

You can’t even do it by hand in Excel: paste value, remove ‘.’ to get integer, then format the thousands with commas, because there are more decimal places stored and you end up with (say)173 billion rather than 1.73 bill.

From the name of the person responsible for the data, it looks like conversion to/from a Polish version of Excel might have something to do with it…In Poland, they use a space as the divider for thousands, which you can see might cause difficulties upon conversion, but I don’t know whether or how that is implemented in Excel.  In other parts of the spreadsheet you certainly have (say) 359,6 for 359.6, so ending up with a comma for the decimal point and for the thousands separator takes some doing.

Discussion 

It is very odd indeed isn’t it? If we make your ‘correction’ of 1,733,333,333 = 17.333… then it does fix row 3 but aren’t rows 1 and 4 still out by a factor of 10? It looks as if something decided to put just one figure in front of the first comma (or decimal point) in the recurring decimals, no matter where the decimal point should have been.
You can’t even do it by hand in Excel: paste value, remove ‘.’ to get integer, then format the thousands with commas, because there are more decimal places stored and you end up with (say)173 billion rather than 1.73 bill. In Poland, they use a space as the divider for thousands, which you can see might cause difficulties upon conversion, but I don’t know whether or how that is implemented in Excel.

When you type 17,3333 and loads of 3s into UK excel (as if you were converting from Polish perhaps) you get a big integer value with the commas (to mark 1000s) put in the right place marking off in threes from the right hand end. Then if we pretend we we put in all the threes possible, we would get a big value with all the digits marked off with commas in threes from the right hand end and the first comma might come after the first digit depending how many digits excel can hold. If you had a number that filled all those digits, and it got bigger by being added to something else, then excel might have to lose the extra right hand digits and the first comma would still always be in the same place because of being marked off three at a time from the right hand end. I think we almost have it but I haven’t expressed it very well.

So all the recurring decimals in your sheet were simply converted to huge integer numbers and, because of the number of digits Excel can hold, they all ended up with a comma after the first digit and then another after each set of three.

Yes that would work but if I try doing it (starting from 52/3, which is clearly what we gave here), I tend to end up with 173 billion–it might be possible to change the number of zeroes Excel stores I suppose…

But it wasn’t put in as 52/3, it was transfered as 17,33333333 (Polish) and converted to 1,733,333,333 etc precisely because you can’t change the number of digits Excel can store. It must be of the form 3n+1, I think.

Sorry if I have become a bit obsessive. It is a lovely problem and much nicer to play with than my PhD! I will go and try and do some writing now though!

And really I don’t think this is how Excel stores integers now – it has 15 digit precision and then a power of 10, but the idea did seem to be almost there … damn I must stop thinking about it!

There’s certainly something in the handcrafting idea. In another part of the spreadsheet, she’s written 349.265 (or 349,265 in her terms) by just putting a comma in 349265–only a factor of 1000 out this time. I believe the young woman in question is now doing a PhD herself…

Tags: , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: