An amusing PCA

Probably need to click on this to see properly

Need to left-click on this I think

Well, this is interesting, though you probably need to click on it to see very much.  Here we have a PCA plot of the data from the 25 books read by Try Books! for which I could find ‘Text Stats’ data on Amazon.com.  In more detail, this is a plot of the second principal component against the first principal component.  The items in black are the books, while those in red are the variables.  The variables in upper case were defined by me, while those largely in lower case come from Amazon.com.

G10 (measuring whether the group liked it) is very close to Flesch (a measure of simplicity).  Naturally enough, the measures of complexity FogSyllables (per word), Complex (percent of complex words), Flesch_K(incaid) and Words_S (words per sentence) point in the opposite direction.  So does Words_10k (number of words in units of 10,000), which is right on top of Fog.

The arrow ENG (saying whether the book was originally written in English) points away from KILL and SEX, while the arrow HT (representing whether HT liked it, whoever that might be) points along SEX, but not so far.

In fact G10 points directly away from Flesch_K(incaid), which is a measure of grade level (or reading age, in British terms).

Everybody has got what they deserved, and especially me, as Tsar Nikolai I remarked after the opening night of Gogol’s Government Inspector.

Advertisements

Tags: ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: