## More about ‘Try Books!’ books

Following on from the previous question, we can consider the how the books read by ‘Try Books!’ are related to each other:  Can they be divided into groups such that the members of a group are like each other and unlike the members of other groups?

The data is limited, but we can try the following approach.  Imagine books X and Y.  Person A marks X as 5 and Y as 7; Person B marks X as 8 and Y as 9; person Z marks them both as 6.  Then we can define a ‘squared distance’ between the books as (5 – 7)^2 + (8 – 9)^2 + (6-6)^2 = 4 + 1 + 0 = 5.  Since there are 3 observations here, we can derive an ‘average distance’ as (5/3)^0.5 = 1.29.  We can derive ‘average’ distances between all the books assessed in the same way, and apply some standard techniques to dealing with those distances.

One approach is to use clustering (agglomerative hierarchical clustering in this case) to produce a dendrogram as above.  The most obvious remark here is that the main branching is between The Resurrectionist and the rest.  Perhaps this is not too surprising.

Another approach is to use multidimensional scaling, where we try to place the points on two dimensions in an arrangement that is consistent with the distances we have derived between them.  This time, not only is The Resurrectionist on its own in a corner, but Pride and Prejudice and Zombies doesn’t have any friends either.   There are some clusters to be seen, but it’s not clear how many of them have a natural interpretation.  It’s interesting to see that If This Is A Man and The Master and Margarita have ended up together when they both feature on for instance Le Monde‘s 100 Books of the Century.

Tags: ,