In previous posts, I have studied some metrics of modern philosophy through computational tools such as frequency words count, idf-tf analysis or, even, topic modelling using LDA algorithm. I have shared a serie of analysis which originated not just the posts (until three posts:,,, but also a paper which is going to be published soon. Today, I would want to write about what should be the continuation of that research. At least, I will give an advance, hoping I will have the time and the inspiration to do it properly, if I can, in another paper.

Text mining approaches

As probably many people know, not all the computational analysis you do with texts is the same. There are differents approaches. Basically, we could enumerate three different levels (you will find mores information here at the last publication of Hvitfeldt and Silge (2020)):

  • the exploratory analysis where you calculate relations between words, including stopwords removal, ngrams analysis, idf-tf measurements or even sentimental analysis.
  • within the first one, a variant where you work not with the words (or tokens) itselves, but with a stemmization. In this case, you use previous created rules which are different for each language. These rules lend you to identify as the same token the variations of number, gender and time of the words. They are, for example: to remove the final “s” or “es” of a word or the “tation” termination, among many others.
  • one step further the second approach, when we use not mechanical rules but semantical detection of the common tokens resulted of words variations. That implies a more sophisticated work totally based on philological studies. So, we need a dictionary or words and compare it with our corpus searching the roots of our works. As you can imagine, it supposes a great cost or computational power. In return, you gain not just accuracy. You also have much more information to analyze. For example, you get the part-of-speech tagging (or POS tagging of each word).

The last method implies the Natural Language Processing (NLP) analysis which is more suitable for achademic purposes. That’s also the method I am going to present in this post. For this purpose, I will use the udpide package (Wijffels, 2019).

Practical case: Hegel vs. Nietzsche

If on previous posts I have analyzed what words identify each philsosphical work, in this one, under the view of NLP analysis, I will focus more on the type of words and the philosophical structure of the texts. In this sense, one of the first features you can observe is the sentence composition of the works. Mainly, we can look at the sentences length as it is shown on the next plot. As you can see, the work of Hegel has very much longer sentences than the Nietzsche’s one. The first one mean of sentences length is 33.2; meanwhile, the second one has a mean of 19.1. This difference should give us an idea of how different the styles are; even they have opposite styles.

Graphic 1: sentences length variations on Hegel and Nietzsche works.

This analysis can go further if we look, not just at the sentences itselves, but the kind of words which constitute them. This is also called POS tags on the NLP approach. On graphic 2, I have plot the result of exploring both works. As we can see, Hegel’s work has much more determines and adpositions, result of the complex sentences he uses. On the opposite, Nietzsche needs more puntuation sings. However, what draws more attention is that, even having shorter sentences, Nietzsche’s work has noticeably more verbs. This information could us think about which work has more movement. Even when Hegel’s work tries to conceptualize movement.

Graphic 2: Tipe of words of each work counted relatively using their average.

Finally, we can go little deeper, analyzing which words appear on each of the kind of words or POS tags detected. I have selected some of the POS tags I thought were more relevant but, of course, you can select another ones. In this case, we can see how Hegel uses very much the verb “to be” against Nietzsche, who uses the verb “want” -excuse me because I did this analysis in spanish 😁-. Or if you look at the noums, you also confirm the orientation that Hegel has to inmaterial and abstract things, against the practical, directed and emotive

Graphic 3: Concrete words for some POS tags.