Sunday 1 February 2015

Text Analysis


Initially, before attending last Monday’s class and seeing Adam Hammond’s text analysis presentation, I was frankly skeptical about how I would use such techniques. From the first week’s readings, covering the text analysis of the Bible as the first real attempt at digitally analyzing text, I found the process to seem extremely boring and very limited in how one could use the data collected from such a practice. However, after seeing Adam Hammond’s presentation and trying out some of these newer ways of utilizing text analysis tools, I can clearly see some real value in them. I decided to try out some of the text options on random bits of text from a variety of sources and then I tried it on a text document from a 4th year rural history seminar I took last semester. Once I chose a document, I decided to use the word cloud making tool, Wordle. The document I decided to use was the immigrant diary of Ben Freure, an English settler who traveled to the township of Eramosa in 1836 with his family. This diary covers approximately six years of Ben’s life with his three adult sons, from the time that he leaves England to when he has established his farm and home in Canada.[i] I wanted to see how accurate a word cloud would be at identifying the core ideas or subject of this text. Before doing the text analysis, I predicted that the names Augustus, Felix, and Caesar would appear with a great deal of regularity as these are the names of Ben’s adult sons and often the focus of his diary entries when he would regularly record their daily labor. I also suspected to see words relating to farm work, perhaps, plowing or logging, among others.

I then put Ben’s diary through the Wordle program and got the following result.

The first thing I noticed was the program picking up the names of days and months, and displaying the extremely large word “day” in the middle of the word cloud. At first, I thought this kind of text analysis wasn’t useful because it brought up date related words but then I realized that it was actually fairly important. For example, these words make it clear to anyone that hasn’t read the text that it is a diary and focuses on listing dates. Additionally, if one is privy to the information of the diary, one can see that Sunday comes up more regularly than other days as this is when more diary entries occur, presumably because it was a day when Ben had the time to write. I also noticed that my prediction was true and Ben’s sons’ names come up with a great deal of regularity but, at the same time, neither Ben’s name nor his wife’s comes up at all. All of this information gives a first time reader a broad understanding of the diary but clearly not the specifics or much context. Regardless, it still seems to be kind of neat to use a word cloud to present a splash page of raw information to first time readers or viewers, giving them a very quick and broad understanding of a much larger text.




A short overview of Ben Freure and his family.

No comments:

Post a Comment