Serendip is an independent site partnering with faculty at multiple colleges and universities around the world. Happy exploring!

Word Cloud analysis

merlin's picture


In class, we discussed different types of information and languages. Anne asked "how important (essential??) is inefficiency, incompleteness, lack of understanding, to our both "slowing down" and "staying connected"? (eg, the QWERY phenomenon). Some of us seemed to think that there is a zero-sum balancing out between efficiency and understanding when it comes to recording what we say with technology.

Drawing on Grobstein's quote, he asks that if there are highly unambiguous "languages" (mathematics.. computer languages), and how come ordinary language is highly ambiguous in interpretation..does it serve to facilitate discussion? Maybe ambiguity even evokes new knowledge..

So does sending information via computer or text increase the understanding factor, but decrease the efficiency? some people tended to agree, while others seemed to disagree. But I'l like to look at this from another direction. Maybe the act of recording information itself inherently increases understanding (without even having to take into account other  "types" of language (like computer language). When recording information - we have the power to analyze information beyond just that particular moment. we have the power to go back and look at it and pick it apart. When we send a text message the reader can go back months later to look at it, and we can never take back what we say. With spoken language, its intangible nature makes it highly ambiguous because what is said can be forgotten or misunderstood without having the ability to review it. I believe it is the recording of information that often slows down it's transmission. But it is the ability to review it later which increases it unabiguity. 


In order to exercise this revision process, I will do a study of gender and technology language using technology to see how our class discussions have evolved over time, pulling from several types of keywords. I'm looking for trends in the class' use of highly specified language. 

Quarter 1Quarter 1


Quarter 2Quarter 2
























To make these "word clouds" I've used the online software "Wordle" in order to create a visual illustration of word frequencies. The larger the word, the larger the frequency in which we used that word. For the semester, I accumulated over 300 document pages, single spaced… that's a lot of words!



Some of the words I'll be paying a lot of attention to...

Here are some of the stand-out words used with the highest frequencies in our posts. They are in approximate order starting from most frequent.





















MALE and FEMALE: did we go back to following the male-female binary? (which we criticized at the beginning of the semester).

It is interesting that women is one of the more frequent words, since in the beginning of the semester we were trying to dispel gender binaries, but it seems that later on we started using them again. The term woman was not used in this high frequency in the first half of the semester.


The first half of the semester has a higher frequency of the words Class and Reading which makes me think that as the semester progressed, we were expanding our scope more outside the realm of the classroom and the readings, and I believe that this is an evolution of sorts that points to the broadening of our perspectives. 
















Next, I made another set of word clouds by removing technology and information from each (used close to 400 times each) because these two words were both of high frequency each quarter, and so removing them would make it easier to the differences rather than the similarities between part one and part two of this class. 


Quarter 1 (revised)Quarter 1 (revised)


Quarter 2 (revised)Quarter 2 (revised)



























COMPARISON between revised first and second quarter word clouds…


















I find it interesting how world and people are prominent in the second quarter, seemingly to show that as a class our visions have expanded to encompass more broader categories.


In the first quarter, we were discussing binaries a lot, so our references to gender and categories makes a lot of sense. People seemed to really respond to music as a form of information transmission, I'm thinking because of our lecture with Professor  Tian Hui Ng. I see that the word think has dropped down on the list from the first to the second quarter. This makes me wonder if it is because we as a class were less uncertain of our presumptions as the semester progressed. This is assuming that we used the think in the context of "I think X_______." signaling an uncertainty or lack of confidence in our statements. There are also smaller-frequency words missed in which have the same tendency to suggest uncertainty such as thought, possible, feel, question, may… Whereas in the second quarter there are fewer of these discourse markers, and more of a trend towards know, intent and see.

I would consider the frequency of reading in the first semester. It seems like most of the time this word was used in reference to our class "readings" as opposed to the "reading" or interpretation of a story. I believe that this might also suggest that our class was holding a focus in the realm of the article by making lots of references to "the readings" but maybe later on looking outside the readings and making interpretations about the outside world from what we learned in those readings. Both quarters have a high frequency of the word people. I found this interesting because I thought that cyborg would be in pretty high frequency, it seems that we consistently are thinking about gender and technology in the context of people, as much as we tried to expand our viewpoints to outside the human realm - like in cyborgs, for example. 

Not only did we use "people" a lot in the second semester, we also used human in the second semester, possibly to differentiate between gender and technology themes in the cyborgian context. But we still liked to reference humans a lot for some reason. 

The first quarter, we seemed to use a lot of terms in reference to the human state and states of being (sexuality, children, students, society, body, women, natural, exist, name, thinking, female, someone, died, learn, sex, life, and brain)

However in the second quarter we made references to types of technology and people who use it: (internet, gamers, computer, video, text, film, book, online, virtual, Facebook). This shift is perhaps one of the more interesting observations, and might also have to do with what was being discussed in the second versus the first quarter. 


The theme of this project has been to use technology to understand the groups perspective and the evolution of this perspective. I’m sure one could notice a bunch of other interesting differences and similarities between my word clouds, so any commentary in this respect would be appreciated! My very own interpretation of these word clouds has probably been influenced by the random assortment of the words that was all created by computer (binary, code) and the ways in which the words are dispersed may have made me notice some words and disregarded others. I would even venture to say that my reading of these word clouds might have been influenced by the characteristics of Hyper-reading. I sometimes found myself focusing on the top left corner of the images, and ignoring the bottom right in my word-search. There is also a sort of availability heuristic, in which I probably picked out those words that paralleled my theory at that moment. Even though technology did a great deal of the sorting and calculated frequencies in a matter of seconds rather than a matter of hours (which it would have taken me by hand), I still found it necessary to do some tweaks manually to make sure my data came out the way I wanted it (for example removing all the extraneous names, times, profile pictures and dates that came before each post). it kind of goes to show that computer production is only just as good as what we put into it. My word clouds certainly wouldn't have looked as "pretty" had I not filtered out all the irrelevant information from the Serendip page. But by far the most powerful element that made this project possible was the notion that recorded information allows for revision which auditory processing alone could not have allowed for. With my creation of the word cloud from our Serendip records, I was able to form interpretations and meta-analyze what had been said months ago by fellow students. So I ague that this exercise has drastically increased my understanding of the big picture of our class discussions, even if typing out the information potentially took longer for each individual than speaking would have.



Anne Dalke's picture

Reading Reductively


What a delightful project!

Three students in my other class also created word clouds for their final projects. Tangerines used hers to compare my class notes with her own from the first and last weeks. In light of your ideas about "transmission" of information, this turned out to be a VERY interesting pedagogical experiment! AnnaP and cr88 had a different axis of comparison for their project:
they used word clouds to analysize the full texts of a scientific and a literary text, and were surprised to see how full of "uncertainty" both clouds were.

Your particular angle of analysis --that "recorded information allows for revision which auditory processing alone could not have," enabling you "to form interpretations and meta-analyze what had been said months ago by fellow students," is a convincing demonstration that, as you claim, "the act of recording information [can] increase understanding," because it offers "the ability to review it later," which (you also claim) "increases it unabiguity." (An interestingly ambiguous typo, that!) Increases unambiguity, you mean? But doesn't your act of interpretation, while reducing some ambiguities, create others?

What we discussed during the presentations in the other class is relevant here: our conversation had to do with the usefulness of this sort of "reductionism," commonly associated with science (think of the reductions of chemists, for instance): what does it accomplish, and what are the limits of this sort of methodology? What does it make clear, and what does it further obscure or bring into question?