This is a post to share some observations on language according to Mark and explore the Gospel According to Saint Mark. It is a response to some discourse happening on the national stage and a follow up on some work I’ve been doing on embedding charts and displaying language data.


In every language that permits it, by grammar not by governance, The Bible has pronouns. I repeat: The Bible has pronouns. To suggest otherwise is to deny millennium-old facts. Now, using the linguistic presence of pronouns as part of a proxy battle against modern pronoun usage, specifically among LGBTQI+ communities, is what’s really happening. These veiled attacks not only deny facts, they deny individuals the right to free expression.

So, curious not about whether there are pronouns in The Bible but rather how very many there are, I set to pulling some .txt files and working through a process that I had done with poetry before. I very quickly saw that I needed to limit myself to the New Testament and beyond that, one Book. I chose the Gospel of Mark. I took the .txt file and imported it into Google Sheets. I did some formatting and ran some formulas to create a clean column that contained each word in Chapter 1. (For a detailed description of how I did this, use the “displaying language data” link above.) This took some time and gave me enough of what I needed to begin this exercise.

Chapter 1 contains 978 words and, of those, 334 are unique. This is what that list looks like in chart form.

The words “and,” “the” and “of” top the list. The fourth most common word is “him,” our first pronoun. (For the purposes of this exercise, I collapsed “him” and “Him” as one single reference…so too with all nouns and pronouns that could be proper or common nouns.) The pronouns used in Chapter 1 are, in order of frequency: him, he, they, I, his*, them, her*, thee, ye, you, me, which**, who**, whom**, thyself, she, we, whose**.

One can rightfully–righteously?–observe the following. The pronouns “him” and “he” top the list should not surprise. It surprised me that “me” and “you” and “we” are lower on the list. “She” appears once in Book 1 and 36 times subsequently. Like “she,” the word “Lord” only occurs once in Book 1. Not a pronoun, but not un-surprising either.

We could use this data to set up a dialog about the collective perspective compared to the intimate/individual connection. This is probably worth exploring. What else is worth exploring? What might we find in a further exploration of the full Gospel? What light could this type of linguistic analysis play on a comparative analysis of the Books of the New Testament, or the New and Old Testament?


With some forward mapping, I was able to apply some lessons from Book 1 to the other 15. (Thanks you =COUNTIF(B2:AX679,”he”)!)*** Using the 18 pronouns found in Book 1, I sorted and charted their usage in all of Mark. This is what that looks like:


This is my hunt-and-peck analysis of the available data. I suspect the questions and the framework are more useful than the hard numbers. If I had time and/or more talent and/or a team, I would run similar numbers for all the Books of the Bible and create ways to find further connections and visually represent trends and connections inside the Bible. Are there “pockets” of words and, if so, where do they appear? I’d like to try to do this in more than two dimensions, too. The final-final project: to identify connections among sacred texts. What could we find if we were to compare comparative analysis of The Bible, The Quran, The Tanakh And The Talmud, The Tipitaka, The Popol Vuh? How are pronouns and prophets connected? In the lay sense, The Philosophy Data Project has create some amazing code and representations of connections among great philosophers. This type of comparison among sacred scriptures would be illuminating. This project in kaggle titled “Explore King James Bible Books” has potential as a framework, but it is not as detailed or well designed as The Philosophy Data Project.

I’ll conclude by posing two questions that may be on your mind: isn’t parsing a sacred text like this, well, sacrilege? Can we reasonably apply the same textual analysis to a sacred text that we might to a sonnet or a short story? I’d say “no” and “yes.” First, if we are going to have informed debates in different spaces, shouldn’t the debates be informed? If we suggested things that are or are not in a text, should we try to be correct? Also, if we have these tools to collect and analyze data at our disposal, shouldn’t we apply them fully? I would follow up by saying this type of textual analysis is not at all intended to subjugate the Texts in any way or suggest that modern approaches or systems are superior. The key here is to center conversation on the significance of the text. Now, I recognize that the simple count of words does not give us everything we need, but it is a reasonable first step to conversations about what is present, why and what it seems to mean in a historical and contemporary context. I would further follow up by saying the natural tension that would create in terms of how we understand and interpret texts is already at play and has been for most of the Modern Age. This is clearly not specific to scripture, as Constitutional debates about firearms, abortion, and voting rights play out.

As is the case more and more, my hope is to help my students and my readers to use available tech tools to explore the humanities in the fullest possible way.


Notes:

* On first pass, I did not distinguish between these words as pronouns or possessive adjectives. I could down the road.

** These relative pronouns work as part of a langer analysis of pronoun usage, but not pronoun usage in the sense of identifying/identity markers. I did not include “those” as a pronoun for related reasons.

*** It took me some tinkering to determine that =COUNTIF(B2:AX679,”he”) without asterisks will give be back full words, which is what I wanted. Conversely, =COUNTIF(B2:AX679,”*he*”) with asterisks will give me any place where those two letters appear. The scale in Mark is 404 vs. 2,624 for the count of “he.”