Centuries of Sociology in Millions of Books

By Yunsong Chen and Fei Yan

Sociology, as one of the core disciplines of the social sciences, is “like a caravansary on the Silk Road, filled with all sorts and types of people and beset by bandit gangs of positivists, feminists, interactionists, and Marxists, and even by some larger, far-off states like Economics and the Humanities, all of whom are bent on reducing the place to vassalage”. Yet, notwithstanding this statement on the complexities of disciplinary advancement of sociology, there is virtually no empirical sociological research that can attest to the development of different “sorts and types” of sociological norms, practices, and boundaries.

In the current study, we conduct the first empirical analysis, to our knowledge, in the field of sociology to use the Google Books N-gram corpus, a digitized books repository containing enormous volumes of data. This novel application of massive content analysis using data of unprecedented size could help unpack the transformation of sociocultural dynamics over a long-term temporal scale.

Since 2004, Google has been engaged in digitizing books from libraries, retailers, and publishers worldwide. The main Google corpus consists of about 8,116,746 volumes of books, representing 6 percent of all books ever printed since 1500. The English corpus alone comprises close to half a trillion words. The Google Books corpus provides information about how many times per year an “n-gram” appears in all the books included in the corpus, where an n-gram is a continual string of n words (uninterrupted by a space). A 1-gram could be a single word, for example, “sociology,” or numbers “1.234.” An n-gram is a sequence of 1-grams, such as the phrases “sociology theory” (a 2-gram) and “field of sociology” (a 3-gram). Punctuation and capitalization are preserved in the data set. By searching the Google corpus for a key word or phrase, one can obtain information about the annual occurrence of that keyword or phrase for a given time period. 

We analyze the evolution of the usage of the most common words and phrases in terms of disciplinary advancement of sociology in five major categories from the mid-nineteenth century to 2008: academic significance, masters of sociology, theoretical dimensions, fields of sociology, and analytical methodologies. “Academic significance” refers to the historical position of sociology in human knowledge as a subject related and compared to other subjects; the key word for this is “sociology” or “sociological.” For “masters of sociology” sociologists’ full names serve as the search terms and the goal is to chart key figures’ rise to fame and their academic reputations. The key words for “theoretical dimension” are the names of relevant sociological theories and schools; “fields of sociology” focuses on the sub-branches of sociology and popular research topics; and “analytical methodologies” focuses mainly on the comparison of qualitative and quantitative research methodologies in sociology. Finally, we constructed an overall index deriving from all sociology-related key words using the principle component method to demonstrate the overall sociocultural influence of sociology in two centuries’ books.

We also conduct more substantial research into the development of sociology beyond simply describing the rise and fall of the usage of sociology-related words. We use the case of the early development of sociology in the US as an example to illustrate how the data extracted from Google corpus can be used to conduct quantitative studies. Based on time series analyses, we find that there was a close relationship between the early development of sociology and the social gospel movement in the US.

Our results show that the annual usage frequency count of a particular term based on a big-data strategy not only gives clues as to the historical emergence and progress of sociology—indicating, for example, the longevity or popularity of a particular sociological field or method—but also sheds light on the linkage between the development of sociology and broader sociocultural dynamics over centuries.

The contribution of this study has been to show that massive content analysis from digitized books can provide rich insights regarding the historical evolution of professional disciplines and long-term sociocultural changes at a macro level. We thus suggest to open up a new field—“socialomics” —to study the current state of a dynamic, fluid social world with massive digitized data collection and analysis. The value of establishing such an energetic and forward-thinking approach lies in the fact that the amount of human knowledge accessible to sociologists via physical reading is, in fact, very limited. With “genetic” analysis of word frequency usage in a digitized era, we are likely to achieve theoretical inspirations and academic knowledge that the early generation of sociologists could not even have imagined.

Read more in the new paper: Centuries of sociology in millions of books

Yunsong Chen is an Associate Professor in Sociology at Nanjing University. He earned a D.Phil in sociology from University of Oxford, Nuffield College. His main research interest lies in advanced quantitative methodology in sociology, social networks, and big data in social science. He has published in Social NetworksBritish Journal of SociologySocial Science Quarterly, Social Science Research, and Chinese Sociological Review.

Fei Yan is an Assistant Professor in Sociology at Tsinghua University. He is also affiliated with Stanford University’s Walter H. Shorenstein Asia-Pacific Research Center. His research focuses on historical sociology, political sociology, and sociology of development. His work has appeared in Social Science ResearchSocial Movement StudiesUrban StudiesModern China, and China Information.

Originally posted 24th August 2016.

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: