Big Data Sociology: Preparing for the Brave New World

Image: Joshua Jackson

Sunday 29th October, 2017

Hamish Robertson and Joanne Travaglia

The emergence of the big data paradigm has taken place over centuries of development, emerging from a variety of pioneering uses into what we now call the information sciences. In addition to its theoretical and technical implications, big data clearly has growing implications for individuals and societies. The current representation of big data is that it is new, intensive and capable of addressing longstanding social and scientific issues. The prevailing assumption is that big data will resolve the problems that small data has failed to, and do so with time left over for other important tasks. In this brave new techno-world big data is proposed as the foundational currency and analytical source of power, especially in increasingly complex data domains such as finance, communication technologies, healthcare and government.

Such is the tenor of the ‘promise’ of big data that potential critiques and risk are framed as ‘simple’ teething problems. For social researchers, this positioning is far from reassuring. The investigation and critique of new technology and its application(s) are essential to understanding both current implications and where emergent processes might be headed. There has been a ramping up of genuinely critical inquiry into big data. But we suggest it is time to go further. It is time to turn big data back on itself. What is needed is not only a sociology of big data but a ‘big data sociology’. This piece is about some of the key sociological issues we see emerging in the unfolding big data space.

The rise and rise of big data

The concept of big data is closely connected to rapid post-war developments in computational technologies, methods and theories. Central to this ‘information age’ has been the more recent advent of mass digitisation and the emergence of a wide array of omnipresent and increasingly ‘always-on’ technologies. The foreground to these technologies includes not only the highly visible and accessible personal computers, tablets, mobile phones and self-monitoring technologies, but also a wide variety of background technologies, many with increasingly sophisticated surveillance and reporting capabilities. These background technologies include satellites, sensors and systemic monitoring systems, many of them self-regulating to various degrees. Big and small system monitoring include wired buildings, traffic (air, rail, road, maritime), water and electricity systems through to the rapid proliferation of the smart or intelligent city concept. These technologies edge into public sight at various points, for example in the current debate around drones, autonomous vehicles and artificial intelligence (AI), but for the most part they have become part of the barely perceptible background hum of interconnected life.

The ‘internet of things’ (IoT) concept envisages a deep and increasing connection between these technologies and an ever-expanding capacity to collect, analyse and benefit (see ‘profit’) from these increasingly enmeshed systems and their capabilities. Each of the systems mentioned above are generating data at quite unprecedented rates, so much so that have reached a time when not only do we not use most of that data generated, but in many cases we aren’t even able to use it all. Our intellectual paradigms, our disciplinary ‘go to’ concepts, heuristics and interpretive capacities (who is really ‘big data’ ready?), and our base technical skills have not yet hit the critical threshold necessary to ‘maximise’ big data’s potential benefits. Big data’s rise and rise seems to be leading many of us by the nose into a future that can only be described speculatively. It is starting to look a lot like the ‘golden age’ of science fiction, back when much of the genre speculated about where technological change might drive society and its potential consequences for us as human beings.

Making Data Social

The social sciences’ interest in big data, is largely coming from the ‘new’ developments that have occurred in social media and media technology and the impact these appear to be having on human behaviour and interactions. Yet much of the data we have been collecting since the early Victorian period has been focused on one of two same primary domains that is, either technical or social knowledge. Even then it was frequently the data itself that interested the collectors rather than the things about which the data was being collected. A growing enthusiasm for analysing the ‘social’, and social information, led to the emergence of the social sciences including economics, sociology and psychology. This discrete domain-specific knowledge production scenario has become much more fluid under the emerging big data paradigm.

What they, and we, are often looking at is data about data, such as metadata or categorical data (such as profile data) describing the individual themselves. Affective, attitudinal and behavioural data make up a large part of what we currently describe as the social data spectrum. These types of data are often domain specific, but in an emergent big data environment the specificity is richer, cheaper and more flexible to acquire than many of the traditional forms of small data. The rising emphasis on the social element (connection, communication, engagement, transactional) of big data is in no small part due to the interest of both business and government. Alaimo and Kallinikos suggest that much ‘social data’ is less about the social domain and more about socially generated data, that is, technologically mediated information generativity. For social researchers, people’s engagement with the technology grows and the ability to ‘farm’ their resulting digital footprints and data trails increases, is an important new source of data in and of itself.

The Coming Transition

Big data is seen as different to small data in that it comes mainly from non-sampling based systems in contrast to surveys, censuses and other cross-sectional or even longitudinal data collection strategies. To date, there seems to be no data system that is truly totalising although big data raises (implies?) the idea of Borge’s one-to-one scale map, itself an extension of prior forms of the concept of a totalising representation of reality or some specific sub-domain. Digital quantification certainly makes the possibility of simulating aspects of reality much more feasible than ever before, Google Earth software, for example, but the fields in which this is currently most developed tend to relate to the physical sciences including geography, where spatial phenomena that are being ‘captured’ in ultra-high digital resolution.

Social relations remain a problem of a different order of magnitude. Much of our architecture of knowledge about the social is deeply contingent on prior understandings of the social world and the limits to theory in this domain. If big data does develop capability to solve wicked problems, we will need a greatly improved conceptual and theoretical architecture of knowledge that is not founded on the array of small data assumptions that are still in play. We will require a truly, foundationally digital sociology.

Past Failures, Future Possibilities

We have flagged in past some of the limitations of the selectively applied quantitative mindset in social analysis. The small data paradigm emerging from the 19th century quickly transitioned to a deterministic strategy as the social sciences supported political hegemony at multiple levels. In the post WW2 period this process was magnified through the lens of the Cold War. In the United States the increasingly corporate state selectively funded specific social researchers while blocking the work of others. In the present, big data’s failures, or more explicitly hegemonic applications, are being investigated by a growing number of researchers, who often come from ‘within’ big data environments. These individuals can in many cases observe not only the limitations of their professional and academic domains, but are prepared to publicly point to systemic and application failures in those spaces. A number of information technology projects have failed to achieve promised results in policy domains notably in health and policing, with substantial costs to tax payers. The award and management of large-scale IT projects, often with big data ambitions, have at time progressed with limited external oversight. The complexities of organisations attempting big data and big data-like projects already has a history worth much closer critique. The enthusiasm for big data is such that there is little focus on past failures or their implications for present or future developments.

The Sociology of Big Data

A gradually emerging domain in the social sciences had been what we would describe as a sociological response to big data. This is usually couched in terms of using currently extant sociological concepts and critiques to inquire on big data initiatives. This includes some of the problems we can already see associated with big data’s very sudden rise to prominence, however deep its historical, conceptual and technical roots. In the second area of social knowledge, alluded to above, we can see how much of this focus is on apparent failures of application in fields such as education or criminal justice. These are two social policy domains with an early interest in bringing analytical tools to large, complex data environments. Another field which stands includes the convergent fields of immigration, foreign policy and military engagement. As with small data, military interest and investment in ‘big data’ is already substantial and exhibits some early worrying trends.

The scope of big data applications is another important sociological consideration. Not all industries, or policy domains, are pursuing big data ambitions at the same pace. This not only means that there is a growing range of organisations and contexts in which big data strategies are being developed, but also differential timeframes and thus experiential pathways towards big data functionality in such organisations. Even within some industries, such as healthcare, there is considerable variation in the move to big data adoption across different types of providers, specialisms and disease-treatment environments. Some of these differences are predicated on the availability of integrated data resources in healthcare environments which may be publicly, privately or not-for-profit funded. In other situations, issues such as the presence or absence of electronic health records, for example, may be the deciding factor. Regardless, this offers the sociologist an almost unending research base on which to draw, and one which is both emergent and highly dynamic over time. And there is, after all, no such thing as theory-free in big or small data, in the same way that there is no ‘raw’ data. Theorising is an integral part of all human knowledge production processes, analogue or digital, and remains central to even the experimental sciences, both natural and social.

Corporate and State Surveillance

One of the foundational principles of data is that it is both trackable and identifiable. This was somewhat cumbersome in the pre-digital age but even then huge effort went into this, including use of the postal system for political surveillance and intelligence gathering. Babbage and Lovelace worked to build a machine that might help resolve the analytical problematic of vastly increasing data collection, without ultimate success, but they set the scene for much of what followed. Now we live in a time when data is not only trackable and identifiable but it can persist in the digital domain well beyond our physical lifespan. This makes it a compounding resource for state and corporate surveillance purposes. A basic principle of the data pundit is that ‘more is better’. Big data elevates this somewhat simplistic principle to entirely new heights. Always-on (but not always working properly) systems continuously adding to their databases, and with growing analytical sophistication makes for a very different environment to any we have seen before. More problematic still is who controls such systems and their growing interconnections?

Big Data Sociology

The premise of this piece has been that small data sociology is insufficient to critique the emerging big data paradigm. The assumptions, methods and approaches of much of small data social science was already compromised by its participation in a variety of unethical research programmes and ideological positions. In more than a century of social science theorising, the same ‘wicked problems’ not only persist but can even be interpreted as the direct consequences of prevailing political and social hegemonies. Small data has not reduced poverty or marginalisation to any significant degree and often simply documents the continuing ebb and flow of power in our social systems. So how might big data be different? How can we obviate or at least reduce the risk that big data and its huge digital knowledge architecture not simply be the small data paradigm writ larger and more oppressive?

Firstly, there is the issue of big data literacy that goes beyond the hype cycle. It is hard to critique something that is not well understood. So, in this regard, the social researchers need to engage with big data beyond applying big data methods as the next interesting set of social analysis concepts and tools. Big data sociology does not imply yet another social media analysis of Twitter Feeds, Facebook postings or online comments on a news item. Secondly, big data needs to be better contextualised. Some of this work already exists in fields such as sociology of technology or social studies of science (STS). But a more comprehensive approach is, we think necessary. This means adding some of the small but interesting fields such as the history of statistics or philosophy of information to the mix. Thirdly there is perhaps by now more obvious need for a deeper socio-political analysis of big data and its applications.

As we note above, small data usage has not ensured an end to social exploitation or discrimination, therefore we need to even more vigilant if big data is not to become yet another, more sophisticated, coercive system. This is one reason that we are already seeing a move to develop an ethics of big data. A ‘big data sociology’ requires an expanded, ethical and critical engagement with big data theorising, methods, practices and, especially, outcomes. It needs to be situated within a sophisticated, interdisciplinary and socially informed science, one which rejects silos and narrow forms of practice that promote inequality (being paid to write an unjust algorithm or one with unjust consequences is not an excuse). It needs an integrated sense of historical development and some degree of moral purpose (even to ‘do no harm’ would be a good initial premise). Big data sociology is not traditional social science but, rather, social science that is actively engaged with the many sciences that have brought us to the present big data horizon. Developing this kind of big data sociology, we suggest, will be one of the major sociological challenges of the coming decades.

Conclusion

In this piece, we have examined several factors in the rapidly developing big data environment as they relate to some key sociological concerns. Our premise here is that traditional or conventional sociology will be insufficient for critically unpacking the big data paradigm and its increasing appeal and interest across many sectors of contemporary society, from business to government to academia. Indeed, our position is that a limitation of social science small data methods is that they are a consequence of the first information age rather than a practical critique of it. The current institutional authority of the small data paradigm has been connected less to its, although acknowledged, analytical utility and more to its role in supporting the growth and influence of the social sciences, particularly in the post-World War Two and Cold War funded quantitative revolution.

The scale and variety of applications to which big data is already being put is considerable. This scenario can only increase over time and with the continued expansion of digitisation as a foundational basis for the modern information society. The social sciences must also adapt to this emergent socio-technical paradigm. The small data worldview, developed over the past 150 years or so, will prove insufficient for developing the ‘big data sociology’ that we propose is required. This ‘brave new world’ demands an equally brave new sociological imagination if it is to keep pace with an expanding big data paradigm and its inevitable consequences.

Hamish Robertson is a geographer with experience in healthcare including a decade in ageing research. He is Visiting Fellow at the University of Technology Sydney. He has worked in the private, public and not-for-profit sectors and he has presented and published on a variety of topics ranging from ageing, diversity, health informatics, Aboriginal health, patient safety and spatial science to cultural heritage research.

Joanne Travaglia is Professor of Health Services Management in the Faculty of Health at University of Technology Sydney. Her research addresses various aspects of health services management and leadership, with a particular focus on the impact of patient and clinician vulnerability and diversity on the safety and quality of care.

Here's more you may be interested in: