29.04.2021 | Inside Data Science
John Bateman on Data Science in Digital Humanities
In our interview, John Bateman talks about diverse forms of communication and the role of data science in his research.
What topics are you currently working on in your research?
Essentially issues that arise due to the intrinsic multimodality of human communication, including theoretical and methodological foundations. Communication apparently combines speech, written language, gesture, gaze, diagrams, drawings, photographs, moving images, and much much more. Often it works, but sometimes it doesn’t; and it is an increasing challenge for education to teach literacies that can cope with this diversity. Multimodality research then looks for general methods for understanding complex communicative forms wherever they might occur.
How important is data to your research?
Many of the methods that I draw on originate in linguistics, which has a long history of engaging with large bodies of (linguistic) data in the context of corpus linguistics. Nowadays it is usual for empirical linguistic studies to engage with large collections of text to determine properties of language through statistical methods, increasingly involving deep learning strategies. We are now trying to push this for multimodality data as well, which means doing the same, but across types of language (written, spoken), images, moving images, and all the other kinds of media that are deployed in communication. Increasing both scale and access to such mixed data is then central but challenging.
What role does data science play in your research? Do you see yourself more as a user, a method developer, a basic researcher, or perhaps something completely different?
Data science is fairly central, and increasingly so. Accessing and organizing diverse forms of communication at a larger scale requires methods and approaches to analysis that are core to data science. The work undertaken is then situated both as a user, in terms of drawing on the techniques and methods established, and as a developer when it comes to formulating and developing ways of organizing multimodally complex data. The move is to develop increasingly abstract annotations levels that enable us to advance from more directly measurable properties of data to interpretations.
Which data science methods and technologies are in the focus of your research or could also become interesting in the future?
Primarily mixed methods, where maximal use can be made of less abstract algorithmic processes (automatic extraction of measurable properties) wherever possible on the one hand, including very specifically targeted processing strategies (shot-lengths in film, word-embeddings, particular kinds of sound recognition, colour-balance, eye-movements, etc.), and abstract characterisations in terms of narratives, argumentative strategies, persuasion, on the other.
What are your main challenges in dealing with data?
Getting at the meanings
of patterns, rather than just finding patterns.
...and so trying to direct pattern-finding so what comes out is more easily relatable to meaningful practices.
And finally, what is your personal motivation for joining the Data Science Center?
There are many quite separate initiatives using data at scale and new methods are always being developed: one never knows which kinds of data the methods will be usable for. So it is going to be important to see just what techniques and data sets are emerging across individual projects. The DSC is a forum where that information can be made more visible.
You can learn more about John’s activities in his talk “Data and Interpretation: Digital Humanities and the missing link with Multimodal Semiotics“
in the Data Science Forum on 06.05.2021.
Prof. Dr. John Bateman
Professor of English Applied Linguistics
FB 10 – Linguistics and Literary Studies