Trainings

From Audio to Text: Automated Transcriptions with Whisper (DSC-2025-17)


When?
30.09.2025

09:30 AM - 12:30 PM & 02:00 PM - 04:30 PM

Where?
To be announced


Speaker:
Nele Fuchs & Annika Nolte
Data Science Center, University of Bremen

Number of Participants: Max. 20

Language: English





« Back

BACKGROUND

Interviews are a central method of qualitative research and form the basis for scholarly insights in many areas of the Digital Humanities (DH) – from oral history and linguistics to ethnography. But qualitative audio data is also becoming increasingly important in other disciplines. However, transcribing the collected audio data is extremely time- and resource-intensive. As a rule of thumb, one hour of recorded material requires approximately four to sixty hours of manual transcription time (Evers 2011).

By using automated methods such as Whisper  – an open-source tool and Python package for automatic speech recognition (ASR) – this process can be significantly accelerated. Whisper enables the creation of initial transcript drafts, which can then be manually revised.

WORKSHOP GOAL

Participants will have the opportunity to experiment hands-on with Whisper and related tools, assessing their potential in the context of their own research. At the same time, the workshop aims to enable participants to critically reflect on the methodological, ethical, and technical challenges of automated transcription.

By the end of the workshop, participants will have gained a solid understanding of the various applications of Whisper. Participants will be able to generate reliable first drafts of transcriptions, and (optionally, if attending the full day) streamline their workflow by integrating Whisper with Python for large-scale transcription tasks.

WORKSHOP CONTENT

Please note: Attending only the morning session is possible.

Morning
  • Introduction to audio transcription in qualitative research and its relevance for qualitative research.
  • Overview of requirements for transcription tools (e.g., data protection, GDPR compliance).
  • Presentation of Whisper and the open-source tools aTrain and noScribe built on it.
  • Critical discussion: Opportunities and limitations of automated transcription (accuracy, bias, impact on research processes).
  • Practical exercise: First transcriptions with aTrain and noScribe using provided audio files.

Afternoon (optional)
  • Introduction to Whisper as a machine learning model and Python package (for more advanced applications).
  • How to use Whisper for larger datasets and speaker diarization.
  • Step-by-step demo: Running and customizing a Whisper script in Jupyter Notebooks.
  • Hands-on session: Work on provided tasks to adapt the Python script; optionally perform transcriptions with participants’ own audio files.

TARGET AUDIENCE & PRIOR KNOWLEDGE

This workshop is designed for anyone who needs to transcribe audio files and wants to automate the process. It is primarily aimed at researchers working with qualitative data.

No specific technical knowledge is required to participate in the morning session. Experience with transcribing interviews or other research audio data is helpful but not mandatory.

The afternoon session is aimed at participants who wish to gain deeper insights into using Whisper as a Python package. Basic knowledge of a programming language (ideally Python) is beneficial but not required. The workshop is designed so that technically inclined researchers without prior programming experience can also gain an initial understanding of working with scripts.

TECHNICAL REQUIREMENTS



ABOUT THE TRAINER

Annika Nolte and Nele Fuchs are data scientists for training and consulting at the DSC.

Nele Fuchs studied Philosophy, Material Culture: Textile (CvO University of Oldenburg), and Transcultural Studies (University of Bremen). As a data scientist in the Humanities, she supports researchers in the areas of Digital Humanities, data science methods for qualitative research and FAIR-compliant qualitative data management, leveraging her expertise in handling sensitive qualitative data.

As a DSC data scientist and environmental scientist, Annika Nolte supports researchers with their data management and analysis workflows. In training and consulting, Annika draws on broad expertise in Earth system sciences and extensive experience in scientific programming. Her main focus areas are data standardization, data management, statistical methods, geospatial analysis, and machine learning in environmental and marine sciences.





The Data Science Center is funded by:
Logo funding by BMBF Logo funding by EU