Natural Language Processing in Python (DSC-2023-06) | 2,5 days

04.10. - 06.10.2023

04 & 05 Oct.: 08:15 - 15:45
06 Oct.: 08:15 - 11:45

Workshop for PhD students and Postdocs

Dr. Matthias Assenmacher

Online (Zoom)

The workshop will be held in

The workshop is already fully booked.

Also, if you have any questions regarding our workshops, please feel free to write us an E-MAIL.

« Back


Researchers have achieved some breakthrough developments in text mining and natural language processing (NLP) which are mainly driven by three key factors: new deep learning frameworks, better computational resources, and access to larger amounts of data (Big Data). In this workshop, we will start with the basics of text processing in Python and learn about classical feature engineering from machine learning for text data. We will then look at word embeddings, word vectors, and their integration into Deep Learning architectures like RNNs. We will also delve into the “attention” mechanism and transfer learning, key components of state-of-the-art models like BERT & Co.


  • We demonstrate the importance of NLP with some examples. Followed by an introduction to dealing with text data and their potential representations in ML. We also shortly introduce Fully-Connected-Neural-Networks (FCNNs) as an important basis for the rest of the course.
  • We focus on neural representations of texts and start with the idea of language modeling using the neural probabilistic language model (Bengio et al, 2003). Then, the Word2Vec framework (Mikolov et al., 2013), the Doc2Vec framework (Mikolov and Le, 2014), and the FastText framework (Bojanowski et al, 2017) are introduced. The frameworks will be accompanied with hands-on sessions for practical implementation of what has been learned.
  • We will focus on Deep Learning and current state-of-the-art architectures. We will take an in-depth look at existing transfer learning resources and apply what we have learned in a final hands-on session.
  • For the hands-on parts of the workshop, practice exercises will be provided in the form of Jupyter notebooks that participants can use to complete the exercises themselves.


PhD students and postdocs with basic knowledge of Python and supervised machine learning methods. The workshop is very hands-on and thus limited to max. 15 participants.


Use a laptop/PC with reliable internet access and install the following software:


Dr. Matthias Assenmacher works as trainer for Data Science Essential GmbH and is postdoctoral researcher at the Chair of Statistical Learning and Data Science (LMU) and the NFDI Consortium for Business, Economic and Related Data (BERD@NFDI). He obtained his bachelor’s degree in Economics from LMU in 2014, afterwards I turned to Statistics (with a focus on social and economic studies) and obtained his Master’s degree in 2017 (also from LMU). In October 2021 he finished his PhD with a focus on Natural Language Processing.

His expertise revolves around the practical application of state-of-the-art NLP architectures to real-world problems from various disciplines, as well as open and reproducible science.