Workshop für Promovierende und Postdocs

Natural Language Processing in Python (DSC-2023-06) | 2,5 days

Wann?
04.10. - 06.10.2023

04. & 05. Okt.: 08:15 - 15:45 Uhr
06. Okt.: 08:15 - 11:45 Uhr

Wo?
Online (Zoom)

Referent*in:
Dr. Matthias Assenmacher

Sprache: Englisch

Der Workshop ist bereits ausgebucht.

Bei Fragen zu unseren Workshops, schreiben Sie uns gerne eine E-MAIL.

« zurück

OBJECTIVES

Researchers have achieved some breakthrough developments in text mining and natural language processing (NLP) which are mainly driven by three key factors: new deep learning frameworks, better computational resources, and access to larger amounts of data (Big Data). In this workshop, we will start with the basics of text processing in Python and learn about classical feature engineering from machine learning for text data. We will then look at word embeddings, word vectors, and their integration into Deep Learning architectures like RNNs. We will also delve into the “attention” mechanism and transfer learning, key components of state-of-the-art models like BERT & Co.

WORKSHOP CONTENT

Part 1: NLP Significance and Text Data Introduction

We demonstrate the importance of NLP with some examples. Followed by an introduction to dealing with text data and their potential representations in ML. We also shortly introduce Fully-Connected-Neural-Networks (FCNNs) as an important basis for the rest of the course.

Part 2: Neural Text Representations: Language Modeling and Frameworks

We focus on neural representations of texts and start with the idea of language modeling using the neural probabilistic language model (Bengio et al, 2003). Then, the Word2Vec framework (Mikolov et al., 2013), the Doc2Vec framework (Mikolov and Le, 2014), and the FastText framework (Bojanowski et al, 2017) are introduced. The frameworks will be accompanied with hands-on sessions for practical implementation of what has been learned.

Part 3: Deep Learning and Advanced Architectures for NLP

We will focus on Deep Learning and current state-of-the-art architectures. We will take an in-depth look at existing transfer learning resources and apply what we have learned in a final hands-on session.

Hands-On Sessions

For the hands-on parts of the workshop, practice exercises will be provided in the form of Jupyter notebooks that participants can use to complete the exercises themselves.

TARGET AUDIENCE

PhD students and postdocs with basic knowledge of Python and supervised machine learning methods. The workshop is very hands-on and thus limited to max. 15 participants.

TECHNICAL REQUIREMENTS

Use a laptop/PC with reliable internet access and install the following software:

Python (at least version 3.6)
Jupyter Notebook
Zoom https://zoom.us/download
Both Python and Jupyter Notebook are included e.g. in the software Anaconda https://www.anaconda.com/distribution
Make sure that you have sufficient permissions to install Python modules (e.g. via conda install)

ABOUT THE TRAINER

Dr. Matthias Assenmacher works as trainer for Data Science Essential GmbH and is postdoctoral researcher at the Chair of Statistical Learning and Data Science (LMU) and the NFDI Consortium for Business, Economic and Related Data (BERD@NFDI). He obtained his bachelor’s degree in Economics from LMU in 2014, afterwards I turned to Statistics (with a focus on social and economic studies) and obtained his Master’s degree in 2017 (also from LMU). In October 2021 he finished his PhD with a focus on Natural Language Processing.

His expertise revolves around the practical application of state-of-the-art NLP architectures to real-world problems from various disciplines, as well as open and reproducible science.