30.09.2021 | Research & Teaching

The practical application of Autoencoder in modern dimension reduction of data



High-dimensional data can be a problem for machine learning, leading to overfitting and large models. A practical solution was developed with Autoencoder as part of a Bachelor Thesis at the DSC.

In the bachelor thesis of Kien Vinh Luc “Autoencoder for dimension reduction of data in empirical asset pricing model ”, under supervising from Prof. Dr Rolf Drechsler and M. Sc Christopher Metz from Data Science Center, and PD. Dr Christian Fieberg in the field of empirical capital market research, a new practical approach that optimizes the efficiency and improves the performance of the machine learning model by representing the original high dimensional data space in a simplified, lower-dimensional subspace but still maintain its informative pattern was presented.

The full big data explosion has provided us with more and more information about every problem in our society and industries, especially in finance. The global financial crisis and European debt crisis have highlighted the importance of understanding the rich and multivariate underlying in the financial entities, economies and markets. While it is true that the collected high dimensional data will help the machine learning model to learn more and explore patterns in generalizing the data, it also leads to an indiscriminate problem of low-quality data and input, highly complex and correlated, which results in a significant amount of redundancy in data and may expose the risk of anomalous predictability, identifiability, instability and overfitting. Therefore, it is crucial to reduce the data dimensionality to estimate how informative each feature in the dataset is and if needed, to remove it but still maintain the structure of the data.

Mr. Luc used Autoencoder (AE) in his work to reduce the dimension of a stock asset dataset. An Autoencoder learns an approximation to identify function so that the output is similar to the input. With the following property, the AE will compress the input X to a smaller latent space representation and then reconstructs the output from the latent space again, in this way we can control and minimize the loss of information as much as possible. We extract the first component of Autoencoder, called Encoder, then integrate it with the empirical asset pricing model to create a new model. By doing so, coming input will be first compressed to a smaller dimension, before feeding into the empirical asset pricing model to perform the machine learning training and prediction as usual. The new transform input in the new proposed structure is the extraction of the most valuable information while eliminating the undesirable redundancy that reduces the efficiency of the model, resulting in improving the prediction performance and also computational time.

The research is a joined work with AGRA Group of Faculty 3, empirical capital market research of Faculty 7 and the Data Science Center. It is submitted as bachelor thesis for the University of Bremen. This new approach of this thesis may be useful and can be applied to solve many types of problems in other industries. The results will be processed for research and might be used in future publications.

We congratulate Mr. Luc for passing his thesis and wish him continued success in his private and professional career.

Author: Christoph Metz
Are you interested in writing a thesis with us?

Please contact:

Dr. Lena Steinmann
DSC Koordinatorin
+49 (421) 218 - 63941
lena.steinmann@uni-bremen.de



« back