Abstracts

Title		Supervisor	Degree
Improvement of machine leaning algorithms performance by data set dimensionality reduction using cellular automata	Alexey Kuchvalskiy	Dmitry Pavlyuk	Master	2024
Faculty: Engineering Faculty Study programme: Computer Sciences More...
Unsupervised machine learning approach for hierarchical graph-based representation of natural language text collections.	Jevgenijs Bodrenko	Irina Jackiva	Master	2024
Faculty: Engineering Faculty Study programme: Computer Sciences More...
Boosting Algorithms for Credit Card Fraud Detection Across Varied Datasets	Justs Vīdušs	Nadežda Spiridovska	Master	2024
Faculty: Engineering Faculty Study programme: Computer Sciences More...
Enhancement Strategies for Retrieval-Augmented Generation Systems	Sigita Lapiņa	Dmitry Pavlyuk	Master	2024
Faculty: Engineering Faculty Study programme: Computer Sciences More...
Apply a Machine Learning Model to Mitigate Bias in the Future AI-based Recruitment	Ērika Todjēre	Jeļena Kijonoka	Master	2024
Faculty: Engineering Faculty Study programme: Computer Sciences More...

Improvement of machine leaning algorithms performance by data set dimensionality reduction using cellular automata

A significant challenge in Machine Learning is dealing with high-dimensional data. Complexity knowns as the "curse of dimensionality" results in deterioration оf Machine Learning algorithms performance as the dimensionality and dataset size increases. Cellular automata are a dynamical discrete computational system with mathematical functions knows as rules that result in complex global behaviour. We used one-dimensional elementary cellular automata as a tool for dataset size. Model variables were selected for initial status vector generation and its further transformation to format that is suitable for cellular automata rules application known in cellular automata theory as configuration. Then model iterated through all possible cellular automata rules and various epochs variations were applied. Model performance for reduced dataset was compared with benchmark results of original dataset after standard dimensionality reduction technics used. It was concluded that applied cellular automata rules can be used as alternative methods for dataset size reduction without deteriorating model performance.

Author: Alexey Kuchvalskiy

Supervisor: Dmitry Pavlyuk

Degree: Master

Year: 2024

Work Language: English

Study programme: Computer Sciences

More...

Unsupervised machine learning approach for hierarchical graph-based representation of natural language text collections.

Managing big data efficiently is important in various fields, much so when data consists of human-written documents. Recent advances in Natural Language Processing (NLP), particularly LLMs, allowed to solve many task in this domain, despite the high demand for labelled data, compute resources and specialized skills.To tackle these limitations, current study proposed a NLP pipeline to identify topic hierarchies in collections of scientific publications. The work focused on evaluation of available unsupervised machine learning methods and quality metrics in NLP, and development of visualization techniques to build a prototype of the pipeline.Proposed solution is based on the hARTM approach optimized for interpretability. It demonstrated the capacity to infer human-interpretable topic hierarchies from collections of scientific texts and construct meaningful hierarchy of topic-based document representations. The visualization approaches rely on MDS to present inter-document similarity and Sankey plots to show document cluster relatedness within topic hierarchy.Utility was demonstrated on two datasets, focusing on interpretability and meaning of the topic hierarchy and associated topic definitions. Potential application areas include personal education and scientific writing.

Author: Jevgenijs Bodrenko

Supervisor: Irina Jackiva

Degree: Master

Year: 2024

Work Language: English

Study programme: Computer Sciences

More...

Boosting Algorithms for Credit Card Fraud Detection Across Varied Datasets

Manual reviews and rule-based systems, as well as data mining techniques such as clustering and classification algorithms, are crucial for identifying credit card fraud since they help identify fraudulent transactions. Despite obstacles in gathering training data, more data has lately been available, however, a complete comparison of current machine learning approaches has yet to be conducted. Algorithms like XGBoost, AdaBoost, and Gradient Boosting Machine frequently outperform older approaches. This study compares boosting algorithms to traditional approaches using three different credit card transaction datasets: synthetic, balanced with 50% fraudulent transactions, and very unbalanced with only 0.17% fraudulent transactions. The genuine transaction datasets contained 28 anonymized parameters such as time and location. Each method was evaluated using the F1 score, accuracy, precision, and recall. This study makes recommendations on which algorithms to use in real-world scenarios, giving important insights for future research and practical use in credit card fraud detection.

Author: Justs Vīdušs

Supervisor: Nadežda Spiridovska

Degree: Master

Year: 2024

Work Language: English

Study programme: Computer Sciences

More...

Enhancement Strategies for Retrieval-Augmented Generation Systems

This thesis systematically explores the enhancement of Retrieval-Augmented Generation (RAG) systems within Large Language Models, emphasizing optimization of retrieval parameters and generation accuracy. We investigate optimal configurations in RAG systems, including chunk size and overlap percentages, top-k selection, query transformations, different retrieval methods, different LLMs, namely GPT-3.5-Turbo and GPT-4, discovering that a chunk size of 500 tokens generally offers the best performance. Vector search using cosine similarity emerges as the most effective retrieval method, significantly enhancing both context precision and recall across various tasks and knowledge bases. Experimentation within the CRUD-RAG framework demonstrates its applicability in diverse tasks from content creation to knowledge refinement. Our findings indicate that enhancements in retrieval settings can markedly improve the performance of RAG systems, making them more efficient and adaptable for complex information synthesis and retrieval tasks. These results affirm the potential of systematic enhancements to improve AI-driven language models in practical applications, contributing significant insights and practical approaches to the evolving landscape of RAG system research.

Author: Sigita Lapiņa

Supervisor: Dmitry Pavlyuk

Degree: Master

Year: 2024

Work Language: English

Study programme: Computer Sciences

More...

Apply a Machine Learning Model to Mitigate Bias in the Future AI-based Recruitment

In the contemporary landscape of Human Resources, the integration of artificial intelligence presents both opportunities and challenges, especially in the field of recruitment encompassing all stages of the process, from candidate sourcing to final selection. However, this integration is not without its challenges. Biased data, originating from historical data or societal prejudices, can present a significant obstacle, potentially perpetuating discriminatory practices. The study "Apply a Machine Learning Model to Mitigate Bias in the Future AI-based Recruitment" aims to comprehensively analyze existing biases from both human and artificial intelligence perspectives within the recruitment process. In its framework, answers to the research questions are sought: what are the existing biases in the recruitment process, both explicit and implicit, and how can biases in the recruitment process be effectively mitigated or eliminated through modeling techniques in future AI-based recruitments systems. Through a data-driven approach and the development of machine learning models, will be discover what kind of biases exist in the selection process and how to mitigate them.

Author: Ērika Todjēre

Supervisor: Jeļena Kijonoka

Degree: Master

Year: 2024

Work Language: English

Study programme: Computer Sciences

More...

DEGREE

DEGREE

PROGRAMME

PROGRAMME

FACULTY

FACULTY

YEAR

YEAR

LANGUAGE

LANGUAGE

KEYWORDS

KEYWORDS

Improvement of machine leaning algorithms performance by data set dimensionality reduction using cellular automata

Unsupervised machine learning approach for hierarchical graph-based representation of natural language text collections.

Boosting Algorithms for Credit Card Fraud Detection Across Varied Datasets

Enhancement Strategies for Retrieval-Augmented Generation Systems

Apply a Machine Learning Model to Mitigate Bias in the Future AI-based Recruitment

DEGREE

DEGREE

PROGRAMME

PROGRAMME

FACULTY

FACULTY

YEAR

YEAR

LIST OF YEARS

LANGUAGE

LANGUAGE

KEYWORDS

KEYWORDS

Improvement of machine leaning algorithms performance by data set dimensionality reduction using cellular automata

Unsupervised machine learning approach for hierarchical graph-based representation of natural language text collections.

Boosting Algorithms for Credit Card Fraud Detection Across Varied Datasets

Enhancement Strategies for Retrieval-Augmented Generation Systems

Apply a Machine Learning Model to Mitigate Bias in the Future AI-based Recruitment