Abstracts

Title		Supervisor	Degree
AI-driven Voice Recognition: Model Development and Application	Aleksejs Ņikiforovs	Dmitry Pavlyuk	Bachelor	2024
Faculty: Engineering Faculty Study programme: Computer Science More...
Improvement of machine leaning algorithms performance by data set dimensionality reduction using cellular automata	Alexey Kuchvalskiy	Dmitry Pavlyuk	Master	2024
Faculty: Engineering Faculty Study programme: Computer Sciences More...
COMPARATIVE ANALYSIS OF LLM-BASED APPROACHES FOR SQL GENERATION	Maksim Ilin	Dmitry Pavlyuk	Master	2024
Faculty: Engineering Faculty Study programme: Computer Sciences More...
Assessing the Viability of Natural Language Processing Applications within an Electronic Checklist System for Freight Forwarders: Rule-based Information Extraction from Cargo Descriptions.	Nikita Mickevičs	Dmitry Pavlyuk	Professional Bachelor	2024
Faculty: Transport and Management Faculty Study programme: Transport and Business Logistics More...
Enhancement Strategies for Retrieval-Augmented Generation Systems	Sigita Lapiņa	Dmitry Pavlyuk	Master	2024
Faculty: Engineering Faculty Study programme: Computer Sciences More...

AI-driven Voice Recognition: Model Development and Application

In the course of the work is decided to develop a speech recognition model with the application with the system assistant capabilities. The result of the author's work is a model that is capable of speech recognition on a limited set of words and the application that will be the prototype of the concept. The software is implemented using Visual Studio Code/Jupyter, Python programming language with big framework such as Keras. The developed software fully meets the requirements and is ready for operation.

Author: Aleksejs Ņikiforovs

Supervisor: Dmitry Pavlyuk

Degree: Bachelor

Year: 2024

Work Language: English

Study programme: Computer Science

More...

Improvement of machine leaning algorithms performance by data set dimensionality reduction using cellular automata

A significant challenge in Machine Learning is dealing with high-dimensional data. Complexity knowns as the "curse of dimensionality" results in deterioration оf Machine Learning algorithms performance as the dimensionality and dataset size increases. Cellular automata are a dynamical discrete computational system with mathematical functions knows as rules that result in complex global behaviour. We used one-dimensional elementary cellular automata as a tool for dataset size. Model variables were selected for initial status vector generation and its further transformation to format that is suitable for cellular automata rules application known in cellular automata theory as configuration. Then model iterated through all possible cellular automata rules and various epochs variations were applied. Model performance for reduced dataset was compared with benchmark results of original dataset after standard dimensionality reduction technics used. It was concluded that applied cellular automata rules can be used as alternative methods for dataset size reduction without deteriorating model performance.

Author: Alexey Kuchvalskiy

Supervisor: Dmitry Pavlyuk

Degree: Master

Year: 2024

Work Language: English

Study programme: Computer Sciences

More...

COMPARATIVE ANALYSIS OF LLM-BASED APPROACHES FOR SQL GENERATION

The rapid development of Large Language Models has unlocked opportunities for restructuring software development processes in general as well as in such cases as converting natural language into SQL queries. This study seeks to experimentally evaluate the effects of four LLM-based methods on the efficiency and quality of SQL generation. Evaluation is being held based on following metrics: Correctness, Completeness and Consistency. Studied LLM-based SQL generation methods include Specific LLMs tailored for SQL code generation like SQL Coder frameworks for generating SQL code (Vanna.ai, 2023; Llamaindex, 2023) and Multi agent collaborative networks for transforming language into SQL.The research utilizes a mix of literature review case studies and simulations. It offers a comprehensive review of the advancements in LLM-driven SQL generation encompassing concepts, technologies, methodologies, strengths, limitations, and ethical considerations.This research successfully bridges the gap between theoretical foundations and practical application of AI-augmented approaches while promoting the integration of LLM-based SQL generation, into automated software development processes.

Author: Maksim Ilin

Supervisor: Dmitry Pavlyuk

Degree: Master

Year: 2024

Work Language: English

Study programme: Computer Sciences

More...

Assessing the Viability of Natural Language Processing Applications within an Electronic Checklist
System for Freight Forwarders: Rule-based Information Extraction from Cargo Descriptions.

This study investigates the application of Natural Language Processing (NLP) within electronic checklist systems to enhance cargo description and securing practices for freight forwarders. The logistics industry faces significant challenges due to complex and varied legislation and the need for autonomous validation tools for cargo securing. This research aims to develop a rule-based Named Entity Recognition (NER) model to standardize and automate the extraction of entities from cargo descriptions. Key components of this study include the development of an entity extraction mechanism using regular expressions and standardized codes. The research demonstrates the potential of NLP solutions to generate precise, dynamic checklists from detailed cargo descriptions, ensuring that all pertinent tasks are covered. The developed NER model's effectiveness is evaluated through a series of experiments, showcasing high precision, recall, and F1 scores, thus highlighting its practical applicability in real-world logistics operations. The findings underscore the importance of standardizing cargo-related information to facilitate the broader adoption of automated NLP solutions in the logistics industry.

Author: Nikita Mickevičs

Supervisor: Dmitry Pavlyuk

Degree: Professional Bachelor

Year: 2024

Work Language: English

Study programme: Transport and Business Logistics

More...

Enhancement Strategies for Retrieval-Augmented Generation Systems

This thesis systematically explores the enhancement of Retrieval-Augmented Generation (RAG) systems within Large Language Models, emphasizing optimization of retrieval parameters and generation accuracy. We investigate optimal configurations in RAG systems, including chunk size and overlap percentages, top-k selection, query transformations, different retrieval methods, different LLMs, namely GPT-3.5-Turbo and GPT-4, discovering that a chunk size of 500 tokens generally offers the best performance. Vector search using cosine similarity emerges as the most effective retrieval method, significantly enhancing both context precision and recall across various tasks and knowledge bases. Experimentation within the CRUD-RAG framework demonstrates its applicability in diverse tasks from content creation to knowledge refinement. Our findings indicate that enhancements in retrieval settings can markedly improve the performance of RAG systems, making them more efficient and adaptable for complex information synthesis and retrieval tasks. These results affirm the potential of systematic enhancements to improve AI-driven language models in practical applications, contributing significant insights and practical approaches to the evolving landscape of RAG system research.

Author: Sigita Lapiņa

Supervisor: Dmitry Pavlyuk

Degree: Master

Year: 2024

Work Language: English

Study programme: Computer Sciences

More...

DEGREE

DEGREE

PROGRAMME

PROGRAMME

FACULTY

FACULTY

YEAR

YEAR

LANGUAGE

LANGUAGE

AI-driven Voice Recognition: Model Development and Application

Improvement of machine leaning algorithms performance by data set dimensionality reduction using cellular automata

COMPARATIVE ANALYSIS OF LLM-BASED APPROACHES FOR SQL GENERATION

Assessing the Viability of Natural Language Processing Applications within an Electronic Checklist
System for Freight Forwarders: Rule-based Information Extraction from Cargo Descriptions.

Enhancement Strategies for Retrieval-Augmented Generation Systems

DEGREE

DEGREE

PROGRAMME

PROGRAMME

FACULTY

FACULTY

YEAR

YEAR

LIST OF YEARS

LANGUAGE

LANGUAGE

AI-driven Voice Recognition: Model Development and Application

Improvement of machine leaning algorithms performance by data set dimensionality reduction using cellular automata

COMPARATIVE ANALYSIS OF LLM-BASED APPROACHES FOR SQL GENERATION

Assessing the Viability of Natural Language Processing Applications within an Electronic ChecklistSystem for Freight Forwarders: Rule-based Information Extraction from Cargo Descriptions.

Enhancement Strategies for Retrieval-Augmented Generation Systems

Assessing the Viability of Natural Language Processing Applications within an Electronic Checklist
System for Freight Forwarders: Rule-based Information Extraction from Cargo Descriptions.