LV EN

DEGREE

PROGRAMME

FACULTY

YEAR

LANGUAGE

KEYWORDS

Unsupervised machine learning approach for hierarchical graph-based representation of natural language text collections.

Managing big data efficiently is important in various fields, much so when data consists of human-written documents. Recent advances in Natural Language Processing (NLP), particularly LLMs, allowed to solve many task in this domain, despite the high demand for labelled data, compute resources and specialized skills.To tackle these limitations, current study proposed a NLP pipeline to identify topic hierarchies in collections of scientific publications. The work focused on evaluation of available unsupervised machine learning methods and quality metrics in NLP, and development of visualization techniques to build a prototype of the pipeline.Proposed solution is based on the hARTM approach optimized for interpretability. It demonstrated the capacity to infer human-interpretable topic hierarchies from collections of scientific texts and construct meaningful hierarchy of topic-based document representations. The visualization approaches rely on MDS to present inter-document similarity and Sankey plots to show document cluster relatedness within topic hierarchy.Utility was demonstrated on two datasets, focusing on interpretability and meaning of the topic hierarchy and associated topic definitions. Potential application areas include personal education and scientific writing.

Author: Jevgenijs Bodrenko

Supervisor: Irina Jackiva

Degree: Master

Year: 2024

Work Language: English

Study programme: Computer Sciences

More...


COMPARATIVE ANALYSIS OF LLM-BASED APPROACHES FOR SQL GENERATION

The rapid development of Large Language Models has unlocked opportunities for restructuring software development processes in general as well as in such cases as converting natural language into SQL queries. This study seeks to experimentally evaluate the effects of four LLM-based methods on the efficiency and quality of SQL generation. Evaluation is being held based on following metrics: Correctness, Completeness and Consistency. Studied LLM-based SQL generation methods include Specific LLMs tailored for SQL code generation like SQL Coder frameworks for generating SQL code (Vanna.ai, 2023; Llamaindex, 2023) and Multi agent collaborative networks for transforming language into SQL.The research utilizes a mix of literature review case studies and simulations. It offers a comprehensive review of the advancements in LLM-driven SQL generation encompassing concepts, technologies, methodologies, strengths, limitations, and ethical considerations.This research successfully bridges the gap between theoretical foundations and practical application of AI-augmented approaches while promoting the integration of LLM-based SQL generation, into automated software development processes.

Author: Maksim Ilin

Supervisor: Dmitry Pavlyuk

Degree: Master

Year: 2024

Work Language: English

Study programme: Computer Sciences

More...


Assessing the Viability of Natural Language Processing Applications within an Electronic Checklist
System for Freight Forwarders: Rule-based Information Extraction from Cargo Descriptions.

This study investigates the application of Natural Language Processing (NLP) within electronic checklist systems to enhance cargo description and securing practices for freight forwarders. The logistics industry faces significant challenges due to complex and varied legislation and the need for autonomous validation tools for cargo securing. This research aims to develop a rule-based Named Entity Recognition (NER) model to standardize and automate the extraction of entities from cargo descriptions. Key components of this study include the development of an entity extraction mechanism using regular expressions and standardized codes. The research demonstrates the potential of NLP solutions to generate precise, dynamic checklists from detailed cargo descriptions, ensuring that all pertinent tasks are covered. The developed NER model's effectiveness is evaluated through a series of experiments, showcasing high precision, recall, and F1 scores, thus highlighting its practical applicability in real-world logistics operations. The findings underscore the importance of standardizing cargo-related information to facilitate the broader adoption of automated NLP solutions in the logistics industry.

Author: Nikita Mickevičs

Supervisor: Dmitry Pavlyuk

Degree: Professional Bachelor

Year: 2024

Work Language: English

Table View
Text View