LV EN

DEGREE

PROGRAMME

FACULTY

YEAR

LANGUAGE

KEYWORDS

Title Supervisor Degree
Master 2024
Faculty: Engineering Faculty

Study programme: Computer Sciences

More...

Master 2024
Faculty: Engineering Faculty

Study programme: Computer Sciences

More...

COMPARATIVE ANALYSIS OF LLM-BASED APPROACHES FOR SQL GENERATION

The rapid development of Large Language Models has unlocked opportunities for restructuring software development processes in general as well as in such cases as converting natural language into SQL queries. This study seeks to experimentally evaluate the effects of four LLM-based methods on the efficiency and quality of SQL generation. Evaluation is being held based on following metrics: Correctness, Completeness and Consistency. Studied LLM-based SQL generation methods include Specific LLMs tailored for SQL code generation like SQL Coder frameworks for generating SQL code (Vanna.ai, 2023; Llamaindex, 2023) and Multi agent collaborative networks for transforming language into SQL.The research utilizes a mix of literature review case studies and simulations. It offers a comprehensive review of the advancements in LLM-driven SQL generation encompassing concepts, technologies, methodologies, strengths, limitations, and ethical considerations.This research successfully bridges the gap between theoretical foundations and practical application of AI-augmented approaches while promoting the integration of LLM-based SQL generation, into automated software development processes.

Author: Maksim Ilin

Supervisor: Dmitry Pavlyuk

Degree: Master

Year: 2024

Work Language: English

Study programme: Computer Sciences

More...


Enhancement Strategies for Retrieval-Augmented Generation Systems

This thesis systematically explores the enhancement of Retrieval-Augmented Generation (RAG) systems within Large Language Models, emphasizing optimization of retrieval parameters and generation accuracy. We investigate optimal configurations in RAG systems, including chunk size and overlap percentages, top-k selection, query transformations, different retrieval methods, different LLMs, namely GPT-3.5-Turbo and GPT-4, discovering that a chunk size of 500 tokens generally offers the best performance. Vector search using cosine similarity emerges as the most effective retrieval method, significantly enhancing both context precision and recall across various tasks and knowledge bases. Experimentation within the CRUD-RAG framework demonstrates its applicability in diverse tasks from content creation to knowledge refinement. Our findings indicate that enhancements in retrieval settings can markedly improve the performance of RAG systems, making them more efficient and adaptable for complex information synthesis and retrieval tasks. These results affirm the potential of systematic enhancements to improve AI-driven language models in practical applications, contributing significant insights and practical approaches to the evolving landscape of RAG system research.

Author: Sigita Lapiņa

Supervisor: Dmitry Pavlyuk

Degree: Master

Year: 2024

Work Language: English

Study programme: Computer Sciences

More...

Table View
Text View