Large-Language Model (LLM) part of Talk2PowerSystem (Talk2PowerSystem_LLM) is a core component of the Talk2PowerSystem project, providing all the necessary coding and scripting to support the integration and operation of a Large-Language Model (LLM). This project focuses on enabling robust LLM functionalities, including data preprocessing, model training, inference, and seamless integration with other parts of the Talk2PowerSystem ecosystem.
-
Data Preprocessing: Scripts to clean, normalize, and format data for LLM training.
-
Model Training: Pipelines and utilities for fine-tuning and training LLM models.
-
Inference Engine: Code for running real-time queries and generating model predictions.
-
System Integration: Tools and interfaces to connect the LLM with other components of the Talk2PowerSystem project.
-
Testing and Evaluation: Automated tests and performance evaluation scripts to ensure model reliability and accuracy.
The repository is organized as follows:
-
config/- Configuration files for model parameters and environment settings. -
docker/- Dockerfile for the FastAPI chatbot application. -
docs/- Documentation, guides, and technical notes. -
evaluation_results/- Directory, which holds the evaluation results of the system. -
helm-chart/- Directory, which holds resources for easier deployment on Kubernetes environments. -
src/- Main source code including training, inference, and integration scripts. -
tests/- Unit and integration tests for various modules.
-
You should install conda.
minicondawill suffice.
To set up the project locally, follow these steps:
-
Clone the repository:
git clone https://github.com/statnett/Talk2PowerSystem_LLM.git
-
Create a conda environment and install dependencies
conda create --name Talk2PowerSystemLLM --file conda-linux-64.lock conda activate Talk2PowerSystemLLM poetry install
conda activate Talk2PowerSystemLLM
poetry install --with test
poetry run pytest --cov=talk2powersystemllm --cov-report=term-missing tests/unit_tests/The acceptance tests require a valid GraphDB license to run in CI.
On GitHub, this is handled via the GRAPHDB_LICENSE GitHub Secret.
Current License Expiry: 2027-03-20
Since the license is a binary file, it must be stored as a Base64 encoded string to prevent corruption during transport. When the license expires, follow these steps:
-
Encode the new binary file
Run this command on your local machine to generate the encoded string:
# Linux/GNU (standard) base64 -w 0 /path/to/new/graphdb.license # macOS (BSD) base64 -i /path/to/new/graphdb.license
-
Update GitHub Secrets
-
Navigate to the repository menu:Settings[Secrets and variables > Actions].
-
Locate the
GRAPHDB_LICENSEsecret and click the Edit (pencil) icon. -
Paste the entire output from the command above into the value field and save.
-
When running acceptance tests locally, the environment does not use the GitHub Secret. Instead, you must pass the absolute path to the license file on your machine via the LICENSE_PATH environment variable:
bash ./docker/generate-manifest.sh
docker buildx build --file docker/Dockerfile --tag talk2powersystem .
docker buildx build --file tests/acceptance_tests/docker-compose/DockerfileAcceptanceTests --tag talk2powersystem-acceptance-tests .
docker buildx build --file tests/acceptance_tests/docker-compose/DockerfileGraphDB --tag graphdb .
LLM_SEED=1 LICENSE_PATH=/path/to/graphdb.license docker compose -f tests/acceptance_tests/docker-compose/docker-compose.yaml run --rm talk2powersystem-acceptance-tests poetry run pytest tests/acceptance_tests/
LLM_USE_RESPONSES_API=true LICENSE_PATH=/path/to/graphdb.license docker compose -f tests/acceptance_tests/docker-compose/docker-compose.yaml run --rm talk2powersystem-acceptance-tests poetry run pytest tests/acceptance_tests/
docker compose -f tests/acceptance_tests/docker-compose/docker-compose.yaml down -v --remove-orphans