Skip to content

hyperpolymath/statistease

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

StatistEase — Neurosymbolic Statistical Analysis Assistant

The Problem

LLMs make statistical mistakes. They fabricate means, invent p-values, hallucinate confidence intervals, and present plausible-sounding nonsense as fact. We call these outputs mollocks — they look right, feel right, and are wrong.

StatistEase exists to stop this.

The Solution

StatistEase is a Kautz Type 1 neurosymbolic statistical analysis assistant:

  • Neural (LLM): Understands your question in natural language. Routes it to the correct statistical function. Explains the result in plain English.

  • Symbolic (Julia): Performs ALL mathematical computation. Every number comes from a verified, deterministic Julia function. Zero neural inference in the computation path.

You: "Is there a significant difference between these two groups?"
     │
     ▼ (Natural Language Understanding — neural)
LLM routes to: t_test_independent(group1, group2)
     │
     ▼ (Symbolic Computation — Julia)
Julia computes: t=2.847, df=38, p=0.007, Cohen's d=0.90
     │
     ▼ (Natural Language Generation — neural)
LLM explains: "Yes, there is a statistically significant difference
               (t(38)=2.847, p=.007) with a large effect size (d=0.90)."

Every number in that response came from Julia. The LLM touched none of them.

Important

HARD NOTICE — MOLLOCK WARNING

This software enforces a strict neural-symbolic boundary. No statistical value is ever produced by neural inference. If you see a number in a StatistEase response, it was computed by Julia. This is not a preference — it is a hard architectural invariant.

Features

Statistical Functions (17 Modules)

Module Functions

Descriptive

Mean, median, mode, SD, skewness, kurtosis, quartiles, CI

Inferential

t-tests (independent, paired, one-sample), ANOVA, chi-square

Correlation & Regression

Pearson, Spearman, simple/multiple regression with VIF

Non-parametric

Mann-Whitney U, Wilcoxon signed-rank, Kruskal-Wallis, PERMANOVA

Effect Sizes

Cohen’s d, r, eta², Hedges' g, OR, NNT, CL effect size

Power Analysis

Power for t-tests, sample size for means/proportions/regression

Bayesian

Prior updating, Bayes factor (BIC), credible intervals (ETI + HDI)

Fuzzy Logic

Membership functions, fuzzy AND/OR/NOT, multi-rule inference

Dempster-Shafer

Evidence combination with conflict detection

Causality

Granger causality (regression-based F-test)

Estimation

James-Stein shrinkage estimator

Reliability

Cronbach’s alpha, McDonald’s omega

Validity

Content (Lawshe CVR), convergent/discriminant (AVE), criterion

Measurement

ICC (6 types), SEM, item analysis, sensitivity/specificity, PRE

Qualitative

Cohen’s/Fleiss' kappa, thematic saturation detection

Assumptions

Normality (Jarque-Bera), Levene’s test for homogeneity

Sampling

Design effect, margin of error with FPC, missing data analysis

Data Quality Pathway

Raw Input → Detection → Validation → Cleansing → Normalization → Analysis → Output
  • Detection: Automatic data type (nominal/ordinal/interval/ratio) and file format detection

  • Validation: Range checks, variance verification, infinity/NaN screening

  • Cleansing: Outlier detection (IQR/z-score/modified z-score), missing value handling, deduplication

  • Normalization: Z-score, min-max, log transforms; tabular normalization (1NF→3NF) checks

Output Formats

  • Unicode box-drawing tables (terminal)

  • ASCII histogram, box plot, scatter plot, bar chart

  • CSV and JSON export

  • Text reports with provenance stamps

Requirements

  • Julia 1.10+ with packages: Statistics, StatsBase, Distributions, DataFrames, CSV, JSON3, HTTP

  • LM Studio running locally (default: localhost:1234) with a model that supports function calling

Quick Start

cd statistease
julia --project=. -e 'using Pkg; Pkg.instantiate()'
julia --project=. -e 'using StatistEase; main()'

Or run without LLM (offline examples):

julia --project=. -e 'using StatistEase; run_examples()'

Architecture

This is a Kautz Type 1 neurosymbolic system — neural and symbolic components operate side-by-side with a defined, auditable interface boundary.

The boundary is src/tools/executor.jl:execute_tool(). Everything above it is neural (language understanding). Everything below it is symbolic (Julia computation). No statistical value crosses this boundary in the upward direction without having been computed by a verified Julia function.

License

PMPL-1.0-or-later (Palimpsest License)

Copyright (c) 2026 Jonathan D.A. Jewell (hyperpolymath) <j.d.a.jewell@open.ac.uk>

About

Neurosymbolic statistical analysis assistant — Julia computes, LLMs route, nothing is fabricated

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors