Skip to content

Experimental Design#6

Open
etontackett wants to merge 23 commits intomainfrom
Design
Open

Experimental Design#6
etontackett wants to merge 23 commits intomainfrom
Design

Conversation

@etontackett
Copy link
Copy Markdown
Collaborator

The experimental design for Ridge Regression experiments.

Description

This PR introduces the experimental design. The design outlines the questions, experimental units, treatments, and any blocking procedures, and observational measurements that will be used to compare algorithm performance.

Motivation and Context

The purpose of ridge regression is to address the issue of multicollinearity and enforce shrinkage of coefficient estimates. However, different numerical algorithms for computing ridge regression solutions can vary significantly in terms of computational cost and numerical stability depending on the structure of the problem. This experimental design establishes a systematic framework for comparing these algorithms across varying dimensional regimes, sparsity levels, and levels of regularization. The goal is to answer which ridge regression algorithm is the "best" in the sense that we want to identify which algorithms perform most reliably and efficiently under different conditions.

Types of changes

  • CI
  • Docs
  • Feature
  • Fix
  • Performance
  • Refactor
  • Style
  • Test
  • Other (use sparingly):

Checklists:

Code and Comments
If this PR includes modifications to the code base, please select all that apply.

  • My code follows the code style of this project.
  • I have updated all package dependencies (if any).
  • I have included all relevant files to realize the functionality of the PR.
  • I have exported relevant functionality (if any).

API Documentation

  • For every exported function (if any), I have included a detailed docstring.
  • I have checked the spelling and grammar of all docstring updates through an external tool.
  • I have checked that the docstring's function signature is correctly formatted and has all arguments.
  • I have checked that the docstring's list of arguments, fields, or return values match the function.
  • I have compiled the docs locally and read through all docstring updates to check for errors.

Manual Documentation

  • I have checked the spelling and grammar of all manual updates through an external tool.
  • Any code included in the docstring is tested using doc tests to ensure consistency.
  • I have compiled the docs locally and read through all manual updates to check for errors.

Testing

  • I have added unit tests to cover my changes. (For Macros, be sure to check
    @code_lowered and
    @code_typed)
  • All new and existing tests passed.
  • I have achieved sufficient code coverage.

Comment on lines +6 to +12
\hat{\boldsymbol{\beta}} =
\arg\min_{\boldsymbol{\beta}}
\left(
\| \mathbf{y} - X\boldsymbol{\beta} \|^2
+
\lambda \| \boldsymbol{\beta} \|^2
\right)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should indicate which norms you are using by using a subscript _2 for instance.

```
where $\lambda > 0$ is a regularization parameter that controls the strength of the penalty.

The purpose of ridge regression is to stabilize regression estimates where the predictors are highly correlated or the design matrix $X$ is almost singular. Ridge regression shrinks the estimated coefficient vector in a way such that the coefficient estimates minimize the sum of squared residuals subject to a constraint on the $\ell_2$ norm of the coefficient vector, $\|\boldsymbol{\beta}\|^2 \leq t$, which shrinks the least squares estimates toward the origin. This reduces the variance of the coefficient estimates and mitigates the effects of multicollinearity.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ridge Regression does not impose a constraint. It uses a penalty. This needs to be clarified and be made more precise

Comment on lines +16 to +29
The purpose of ridge regression is to stabilize regression estimates where the predictors are highly correlated or the design matrix $X$ is almost singular. Ridge regression shrinks the estimated coefficient vector in a way such that the coefficient estimates minimize the sum of squared residuals subject to a constraint on the $\ell_2$ norm of the coefficient vector, $\|\boldsymbol{\beta}\|^2 \leq t$, which shrinks the least squares estimates toward the origin. This reduces the variance of the coefficient estimates and mitigates the effects of multicollinearity.

There are many numerical algorithms available to compute ridge regression estimates including direct methods, Krylov subspace methods, gradient-based optimization, coordinate descent, and stochastic gradient descent. These algorithms differ in their computational costs and numerical stability.

The goal of this experiment is to investigate the performance of these algorithms when we vary the structure and scale of the regression problem. To do this, we consider the linear model $\mathbf{y} = X\boldsymbol{\beta} + \boldsymbol{\varepsilon}$ where the matrix ${X}$ may be constructed with varying dimensions, sparsity patterns, and conditioning properties.
# Questions
The primary goal of this experiment is to compare numerical algorithms for computing ridge regression estimates under various conditions. In particular, we aim to address the following questions:

1. How does the performance of ridge regression algorithms change as the structural and numerical properties of the regression problem vary?

2. Which ridge regression algorithm provides the best balance between numerical stability and computational cost across these problem regimes?

# Experimental Units
The experimental units are the datasets under fixed penalty weights. For each experimental unit, all treatments will be applied to the dataset. This will be done so that differences in performance can be attributed to the algorithms themselves rather than the data. Each dataset will contain a matrix ${X}$, a response vector $\mathbf{y}$, and a regularization parameter ${\lambda}$ for some specific ${\lambda}$.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unclear to me. What does, "for each experimental unit, all treatments will be applied to the dataset." mean?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to obey the 92 character line limit for this file.

\frac{\sigma_{\max}^2+\lambda}{\sigma_{\min}^2+\lambda}.
```

Because the performance of numerical algorithms is strongly influenced by the conditioning of the system they solve, the ridge penalty effectively creates regression problems with different numerical difficulty. This provides a way to assess how algorithm performance, convergence behavior, and computational cost depend on the numerical stability of the problem. In this experiment, the magnitude of $\lambda$ is selected relative to the smallest and largest singular values of $X$. A weak regularization regime corresponds to $\lambda \approx \sigma_{\min}^2$, where the ridge penalty begins to influence the smallest singular directions but the system remains moderately ill-conditioned. A moderate regularization regime corresponds to $\lambda \approx \sigma_{\min}\sigma_{\max}$, which substantially improves the conditioning of the problem by increasing the smallest eigenvalues of $X^\top X + \lambda I$. Finally, a strong regularization regime corresponds to $\lambda \approx \sigma_{\max}^2$, where the ridge penalty dominates the spectral scale of the problem and produces a well-conditioned system.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who are \sigma_{\min} and \sigma_{\max}? If my system has zero singular values, is \sigma_{\min} = 0? In this case, your condition number is not defined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants