Open
Conversation
vp314
requested changes
Mar 24, 2026
Comment on lines
+6
to
+12
| \hat{\boldsymbol{\beta}} = | ||
| \arg\min_{\boldsymbol{\beta}} | ||
| \left( | ||
| \| \mathbf{y} - X\boldsymbol{\beta} \|^2 | ||
| + | ||
| \lambda \| \boldsymbol{\beta} \|^2 | ||
| \right) |
Owner
There was a problem hiding this comment.
You should indicate which norms you are using by using a subscript _2 for instance.
| ``` | ||
| where $\lambda > 0$ is a regularization parameter that controls the strength of the penalty. | ||
|
|
||
| The purpose of ridge regression is to stabilize regression estimates where the predictors are highly correlated or the design matrix $X$ is almost singular. Ridge regression shrinks the estimated coefficient vector in a way such that the coefficient estimates minimize the sum of squared residuals subject to a constraint on the $\ell_2$ norm of the coefficient vector, $\|\boldsymbol{\beta}\|^2 \leq t$, which shrinks the least squares estimates toward the origin. This reduces the variance of the coefficient estimates and mitigates the effects of multicollinearity. |
Owner
There was a problem hiding this comment.
Ridge Regression does not impose a constraint. It uses a penalty. This needs to be clarified and be made more precise
Comment on lines
+16
to
+29
| The purpose of ridge regression is to stabilize regression estimates where the predictors are highly correlated or the design matrix $X$ is almost singular. Ridge regression shrinks the estimated coefficient vector in a way such that the coefficient estimates minimize the sum of squared residuals subject to a constraint on the $\ell_2$ norm of the coefficient vector, $\|\boldsymbol{\beta}\|^2 \leq t$, which shrinks the least squares estimates toward the origin. This reduces the variance of the coefficient estimates and mitigates the effects of multicollinearity. | ||
|
|
||
| There are many numerical algorithms available to compute ridge regression estimates including direct methods, Krylov subspace methods, gradient-based optimization, coordinate descent, and stochastic gradient descent. These algorithms differ in their computational costs and numerical stability. | ||
|
|
||
| The goal of this experiment is to investigate the performance of these algorithms when we vary the structure and scale of the regression problem. To do this, we consider the linear model $\mathbf{y} = X\boldsymbol{\beta} + \boldsymbol{\varepsilon}$ where the matrix ${X}$ may be constructed with varying dimensions, sparsity patterns, and conditioning properties. | ||
| # Questions | ||
| The primary goal of this experiment is to compare numerical algorithms for computing ridge regression estimates under various conditions. In particular, we aim to address the following questions: | ||
|
|
||
| 1. How does the performance of ridge regression algorithms change as the structural and numerical properties of the regression problem vary? | ||
|
|
||
| 2. Which ridge regression algorithm provides the best balance between numerical stability and computational cost across these problem regimes? | ||
|
|
||
| # Experimental Units | ||
| The experimental units are the datasets under fixed penalty weights. For each experimental unit, all treatments will be applied to the dataset. This will be done so that differences in performance can be attributed to the algorithms themselves rather than the data. Each dataset will contain a matrix ${X}$, a response vector $\mathbf{y}$, and a regularization parameter ${\lambda}$ for some specific ${\lambda}$. |
Owner
There was a problem hiding this comment.
This is unclear to me. What does, "for each experimental unit, all treatments will be applied to the dataset." mean?
Owner
There was a problem hiding this comment.
You need to obey the 92 character line limit for this file.
| \frac{\sigma_{\max}^2+\lambda}{\sigma_{\min}^2+\lambda}. | ||
| ``` | ||
|
|
||
| Because the performance of numerical algorithms is strongly influenced by the conditioning of the system they solve, the ridge penalty effectively creates regression problems with different numerical difficulty. This provides a way to assess how algorithm performance, convergence behavior, and computational cost depend on the numerical stability of the problem. In this experiment, the magnitude of $\lambda$ is selected relative to the smallest and largest singular values of $X$. A weak regularization regime corresponds to $\lambda \approx \sigma_{\min}^2$, where the ridge penalty begins to influence the smallest singular directions but the system remains moderately ill-conditioned. A moderate regularization regime corresponds to $\lambda \approx \sigma_{\min}\sigma_{\max}$, which substantially improves the conditioning of the problem by increasing the smallest eigenvalues of $X^\top X + \lambda I$. Finally, a strong regularization regime corresponds to $\lambda \approx \sigma_{\max}^2$, where the ridge penalty dominates the spectral scale of the problem and produces a well-conditioned system. |
Owner
There was a problem hiding this comment.
Who are \sigma_{\min} and \sigma_{\max}? If my system has zero singular values, is \sigma_{\min} = 0? In this case, your condition number is not defined.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The experimental design for Ridge Regression experiments.
Description
This PR introduces the experimental design. The design outlines the questions, experimental units, treatments, and any blocking procedures, and observational measurements that will be used to compare algorithm performance.
Motivation and Context
The purpose of ridge regression is to address the issue of multicollinearity and enforce shrinkage of coefficient estimates. However, different numerical algorithms for computing ridge regression solutions can vary significantly in terms of computational cost and numerical stability depending on the structure of the problem. This experimental design establishes a systematic framework for comparing these algorithms across varying dimensional regimes, sparsity levels, and levels of regularization. The goal is to answer which ridge regression algorithm is the "best" in the sense that we want to identify which algorithms perform most reliably and efficiently under different conditions.
Types of changes
Checklists:
Code and Comments
If this PR includes modifications to the code base, please select all that apply.
API Documentation
Manual Documentation
Testing
@code_lowered and
@code_typed)