GEP 8 — Refactor Piecewise Polynomials#
Author |
|
Status |
Draft |
Type |
Standards Track |
Created |
2025-01-20 |
Resolution |
Abstract#
This GEP proposes refactoring the piecewise polynomial specification format to use interval notation inspired by the portion library. The new format will be more intuitive, eliminate the confusing “k intervals with k-1 cutoffs” pattern, and make boundary conditions (open/closed) explicit.
Motivation and Scope#
The current piecewise polynomial parameter format has several usability problems:
Confusing interval/cutoff relationship: Users must specify k numbered intervals (0, 1, 2, …) with k-1 internal thresholds, plus explicit
lower_threshold: -infandupper_threshold: infon the boundary intervals. This mental model is error-prone.Implicit boundary conditions: It’s unclear whether thresholds are inclusive or exclusive. For example, if interval 0 has
upper_threshold: 100and interval 1 starts atlower_threshold: 100, which interval does exactly 100 belong to?Verbose specification: Each interval requires manual numbering and redundant threshold specification (the upper threshold of interval k equals the lower threshold of interval k+1).
Hard to read and maintain: The numbered intervals obscure the actual policy structure. Compare reading “interval 3 starts at 45” versus “the interval [45, 55) has value X”.
Forced coverage of irrelevant domains: The current format requires specifying behavior for the entire real line, even when parameters are only meaningful for a subset (e.g., non-negative values for income or age).
Unintuitive internal array shapes: The underlying implementation uses a
ratesarray with shape(n_coefficients, n_intervals). This transposed layout is counter-intuitive compared to standard linear algebra conventions and makes manual inspection or construction of these arrays error-prone (see ttsim#5).
Scope: This GEP covers the YAML parameter format and the internal representation
used by piecewise_polynomial(). It implies updating piecewise_polynomial() to
support partial domains (returning NaN outside). It preserves the existing
mathematical evaluation logic (polynomials evaluated on local coordinates relative to
the interval start).
Usage and Impact#
Current Format (Before)#
parameter_behindertenpauschbetrag:
type: piecewise_constant
2021-01-01:
0:
lower_threshold: -inf
intercept_at_lower_threshold: 0
1:
lower_threshold: 20
intercept_at_lower_threshold: 384
2:
lower_threshold: 30
intercept_at_lower_threshold: 620
# ... more intervals ...
9:
lower_threshold: 100
upper_threshold: inf
intercept_at_lower_threshold: 2840
Proposed Format (After)#
parameter_behindertenpauschbetrag:
type: piecewise_constant
2021-01-01:
reference: Art. 1 G. v. 09.12.2020 BGBL. I S. 2770.
intervals:
- interval: "[0, 20)"
intercept: 0
- interval: "[20, 30)"
intercept: 384
- interval: "[30, 40)"
intercept: 620
# ... more intervals ...
- interval: "[100, inf)"
intercept: 2840
Note: The domain starts at 0 rather than (-inf, ...) since disability percentages
(Grad der Behinderung) are non-negative. Values outside the defined domain return NaN.
Piecewise Linear Example#
parameter_solidaritätszuschlag:
type: piecewise_linear
2021-01-01:
reference: Artikel 1 G. v. 10.12.2019 BGBl. I S. 2115.
intervals:
- interval: "[0, 16956)"
intercept: 0
slope: 0
- interval: "[16956, 31528)"
# intercept is optional if continuous from previous interval
slope: 0.119
- interval: "[31528, inf)"
# intercept is optional if continuous from previous interval
slope: 0.055
updates_previous Example#
When only some coefficients change between dates, updates_previous avoids restating
the entire definition. Each interval listed in the update must have bounds that exactly
match one of the base entry’s intervals. Only the specified coefficients are replaced;
all other coefficients and any intervals not listed in the update are carried over
unchanged. The interval structure (bounds and ordering) is never modified by an update.
parameter_solidaritätszuschlag:
type: piecewise_linear
2021-01-01:
reference: Artikel 1 G. v. 10.12.2019 BGBl. I S. 2115.
intervals:
- interval: "[0, 16956)"
intercept: 0
slope: 0
- interval: "[16956, 31528)"
slope: 0.119
- interval: "[31528, inf)"
slope: 0.055
2023-01-01:
updates_previous: true
reference: Art. 4 G. v. 08.12.2022 BGBl. I S. 2230.
intervals:
- interval: "[16956, 31528)"
slope: 0.11
Here, only the second interval’s slope changes from 0.119 to 0.11. The interval bounds
[16956, 31528) exactly match the base entry. The first and third intervals are carried
over unchanged, yielding a resolved entry with the same three intervals and the same
bounds as before. An error is raised if an update interval’s bounds do not match any
base interval.
Benefits#
Self-documenting: The interval
[20, 30)immediately shows the range and boundary conditionsNo manual numbering: Intervals are keyed by their range, not arbitrary indices
Explicit boundaries:
[means closed (inclusive),(means open (exclusive)Natural domains: Parameters only need to cover their meaningful range; queries outside return NaN
Validation: The portion library can validate that intervals are contiguous without gaps or overlaps within the defined domain
Backward Compatibility#
This is a breaking change for parameter files. Migration requires:
Converting existing YAML files to the new format
If intercepts are omitted in the new format, they will be calculated automatically to ensure continuity, preserving the behavior of the current implementation.
The Python API (piecewise_polynomial()) will remain unchanged in signature, but its
behavior will change to return NaN for out-of-domain inputs.
Detailed Description#
Interval Syntax#
The interval syntax follows mathematical convention:
Syntax |
Meaning |
|---|---|
|
Closed interval: a ≤ x ≤ b |
|
Open interval: a < x < b |
|
Closed-open: a ≤ x < b |
|
Open-closed: a < x ≤ b |
Special values:
-inffor negative infinityinffor positive infinityInfinity bounds must always be open, following standard mathematical convention (e.g.,
(-inf, 0)or[100, inf)). Writing[-inforinf]will result in a validation error.
Parameter Structure and Mathematical Evaluation#
The polynomials are evaluated using local coordinates relative to the lower bound of the interval. For an input \(x\) falling into an interval \([a, b)\), the value is calculated as:
Where the coefficients correspond to the YAML keys as follows:
YAML Key |
Symbol |
Meaning |
|---|---|---|
|
\(c_0\) |
Value at lower bound (\(f(a)\)) |
|
\(c_1\) |
First derivative at lower bound (\(f'(a)\)) |
|
\(c_2\) |
Coefficient of \(x^2\) (equals \(\frac{1}{2}f''(a)\)) |
|
\(c_3\) |
Coefficient of \(x^3\) (equals \(\frac{1}{6}f'''(a)\)) |
Note on Intervals starting at -Infinity: For intervals of the form (-inf, b), the
lower bound \(a\) is undefined. In this case, the implementation treats the coordinate
term \((x-a)\) as \(0\). Consequently, such intervals must be constant (only intercept
is used; slope, quadratic, etc. have no effect). This matches the existing behavior.
Parameter Examples#
Each list item under intervals has a required interval key and optional coefficient
keys. Metadata (reference, note) belongs on the date entry mapping, not on
individual interval items (see GEP 3).
For piecewise_constant:
intervals:
- interval: "[a, b)"
intercept: <number>
For piecewise_linear:
intervals:
- interval: "[a, b)"
intercept: <number> # c_0
slope: <number> # c_1
For piecewise_quadratic:
intervals:
- interval: "[a, b)"
intercept: <number> # c_0
slope: <number> # c_1
quadratic: <number> # c_2
For piecewise_cubic:
intervals:
- interval: "[a, b)"
intercept: <number>
slope: <number>
quadratic: <number>
cubic: <number> # c_3
Internal Representation#
At load time, the intervals list from the YAML is converted to portion’s
IntervalDict:
import portion
# YAML input:
# intervals:
# - interval: "[0, 20)"
# intercept: 0
# - interval: "[20, 30)"
# intercept: 384
# ...
# Converted to:
params = portion.IntervalDict(
{
portion.closedopen(0, 20): {"intercept": 0},
portion.closedopen(20, 30): {"intercept": 384},
portion.closedopen(30, 40): {"intercept": 620},
# ...
portion.closedopen(100, portion.inf): {"intercept": 2840},
}
)
Internal Array Representation#
For vectorized execution (e.g., in JAX), the IntervalDict is compiled into dense
arrays. To address the usability issues identified in
TTSIM #5, the array with coefficients
will be standardized to shape (n_intervals, n_coefficients).
For example, a piecewise linear function with 3 intervals will have a coefficient array
of shape (3, 2):
# [
# [intercept_0, slope_0],
# [intercept_1, slope_1],
# [intercept_2, slope_2],
# ]
coefficients = np.array(
[
[0.0, 0.0],
[0.0, 0.119],
[0.0, 0.055],
]
)
This layout intuitively maps each row to a specific interval, improving readability and aligning with standard data conventions.
Named Access to Coefficients#
The PiecewisePolynomialParamValue object supports accessing individual intervals and
their coefficients by name. For example, given a parameter with three intervals:
# Access the slope of the first interval:
parameter_solidaritätszuschlag[0].slope
# Access the intercept of the second interval:
parameter_solidaritätszuschlag[1].intercept
This is useful in policy functions that need to reference specific coefficients
directly, without calling piecewise_polynomial().
Behavior Outside Defined Domain#
When piecewise_polynomial() is called with a value outside the defined intervals, it
returns NaN. This design choice reflects several considerations:
JAX compatibility: JAX’s JIT compilation model does not support raising exceptions during traced computation.
NaN propagation: NaN values propagate, making it as easy as possible to identify affected outputs.
Debugging: If the column that
piecewise_polynomialoperates on is provided as input, we can easily identify data outside expected ranges (see #402).Natural domains: Allows specifying parameters only for their meaningful range (e.g., income ≥ 0).
Validation#
At parameter load time, the system will validate:
Contiguity: Intervals must be contiguous (no gaps within the defined domain)
No overlaps: Intervals must not overlap (portion handles this automatically)
Ordering: Intervals must be specified in ascending order in the YAML file
Continuity (optional, for linear+): At boundaries, the polynomial values should match (can be a warning rather than error)
updates_previouscompatibility: Each update interval must exactly match a base interval’s bounds; only coefficients are replaced
Full coverage of (-inf, inf) is not required.
Implementation#
Add portion dependency to ttsim-backend
Create interval parser: Parse strings like
"[20, 30)"into portion intervalsUpdate parameter loading: Convert YAML to
IntervalDict-based representationUpdate
piecewise_polynomial(): QueryIntervalDictinstead of searching arrays; return NaN for queries outside defined domain. Ensure evaluation logic uses local coordinates relative to interval start.Write migration script: Convert existing YAML files to new format.
Update documentation: GEP 3 (parameters) and user guides
Alternatives#
Alternative 1: Keep Current Format with Better Documentation#
Pros: No breaking change. Cons: Doesn’t solve usability issues.
Alternative 2: Generic Coefficient Names (p0, p1, p2, p3)#
Instead of descriptive names (intercept, slope, quadratic, cubic), use generic
notation like p0, p1, p2, p3 or coefficients: [...].
We chose descriptive names because:
Reduces order-confusion errors: Descriptive names make the meaning unambiguous.
Consistency:
slope(linear),quadratic, andcubicprovide a clear progression that aligns with the polynomial terms they represent.Precision:
quadraticunambiguously refers to the coefficient \(c_2\), whereas terms like “curvature” could be confused with the second derivative (\(2 \cdot c_2\)).Self-documenting YAML:
slope: 0.119immediately conveys meaning.
Discussion#
ttsim #5: Proposal to improve the interface for piecewise polynomials (rates shape)
gettsim #901: Original issue
pylcm #210: Discussion on interval specification
Copyright#
This document has been placed in the public domain.