Visualizing the Taxes and Transfers System#

How to create a plot#

To help you understand how GETTSIM works internally and how you are able to implement custom reforms, you can visualize the tax and transfer system. This tutorial explains how to create a graph and what information you can get from it.

[2]:
from gettsim import Labels, plot

The plot of the tax and transfer DAG visualizes each node as a colored dot. Arrows between dots represent dependencies among the nodes. Each top-level namespace (e.g., “einkommensteuer”, “elterngeld”, etc.) is assigned a unique color to distinguish its nodes.

The simplest way to create a plot is to use the plot.dag.tt function, passing only a policy date.

Unfortunately, this is also unreadable due to the complexity of the tax and transfers system when looked at in its entirety.

(note we pass the height and width only to make the plots fit the page layout, they are not necessary when working in a notebook)

[3]:
plot.dag.tt(policy_date_str="2025-01-01", width=704, height=704)

The only idea you may be getting from this plot is the different kinds of programs, based on the colors of the nodes. These are:

  • Blue for top-level elements and background/general input variables.

  • Red for taxes.

    Exception:

    Purple for Einkünfte, which are a mix of Einnahmen (general input) and tax rules.

  • Yellow for social insurance programs.

    Differentiate between programs and between pension contributions and pension benefits.

  • Green for transfer programs.

If not mentioned otherwise, differentiation within colors happens at the first level of the namespace hierarchy.

Looking at the signature of plot.dag.tt, we see that it takes three types of arguments:

  • DAG-plotting options:

    • primary_nodes

    • selection_type

    • selection_depth

    • include_params

    • show_node_description

    • node_colormap

  • Arguments to main, which are the same as there

  • Any remaining keyword arguments are passed to plotly.graph_objects.Figure.update

[4]:
plot.dag.tt?

A more flexible way of looking at portions of the graph is to make use of the arguments primary_nodes and selection_type of plot.dag.tt().

Focusing on a subset of the system using the plotting interface#

Primary nodes allow you to visualize only a subset of the complete graph of the tax and transfer systems. They can be passed to the primary_nodes argument of the plot.dag.tt() function. This is useful only in conjunction with specifying a selection type — in case you leave that out or set it to None, the complete graph is displayed (i.e., primary_nodes does not do anything).

Other than None, selection_type may take one of four values:

  • "neighbors": The neighbors (parents and children) of the primary nodes are displayed.

  • "descendants": All descendants of the primary nodes are displayed.

  • "ancestors": All ancestors of the primary nodes are displayed.

  • "all_paths": All paths between the primary nodes are displayed (including any other nodes lying on these paths). You must pass at least two primary nodes.

We’ll go through these in turn.

Selection type: Neighbors#

This is the example above. We pick one primary node ("betrag_y_sn" in the "einkommensteuer" / "abgeltungssteuer" namespace) and it is displayed together with its parents ("zu_versteuernde_kapitalerträge_y_sn" and "satz" in the same namespace) and its child ("betrag_y_sn" in the "solidaritätszuschlag" namespace).

[5]:
plot.dag.tt(
    policy_date_str="2025-01-01",
    primary_nodes={("einkommensteuer", "abgeltungssteuer", "betrag_y_sn")},
    selection_type="neighbors",
    width=704,
    height=396,
)

By changing the selection_depth argument from its default of 1 for neighbors, you can look at a broader vicinity in both directions. Setting it to 2 will include all grandparents and grandchildren, and so on.

[6]:
plot.dag.tt(
    policy_date_str="2025-01-01",
    primary_nodes={("einkommensteuer", "abgeltungssteuer", "betrag_y_sn")},
    selection_type="neighbors",
    selection_depth=2,
    width=704,
    height=396,
)

Selection type: Ancestors#

This takes a node and plots it along with all of its ancestors. Picking again "betrag_y_sn" in the "einkommensteuer" / "abgeltungssteuer" namespace, we get its two parents as above. Among them, "satz" has no ancestors because it is a parameter of the tax system. However, "zu_versteuernde_kapitalerträge_y_sn" has a much longer set of preceding nodes, for setting up the tax unit from primitives (marital status, joint filing) and calculating the taxable portion of all capital income.

[7]:
plot.dag.tt(
    policy_date_str="2025-01-01",
    primary_nodes={("einkommensteuer", "abgeltungssteuer", "betrag_y_sn")},
    selection_type="ancestors",
    width=704,
    height=396,
)

Again, we can use the selection_depth argument to control how many levels of the graph we look at:

[8]:
plot.dag.tt(
    policy_date_str="2025-01-01",
    primary_nodes={("einkommensteuer", "abgeltungssteuer", "betrag_y_sn")},
    selection_type="ancestors",
    selection_depth=3,
    width=704,
    height=396,
)

As it happens, this is the same (up to possible layout differences) as letting the graph know that sn_id is an input column:

[9]:
plot.dag.tt(
    policy_date_str="2025-01-01",
    primary_nodes={("einkommensteuer", "abgeltungssteuer", "betrag_y_sn")},
    selection_type="ancestors",
    labels=Labels.input_columns({"sn_id"}),
    include_warn_nodes=False,
    width=704,
    height=396,
)

Selection type: Descendants#

This takes a node and plots it along with all of its descendants. Picking again "betrag_y_sn" in the "einkommensteuer" / "abgeltungssteuer" namespace and , we get a long DAG describing its descendants because the amount of Solidaritätszuschlag paid impacts the means-tested transfers Bürgergeld, Wohngeld, and Kinderzuschlag.

[10]:
plot.dag.tt(
    policy_date_str="2025-01-01",
    primary_nodes={("einkommensteuer", "abgeltungssteuer", "betrag_y_sn")},
    selection_type="descendants",
    width=704,
    height=396,
)

Selection type: All paths#

This takes a set of nodes and plots all paths connecting them, along with any nodes lying on these paths. This can be very useful to understand the connections between two or more nodes. Picking again "betrag_y_sn" in the "einkommensteuer" / "abgeltungssteuer" namespace, we additionally add "betrag_m_bg" in the "bürgergeld" namespace.

This ends up being almost the same as the previous example, except that unrelated nodes (those in the "grundsicherung_im_alter" namespace and the amounts of Wohngeld and Kinderzuschlag) are not displayed.

[11]:
plot.dag.tt(
    policy_date_str="2025-01-01",
    primary_nodes={
        ("einkommensteuer", "abgeltungssteuer", "betrag_y_sn"),
        ("bürgergeld", "betrag_m_bg"),
    },
    selection_type="all_paths",
    width=704,
    height=396,
)

Customizing node colors with node_colormap#

The node_colormap argument allows you to control the colors of nodes in the DAG plot. This is useful when you want to highlight specific nodes or group related nodes by color, especially when working with heterogeneous top-level namespaces.

Basic usage: Exact namespace matching#

The simplest use of node_colormap is to assign colors to specific namespaces. Keys are tuples representing the namespace hierarchy, and values are color names (any Plotly CSS color).

Longer tuple matches take precedence over shorter ones. For example, if you specify both ("einkommensteuer",) and ("einkommensteuer", "abgeltungssteuer"), nodes in the abgeltungssteuer sub-namespace will use the color you specified for (\"einkommensteuer\", \"abgeltungssteuer\").

[12]:
# Assign specific colors to namespaces
plot.dag.tt(
    policy_date_str="2025-01-01",
    primary_nodes={("einkommensteuer", "abgeltungssteuer", "betrag_y_sn")},
    selection_type="ancestors",
    selection_depth=3,
    node_colormap={
        ("top-level",): "lightskyblue",  # Special name to catch all top-level nodes
        ("einkommensteuer",): "crimson",
        ("einkommensteuer", "abgeltungssteuer"): "coral",  # Takes precedence
        ("familie",): "skyblue",
        ("einnahmen",): "mediumblue",
    },
    width=704,
    height=396,
)

Glob patterns for flexible matching#

For more flexible matching, node_colormap supports glob-style patterns using:

  • * to match any sequence of characters

  • ? to match a single character

This is particularly useful for:

  • Grouping variables by prefix (e.g., all p_id_* variables)

  • Grouping variables by suffix (e.g., all *_m monthly variables)

  • Matching patterns within names (e.g., *einkommen* for income-related variables)

Matching priority:

  1. Exact matches (no wildcards) take precedence

  2. Longer patterns are more specific than shorter ones

  3. Patterns with fewer wildcards score higher

[13]:
# Using glob patterns to group related variables
plot.dag.tt(
    policy_date_str="2025-01-01",
    primary_nodes={("bürgergeld", "betrag_m_bg")},
    selection_type="ancestors",
    selection_depth=2,
    node_colormap={
        # Only match at top-level
        ("*_hh",): "lightskyblue",
        # Match at all levels
        ("**", "*_wthh"): "mediumblue",
        ("**", "*_bg"): "coral",
        # Specific namespace colors
        ("vorrangprüfungen",): "olive",
        ("bürgergeld", "betrag_m_bg"): "crimson",
    },
    width=704,
    height=396,
)

The ** pattern for matching at any depth#

The patterns shown above only match at specific levels in the namespace hierarchy. To match nodes at any depth, use ** which matches any number of path segments (including zero).

For example, to color all nodes ending with _bg (Bedarfsgemeinschaft level) regardless of which namespace they’re in:

("**", "*_bg")  # Matches bürgergeld__betrag_m_bg, wohngeld__einkommen_m_bg, etc.

Patterns with ** have lower priority than equivalent patterns without it, so more specific patterns will still take precedence.

Common patterns#

Here are some useful glob patterns for the German tax and transfer system:

Pattern

Matches

Use case

("*_id",)

hh_id, bg_id, sn_id, …

Group identifiers (top-level)

("p_id_*",)

p_id_elternteil_1, p_id_empfänger, …

Person pointer columns

("*_m",)

Monthly variables at top-level

Time-based grouping

("**", "betrag_?")

betrag_m, betrag_y, etc. at any depth

Monetary amounts by time unit

("**", "*_bg")

*_bg nodes at any depth

Bedarfsgemeinschaft-level results

("**", "*_hh")

*_hh nodes at any depth

Household-level results

("**", "*_sn")

*_sn nodes at any depth

Tax unit (Steuernummer) results

("einkommensteuer", "*")

Direct children of einkommensteuer

Tax subsystems (one level)

("einkommensteuer", "**")

All descendants of einkommensteuer

Full tax namespace (any depth)

API reference#

The node_colormap parameter accepts a dict[tuple[str, ...] | str, str] where:

  • Keys are tuples or qualified name strings representing patterns to match against node paths

  • Values are color names (any Plotly CSS color)

Patterns can be specified as tuples (e.g., ("housing_benefits", "*_m")) or as qualified name strings using __ as separator (e.g., "housing_benefits__*_m").

Pattern syntax:

Pattern element

Meaning

"exact_name"

Matches this exact path segment

"prefix*"

Matches segments starting with prefix

"*suffix"

Matches segments ending with suffix

"*infix*"

Matches segments containing infix

"?"

Matches exactly one character

"**"

Matches any number of path segments (including zero)

"top-level"

Special: matches any single-segment path (top-level nodes)

Matching priority (highest to lowest):

  1. Exact matches (no wildcards)

  2. Longer patterns (more tuple elements)

  3. Fewer wildcards in the pattern

  4. Patterns without **

  5. ("top-level",) catch-all

Fallback colors:

  • Top-level nodes without a match: dimgray

  • Nested nodes without a match: black