PanelRegression#
- class causalpy.experiments.panel_regression.PanelRegression[source]#
Panel regression with fixed effects estimation.
Enables panel-aware visualization and diagnostics, with support for both unpooled dummy-variable and demeaned (de-meaned) fixed effects.
- Parameters:
data (
DataFrame) – A pandas dataframe with panel data. Each row is an observation for a unit at a time period.formula (
str) – A statistical model formula using patsy syntax. For the unpooled dummy-variable fixed-effects approach, includeC(unit_var)(and optionallyC(time_var)) in the formula. For the demeaned transformation, do NOT include thoseC(...)terms; fixed effects are removed by transformation before fitting.unit_fe_variable (
str) – Column name for the unit identifier (e.g., “state”, “id”, “country”).time_fe_variable (
str|None) – Column name for the time identifier (e.g., “year”, “wave”, “period”). If provided, time fixed effects will be included. Default is None.fe_method (
Literal['dummies','demeaned']) –Method for handling fixed effects:
”dummies”: Use unpooled dummy-variable fixed effects (
C(unit)/C(time)in formula). Gets individual unit effect estimates but creates N-1 dummy columns. Best for small N.”demeaned”: Use demeaned (de-meaned) transformation. Scales to large N but doesn’t directly estimate individual unit effects.
model (
PyMCModel|RegressorMixin|None) – A PyMC (Bayesian) or sklearn (OLS) model. If None, a model must be provided.**kwargs (
dict) – Additional keyword arguments forwarded toBaseExperiment.
Examples
Small panel with dummy variables:
>>> import causalpy as cp >>> import pandas as pd >>> # Create small panel: 10 units, 20 time periods >>> np.random.seed(42) >>> units = [f"unit_{i}" for i in range(10)] >>> periods = range(20) >>> data = pd.DataFrame( ... [ ... { ... "unit": u, ... "time": t, ... "treatment": int(t >= 10 and u in units[:5]), ... "x1": np.random.randn(), ... "y": np.random.randn(), ... } ... for u in units ... for t in periods ... ] ... ) >>> result = cp.PanelRegression( ... data=data, ... formula="y ~ C(unit) + C(time) + treatment + x1", ... unit_fe_variable="unit", ... time_fe_variable="time", ... fe_method="dummies", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={"random_seed": 42, "progressbar": False} ... ), ... )
Large panel with demeaned transformation:
>>> # Create larger panel: 1000 units, 10 time periods >>> np.random.seed(42) >>> units = [f"unit_{i}" for i in range(1000)] >>> periods = range(10) >>> data = pd.DataFrame( ... [ ... { ... "unit": u, ... "time": t, ... "treatment": int(t >= 5), ... "x1": np.random.randn(), ... "y": np.random.randn(), ... } ... for u in units ... for t in periods ... ] ... ) >>> result = cp.PanelRegression( ... data=data, ... formula="y ~ treatment + x1", # No C(unit) needed ... unit_fe_variable="unit", ... time_fe_variable="time", ... fe_method="demeaned", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={"random_seed": 42, "progressbar": False} ... ), ... )
Notes
The demeaned transformation (de-meaning by group) removes time-invariant confounders but also drops time-invariant covariates from the model. For the
"dummies"approach (unpooled FE), individual unit effects can be extracted from the coefficients. For the demeaned approach, unit effects can be recovered post-hoc using the stored group means (_group_means), which are always computed from the original (pre-demeaning) data.This class does not yet implement hierarchical/partial-pooling fixed effects. Those semantics are intentionally kept out of scope here so
fe_method="dummies"remains an accurate label for the current unpooled estimator.Two-way fixed effects (unit + time) control for both unit-specific and time-specific unobserved heterogeneity. This is the standard approach in difference-in-differences estimation.
Balanced vs unbalanced panels: A panel is balanced when every unit is observed in every time period; otherwise it is unbalanced (e.g. unit entry/exit, missing waves). When both unit and time fixed effects are requested with
fe_method="demeaned", the sequential demeaning (first by unit, then by time) is algebraically equivalent to the standard two-way demeaned transformation only for balanced panels. For unbalanced panels, iterative alternating demeaning would be needed for exact convergence; the single-pass approximation used here may introduce small biases. Unbalanced panels are common in practice (e.g. firm or worker panels with attrition); for heavily unbalanced data, consider checking sensitivity or using dedicated FE packages that implement iterative two-way demeaning (e.g. reghdfe, pyfixest).Methods
Run the experiment algorithm: fit the model.
PanelRegression.effect_summary(*[, window, ...])Generate a decision-ready summary of causal effects.
PanelRegression.fit(*args, **kwargs)Fit the underlying model.
PanelRegression.generate_report(*[, ...])Generate a self-contained HTML report for this experiment.
PanelRegression.get_plot_data(*args, **kwargs)Recover the data of an experiment along with the prediction and causal impact information.
PanelRegression.get_plot_data_bayesian(**kwargs)Get plot data for Bayesian model.
PanelRegression.get_plot_data_ols(**kwargs)Get plot data for OLS model.
Validate input parameters.
PanelRegression.plot(*[, hdi_prob, show, ...])Plot the panel regression coefficients.
Plot coefficient estimates with credible/confidence intervals.
PanelRegression.plot_residuals([kind])Plot residual diagnostics.
PanelRegression.plot_trajectories([units, ...])Plot unit-level time series trajectories.
Plot distribution of unit fixed effects.
PanelRegression.print_coefficients([round_to])Ask the model to print its coefficients.
PanelRegression.set_maketables_options(*[, ...])Set optional maketables rendering options for this experiment.
PanelRegression.summary([round_to])Print a summary of the panel regression results.
Attributes
designDesign matrix dataset.
Fixed effects method.
idataReturn the InferenceData object of the model.
labelsCoefficient labels from the design matrix.
Number of unique time periods.
Number of unique units.
outcome_variable_nameName of the outcome variable.
supports_bayessupports_olstime_fe_variableTime fixed effect variable name.
unit_fe_variableUnit fixed effect variable name.
data- __init__(data, formula, unit_fe_variable, time_fe_variable=None, fe_method='dummies', model=None, **kwargs)[source]#
- classmethod __new__(*args, **kwargs)#