Guide

API reference

Compact, opinionated reference of the public API. For exhaustive docstrings, read the source under openpls/ or call help(openpls.Plspm) in a Python REPL.

Imports throughout assume:

import openpls.config as c
from openpls import Plspm
from openpls.scheme import Scheme
from openpls.mode import Mode
from openpls.scale import Scale

openpls.config

Structure()

A builder for the path matrix that defines the structural model.

structure = c.Structure()
structure.add_path(["IMAG"], ["EXPE", "SAT", "LOY"])
path_df = structure.path() # lower-triangular DataFrame of 0/1
  • add_path(source: list, target: list) records one or more directed edges. Either source or target may be a list; the other must contain exactly one entry.
  • path() returns the path matrix as a Pandas DataFrame, topologically ordered, ready to pass into Config.

MV(name: str, scale: Scale = None)

One manifest variable. name must match a column in your dataset. scale is only needed for nonmetric data.

Config(path: DataFrame, scaled: bool = True, default_scale: Scale = None)

The model specification consumed by Plspm.

  • path: the matrix from Structure.path() (square, lower-triangular, 0/1).
  • scaled: standardize MVs before fitting (default True, the standard PLS-SEM behavior). Only used when default_scale is None.
  • default_scale: scale for nonmetric MVs (Scale.NUM, Scale.ORD, Scale.NOM, Scale.RAW).

Methods:

  • add_lv(lv_name, mode, *mvs) attaches a Mode A or Mode B latent variable with explicit MV(...) instances.
  • add_lv_with_columns_named(lv_name, mode, data, col_name_starts_with, default_scale=None) is the shortcut when indicator columns share a prefix. With ECSI data and the convention lv.lower(), this picks up every indicator for a given LV in one call.
  • add_higher_order(hoc_name, mode, lvs) registers a second-order construct. Best paired with the two-stage HOC workflow; see the upstream plspm docs for the details.

openpls.Plspm

Plspm(data, config, scheme=Scheme.CENTROID, iterations=100, tolerance=1e-6, bootstrap=False, bootstrap_iterations=100, processes=2, missing_strategy="casewise")

The main entry point. Constructing a Plspm runs the algorithm; results are accessed via instance methods.

ArgumentNotes
dataPandas DataFrame. Must contain every column referenced as an MV.
configConfig instance.
schemeOne of the five Scheme enum values.
iterationsMaximum PLS outer-loop iterations. Floored at 100.
toleranceConvergence tolerance for the weight update.
bootstrapSet True to run the upstream multiprocessing bootstrap inline. For long runs prefer LongBootstrap (below).
bootstrap_iterationsBootstrap resamples (must be a multiple of processes).
processesWorker processes for the bootstrap.
missing_strategy"casewise" (default) or "mean". See Core concepts.

Result-object methods:

MethodReturns
scores()DataFrame of latent-variable scores, one column per LV.
outer_model()DataFrame with weight, loading, communality, redundancy per indicator.
inner_model()Long-format DataFrame with OLS estimate, std error, t, `p>
path_coefficients()Square DataFrame mirroring the path matrix, with the path coefficients filled in.
crossloadings()DataFrame with indicators on the index and every LV on the columns.
inner_summary()DataFrame per LV with type (Exogenous/Endogenous), r_squared, r_squared_adj, block_communality, mean_redundancy, ave, and (for endogenous) bic.
goodness_of_fit()Tenenhaus GoF scalar.
effects()DataFrame with from, to, direct, indirect, total columns.
specific_indirect_effects(source, target, through=None)Point-estimate specific indirect effects (mediation analysis). Chain product of path coefficients along source -> M1 -> ... -> target. through=None enumerates every chain in the DAG; through=["M1", "M2"] evaluates that single chain. Returns a DataFrame indexed by chain label with from, to, via, estimate. For inference use Bootstrap.specific_indirect_effects(...).
f_squared()FSquared instance. Cohen’s f² per structural edge (Cohen 1988; Hair et al. 2022). table() returns the long-format view with effect-size labels; matrix() returns a square matrix mirroring the path matrix. Computed lazily and cached.
fornell_larcker()FornellLarcker instance. Fornell and Larcker (1981) discriminant-validity criterion: sqrt(AVE) on the diagonal vs inter-construct correlations off-diagonal. Mode B / single-indicator LVs get NaN on the diagonal. Computed lazily and cached.
report(include_rho_a=True, include_htmt2=True)Report instance bundling the reviewer-standard panels (reliability, discriminant validity, structural paths with f², per-LV R² / adj R² / BIC, fit indices, collinearity) into one object for publication-ready export. See openpls.report.Report.
unidimensionality()DataFrame per LV with Cronbach alpha, Dillon-Goldstein rho, and first/second eigenvalues.
htmt()HTMT instance. Call .matrix() for the square matrix or .pairs() for the long form.
htmt2()HTMT2 instance. Geometric-mean refinement of HTMT (Roemer, Schuberth and Henseler 2021). Same matrix() / pairs() API. Computed lazily and cached.
model_fit()ModelFit instance. Call .srmr(), .d_uls(), .residuals().
q_squared(omission_distance=7)DataFrame indexed by endogenous LV with a q_squared column. Computed lazily via blindfolding.
vif()VIF instance. Per-indicator (items()) and per-predictor (inner()) Variance Inflation Factor. Computed lazily and cached.
cta(n_boot=500, alpha=0.05, seed=42)CTAPLS instance. Confirmatory Tetrad Analysis testing reflective-Mode-A specification of each block with at least four indicators.
plsc()PLSc instance. Consistent-PLS (Dijkstra and Henseler 2015) correction of paths, loadings, and R² for reflective Mode-A attenuation. Computed lazily and cached.
copula(endogenous, suspected=None, n_boot=500, seed=42)GaussianCopula instance. Park and Gupta (2012) / Hult et al. (2018) endogeneity test for the structural equation of endogenous. Returns per-predictor copula coefficients, bootstrap p-values, and a CramĂŠr-von Mises admissibility check.
predict(k=10, repeats=1, seed=42)PLSPredict instance for k-fold PLSpredict.
ipma(target, scale_min=None, scale_max=None, indicator_scales=None)IPMA instance for the target endogenous LV.
fimix(n_classes, max_iter=500, tolerance=1e-6, n_restarts=5, seed=42)FIMIX instance for K-class finite-mixture segmentation.
higher_order(name, first_order, mode, structure, ...)HigherOrder instance. Disjoint two-stage higher-order construct (Sarstedt et al. 2019; Hair et al. 2022). Uses the current fit as stage 1 and refits with a new second-order LV whose indicators are the first-order LV scores.
micom(data, grouping_column, group_a, group_b, iterations=1000, seed=42)MICOM instance. Three-step Measurement Invariance of Composite Models (Henseler, Ringle and Sarstedt 2016): per-construct compositional invariance via permutation on c, plus mean and variance equality tests on the pooled-weight composites. Run before MGA / moderation.
data()The dataset actually used by the fit (after the configured missing-value strategy).
config()The Config used by the fit.
bootstrap()The Bootstrap instance (requires constructing Plspm with bootstrap=True). Exposes weights(), loading(), r_squared(), paths(), total_effects(), and specific_indirect_effects(source, target, through=None, alpha=0.05) (bootstrap percentile CIs for mediation chains; see the Specific indirect effects note below).

openpls.scheme.Scheme

Enum of inner-weighting schemes:

  • Scheme.CENTROID: classical sign-of-correlation scheme (upstream default).
  • Scheme.FACTORIAL: covariance-based.
  • Scheme.PATH: asymmetric OLS-for-predecessors, correlation-for-successors.
  • Scheme.NEWTON: joint quasi-Newton (BFGS) optimization across all neighbors of each LV. Initialized from the analytical OLS solution; uses scipy.optimize.minimize with gtol=1e-8.
  • Scheme.PCA: LohmĂśller’s first-principal-direction scheme. Treats neighbor weights as a joint multivariate direction.

openpls.mode.Mode

  • Mode.A (reflective). LV causes its indicators.
  • Mode.B (formative). Indicators form the LV.

openpls.scale.Scale

Used for nonmetric data:

  • Scale.NUM: numeric, linearly transformable.
  • Scale.RAW: numeric, no transformation.
  • Scale.ORD: ordinal, monotonic transformation.
  • Scale.NOM: nominal, non-monotonic transformation.

openpls.vif.VIF

Variance Inflation Factor diagnostics. Construct through Plspm.vif().

fit = Plspm(data, config, Scheme.CENTROID)
vif = fit.vif()
print(vif.items()) # per-indicator VIF within each construct block
print(vif.inner()) # per-predictor VIF for each endogenous LV

Two views:

  • items(): long-format DataFrame with columns lv, indicator, vif. For every indicator in a block with two or more indicators, x_j is regressed on the remaining indicators of the same block and VIF_j = 1 / (1 - R²_j). Standard collinearity diagnostic for formative (Mode B) blocks; also informative for reflective blocks. Blocks with fewer than two indicators are omitted (VIF is undefined).
  • inner(): dict keyed by endogenous LV name, value is a DataFrame with columns predictor, vif. For every predictor LV of an endogenous LV Y, the predictor’s score is regressed on the other predictors’ scores. Use to detect structural multicollinearity among antecedents. Endogenous LVs with fewer than two predictors are omitted.

Returns inf when a regression is perfectly collinear (R² is numerically 1) and nan when the response has zero variance. A common rule of thumb is VIF < 5 (lenient) or < 3.3 (Diamantopoulos and Siguaw 2006) for formative indicators.

openpls.cta.CTAPLS

Confirmatory Tetrad Analysis for PLS (Gudergan, Ringle, Wende and Will 2008). Diagnostic for the outer model that tests whether reflective (Mode A) specification is consistent with the data, per block of four or more indicators, using Bollen and Ting’s (1993) vanishing-tetrad theorem. Construct through Plspm.cta(...).

fit = Plspm(data, config, Scheme.CENTROID)
cta = fit.cta(n_boot=500, alpha=0.05, seed=42)
print(cta.tetrads()) # per-tetrad table with bootstrap SE and Holm decision
print(cta.summary()) # per-block verdict

Arguments:

  • n_boot: number of bootstrap resamples per block (default 500, must be at least 50).
  • alpha: family-wise significance level for the within-block Holm step-down correction (default 0.05).
  • seed: RNG seed for the bootstrap. Pass None for non-deterministic.

Procedure:

  1. One canonical tetrad per indicator 4-tuple — s_ij · s_kl − s_ik · s_jl for i<j<k<l — gives C(p, 4) non-redundant tetrads per block.
  2. Bootstrap each tetrad to obtain its sampling distribution.
  3. Compute a two-sided percentile p-value under H₀: τ = 0 after centering the bootstrap distribution on zero. P-values are floored at 1 / n_boot.
  4. Apply Holm step-down at alpha within each block.

Mode B blocks and reflective blocks with fewer than four indicators are omitted (tetrads are undefined or vacuously satisfied).

Result methods:

  • tetrads(): long-format DataFrame with columns lv, indicators (the four indicator names, comma-separated), tetrad (observed sample value), boot_se, p_value, holm_decision ("reject" or "fail to reject").
  • summary(): per-block DataFrame with columns lv, n_indicators, n_tetrads, n_rejected, decision ("reflective supported" when no tetrad rejects after Holm correction; "reflective rejected" otherwise).
  • alpha() and n_boot(): echo the configured values.

openpls.plsc.PLSc

Consistent PLS (Dijkstra and Henseler 2015). Corrects path coefficients, outer loadings, and R² for measurement-error attenuation in reflective (Mode A) constructs. Construct through Plspm.plsc().

fit = Plspm(data, config, Scheme.CENTROID)
plsc = fit.plsc()
print(plsc.rho_a()) # per-construct Dijkstra-Henseler reliability
print(plsc.path_coefficients()) # corrected paths
print(plsc.r_squared()) # corrected R² per endogenous LV
print(plsc.loadings()) # corrected outer loadings (common-factor form)
print(plsc.adjusted_correlations()) # dis-attenuated construct correlation matrix
print(plsc.summary()) # rho_a + R² + adjusted R² per LV

Procedure:

  1. Compute the closed-form Dijkstra-Henseler reliability rho_A = (w'w)² · w'Sw / w'(ww' − diag) w per construct, where w is the PLS-normalized outer-weight vector and S is the standardized indicator covariance matrix with zero diagonal. Mode B (formative) constructs and single-indicator constructs receive rho_A = 1 by convention.
  2. Build the dis-attenuated construct correlation matrix: divide every off-diagonal entry of cor(scores) by sqrt(rho_A_i ¡ rho_A_j); the diagonal stays 1.
  3. Re-estimate the standardized path coefficients by OLS on the adjusted correlations: beta = R_xx⁝š r_xy.
  4. Recompute R² and adjusted R² per endogenous LV from the corrected paths.
  5. Rescale Mode A loadings to the consistent common-factor form Ν_k = w_k ¡ sqrt(rho_A) / (w'w). Mode B loadings are left unchanged.

The composite LV scores themselves are not modified; the correction operates only on quantities derived from them. Use the corrected outputs when you intend the model to be interpreted as a common-factor (covariance-based) model alongside the composite-model originals.

Result methods:

  • rho_a(): Series indexed by LV, with the per-construct reliability.
  • adjusted_correlations(): square DataFrame of the dis-attenuated construct correlations.
  • path_coefficients(): DataFrame mirroring Plspm.path_coefficients(), but with PLSc-corrected coefficients.
  • r_squared() and r_squared_adj(): Series indexed by endogenous LV with the corrected R² and adjusted R².
  • loadings(): Series of corrected outer loadings indexed by indicator (loading_c).
  • summary(): per-LV DataFrame with rho_a, r_squared, r_squared_adj. Exogenous LVs have NaN in the two R² columns.

openpls.copula.GaussianCopula

Gaussian-copula endogeneity test (Park and Gupta 2012; Hult, Hair, Proksch, Sarstedt, Pinkwart and Ringle 2018). Diagnoses whether a structural-equation predictor is correlated with the omitted-variable error in its endogenous LV’s regression. Construct through Plspm.copula(...).

fit = Plspm(data, config, Scheme.CENTROID)
cop = fit.copula(endogenous="SAT", n_boot=500, seed=42)
print(cop.coefficients()) # gamma, boot_se, t, p_value, cvm_p_nonnormal per predictor
print(cop.augmented_paths()) # endogeneity-corrected structural estimates
print(cop.summary()) # adds a `decision` column

The augmented regression is

Y = β₀ + Σⱼ βⱼ Xⱼ + Σₖ γₖ · Φ⁻¹(F_n(Xₖ)) + e

where F_n(x) = rank(x) / (n+1) is the rescaled empirical CDF and the sum over k runs over the suspected predictors. A significant γₖ flags Xₖ as endogenous. Inference uses a non-parametric row bootstrap (SE = std(γ_b), t = γ / SE, p = 2 (1 − Φ(|t|))), matching the convention used by LongBootstrap.

Arguments:

  • endogenous: name of the endogenous LV whose structural equation is tested.
  • suspected: predecessor LVs to augment with a copula term. None (default) tests every predecessor.
  • n_boot: number of bootstrap resamples (default 500, minimum 50).
  • seed: RNG seed for the bootstrap. None for non-deterministic.

Result methods:

  • endogenous(): the endogenous LV under test.
  • predictors(): all structural predecessors of the endogenous LV.
  • suspected(): the subset that received a copula term.
  • coefficients(): DataFrame with predictor, gamma, boot_se, t, p_value, and cvm_p_nonnormal (CramĂŠr-von Mises p-value against a fitted normal — small means non-normality is supported and the test is admissible).
  • augmented_paths(): Series of structural-path coefficients from the augmented OLS — the endogeneity-corrected estimates to compare against Plspm.path_coefficients().
  • summary(): adds a decision column to coefficients(): "endogeneity detected", "no endogeneity detected", "copula not admissible (normal)" when the predictor is too normal for the test to discriminate, or "inconclusive" if the bootstrap was singular.
  • n_boot(): number of successful bootstrap iterations used.

The procedure requires the suspected predictor to be non-normal: under normality Φ⁻¹(F_n(X)) ≈ X, so the copula term collapses and the augmented regression cannot distinguish endogeneity. The Cramér-von Mises screen reports admissibility per predictor.

openpls.ipma.IPMA

Importance-Performance Map Analysis for one endogenous target LV. The recommended construction is via Plspm.ipma(target=...), but you can instantiate directly if needed.

fit = Plspm(data, config, Scheme.CENTROID)
ipma = fit.ipma(target="SAT", scale_min=1.0, scale_max=10.0)
print(ipma.latent_variables()) # importance, performance per predecessor LV
print(ipma.indicators()) # outer_weight, normalized_weight, performance per indicator

Arguments:

  • target: name of the endogenous LV to analyze. Must have at least one incoming path.
  • scale_min, scale_max: common scale bounds (e.g. 1 and 7 for a 7-point Likert). Both None means each indicator is rescaled from its observed min/max.
  • indicator_scales: {indicator: (min, max)} overrides per indicator.

Result methods:

  • latent_variables(): DataFrame indexed by LV with importance (standardized total effect on the target) and performance (mean of the 0-100-rescaled LV score).
  • indicators(): DataFrame indexed by (lv, indicator) with outer_weight, normalized_weight, performance, scale_min, scale_max.

openpls.predict.PLSPredict

PLSpredict via k-fold cross-validation. Construct through Plspm.predict(...). Implements the full Shmueli, Sarstedt, Hair, Cheah, Ting, Vaithilingam and Ringle (2019) panel: per-indicator RMSE, MAE, and MAPE for both PLS and an LM benchmark, in both out-of-sample (k-fold CV) and in-sample (single full-data fit) variants, plus Q²_predict against the naive train-mean baseline.

pred = fit.predict(k=10, repeats=1, seed=42)
print(pred.metrics()) # full per-indicator panel — see columns below
print(pred.summary()) # "better" / "worse" / "tie" per indicator (PLS vs LM)

Arguments:

  • k: number of folds (default 10, must be at least 2 and not exceed the sample size).
  • repeats: how many shuffle-and-refold rounds (default 1).
  • seed: RNG seed for fold shuffling. Pass None for non-deterministic.

Result methods:

  • metrics(): per-indicator DataFrame indexed by (lv, indicator). Columns:
    • Out-of-sample (k-fold): rmse_pls, mae_pls, mape_pls, q2_predict, rmse_lm, mae_lm, mape_lm.
    • In-sample (single full-data fit): rmse_pls_in, mae_pls_in, mape_pls_in, rmse_lm_in, mae_lm_in, mape_lm_in.
    • MAPE is the proportion mean(|err / actual|) (matching sklearn’s convention; multiply by 100 for the percent form). Rows where the actual value is zero are excluded from MAPE only; the other metrics still see them.
  • summary(): per-indicator Series of "better", "worse", or "tie" based on PLS vs LM out-of-sample RMSE. Aggregate to get the Shmueli et al. 2019 verdict (“high / medium / low / none predictive power”).

openpls.moderation.Moderation

Two-stage moderation (Henseler and Chin 2010): fit the base model, multiply standardized scores of predictor and moderator, then refit with the product as a single-indicator interaction LV pointing into target.

from openpls.moderation import Moderation
mod = Moderation(
data,
config,
predictor="IMAG",
moderator="EXPE",
target="SAT",
)
print(mod.interaction_effect()) # estimate, std error, t, p>|t|
print(mod.refit().path_coefficients())

Arguments:

  • data, config: the same inputs you would pass to Plspm.
  • predictor, moderator, target: LV names. Predictor and moderator must differ; target must be endogenous and cannot be either of the other two.
  • interaction_name: defaults to "{predictor}_x_{moderator}".
  • scheme, iterations, tolerance, missing_strategy: passed through to both stages.

Methods:

  • base(): the stage-1 Plspm fit, without the interaction.
  • refit(): the stage-2 fit, with the interaction LV.
  • interaction_effect(): Series with estimate, std error, t, p>|t| for interaction -> target. The OLS-derived t and p are convenience reporting; for inference, bootstrap the refit.

openpls.higher_order.HigherOrder

Disjoint two-stage higher-order construct (Sarstedt, Hair, Cheah, Becker and Ringle 2019; Hair, Hult, Ringle and Sarstedt 2022, A Primer on PLS-SEM, 3rd ed., Chapter 8). The current Plspm fit becomes stage 1; its first-order LV scores are appended as indicators of a new second-order construct, and a stage-2 Plspm is fit with the HOC in place of its first-order constituents. Construct through Plspm.higher_order(...).

fit1 = Plspm(data, config, Scheme.CENTROID)
stage2_structure = c.Structure()
stage2_structure.add_path(["JOB_SAT"], ["INTENT_TO_STAY"])
hoc = fit1.higher_order(
name="JOB_SAT",
first_order=["PAY_SAT", "WORK_SAT", "SUPERVISION_SAT"],
mode=Mode.A, # Type I (R-R): first-order A, HOC A
structure=stage2_structure,
)
print(hoc.loadings()) # HOC measurement loadings (or weights, Mode B)
print(hoc.path_coefficients()) # stage-2 structural paths, HOC included
print(hoc.summary()) # per-first-order loading/weight + stage-1 R²

The four canonical HOC types are obtained by combining the existing first-order LV modes with the HOC mode:

  • Type I (Reflective-Reflective) — first-order Mode A, mode=Mode.A. The HOC is a common factor measured by reflective first-order constructs.
  • Type II (Reflective-Formative) — first-order Mode A, mode=Mode.B. The HOC is a composite formed by reflective first-order constructs.
  • Type III (Formative-Reflective) — first-order Mode B, mode=Mode.A. The HOC is a common factor measured by formative composites.
  • Type IV (Formative-Formative) — first-order Mode B, mode=Mode.B. The HOC is a composite formed by formative composites.

Arguments:

  • name: name of the second-order construct. Must not clash with an existing LV or with any indicator column in the original data.
  • first_order: list of first-order LV names from the base model to roll up into the HOC. At least two are required.
  • mode: measurement mode of the HOC w.r.t. its first-order indicators (Mode.A or Mode.B).
  • structure: a Structure for the stage-2 path model. Must contain name, and must not contain any of the first_order LVs.
  • iterations, tolerance, missing_strategy: passed through to the stage-2 Plspm fit.

Methods:

  • name(), first_order(), hoc_mode(): echo the configuration.
  • base(): the stage-1 Plspm (the fit on which higher_order was called).
  • refit(): the stage-2 Plspm, which contains the HOC and all the non-rolled-up LVs from the base model.
  • loadings(): outer loadings (Mode A) or outer weights (Mode B) of the HOC on its first-order indicators, indexed by first-order LV name.
  • path_coefficients(): stage-2 structural path coefficients (HOC included). Pass-through to refit().path_coefficients().
  • r_squared(): stage-2 R² per endogenous LV (HOC included if endogenous).
  • summary(): per-first-order DataFrame with first_order, the HOC loading (Mode A) or weight (Mode B), and stage1_r_squared (the first-order LV’s stage-1 R² — 0 for exogenous first-order LVs).
  • indicator_columns(): mapping {first_order_lv -> indicator_column_name} used to carry the stage-1 scores into the stage-2 data.

The legacy Config.add_higher_order (repeated-indicators / embedded two-stage) remains available for backward compatibility, but disjoint two-stage is the modern recommended path because the first-order constructs are not simultaneously their own measurement and the HOC’s predictors. Chained / nested HOCs work naturally: call higher_order() again on the previous refit().

openpls.fimix.FIMIX

Finite Mixture PLS (Hahn et al. 2002) for latent-class segmentation. Construct through Plspm.fimix(n_classes=K).

fmx = fit.fimix(n_classes=3, n_restarts=5, seed=42)
print(fmx.class_sizes()) # mixture proportions per class
print(fmx.memberships()) # posterior class probabilities per case
print(fmx.hard_assignments()) # argmax class label per case
print(fmx.class_paths()) # per-class structural path coefficients
print(fmx.fit_criteria()) # log_lik, n_params, AIC, AIC3, AIC4, BIC, CAIC, MDL5, EN

Arguments:

  • n_classes: K, number of mixture components (>= 2).
  • max_iter: maximum EM iterations per restart (default 500).
  • tolerance: convergence threshold on the log-likelihood.
  • n_restarts: how many random EM restarts to run; the best (highest log-likelihood) is kept (default 5).
  • seed: RNG seed for restart initialization.

The fit_criteria() Series exposes the standard information criteria for choosing K. Lower is better for AIC family and BIC; the normalized entropy EN lives in [0, 1] and higher means clearer class separation.

openpls.mga.MGA

Multi-Group Analysis via Henseler permutation tests. Use GroupSpec to define each subset:

from openpls.mga import MGA, GroupSpec
mga = MGA(
data,
config,
grouping_column="region",
groups=[
GroupSpec(name="west", values=["west"]),
GroupSpec(name="east", values=["east"]),
],
scheme=Scheme.CENTROID,
iterations=5000,
seed=42,
)
print(mga.group_estimates()) # per-group path coefficients
print(mga.comparisons()) # pairwise differences and permutation p-values

Arguments:

  • data, config: standard PLS-SEM inputs.
  • grouping_column: a column in data whose values assign each row to at most one group.
  • groups: list of GroupSpec(name=, values=) (categorical / list membership) or GroupSpec(name=, range=(lo, hi)) (inclusive numeric interval; None means unbounded on that side). At least two groups required.
  • iterations: permutation iterations per pair (default 5000).
  • seed: RNG seed.

Methods:

  • group_estimates(): long-format DataFrame with group, n, source, target, estimate.
  • comparisons(): long-format pairwise differences with groupA, groupB, source, target, estimateA, estimateB, difference, p_value. Two-sided permutation p-values with Phipson-Smyth add-one smoothing.

openpls.micom.MICOM

Measurement Invariance of Composite Models — the three-step Henseler, Ringle and Sarstedt (2016) procedure that must be run before interpreting MGA or moderation results. Reuses the same GroupSpec as MGA but is restricted to exactly two groups:

from openpls.mga import GroupSpec
from openpls.micom import MICOM
micom = MICOM(
data,
config,
grouping_column="region",
group_a=GroupSpec(name="west", values=["west"]),
group_b=GroupSpec(name="east", values=["east"]),
scheme=Scheme.CENTROID,
iterations=1000,
seed=42,
)
print(micom.step2()) # compositional invariance per construct
print(micom.step3()) # mean / variance equality per construct
print(micom.summary()) # combined verdict: "full" / "partial" / "none"

Arguments:

  • data, config: standard PLS-SEM inputs.
  • grouping_column: column in data that distinguishes the two groups (need not be a model indicator).
  • group_a, group_b: two GroupSpec instances. Must have distinct names and disjoint masks (MICOM raises if the groups overlap or if either is empty).
  • iterations: permutation iterations (default 1000). Step 3 reuses the pooled-fit weights and skips PLS refits per iteration, so the cost is dominated by Step 2 (one PLS refit per group per iteration).
  • seed: RNG seed; pass None for non-deterministic results.

Methods:

  • step2(): per construct, columns construct, c, p_value, compositional_invariance. p_value is a one-sided lower-tail permutation probability; compositional_invariance is True when p_value >= 0.05 (cannot reject c = 1).
  • step3(): per construct, columns construct, mean_diff, mean_p_value, mean_equal, log_var_ratio, var_p_value, var_equal. Both p-values are two-sided permutation probabilities; pooled-fit weights are applied to standardized indicators to produce common-scale composite scores before computing means and variances.
  • summary(): per construct, combines Steps 2 and 3 into a single invariance verdict: "full" (Step 2 and both Step 3 sub-tests pass), "partial" (Step 2 passes but mean or variance differs — MGA is still interpretable for path differences), "none" (Step 2 fails — composites are not comparable and MGA results would be uninterpretable).
  • group_sizes(): dict of per-group observation counts (audit trail for Step 1 configural invariance).

For more than two groups, run MICOM pairwise.

openpls.f_squared.FSquared

Cohen’s f² effect size for the structural model (Cohen 1988; Hair, Hult, Ringle and Sarstedt 2022, A Primer on PLS-SEM, 3rd ed.). For every directed edge predictor -> endogenous, refits the endogenous LV’s OLS with the predictor removed and reports the change in R² normalised by the residual variance:

f² = (R²_full − R²_reduced) / (1 − R²_full)

Construct through Plspm.f_squared() (lazy, cached).

fit = Plspm(data, config)
f2 = fit.f_squared()
print(f2.table()) # long format with effect-size labels
print(f2.matrix()) # square matrix mirroring the path matrix

Methods:

  • table(): DataFrame indexed by "predictor -> endogenous". Columns from, to, r_squared_full, r_squared_reduced, f_squared, effect_size.
  • matrix(): square DataFrame with the same shape as the path matrix. Rows are targets, columns are sources. Cells outside the structural model are NaN.

Effect-size labels follow the conventional Cohen / Hair thresholds:

  • none for f² < 0.02
  • small for 0.02 <= f² < 0.15
  • medium for 0.15 <= f² < 0.35
  • large for f² >= 0.35

openpls.fornell_larcker.FornellLarcker

Fornell-Larcker discriminant-validity criterion (Fornell and Larcker 1981). Construct through Plspm.fornell_larcker() (lazy, cached).

fit = Plspm(data, config)
fl = fit.fornell_larcker()
print(fl.matrix()) # sqrt(AVE) on diagonal, LV correlations off-diagonal
print(fl.summary()) # per-LV passes verdict

Methods:

  • matrix(): square DataFrame. Diagonal entries are sqrt(AVE_lv) for reflective (Mode A) constructs and NaN for formative (Mode B) and single-indicator constructs (AVE is undefined there). Off-diagonal entries are inter-construct correlations from the standardized latent-variable scores.
  • ave(): Series of Average Variance Extracted per LV, with NaN for non-Mode-A / single-indicator constructs.
  • summary(): DataFrame indexed by LV with sqrt_ave, max_abs_corr, passes (boolean, NA when AVE is undefined), and note. passes is True iff sqrt(AVE_lv) exceeds every absolute off-diagonal entry in the LV’s row.

The modern recommendation (Henseler, Ringle and Sarstedt 2015) is to prefer HTMT (Plspm.htmt() or Plspm.htmt2()) for discriminant validity, but reviewers frequently still request the Fornell-Larcker table as well.

openpls.report.Report

Publication-ready summary report (Hair, Hult, Ringle and Sarstedt 2022, A Primer on PLS-SEM, 3rd ed.). Bundles the engine’s individual diagnostics into the panels you need for the standard PLS-SEM research report, so the whole reporting layer can be exported with a single call. Construct through Plspm.report(...). Pure orchestration — every value comes from an existing lazy-cached method on Plspm.

fit = Plspm(data, config)
rep = fit.report()
rep.reliability() # Cronbach alpha, rho_A, rho_C, AVE per LV
rep.discriminant_validity() # HTMT (+HTMT2), Fornell-Larcker matrices and summary
rep.paths() # Structural paths with std error, t, p, f², effect size
rep.construct_summary() # type / mvs / R² / adj R² / BIC per LV
rep.fit_indices() # SRMR, d_ULS, GoF
rep.collinearity() # Outer + inner VIF
rep.to_dict() # Bundle every section for export (e.g. JSON)

Arguments:

  • include_rho_a (default True): include the Dijkstra-Henseler rho_A column in reliability(). Triggers Plspm.plsc() internally; falls back to NaN if PLSc cannot run.
  • include_htmt2 (default True): include the HTMT2 matrix and pair list (Roemer, Schuberth and Henseler 2021) in discriminant_validity().

Methods:

  • reliability(): DataFrame indexed by LV with columns mode, mvs, cronbach_alpha, rho_a (when include_rho_a), rho_c, ave. Mode B (formative) and single-indicator LVs receive NaN for the metrics that are undefined for them.
  • discriminant_validity(): dict with htmt (matrix), htmt_pairs (long form), fornell_larcker (matrix), fornell_larcker_summary (per-LV passes verdict), and (when include_htmt2) htmt2 and htmt2_pairs.
  • paths(): DataFrame indexed by "predictor -> endogenous" with columns from, to, estimate, std_error, t, p_value, f_squared, effect_size.
  • construct_summary(): DataFrame per LV with type (Exogenous / Endogenous), mvs, r_squared, r_squared_adj, bic, block_communality, mean_redundancy.
  • fit_indices(): Series with srmr, d_uls, goodness_of_fit (NaN if all constructs are single-item).
  • collinearity(): dict with items (per-indicator outer VIF; may be None if no block has at least two indicators) and inner (dict of per-endogenous-LV VIF tables).
  • to_dict(): bundles every section above into a single dictionary, ready for export.

Specific indirect effects

Mediation analysis via chain products of path coefficients (Zhao, Lynch and Chen 2010; Nitzl, Roldan and Cepeda 2016). For every chain source -> M1 -> ... -> target the specific indirect effect is the product of the path coefficients along the chain; the total indirect effect from source to target is the sum of all such products.

fit = Plspm(data, config, bootstrap=True, bootstrap_iterations=500, processes=4)
# Point estimates of every mediation chain from IMAG to LOY
fit.specific_indirect_effects("IMAG", "LOY")
# Single chain with explicit mediators
fit.specific_indirect_effects("IMAG", "LOY", through=["EXPE", "SAT"])
# Bootstrap percentile CIs (95% by default)
fit.bootstrap().specific_indirect_effects("IMAG", "LOY", through=["EXPE", "SAT"])

Arguments:

  • source / target: latent-variable names. Must differ.
  • through: explicit chain [M1, M2, ...] of intermediate LVs. Each consecutive pair (including source -> M1 and the last Mk -> target) must be a direct edge in the structural model. None (default) auto-enumerates every simple directed chain from source to target of length two or more.
  • alpha (bootstrap only): two-sided level for the percentile CI (default 0.05 → 95%).

Returns:

  • Point estimate (Plspm.specific_indirect_effects): DataFrame indexed by chain label (e.g. "IMAG -> EXPE -> SAT -> LOY") with columns from, to, via (tuple of intermediates), estimate.
  • Bootstrap (Bootstrap.specific_indirect_effects): the same identifying columns plus original, mean, std.error, perc.lower, perc.upper, t stat.. The CI is a non-parametric percentile interval from the per-iteration distribution of the chain product.

Notes:

  • The structural model is assumed acyclic; chain enumeration is a depth-first search with per-branch cycle guards.
  • The sum of estimate over all chains from source to target equals the indirect column of effects().loc["source -> target"].
  • Aligns the engine with seminr::specific_effect_significance().

openpls.long_bootstrap.LongBootstrap

Single-process bootstrap with progress callbacks, sign-flipping, BCa percentile CIs, and a configurable success-rate floor. Use this for long-running workloads (Cloud Run, queued jobs) where you want progress streamed and partial failures tolerated.

from openpls.long_bootstrap import LongBootstrap
def report(done, total):
print(f"{done}/{total}")
boot = LongBootstrap(
data,
config,
scheme=Scheme.CENTROID,
iterations=5000,
seed=42,
alpha=0.05,
on_progress=report,
progress_every=100,
min_success_ratio=0.1,
)
print(boot.paths()) # per-path original, boot_mean, se, t, p_value, ci_lower, ci_upper, valid
print(boot.loadings()) # per-indicator loading stats
print(boot.weights()) # per-indicator weight stats
print(boot.total_effects()) # full total-effects table

Arguments:

  • data, config, scheme: standard PLS-SEM inputs.
  • iterations: bootstrap resamples (default 5000).
  • seed: RNG seed (default 42).
  • alpha: significance level for the BCa CI (default 0.05 for 95 percent CIs).
  • on_progress: callback (done, total). Called every progress_every iterations and at least every 5 seconds of wall time.
  • progress_every: callback frequency in iterations (default 100).
  • min_success_ratio: floor on completed-vs-attempted iterations. If too many resamples fail, the constructor raises RuntimeError.

Versioning

The installed version is exposed at runtime as openpls.__version__. Pin a specific version with pip install openpls-engine==X.Y.Z for reproducible analyses.