Changelog

All notable changes to openpls-engine. The format follows Keep a Changelog and the project adheres to Semantic Versioning.

The public API is stable as of 1.0.0. Tagged releases trigger a GitHub Actions workflow that builds the package and publishes it to PyPI via OIDC trusted publishing.

Unreleased

1.5.0

Released 2026-06-10.

Adds the canonical pre-MGA measurement-invariance check so engine users can verify that composite constructs are comparable across two groups before interpreting group differences. Single additive feature, no existing behaviour changes.

Added

MICOM — Measurement Invariance of Composite Models (Henseler, Ringle and Sarstedt 2016) via Plspm.micom(data, grouping_column, group_a, group_b, iterations=1000, seed=42). Three-step procedure:
- Step 1 — Configural invariance is guaranteed by construction: the same Config is reused for both groups (audit trail via MICOM.group_sizes()).
- Step 2 — Compositional invariance. Per construct, computes c = w_A' Σ w_B / sqrt((w_A' Σ w_A)(w_B' Σ w_B)) between the group-A and group-B weight vectors evaluated on the pooled indicator covariance, then tests H_0: c = 1 with a one-sided lower-tail permutation test (the permutation distribution clusters near 1 under the null; an observed c deep in the lower tail rejects invariance). Sign indeterminacy is handled by aligning each permutation’s weight direction before computing c.
- Step 3 — Equality of composite means and variances. Applies pooled-fit weights to standardized indicators to produce common-scale composite scores; mean differences and log(var_A / var_B) are then tested with two-sided label-shuffling permutations. Step 3 reuses the pooled weights inside each iteration instead of refitting PLS, so it is far cheaper than Step 2.
- MICOM.summary() collapses the three steps into a per-construct verdict — "full" (Step 2 + Step 3 pass), "partial" (Step 2 passes but mean or variance differs), or "none" (Step 2 fails — composites are not comparable and MGA results would be uninterpretable). Step 2 and Step 3 are also exposed individually via step2() and step3().
Closes a longstanding gap: prior releases supported Plspm.mga(...) but provided no in-engine way to verify the invariance prerequisite MGA assumes.

1.4.0

Released 2026-06-09.

Five seminr-aligned additions covering one-call reporting, predictive accuracy, structural effect sizes, discriminant validity, and mediation decomposition. All APIs are additive — no existing behaviour or signatures changed — so this is a minor bump.

Added

Publication-ready summary report via Plspm.report(include_rho_a=True, include_htmt2=True). Bundles the engine’s individual diagnostics — reliability (Cronbach alpha, rho_A, rho_C, AVE), discriminant validity (HTMT, HTMT2, Fornell-Larcker), structural paths with f² effect sizes and p-values, per-LV R² / adjusted R² / BIC, fit indices (SRMR, d_ULS, GoF), and outer/inner VIF — into a single Report object covering the standard PLS-SEM research-report panels expected by Hair, Hult, Ringle and Sarstedt (2022, A Primer on PLS-SEM, 3rd ed.). Report.to_dict() returns every section in one dictionary, ready for JSON export. Pure orchestration: every value comes from an existing lazy-cached method on Plspm, so calling report() repeatedly is cheap. The two flags trade speed for completeness: turn off include_rho_a to skip PLSc; turn off include_htmt2 to skip the geometric-mean HTMT refinement.
PLSpredict full panel (Shmueli et al. 2019) via the existing Plspm.predict(...) method. metrics() now reports per-indicator MAPE for PLS and LM (proportion-form, mean(|err / actual|), matching sklearn’s convention) alongside RMSE and MAE, and adds the in-sample counterparts rmse_pls_in, mae_pls_in, mape_pls_in, rmse_lm_in, mae_lm_in, mape_lm_in computed from a single fit on the full data. The complete out-of-sample / in-sample comparison is what reviewers expect to see published (Shmueli, Sarstedt, Hair, Cheah, Ting, Vaithilingam and Ringle 2019, Table 6). MAPE rows where the actual value is zero are excluded from MAPE only; all other metrics still see them. Existing columns and summary() are unchanged.
Cohen’s f² effect size via Plspm.f_squared(). For each structural-model edge predictor -> endogenous, refits the endogenous LV’s OLS without the predictor and reports f² = (R²_full - R²_reduced) / (1 - R²_full) (Cohen 1988; Hair, Hult, Ringle & Sarstedt 2022). Returned FSquared instance exposes a long-format table() with conventional effect-size labels (none / small / medium / large, thresholds 0.02 / 0.15 / 0.35) and a square matrix() mirroring the path matrix. Computed lazily and cached.
Fornell-Larcker discriminant-validity criterion via Plspm.fornell_larcker(). Returns a square matrix with sqrt(AVE) on the diagonal and inter-construct correlations off-diagonal (Fornell and Larcker 1981). summary() produces a per-LV passes verdict (True when sqrt(AVE) exceeds every absolute off-diagonal entry in its row). Formative (Mode B) and single-indicator LVs receive NaN on the diagonal because AVE is undefined for them, and summary() flags them as non-applicable. The modern recommendation (Henseler, Ringle and Sarstedt 2015) is to prefer HTMT for discriminant validity; Fornell-Larcker is provided alongside, not in place of, HTMT. Computed lazily and cached.
Specific indirect effects for mediation analysis via Plspm.specific_indirect_effects(source, target, through=None) for point estimates and Bootstrap.specific_indirect_effects(source, target, through=None, alpha=0.05) for bootstrap percentile CIs. Implements the chain-product procedure of Zhao, Lynch and Chen (2010) and Nitzl, Roldan and Cepeda (2016): each mediation chain source -> M1 -> ... -> target carries an effect equal to the product of its path coefficients, and the per-iteration distribution of that product gives the inferential statistics. With through=None the structural-model DAG is searched for every simple chain from source to target; with through set, only that single chain is evaluated. Aligns the engine with seminr::specific_effect_significance().

1.3.0

Released 2026-06-09.

Disjoint two-stage higher-order construct (HOC) workflow as a first-class API. The legacy Config.add_higher_order (repeated-indicators / embedded two-stage) stays untouched for backward compatibility, so this is a minor bump.

Added

Disjoint two-stage higher-order constructs (HOC) via Plspm.higher_order(name, first_order, mode, structure, ...). Implements the workflow recommended in Sarstedt, Hair, Cheah, Becker and Ringle (2019) and Hair, Hult, Ringle and Sarstedt (2022, A Primer on PLS-SEM, 3rd ed., Chapter 8). The fitted Plspm becomes stage 1; its first-order LV scores are appended to the data as indicators of the new second-order construct, and a stage-2 Plspm is fit with the HOC in place of its first-order constituents in the structural part of the model. All four canonical HOC types are covered by combining the first-order LV modes (set on the base config) with the HOC mode:
- Type I (R-R) — first-order Mode A, HOC Mode A.
- Type II (R-F) — first-order Mode A, HOC Mode B.
- Type III (F-R) — first-order Mode B, HOC Mode A.
- Type IV (F-F) — first-order Mode B, HOC Mode B.
The returned HigherOrder instance exposes the stage-1 fit (base()), the stage-2 fit (refit()), the HOC’s measurement model (loadings()), the stage-2 structural path coefficients (path_coefficients()), and a per-first-order summary table. Nested HOCs work by calling higher_order() again on the stage-2 refit.

1.2.0

Released 2026-06-09.

Three seminr-aligned diagnostics for measurement-error correction, discriminant validity, and structural-equation endogeneity. All three APIs are additive — no existing behaviour or signatures changed — so this is a minor bump.

Added

Gaussian-copula endogeneity test via Plspm.copula(endogenous, suspected=None, n_boot=500, seed=42). Park and Gupta (2012) / Hult, Hair, Proksch, Sarstedt, Pinkwart and Ringle (2018) procedure for detecting endogeneity in PLS-SEM structural equations. For the structural equation of endogenous, each suspected predecessor LV is augmented with a copula term P_k = Φ⁻¹(F_n(X_k)) (F_n(x) = rank(x) / (n+1)), the augmented OLS regression is refit on the latent-variable scores, and each copula coefficient γ_k is tested by a non-parametric row bootstrap (same SE / t / p convention as LongBootstrap). Each suspected predictor is screened with a Cramér-von Mises normality test against its sample-fitted normal; summary() marks normal predictors as copula not admissible (normal) because under normality the copula term degenerates and the test cannot tell endogeneity from a Gaussian regressor. Returns per-predictor gamma, boot_se, t, p_value, cvm_p_nonnormal, the endogeneity-corrected augmented_paths, and a verdict column.
HTMT2 via Plspm.htmt2(). Geometric-mean refinement of the Heterotrait-Monotrait Ratio of Correlations (Roemer, Schuberth and Henseler 2021). Replaces the two arithmetic means in the original Henseler/Ringle/Sarstedt 2015 HTMT with geometric means (exp(mean(log(·)))), removing the bias HTMT shows when indicator loadings within a block are unequal. HTMT2 is consistent under the tau-equivalent / congeneric measurement model. Same API surface as HTMT — matrix() returns the symmetric matrix and pairs() returns the long-format view. Pairs involving a single-indicator construct or any zero indicator correlation are returned as NaN (the geometric mean is undefined in those cases). The same conservative discriminant-validity thresholds apply (HTMT2 < 0.85 / < 0.90).
Consistent PLS (PLSc) via Plspm.plsc(). Applies the Dijkstra and Henseler (2015) bias correction for reflective (Mode A) measurement: each construct receives a closed-form rho_A reliability (w'w)² · w'Sw / w'(ww' − diag) w, off-diagonal construct correlations are dis-attenuated by sqrt(rho_A_i · rho_A_j), and path coefficients are re-estimated by OLS on the adjusted correlation matrix. Corrected R², adjusted R², and outer loadings (λ_k = w_k · sqrt(rho_A) / (w'w)) are returned side by side with the composite-model originals. Formative (Mode B) and single-indicator constructs receive rho_A = 1 by convention and are not adjusted. Aligns with seminr::PLSc().

1.1.0

Released 2026-06-09.

Two seminr-aligned outer-model diagnostics. Both APIs are additive — no existing behaviour or signatures changed — so this is a minor bump.

Added

Variance Inflation Factor (VIF) diagnostics via Plspm.vif(). Two views: items() returns per-indicator VIF within each construct block (collinearity diagnostic primarily for Mode B / formative blocks; for each indicator x_j in a block with two or more indicators, x_j is regressed on the remaining indicators and VIF_j = 1 / (1 - R²_j)), and inner() returns per-predictor VIF for each endogenous LV (structural collinearity among antecedents — each predictor’s score is regressed on the other predictors’ scores). Single-indicator blocks and single-predictor endogenous LVs are omitted (VIF undefined or trivially 1). Aligns the engine with seminr::vif_items().
Confirmatory Tetrad Analysis (CTA-PLS) via Plspm.cta(n_boot=500, alpha=0.05, seed=42). Outer-model diagnostic that tests whether reflective (Mode A) specification is consistent with the data, per block of four or more indicators, using Bollen and Ting’s (1993) vanishing-tetrad theorem and the bootstrap procedure of Gudergan, Ringle, Wende and Will (2008). One canonical tetrad per indicator 4-tuple gives C(p, 4) non-redundant tetrads per block; each is bootstrapped to obtain a two-sided percentile p-value under H₀: τ = 0 (after centering the bootstrap distribution on zero), then reduced with a within-block Holm step-down correction at alpha. tetrads() returns the per-tetrad table; summary() returns the per-block verdict ("reflective supported" vs "reflective rejected"). Mode B blocks and reflective blocks with fewer than four indicators are omitted.

1.0.2

Released 2026-06-01.

Test-suite hardening release. No API changes, no runtime behaviour changes — 1.0.2 is binary-identical to 1.0.1 at runtime.

Added

Scheme-equivalence regression test (tests/test_scheme_equivalence_two_lv.py). On any two-LV model each LV has exactly one neighbour, so the inner-weighting update degenerates and PATH, CENTROID, FACTORIAL, PCA, and NEWTON must produce identical path coefficients, weights, loadings, and R². The test locks this invariant down to < 1e-6.
Redundancy-analysis regression test (tests/test_redundancy_analysis_mode_b.py). A Mode B (formative) driver block predicting a single-item global rating LV. Asserts path recovery is positive and within a sampling band across seeds, R² lies in the expected attenuated range, and the single-indicator loading is exactly 1.0. Also parametrized across inner schemes for the degenerate two-LV case.
Path-recovery regression test (tests/test_path_recovery_synthetic.py). Three-LV mediation chain X → M → Y with known structural coefficients. Asserts the engine recovers direct paths, indirect / direct / total effects, and the population R² on Y within sampling tolerance over multiple seeds.

Changed

Internal docstring and comment phrasing in openpls/config.py, openpls/fit.py, tests/test_fit.py, and tests/test_sign_convention.py now reference the underlying methodological convention (Henseler et al. 2014 §5.3, Hair et al., Wold) directly. No code behaviour change.

1.0.1

Released 2026-06-01.

Two SmartPLS-parity fixes discovered while validating 1.0.0 against 14 reference cases. No API changes.

Fixed

Per-LV sign vote in _MetricWeights.calculate(). The previous implementation computed np.sign(cor * odm) first and applied the membership mask after. Because np.sign(0) == 0 but math.copysign(1.0, 0) == +1.0 for the per-cell variant, every non-belonging indicator contributed a phantom +1 to the LV’s sign vote. A small LV (e.g. 3 indicators) embedded in a much larger model could therefore be out-voted by the phantom contributions of the larger LV, leaving it on the wrong sign even when every one of its own indicators correlated negatively with the latent direction. The sign is now computed first and multiplied by the membership mask, so non-belonging cells contribute 0 rather than +1. Empirical impact: the OI validation cases now match SmartPLS on Org_Ident → AC_Love (β was +0.41 versus SmartPLS −0.41).
Saturated-model SRMR and d_ULS exclude within-LV pairs for Mode B (formative) constructs. The implied indicator-correlation matrix Σ̂ = Λ Φ Λᵀ only constrains common-factor (Mode A) measurement. Mode B indicators are exogenous causes of the composite, so their pairwise correlation is empirical, not implied by Λᵢ Λⱼ. Including those pairs in the SRMR and d_ULS sums inflated both metrics purely as a measurement-model artifact (Henseler et al. 2014 §5.3, SmartPLS convention). Fit now builds an inclusion mask that excludes within-Mode-B-LV blocks and aggregates over the kept pairs only. Models without Mode B LVs are unaffected. Empirical impact: the Corporate Reputation Advanced d_ULS gap closes from +0.5104 to −0.0004.

Added

Regression test tests/test_sign_convention.py constructing a 14-indicator LV alongside a 3-indicator LV whose indicators are all inverted; pins the sign-vote behaviour against the old phantom-vote bug.
Regression test tests/test_fit.py::test_mode_b_within_lv_pairs_excluded_from_fit asserting that the within-Mode-B residual block is masked out of both SRMR and d_ULS sums by exact arithmetic identity.

1.0.0

Released 2026-06-01.

First stable release. The API surface (Plspm, Config, Mode, Scheme, IPMA, PLSPredict, Moderation, FIMIX) now follows semver: breaking changes require a major version bump.

Changed

Namespace renamed plspm → openpls. All imports change shape: from plspm import Plspm becomes from openpls import Plspm, import plspm.config as c becomes import openpls.config as c, and similarly for openpls.mode, openpls.scheme, openpls.mga, openpls.fimix, openpls.ipma, openpls.moderation, openpls.predict, openpls.long_bootstrap. The distribution name on PyPI (openpls-engine) is unchanged. Consumers upgrading from 0.7.x must rewrite their imports.
Column-wise standardization. Indicators are now standardized per column with Bessel-corrected variance (ddof=1), matching SmartPLS 4 conventions. The previous pooled-stack standardization is gone. This shifts numerical alignment closer to SmartPLS 4 for mixed-scale indicator blocks. Path coefficients and quality criteria can move by a few percent on existing models.
setup.py shim removed. Project metadata is fully driven by PEP 621 pyproject.toml. Source installs should use a modern pip (pip install -e . continues to work).

Fixed

LV name may equal an indicator column name. add_lv() no longer rejects configurations where a latent variable shares its name with one of its manifest variables (the ECSI CUSCO single-item LV pattern). Internal LV and MV namespaces are distinct, so the collision was a false positive.

0.7.0a3

Released 2026-05-30.

Second feature release. Ships four advanced PLS-SEM analyses (IPMA, PLSpredict, two-stage moderation, FIMIX-PLS) and two additional inner-weighting schemes (Newton/BFGS and Lohmöller’s PCA), filling the gap between the original plspm-python API and mainstream commercial PLS-SEM tools.

Added

Scheme.PCA: Lohmöller’s PCA inner-weighting scheme (Lohmöller 1989, Section 2.4.2). For each LV, the inner weights are the components of the first principal direction of its neighbor-score matrix, sign-flipped to correlate positively with the LV. Treats neighbor weights as a joint multivariate direction rather than as pairwise quantities.
Scheme.NEWTON: quasi-Newton (BFGS) inner-weighting scheme. For each latent variable, jointly fits inner weights over all neighbors (predecessors and successors together) via BFGS minimization of a least-squares objective, in contrast to the classical PATH scheme, which mixes OLS coefficients for predecessors with bare correlations for successors. Initialized from the analytical OLS solution; uses scipy.optimize for the second-order Hessian-secant update.
openpls.fimix.FIMIX: Finite Mixture PLS (Hahn et al. 2002) for latent class segmentation. EM algorithm with multiple random restarts detects K subgroups sharing the measurement model but with distinct structural paths. Reports per-class path coefficients, posterior memberships, hard assignments, and information criteria (AIC, AIC3, AIC4, BIC, CAIC, MDL5, normalized entropy EN). Exposed as Plspm.fimix(n_classes).
openpls.ipma.IPMA: Importance-Performance Map Analysis. For a chosen target endogenous LV, returns each predecessor’s importance (total effect) and performance (mean of 0-100-rescaled LV score), plus an indicator-level breakdown with rescaled-mean performance and normalized weights. Exposed as Plspm.ipma(target).
openpls.moderation.Moderation: two-stage moderation (Henseler and Chin 2010). Fits a base model, multiplies the standardized LV scores for predictor and moderator into a product column, and refits with that product as a single-indicator construct pointing at the target. Exposes base(), refit(), and interaction_effect().
openpls.predict.PLSPredict: PLSpredict via k-fold cross-validation. Per-indicator RMSE/MAE for PLS and a linear-regression benchmark, plus Q squared predict against the indicator-average baseline (Shmueli et al. 2019). Exposed as Plspm.predict(k=10, repeats=1, seed=42); summary() returns the per-indicator PLS-vs-LM verdict.

0.7.0a2

Released 2026-05-30.

First release published to PyPI. Identical code to 0.7.0a1; bumped only to validate the trusted-publisher pipeline end to end. The previous v0.7.0a1 GitHub release stays available as a download but was never uploaded to PyPI.

0.7.0a1

Released 2026-05-30.

All planned ports from the OpenPLS web app are now in. This is the first feature-complete pre-release.

Added

openpls.long_bootstrap.LongBootstrap: serial bootstrap with progress callback, sign-flipping, BCa percentile CIs, normal-approximation p-values, and a configurable success-rate floor. Suited for long-running, progress-streaming workloads.
openpls.mga.MGA and openpls.mga.GroupSpec: Multi-Group Analysis via Henseler permutation, with categorical and numeric-range group definitions, pairwise comparisons across 2+ groups, two-sided permutation p-values with Phipson-Smyth add-one smoothing.
Plspm(..., missing_strategy="mean"): mean replacement for NaN cells in indicator columns. Default "casewise" preserves upstream behavior.
openpls.q_squared.QSquared: Stone-Geisser Q squared via blindfolding with configurable omission distance D. Exposed as Plspm.q_squared().
openpls.htmt.HTMT: Heterotrait-Monotrait ratio of correlations.
openpls.fit.ModelFit: SRMR (Standardized Root Mean Square Residual) and d_ULS (unweighted least-squares discrepancy).
BIC for endogenous LVs in openpls.inner_summary.
Listwise-deletion fallback for Cronbach alpha and Dijkstra-Henseler rho when an LV’s indicator block contains NaN.
openpls.__version__ reports the installed package version at runtime.

Changed

Project metadata moved from setup.py to PEP 621 pyproject.toml.
Lint pipeline (ruff) and test pipeline (pytest) run on Python 3.10 through 3.13 in CI.

0.6.0a1

Released 2026-05-30.

Initial OpenPLS rebrand of the plspm-python 0.5.7 baseline.

Added

Forked plspm-python 0.5.7 with attribution preserved.
pyproject.toml, ruff config, GitHub Actions CI matrix (Py 3.10 to 3.13).