API reference
Compact, opinionated reference of the public API. For exhaustive docstrings, read the source under openpls/ or call help(openpls.Plspm) in a Python REPL.
Imports throughout assume:
import openpls.config as cfrom openpls import Plspmfrom openpls.scheme import Schemefrom openpls.mode import Modefrom openpls.scale import Scaleopenpls.config
Structure()
A builder for the path matrix that defines the structural model.
structure = c.Structure()structure.add_path(["IMAG"], ["EXPE", "SAT", "LOY"])path_df = structure.path() # lower-triangular DataFrame of 0/1add_path(source: list, target: list)records one or more directed edges. Eithersourceortargetmay be a list; the other must contain exactly one entry.path()returns the path matrix as a Pandas DataFrame, topologically ordered, ready to pass intoConfig.
MV(name: str, scale: Scale = None)
One manifest variable. name must match a column in your dataset. scale is only needed for nonmetric data.
Config(path: DataFrame, scaled: bool = True, default_scale: Scale = None)
The model specification consumed by Plspm.
path: the matrix fromStructure.path()(square, lower-triangular, 0/1).scaled: standardize MVs before fitting (defaultTrue, the standard PLS-SEM behavior). Only used whendefault_scaleisNone.default_scale: scale for nonmetric MVs (Scale.NUM,Scale.ORD,Scale.NOM,Scale.RAW).
Methods:
add_lv(lv_name, mode, *mvs)attaches a Mode A or Mode B latent variable with explicitMV(...)instances.add_lv_with_columns_named(lv_name, mode, data, col_name_starts_with, default_scale=None)is the shortcut when indicator columns share a prefix. With ECSI data and the conventionlv.lower(), this picks up every indicator for a given LV in one call.add_higher_order(hoc_name, mode, lvs)registers a second-order construct. Best paired with the two-stage HOC workflow; see the upstreamplspmdocs for the details.
openpls.Plspm
Plspm(data, config, scheme=Scheme.CENTROID, iterations=100, tolerance=1e-6, bootstrap=False, bootstrap_iterations=100, processes=2, missing_strategy="casewise")
The main entry point. Constructing a Plspm runs the algorithm; results are accessed via instance methods.
| Argument | Notes |
|---|---|
data | Pandas DataFrame. Must contain every column referenced as an MV. |
config | Config instance. |
scheme | One of the five Scheme enum values. |
iterations | Maximum PLS outer-loop iterations. Floored at 100. |
tolerance | Convergence tolerance for the weight update. |
bootstrap | Set True to run the upstream multiprocessing bootstrap inline. For long runs prefer LongBootstrap (below). |
bootstrap_iterations | Bootstrap resamples (must be a multiple of processes). |
processes | Worker processes for the bootstrap. |
missing_strategy | "casewise" (default) or "mean". See Core concepts. |
Result-object methods:
| Method | Returns |
|---|---|
scores() | DataFrame of latent-variable scores, one column per LV. |
outer_model() | DataFrame with weight, loading, communality, redundancy per indicator. |
inner_model() | Long-format DataFrame with OLS estimate, std error, t, `p> |
path_coefficients() | Square DataFrame mirroring the path matrix, with the path coefficients filled in. |
crossloadings() | DataFrame with indicators on the index and every LV on the columns. |
inner_summary() | DataFrame per LV with type (Exogenous/Endogenous), r_squared, r_squared_adj, block_communality, mean_redundancy, ave, and (for endogenous) bic. |
goodness_of_fit() | Tenenhaus GoF scalar. |
effects() | DataFrame with from, to, direct, indirect, total columns. |
specific_indirect_effects(source, target, through=None) | Point-estimate specific indirect effects (mediation analysis). Chain product of path coefficients along source -> M1 -> ... -> target. through=None enumerates every chain in the DAG; through=["M1", "M2"] evaluates that single chain. Returns a DataFrame indexed by chain label with from, to, via, estimate. For inference use Bootstrap.specific_indirect_effects(...). |
f_squared() | FSquared instance. Cohenâs f² per structural edge (Cohen 1988; Hair et al. 2022). table() returns the long-format view with effect-size labels; matrix() returns a square matrix mirroring the path matrix. Computed lazily and cached. |
fornell_larcker() | FornellLarcker instance. Fornell and Larcker (1981) discriminant-validity criterion: sqrt(AVE) on the diagonal vs inter-construct correlations off-diagonal. Mode B / single-indicator LVs get NaN on the diagonal. Computed lazily and cached. |
report(include_rho_a=True, include_htmt2=True) | Report instance bundling the reviewer-standard panels (reliability, discriminant validity, structural paths with f², per-LV R² / adj R² / BIC, fit indices, collinearity) into one object for publication-ready export. See openpls.report.Report. |
unidimensionality() | DataFrame per LV with Cronbach alpha, Dillon-Goldstein rho, and first/second eigenvalues. |
htmt() | HTMT instance. Call .matrix() for the square matrix or .pairs() for the long form. |
htmt2() | HTMT2 instance. Geometric-mean refinement of HTMT (Roemer, Schuberth and Henseler 2021). Same matrix() / pairs() API. Computed lazily and cached. |
model_fit() | ModelFit instance. Call .srmr(), .d_uls(), .residuals(). |
q_squared(omission_distance=7) | DataFrame indexed by endogenous LV with a q_squared column. Computed lazily via blindfolding. |
vif() | VIF instance. Per-indicator (items()) and per-predictor (inner()) Variance Inflation Factor. Computed lazily and cached. |
cta(n_boot=500, alpha=0.05, seed=42) | CTAPLS instance. Confirmatory Tetrad Analysis testing reflective-Mode-A specification of each block with at least four indicators. |
plsc() | PLSc instance. Consistent-PLS (Dijkstra and Henseler 2015) correction of paths, loadings, and R² for reflective Mode-A attenuation. Computed lazily and cached. |
copula(endogenous, suspected=None, n_boot=500, seed=42) | GaussianCopula instance. Park and Gupta (2012) / Hult et al. (2018) endogeneity test for the structural equation of endogenous. Returns per-predictor copula coefficients, bootstrap p-values, and a CramĂŠr-von Mises admissibility check. |
predict(k=10, repeats=1, seed=42) | PLSPredict instance for k-fold PLSpredict. |
ipma(target, scale_min=None, scale_max=None, indicator_scales=None) | IPMA instance for the target endogenous LV. |
fimix(n_classes, max_iter=500, tolerance=1e-6, n_restarts=5, seed=42) | FIMIX instance for K-class finite-mixture segmentation. |
higher_order(name, first_order, mode, structure, ...) | HigherOrder instance. Disjoint two-stage higher-order construct (Sarstedt et al. 2019; Hair et al. 2022). Uses the current fit as stage 1 and refits with a new second-order LV whose indicators are the first-order LV scores. |
micom(data, grouping_column, group_a, group_b, iterations=1000, seed=42) | MICOM instance. Three-step Measurement Invariance of Composite Models (Henseler, Ringle and Sarstedt 2016): per-construct compositional invariance via permutation on c, plus mean and variance equality tests on the pooled-weight composites. Run before MGA / moderation. |
data() | The dataset actually used by the fit (after the configured missing-value strategy). |
config() | The Config used by the fit. |
bootstrap() | The Bootstrap instance (requires constructing Plspm with bootstrap=True). Exposes weights(), loading(), r_squared(), paths(), total_effects(), and specific_indirect_effects(source, target, through=None, alpha=0.05) (bootstrap percentile CIs for mediation chains; see the Specific indirect effects note below). |
openpls.scheme.Scheme
Enum of inner-weighting schemes:
Scheme.CENTROID: classical sign-of-correlation scheme (upstream default).Scheme.FACTORIAL: covariance-based.Scheme.PATH: asymmetric OLS-for-predecessors, correlation-for-successors.Scheme.NEWTON: joint quasi-Newton (BFGS) optimization across all neighbors of each LV. Initialized from the analytical OLS solution; usesscipy.optimize.minimizewithgtol=1e-8.Scheme.PCA: LohmĂśllerâs first-principal-direction scheme. Treats neighbor weights as a joint multivariate direction.
openpls.mode.Mode
Mode.A(reflective). LV causes its indicators.Mode.B(formative). Indicators form the LV.
openpls.scale.Scale
Used for nonmetric data:
Scale.NUM: numeric, linearly transformable.Scale.RAW: numeric, no transformation.Scale.ORD: ordinal, monotonic transformation.Scale.NOM: nominal, non-monotonic transformation.
openpls.vif.VIF
Variance Inflation Factor diagnostics. Construct through Plspm.vif().
fit = Plspm(data, config, Scheme.CENTROID)vif = fit.vif()print(vif.items()) # per-indicator VIF within each construct blockprint(vif.inner()) # per-predictor VIF for each endogenous LVTwo views:
items(): long-format DataFrame with columnslv,indicator,vif. For every indicator in a block with two or more indicators,x_jis regressed on the remaining indicators of the same block andVIF_j = 1 / (1 - R²_j). Standard collinearity diagnostic for formative (Mode B) blocks; also informative for reflective blocks. Blocks with fewer than two indicators are omitted (VIF is undefined).inner(): dict keyed by endogenous LV name, value is a DataFrame with columnspredictor,vif. For every predictor LV of an endogenous LVY, the predictorâs score is regressed on the other predictorsâ scores. Use to detect structural multicollinearity among antecedents. Endogenous LVs with fewer than two predictors are omitted.
Returns inf when a regression is perfectly collinear (R² is numerically 1) and nan when the response has zero variance. A common rule of thumb is VIF < 5 (lenient) or < 3.3 (Diamantopoulos and Siguaw 2006) for formative indicators.
openpls.cta.CTAPLS
Confirmatory Tetrad Analysis for PLS (Gudergan, Ringle, Wende and Will 2008). Diagnostic for the outer model that tests whether reflective (Mode A) specification is consistent with the data, per block of four or more indicators, using Bollen and Tingâs (1993) vanishing-tetrad theorem. Construct through Plspm.cta(...).
fit = Plspm(data, config, Scheme.CENTROID)cta = fit.cta(n_boot=500, alpha=0.05, seed=42)print(cta.tetrads()) # per-tetrad table with bootstrap SE and Holm decisionprint(cta.summary()) # per-block verdictArguments:
n_boot: number of bootstrap resamples per block (default500, must be at least50).alpha: family-wise significance level for the within-block Holm step-down correction (default0.05).seed: RNG seed for the bootstrap. PassNonefor non-deterministic.
Procedure:
- One canonical tetrad per indicator 4-tuple â
s_ij ¡ s_kl â s_ik ¡ s_jlfori<j<k<lâ givesC(p, 4)non-redundant tetrads per block. - Bootstrap each tetrad to obtain its sampling distribution.
- Compute a two-sided percentile p-value under
Hâ: Ď = 0after centering the bootstrap distribution on zero. P-values are floored at1 / n_boot. - Apply Holm step-down at
alphawithin each block.
Mode B blocks and reflective blocks with fewer than four indicators are omitted (tetrads are undefined or vacuously satisfied).
Result methods:
tetrads(): long-format DataFrame with columnslv,indicators(the four indicator names, comma-separated),tetrad(observed sample value),boot_se,p_value,holm_decision("reject"or"fail to reject").summary(): per-block DataFrame with columnslv,n_indicators,n_tetrads,n_rejected,decision("reflective supported"when no tetrad rejects after Holm correction;"reflective rejected"otherwise).alpha()andn_boot(): echo the configured values.
openpls.plsc.PLSc
Consistent PLS (Dijkstra and Henseler 2015). Corrects path coefficients, outer loadings, and R² for measurement-error attenuation in reflective (Mode A) constructs. Construct through Plspm.plsc().
fit = Plspm(data, config, Scheme.CENTROID)plsc = fit.plsc()print(plsc.rho_a()) # per-construct Dijkstra-Henseler reliabilityprint(plsc.path_coefficients()) # corrected pathsprint(plsc.r_squared()) # corrected R² per endogenous LVprint(plsc.loadings()) # corrected outer loadings (common-factor form)print(plsc.adjusted_correlations()) # dis-attenuated construct correlation matrixprint(plsc.summary()) # rho_a + R² + adjusted R² per LVProcedure:
- Compute the closed-form Dijkstra-Henseler reliability
rho_A = (w'w)² ¡ w'Sw / w'(ww' â diag) wper construct, wherewis the PLS-normalized outer-weight vector andSis the standardized indicator covariance matrix with zero diagonal. Mode B (formative) constructs and single-indicator constructs receiverho_A = 1by convention. - Build the dis-attenuated construct correlation matrix: divide every off-diagonal entry of
cor(scores)bysqrt(rho_A_i ¡ rho_A_j); the diagonal stays1. - Re-estimate the standardized path coefficients by OLS on the adjusted correlations:
beta = R_xxâťÂš r_xy. - Recompute
R²and adjustedR²per endogenous LV from the corrected paths. - Rescale Mode A loadings to the consistent common-factor form
Ν_k = w_k ¡ sqrt(rho_A) / (w'w). Mode B loadings are left unchanged.
The composite LV scores themselves are not modified; the correction operates only on quantities derived from them. Use the corrected outputs when you intend the model to be interpreted as a common-factor (covariance-based) model alongside the composite-model originals.
Result methods:
rho_a(): Series indexed by LV, with the per-construct reliability.adjusted_correlations(): square DataFrame of the dis-attenuated construct correlations.path_coefficients(): DataFrame mirroringPlspm.path_coefficients(), but with PLSc-corrected coefficients.r_squared()andr_squared_adj(): Series indexed by endogenous LV with the corrected R² and adjusted R².loadings(): Series of corrected outer loadings indexed by indicator (loading_c).summary(): per-LV DataFrame withrho_a,r_squared,r_squared_adj. Exogenous LVs have NaN in the two R² columns.
openpls.copula.GaussianCopula
Gaussian-copula endogeneity test (Park and Gupta 2012; Hult, Hair, Proksch, Sarstedt, Pinkwart and Ringle 2018). Diagnoses whether a structural-equation predictor is correlated with the omitted-variable error in its endogenous LVâs regression. Construct through Plspm.copula(...).
fit = Plspm(data, config, Scheme.CENTROID)cop = fit.copula(endogenous="SAT", n_boot=500, seed=42)print(cop.coefficients()) # gamma, boot_se, t, p_value, cvm_p_nonnormal per predictorprint(cop.augmented_paths()) # endogeneity-corrected structural estimatesprint(cop.summary()) # adds a `decision` columnThe augmented regression is
Y = βâ + ÎŁâąź β⹟ Xâąź + ÎŁâ Îłâ ¡ ÎŚâťÂš(F_n(Xâ)) + ewhere F_n(x) = rank(x) / (n+1) is the rescaled empirical CDF and the sum over k runs over the suspected predictors. A significant Îłâ flags Xâ as endogenous. Inference uses a non-parametric row bootstrap (SE = std(Îł_b), t = Îł / SE, p = 2 (1 â ÎŚ(|t|))), matching the convention used by LongBootstrap.
Arguments:
endogenous: name of the endogenous LV whose structural equation is tested.suspected: predecessor LVs to augment with a copula term.None(default) tests every predecessor.n_boot: number of bootstrap resamples (default500, minimum50).seed: RNG seed for the bootstrap.Nonefor non-deterministic.
Result methods:
endogenous(): the endogenous LV under test.predictors(): all structural predecessors of the endogenous LV.suspected(): the subset that received a copula term.coefficients(): DataFrame withpredictor,gamma,boot_se,t,p_value, andcvm_p_nonnormal(CramĂŠr-von Mises p-value against a fitted normal â small means non-normality is supported and the test is admissible).augmented_paths(): Series of structural-path coefficients from the augmented OLS â the endogeneity-corrected estimates to compare againstPlspm.path_coefficients().summary(): adds adecisioncolumn tocoefficients():"endogeneity detected","no endogeneity detected","copula not admissible (normal)"when the predictor is too normal for the test to discriminate, or"inconclusive"if the bootstrap was singular.n_boot(): number of successful bootstrap iterations used.
The procedure requires the suspected predictor to be non-normal: under normality ÎŚâťÂš(F_n(X)) â X, so the copula term collapses and the augmented regression cannot distinguish endogeneity. The CramĂŠr-von Mises screen reports admissibility per predictor.
openpls.ipma.IPMA
Importance-Performance Map Analysis for one endogenous target LV. The recommended construction is via Plspm.ipma(target=...), but you can instantiate directly if needed.
fit = Plspm(data, config, Scheme.CENTROID)ipma = fit.ipma(target="SAT", scale_min=1.0, scale_max=10.0)print(ipma.latent_variables()) # importance, performance per predecessor LVprint(ipma.indicators()) # outer_weight, normalized_weight, performance per indicatorArguments:
target: name of the endogenous LV to analyze. Must have at least one incoming path.scale_min,scale_max: common scale bounds (e.g.1and7for a 7-point Likert). BothNonemeans each indicator is rescaled from its observed min/max.indicator_scales:{indicator: (min, max)}overrides per indicator.
Result methods:
latent_variables(): DataFrame indexed by LV withimportance(standardized total effect on the target) andperformance(mean of the 0-100-rescaled LV score).indicators(): DataFrame indexed by(lv, indicator)withouter_weight,normalized_weight,performance,scale_min,scale_max.
openpls.predict.PLSPredict
PLSpredict via k-fold cross-validation. Construct through Plspm.predict(...). Implements the full Shmueli, Sarstedt, Hair, Cheah, Ting, Vaithilingam and Ringle (2019) panel: per-indicator RMSE, MAE, and MAPE for both PLS and an LM benchmark, in both out-of-sample (k-fold CV) and in-sample (single full-data fit) variants, plus Q²_predict against the naive train-mean baseline.
pred = fit.predict(k=10, repeats=1, seed=42)print(pred.metrics()) # full per-indicator panel â see columns belowprint(pred.summary()) # "better" / "worse" / "tie" per indicator (PLS vs LM)Arguments:
k: number of folds (default10, must be at least 2 and not exceed the sample size).repeats: how many shuffle-and-refold rounds (default1).seed: RNG seed for fold shuffling. PassNonefor non-deterministic.
Result methods:
metrics(): per-indicator DataFrame indexed by(lv, indicator). Columns:- Out-of-sample (k-fold):
rmse_pls,mae_pls,mape_pls,q2_predict,rmse_lm,mae_lm,mape_lm. - In-sample (single full-data fit):
rmse_pls_in,mae_pls_in,mape_pls_in,rmse_lm_in,mae_lm_in,mape_lm_in. - MAPE is the proportion
mean(|err / actual|)(matching sklearnâs convention; multiply by 100 for the percent form). Rows where the actual value is zero are excluded from MAPE only; the other metrics still see them.
- Out-of-sample (k-fold):
summary(): per-indicator Series of"better","worse", or"tie"based on PLS vs LM out-of-sample RMSE. Aggregate to get the Shmueli et al. 2019 verdict (âhigh / medium / low / none predictive powerâ).
openpls.moderation.Moderation
Two-stage moderation (Henseler and Chin 2010): fit the base model, multiply standardized scores of predictor and moderator, then refit with the product as a single-indicator interaction LV pointing into target.
from openpls.moderation import Moderation
mod = Moderation( data, config, predictor="IMAG", moderator="EXPE", target="SAT",)print(mod.interaction_effect()) # estimate, std error, t, p>|t|print(mod.refit().path_coefficients())Arguments:
data,config: the same inputs you would pass toPlspm.predictor,moderator,target: LV names. Predictor and moderator must differ; target must be endogenous and cannot be either of the other two.interaction_name: defaults to"{predictor}_x_{moderator}".scheme,iterations,tolerance,missing_strategy: passed through to both stages.
Methods:
base(): the stage-1Plspmfit, without the interaction.refit(): the stage-2 fit, with the interaction LV.interaction_effect(): Series withestimate,std error,t,p>|t|forinteraction -> target. The OLS-derived t and p are convenience reporting; for inference, bootstrap the refit.
openpls.higher_order.HigherOrder
Disjoint two-stage higher-order construct (Sarstedt, Hair, Cheah, Becker and Ringle 2019; Hair, Hult, Ringle and Sarstedt 2022, A Primer on PLS-SEM, 3rd ed., Chapter 8). The current Plspm fit becomes stage 1; its first-order LV scores are appended as indicators of a new second-order construct, and a stage-2 Plspm is fit with the HOC in place of its first-order constituents. Construct through Plspm.higher_order(...).
fit1 = Plspm(data, config, Scheme.CENTROID)
stage2_structure = c.Structure()stage2_structure.add_path(["JOB_SAT"], ["INTENT_TO_STAY"])
hoc = fit1.higher_order( name="JOB_SAT", first_order=["PAY_SAT", "WORK_SAT", "SUPERVISION_SAT"], mode=Mode.A, # Type I (R-R): first-order A, HOC A structure=stage2_structure,)print(hoc.loadings()) # HOC measurement loadings (or weights, Mode B)print(hoc.path_coefficients()) # stage-2 structural paths, HOC includedprint(hoc.summary()) # per-first-order loading/weight + stage-1 R²The four canonical HOC types are obtained by combining the existing first-order LV modes with the HOC mode:
- Type I (Reflective-Reflective) â first-order Mode A,
mode=Mode.A. The HOC is a common factor measured by reflective first-order constructs. - Type II (Reflective-Formative) â first-order Mode A,
mode=Mode.B. The HOC is a composite formed by reflective first-order constructs. - Type III (Formative-Reflective) â first-order Mode B,
mode=Mode.A. The HOC is a common factor measured by formative composites. - Type IV (Formative-Formative) â first-order Mode B,
mode=Mode.B. The HOC is a composite formed by formative composites.
Arguments:
name: name of the second-order construct. Must not clash with an existing LV or with any indicator column in the original data.first_order: list of first-order LV names from the base model to roll up into the HOC. At least two are required.mode: measurement mode of the HOC w.r.t. its first-order indicators (Mode.AorMode.B).structure: aStructurefor the stage-2 path model. Must containname, and must not contain any of thefirst_orderLVs.iterations,tolerance,missing_strategy: passed through to the stage-2Plspmfit.
Methods:
name(),first_order(),hoc_mode(): echo the configuration.base(): the stage-1Plspm(the fit on whichhigher_orderwas called).refit(): the stage-2Plspm, which contains the HOC and all the non-rolled-up LVs from the base model.loadings(): outer loadings (Mode A) or outer weights (Mode B) of the HOC on its first-order indicators, indexed by first-order LV name.path_coefficients(): stage-2 structural path coefficients (HOC included). Pass-through torefit().path_coefficients().r_squared(): stage-2 R² per endogenous LV (HOC included if endogenous).summary(): per-first-order DataFrame withfirst_order, the HOCloading(Mode A) orweight(Mode B), andstage1_r_squared(the first-order LVâs stage-1 R² â0for exogenous first-order LVs).indicator_columns(): mapping{first_order_lv -> indicator_column_name}used to carry the stage-1 scores into the stage-2 data.
The legacy Config.add_higher_order (repeated-indicators / embedded two-stage) remains available for backward compatibility, but disjoint two-stage is the modern recommended path because the first-order constructs are not simultaneously their own measurement and the HOCâs predictors. Chained / nested HOCs work naturally: call higher_order() again on the previous refit().
openpls.fimix.FIMIX
Finite Mixture PLS (Hahn et al. 2002) for latent-class segmentation. Construct through Plspm.fimix(n_classes=K).
fmx = fit.fimix(n_classes=3, n_restarts=5, seed=42)print(fmx.class_sizes()) # mixture proportions per classprint(fmx.memberships()) # posterior class probabilities per caseprint(fmx.hard_assignments()) # argmax class label per caseprint(fmx.class_paths()) # per-class structural path coefficientsprint(fmx.fit_criteria()) # log_lik, n_params, AIC, AIC3, AIC4, BIC, CAIC, MDL5, ENArguments:
n_classes: K, number of mixture components (>= 2).max_iter: maximum EM iterations per restart (default 500).tolerance: convergence threshold on the log-likelihood.n_restarts: how many random EM restarts to run; the best (highest log-likelihood) is kept (default 5).seed: RNG seed for restart initialization.
The fit_criteria() Series exposes the standard information criteria for choosing K. Lower is better for AIC family and BIC; the normalized entropy EN lives in [0, 1] and higher means clearer class separation.
openpls.mga.MGA
Multi-Group Analysis via Henseler permutation tests. Use GroupSpec to define each subset:
from openpls.mga import MGA, GroupSpec
mga = MGA( data, config, grouping_column="region", groups=[ GroupSpec(name="west", values=["west"]), GroupSpec(name="east", values=["east"]), ], scheme=Scheme.CENTROID, iterations=5000, seed=42,)print(mga.group_estimates()) # per-group path coefficientsprint(mga.comparisons()) # pairwise differences and permutation p-valuesArguments:
data,config: standard PLS-SEM inputs.grouping_column: a column indatawhose values assign each row to at most one group.groups: list ofGroupSpec(name=, values=)(categorical / list membership) orGroupSpec(name=, range=(lo, hi))(inclusive numeric interval;Nonemeans unbounded on that side). At least two groups required.iterations: permutation iterations per pair (default 5000).seed: RNG seed.
Methods:
group_estimates(): long-format DataFrame withgroup,n,source,target,estimate.comparisons(): long-format pairwise differences withgroupA,groupB,source,target,estimateA,estimateB,difference,p_value. Two-sided permutation p-values with Phipson-Smyth add-one smoothing.
openpls.micom.MICOM
Measurement Invariance of Composite Models â the three-step Henseler, Ringle and Sarstedt (2016) procedure that must be run before interpreting MGA or moderation results. Reuses the same GroupSpec as MGA but is restricted to exactly two groups:
from openpls.mga import GroupSpecfrom openpls.micom import MICOM
micom = MICOM( data, config, grouping_column="region", group_a=GroupSpec(name="west", values=["west"]), group_b=GroupSpec(name="east", values=["east"]), scheme=Scheme.CENTROID, iterations=1000, seed=42,)print(micom.step2()) # compositional invariance per constructprint(micom.step3()) # mean / variance equality per constructprint(micom.summary()) # combined verdict: "full" / "partial" / "none"Arguments:
data,config: standard PLS-SEM inputs.grouping_column: column indatathat distinguishes the two groups (need not be a model indicator).group_a,group_b: twoGroupSpecinstances. Must have distinct names and disjoint masks (MICOMraises if the groups overlap or if either is empty).iterations: permutation iterations (default 1000). Step 3 reuses the pooled-fit weights and skips PLS refits per iteration, so the cost is dominated by Step 2 (one PLS refit per group per iteration).seed: RNG seed; passNonefor non-deterministic results.
Methods:
step2(): per construct, columnsconstruct,c,p_value,compositional_invariance.p_valueis a one-sided lower-tail permutation probability;compositional_invarianceisTruewhenp_value >= 0.05(cannot rejectc = 1).step3(): per construct, columnsconstruct,mean_diff,mean_p_value,mean_equal,log_var_ratio,var_p_value,var_equal. Both p-values are two-sided permutation probabilities; pooled-fit weights are applied to standardized indicators to produce common-scale composite scores before computing means and variances.summary(): per construct, combines Steps 2 and 3 into a singleinvarianceverdict:"full"(Step 2 and both Step 3 sub-tests pass),"partial"(Step 2 passes but mean or variance differs â MGA is still interpretable for path differences),"none"(Step 2 fails â composites are not comparable and MGA results would be uninterpretable).group_sizes(): dict of per-group observation counts (audit trail for Step 1 configural invariance).
For more than two groups, run MICOM pairwise.
openpls.f_squared.FSquared
Cohenâs f² effect size for the structural model (Cohen 1988; Hair, Hult, Ringle and Sarstedt 2022, A Primer on PLS-SEM, 3rd ed.). For every directed edge predictor -> endogenous, refits the endogenous LVâs OLS with the predictor removed and reports the change in R² normalised by the residual variance:
f² = (R²_full â R²_reduced) / (1 â R²_full)Construct through Plspm.f_squared() (lazy, cached).
fit = Plspm(data, config)f2 = fit.f_squared()print(f2.table()) # long format with effect-size labelsprint(f2.matrix()) # square matrix mirroring the path matrixMethods:
table(): DataFrame indexed by"predictor -> endogenous". Columnsfrom,to,r_squared_full,r_squared_reduced,f_squared,effect_size.matrix(): square DataFrame with the same shape as the path matrix. Rows are targets, columns are sources. Cells outside the structural model areNaN.
Effect-size labels follow the conventional Cohen / Hair thresholds:
noneforf² < 0.02smallfor0.02 <= f² < 0.15mediumfor0.15 <= f² < 0.35largeforf² >= 0.35
openpls.fornell_larcker.FornellLarcker
Fornell-Larcker discriminant-validity criterion (Fornell and Larcker 1981). Construct through Plspm.fornell_larcker() (lazy, cached).
fit = Plspm(data, config)fl = fit.fornell_larcker()print(fl.matrix()) # sqrt(AVE) on diagonal, LV correlations off-diagonalprint(fl.summary()) # per-LV passes verdictMethods:
matrix(): square DataFrame. Diagonal entries aresqrt(AVE_lv)for reflective (Mode A) constructs andNaNfor formative (Mode B) and single-indicator constructs (AVE is undefined there). Off-diagonal entries are inter-construct correlations from the standardized latent-variable scores.ave(): Series of Average Variance Extracted per LV, withNaNfor non-Mode-A / single-indicator constructs.summary(): DataFrame indexed by LV withsqrt_ave,max_abs_corr,passes(boolean,NAwhen AVE is undefined), andnote.passesisTrueiffsqrt(AVE_lv)exceeds every absolute off-diagonal entry in the LVâs row.
The modern recommendation (Henseler, Ringle and Sarstedt 2015) is to prefer HTMT (Plspm.htmt() or Plspm.htmt2()) for discriminant validity, but reviewers frequently still request the Fornell-Larcker table as well.
openpls.report.Report
Publication-ready summary report (Hair, Hult, Ringle and Sarstedt 2022, A Primer on PLS-SEM, 3rd ed.). Bundles the engineâs individual diagnostics into the panels you need for the standard PLS-SEM research report, so the whole reporting layer can be exported with a single call. Construct through Plspm.report(...). Pure orchestration â every value comes from an existing lazy-cached method on Plspm.
fit = Plspm(data, config)rep = fit.report()
rep.reliability() # Cronbach alpha, rho_A, rho_C, AVE per LVrep.discriminant_validity() # HTMT (+HTMT2), Fornell-Larcker matrices and summaryrep.paths() # Structural paths with std error, t, p, f², effect sizerep.construct_summary() # type / mvs / R² / adj R² / BIC per LVrep.fit_indices() # SRMR, d_ULS, GoFrep.collinearity() # Outer + inner VIFrep.to_dict() # Bundle every section for export (e.g. JSON)Arguments:
include_rho_a(defaultTrue): include the Dijkstra-Henselerrho_Acolumn inreliability(). TriggersPlspm.plsc()internally; falls back toNaNif PLSc cannot run.include_htmt2(defaultTrue): include the HTMT2 matrix and pair list (Roemer, Schuberth and Henseler 2021) indiscriminant_validity().
Methods:
reliability(): DataFrame indexed by LV with columnsmode,mvs,cronbach_alpha,rho_a(wheninclude_rho_a),rho_c,ave. Mode B (formative) and single-indicator LVs receiveNaNfor the metrics that are undefined for them.discriminant_validity(): dict withhtmt(matrix),htmt_pairs(long form),fornell_larcker(matrix),fornell_larcker_summary(per-LV passes verdict), and (wheninclude_htmt2)htmt2andhtmt2_pairs.paths(): DataFrame indexed by"predictor -> endogenous"with columnsfrom,to,estimate,std_error,t,p_value,f_squared,effect_size.construct_summary(): DataFrame per LV withtype(Exogenous / Endogenous),mvs,r_squared,r_squared_adj,bic,block_communality,mean_redundancy.fit_indices(): Series withsrmr,d_uls,goodness_of_fit(NaN if all constructs are single-item).collinearity(): dict withitems(per-indicator outer VIF; may beNoneif no block has at least two indicators) andinner(dict of per-endogenous-LV VIF tables).to_dict(): bundles every section above into a single dictionary, ready for export.
Specific indirect effects
Mediation analysis via chain products of path coefficients (Zhao, Lynch and Chen 2010; Nitzl, Roldan and Cepeda 2016). For every chain source -> M1 -> ... -> target the specific indirect effect is the product of the path coefficients along the chain; the total indirect effect from source to target is the sum of all such products.
fit = Plspm(data, config, bootstrap=True, bootstrap_iterations=500, processes=4)
# Point estimates of every mediation chain from IMAG to LOYfit.specific_indirect_effects("IMAG", "LOY")
# Single chain with explicit mediatorsfit.specific_indirect_effects("IMAG", "LOY", through=["EXPE", "SAT"])
# Bootstrap percentile CIs (95% by default)fit.bootstrap().specific_indirect_effects("IMAG", "LOY", through=["EXPE", "SAT"])Arguments:
source/target: latent-variable names. Must differ.through: explicit chain[M1, M2, ...]of intermediate LVs. Each consecutive pair (includingsource -> M1and the lastMk -> target) must be a direct edge in the structural model.None(default) auto-enumerates every simple directed chain fromsourcetotargetof length two or more.alpha(bootstrap only): two-sided level for the percentile CI (default0.05â 95%).
Returns:
- Point estimate (
Plspm.specific_indirect_effects): DataFrame indexed by chain label (e.g."IMAG -> EXPE -> SAT -> LOY") with columnsfrom,to,via(tuple of intermediates),estimate. - Bootstrap (
Bootstrap.specific_indirect_effects): the same identifying columns plusoriginal,mean,std.error,perc.lower,perc.upper,t stat.. The CI is a non-parametric percentile interval from the per-iteration distribution of the chain product.
Notes:
- The structural model is assumed acyclic; chain enumeration is a depth-first search with per-branch cycle guards.
- The sum of
estimateover all chains fromsourcetotargetequals theindirectcolumn ofeffects().loc["source -> target"]. - Aligns the engine with
seminr::specific_effect_significance().
openpls.long_bootstrap.LongBootstrap
Single-process bootstrap with progress callbacks, sign-flipping, BCa percentile CIs, and a configurable success-rate floor. Use this for long-running workloads (Cloud Run, queued jobs) where you want progress streamed and partial failures tolerated.
from openpls.long_bootstrap import LongBootstrap
def report(done, total): print(f"{done}/{total}")
boot = LongBootstrap( data, config, scheme=Scheme.CENTROID, iterations=5000, seed=42, alpha=0.05, on_progress=report, progress_every=100, min_success_ratio=0.1,)print(boot.paths()) # per-path original, boot_mean, se, t, p_value, ci_lower, ci_upper, validprint(boot.loadings()) # per-indicator loading statsprint(boot.weights()) # per-indicator weight statsprint(boot.total_effects()) # full total-effects tableArguments:
data,config,scheme: standard PLS-SEM inputs.iterations: bootstrap resamples (default 5000).seed: RNG seed (default 42).alpha: significance level for the BCa CI (default 0.05 for 95 percent CIs).on_progress: callback(done, total). Called everyprogress_everyiterations and at least every 5 seconds of wall time.progress_every: callback frequency in iterations (default 100).min_success_ratio: floor on completed-vs-attempted iterations. If too many resamples fail, the constructor raisesRuntimeError.
Versioning
The installed version is exposed at runtime as openpls.__version__. Pin a specific version with pip install openpls-engine==X.Y.Z for reproducible analyses.