Guide

Core concepts

A quick orientation for users new to PLS-SEM or unsure how the library maps onto the textbook formalism. If you already know PLS-SEM well, skip to the API reference.

PLS-SEM in 60 seconds

Structural Equation Modeling estimates relationships among latent variables (LVs): constructs you cannot measure directly, like “Customer Satisfaction” or “Brand Trust.” Each LV is reflected by (or formed from) one or more manifest variables (MVs), the indicators you actually observe in your data, typically survey items on a Likert scale.

A PLS-SEM model has two parts:

  • The outer model (also called the measurement model) describes how each LV relates to its indicators.
  • The inner model (the structural model) describes how the LVs relate to each other.

Partial Least Squares estimates both jointly by iteratively reweighting indicators until the LV scores stabilize. Once converged, the model gives you path coefficients (how strongly LV A affects LV B), R squared per endogenous LV, outer loadings and weights, and a long list of quality criteria.

In openpls-engine, the inner model is specified through Structure.add_path([source], [target]), and the outer model via Config.add_lv(...) or the prefix-based shortcut Config.add_lv_with_columns_named(...).

Modes A and B

Each LV is measured either reflectively or formatively, and you tell the engine which by setting its Mode.

  • Mode A (reflective): the LV causes its indicators. All indicators of a reflective LV should correlate strongly with each other because they all reflect the same underlying construct. Use this for attitudinal and perception constructs like Satisfaction or Trust, where each item is a manifestation of one shared latent feeling.
  • Mode B (formative): the indicators cause the LV. The indicators form the construct, and they need not correlate (they capture different facets). Use this for composite constructs like a “Marketing Mix” index built from price, promotion, place, product, or for inherently formative indices like socioeconomic status.

The choice is theoretical, not statistical. Get it wrong and your loadings, weights, and bootstrap CIs all stop meaning what they normally mean. Hair et al. (2022) walk through how to decide.

Inner-weighting schemes

The inner-weighting scheme is how PLS combines neighboring LV scores during the inner update. openpls-engine ships five:

  • Scheme.CENTROID (the upstream default). Inner weights are the sign of correlation between neighbors. Robust, but ignores the magnitude of the relationship.
  • Scheme.FACTORIAL. Inner weights are the covariance between neighbors. Magnitude-sensitive but sign-ambiguous in some edge cases.
  • Scheme.PATH. Asymmetric: OLS regression coefficients for predecessors, bare correlations for successors. The most popular scheme in published PLS-SEM work because it usually has the best behavior on real survey data.
  • Scheme.NEWTON (new in OpenPLS). Joint quasi-Newton (BFGS) optimization of all neighbor weights for each LV under one least-squares objective. The most advanced of the five; recommended when convergence stability matters or when the asymmetric PATH scheme feels wrong for your model.
  • Scheme.PCA (new in OpenPLS, Lohmöller 1989, Section 2.4.2). Inner weights are the components of the first principal direction of the neighbor-score matrix. Treats neighbor weights as a joint multivariate direction rather than as pairwise quantities.

For most published work, PATH is the conservative choice. NEWTON is the one to try when you want a theoretically cleaner, symmetric handling of predecessors and successors.

Scaling and metric vs nonmetric data

The Config constructor has a scaled flag that decides whether manifest variables are standardized (mean 0, variance 1) before fitting. For typical Likert-style data:

  • Set scaled=True (default) for the standard standardized PLS-SEM, comparable with SmartPLS and seminr outputs.
  • Set scaled=False only when you specifically want unstandardized weights and have a reason (rare).

For nonmetric data (ordinal, nominal, or mixed scales), pass default_scale=Scale.NUM, Scale.ORD, Scale.NOM, or Scale.RAW on the Config, or attach per-MV scales through the MV(name, scale=...) constructor:

  • Scale.NUM for numeric variables that should be linearly transformed.
  • Scale.RAW for numeric variables that should not be transformed at all.
  • Scale.ORD for ordinal variables (monotonic transformation).
  • Scale.NOM for nominal variables (non-monotonic transformation).

If you specify a scale for any MV, you must specify one for all (or set a default_scale).

Missing values

PLS-SEM with missing data is awkward; published recommendations vary. The engine supports two strategies, selected on the Plspm constructor:

  • missing_strategy="casewise" (default, matches upstream plspm-python). Any row with at least one NaN in any modeled indicator is dropped before fitting. Safe and conservative; loses sample size when missingness is widespread.
  • missing_strategy="mean" (matches the “Mean replacement” option in commercial PLS-SEM software). Each NaN is replaced with the column mean of its indicator. Keeps all rows; biases the indicator distributions toward zero variance proportional to the missingness rate.

For more nuanced treatments (multiple imputation, full-information ML), preprocess the data before calling Plspm.

What openpls-engine does not do (yet)

  • Out-of-the-box plotting. The engine returns tidy DataFrames; visualization is up to you (or the OpenPLS web app).
  • Confirmatory composite analysis (CCA). A separate methodology (Henseler et al. 2014) with its own pipeline; not in scope for the current release.
  • Necessary condition analysis (NCA). Often combined with PLS-SEM but a different statistical procedure; not shipped here.
  • Alternative segmentation methods beyond FIMIX-PLS (e.g., REBUS-PLS, POS-PLS). FIMIX covers the finite-mixture case; the response-based variants are not implemented.

Next: head to the API reference for the full surface, or jump into worked examples.