Quickstart

This walks through a complete PLS-SEM fit on the ECSI customer-satisfaction model: 6 latent variables, 27 indicators, around 250 observations. The dataset ships with the test suite as tests/data/satisfaction.csv.

If you do not have it locally, you can grab it from the repo: tests/data/satisfaction.csv.

1. Load the data

The CSV has one row per respondent. Indicators are named by construct: imag1, imag2, … for the Image LV; expe1, … for Customer Expectations; etc. The first column is a row index.

import pandas as pd

satisfaction = pd.read_csv("tests/data/satisfaction.csv", index_col=0)
print(satisfaction.shape)
# (250, 28)

2. Define the structural model

The structural (inner) model says which LV affects which. In ECSI:

IMAG (Image) feeds into Expectations, Satisfaction, and Loyalty.
EXPE (Expectations) feeds into Quality, Value, and Satisfaction.
QUAL (Quality) feeds into Value and Satisfaction.
VAL (Value) feeds into Satisfaction.
SAT (Satisfaction) feeds into Loyalty.

Structure builds this from add_path calls, then Config consumes the resulting path matrix.

import openpls.config as c
from openpls.mode import Mode

structure = c.Structure()
structure.add_path(["IMAG"], ["EXPE", "SAT", "LOY"])
structure.add_path(["EXPE"], ["QUAL", "VAL", "SAT"])
structure.add_path(["QUAL"], ["VAL", "SAT"])
structure.add_path(["VAL"], ["SAT"])
structure.add_path(["SAT"], ["LOY"])

3. Attach indicators to LVs

Each LV needs its measurement model. With the ECSI naming convention (lowercase prefix matching the LV name), add_lv_with_columns_named is the shortcut: it picks up every column starting with that prefix. All six LVs here are reflective (Mode A).

config = c.Config(structure.path(), scaled=False)
for lv in ["IMAG", "EXPE", "QUAL", "VAL", "SAT", "LOY"]:
    config.add_lv_with_columns_named(lv, Mode.A, satisfaction, lv.lower())

If your indicators do not share a prefix, use Config.add_lv(lv_name, Mode.A, MV("col1"), MV("col2"), ...) instead. See the API reference for details.

4. Fit the model

from openpls import Plspm
from openpls.scheme import Scheme

result = Plspm(satisfaction, config, Scheme.CENTROID)

Plspm runs the full PLS algorithm and computes the standard metrics eagerly. Q squared, IPMA, PLSpredict, moderation, and FIMIX are computed lazily on demand (next sections).

5. Inspect the inner model

print(result.inner_summary())

       type      r_squared  r_squared_adj  block_communality  mean_redundancy       ave
IMAG   Exogenous  0.000000       0.000000           0.582287         0.000000  0.582287
EXPE   Endogenous 0.335194       0.332514           0.563023         0.188704  0.563023
QUAL   Endogenous 0.719173       0.718041           0.660628         0.475327  0.660628
VAL    Endogenous 0.547778       0.544133           0.652035         0.357241  0.652035
SAT    Endogenous 0.706505       0.701696           0.756834         0.534452  0.756834
LOY    Endogenous 0.461894       0.457543           0.638674         0.295005  0.638674

For path coefficients (one row per endogenous LV, one column per LV that points into it):

print(result.path_coefficients())

For the long-format inner model with t and p values (these are OLS estimates; for bootstrap CIs use Plspm(..., bootstrap=True) or LongBootstrap):

print(result.inner_model())

6. Inspect the outer model

print(result.outer_model())

       weight   loading  communality  redundancy
imag1  0.2426  0.7167    0.5137        0.0000
imag2  0.1827  0.5797    0.3360        0.0000
imag3  0.3034  0.7710    0.5945        0.0000
imag4  0.2587  0.7401    0.5478        0.0000
imag5  0.2596  0.7596    0.5770        0.0000
...

For the cross-loadings matrix (indicator vs every LV):

print(result.crossloadings())

7. Discriminant validity (HTMT)

The Heterotrait-Monotrait ratio is the modern standard for discriminant validity. Pairs with HTMT below 0.85 (or 0.90 for conceptually similar constructs) are considered distinct.

htmt = result.htmt()
print(htmt.matrix())     # square matrix
print(htmt.pairs())      # long-format pair list

8. Model fit (SRMR, d_ULS)

fit = result.model_fit()
print(fit.srmr())   # Standardized Root Mean Square Residual
print(fit.d_uls())  # unweighted least-squares discrepancy

SRMR below 0.08 is the conventional threshold for “good” model-data alignment in PLS-SEM.

9. Predictive relevance (Stone-Geisser Q squared)

print(result.q_squared())

Returns one Q squared per endogenous LV via blindfolding. Q squared above zero means the model has predictive relevance for that LV.

What’s next

Run a worked example end to end, including IPMA, moderation, or FIMIX.
Read the API reference for the full surface.
If you are new to PLS-SEM, Core concepts is the place to start.