Getting started

Quickstart

This walks through a complete PLS-SEM fit on the ECSI customer-satisfaction model: 6 latent variables, 27 indicators, around 250 observations. The dataset ships with the test suite as tests/data/satisfaction.csv.

If you do not have it locally, you can grab it from the repo: tests/data/satisfaction.csv.

1. Load the data

The CSV has one row per respondent. Indicators are named by construct: imag1, imag2, … for the Image LV; expe1, … for Customer Expectations; etc. The first column is a row index.

import pandas as pd
satisfaction = pd.read_csv("tests/data/satisfaction.csv", index_col=0)
print(satisfaction.shape)
# (250, 28)

2. Define the structural model

The structural (inner) model says which LV affects which. In ECSI:

  • IMAG (Image) feeds into Expectations, Satisfaction, and Loyalty.
  • EXPE (Expectations) feeds into Quality, Value, and Satisfaction.
  • QUAL (Quality) feeds into Value and Satisfaction.
  • VAL (Value) feeds into Satisfaction.
  • SAT (Satisfaction) feeds into Loyalty.

Structure builds this from add_path calls, then Config consumes the resulting path matrix.

import openpls.config as c
from openpls.mode import Mode
structure = c.Structure()
structure.add_path(["IMAG"], ["EXPE", "SAT", "LOY"])
structure.add_path(["EXPE"], ["QUAL", "VAL", "SAT"])
structure.add_path(["QUAL"], ["VAL", "SAT"])
structure.add_path(["VAL"], ["SAT"])
structure.add_path(["SAT"], ["LOY"])

3. Attach indicators to LVs

Each LV needs its measurement model. With the ECSI naming convention (lowercase prefix matching the LV name), add_lv_with_columns_named is the shortcut: it picks up every column starting with that prefix. All six LVs here are reflective (Mode A).

config = c.Config(structure.path(), scaled=False)
for lv in ["IMAG", "EXPE", "QUAL", "VAL", "SAT", "LOY"]:
config.add_lv_with_columns_named(lv, Mode.A, satisfaction, lv.lower())

If your indicators do not share a prefix, use Config.add_lv(lv_name, Mode.A, MV("col1"), MV("col2"), ...) instead. See the API reference for details.

4. Fit the model

from openpls import Plspm
from openpls.scheme import Scheme
result = Plspm(satisfaction, config, Scheme.CENTROID)

Plspm runs the full PLS algorithm and computes the standard metrics eagerly. Q squared, IPMA, PLSpredict, moderation, and FIMIX are computed lazily on demand (next sections).

5. Inspect the inner model

print(result.inner_summary())
type r_squared r_squared_adj block_communality mean_redundancy ave
IMAG Exogenous 0.000000 0.000000 0.582287 0.000000 0.582287
EXPE Endogenous 0.335194 0.332514 0.563023 0.188704 0.563023
QUAL Endogenous 0.719173 0.718041 0.660628 0.475327 0.660628
VAL Endogenous 0.547778 0.544133 0.652035 0.357241 0.652035
SAT Endogenous 0.706505 0.701696 0.756834 0.534452 0.756834
LOY Endogenous 0.461894 0.457543 0.638674 0.295005 0.638674

For path coefficients (one row per endogenous LV, one column per LV that points into it):

print(result.path_coefficients())

For the long-format inner model with t and p values (these are OLS estimates; for bootstrap CIs use Plspm(..., bootstrap=True) or LongBootstrap):

print(result.inner_model())

6. Inspect the outer model

print(result.outer_model())
weight loading communality redundancy
imag1 0.2426 0.7167 0.5137 0.0000
imag2 0.1827 0.5797 0.3360 0.0000
imag3 0.3034 0.7710 0.5945 0.0000
imag4 0.2587 0.7401 0.5478 0.0000
imag5 0.2596 0.7596 0.5770 0.0000
...

For the cross-loadings matrix (indicator vs every LV):

print(result.crossloadings())

7. Discriminant validity (HTMT)

The Heterotrait-Monotrait ratio is the modern standard for discriminant validity. Pairs with HTMT below 0.85 (or 0.90 for conceptually similar constructs) are considered distinct.

htmt = result.htmt()
print(htmt.matrix()) # square matrix
print(htmt.pairs()) # long-format pair list

8. Model fit (SRMR, d_ULS)

fit = result.model_fit()
print(fit.srmr()) # Standardized Root Mean Square Residual
print(fit.d_uls()) # unweighted least-squares discrepancy

SRMR below 0.08 is the conventional threshold for “good” model-data alignment in PLS-SEM.

9. Predictive relevance (Stone-Geisser Q squared)

print(result.q_squared())

Returns one Q squared per endogenous LV via blindfolding. Q squared above zero means the model has predictive relevance for that LV.

What’s next