Classification Interpretation

Relate the latent classification axis to interpretable image feature scores.

After train_xaaenet, compare the model’s latent codes to interpretable pixel scores: latent matrix z (one row per image) and a feature table from compute_feature_score_table, on the same images in the same order. For binary classification, encode one level as class A (target_score = 1.0) and the other as class B (0.0).

Call run_pls_feature_figures to build, display, and optionally save two figures:

Alignment panels — one scatter plot per feature (latent axis vs feature value);
Importance ranking — bars ranked by how strongly each feature follows that axis.

A full end-to-end example lives in the tutorial notebook.

`run_pls_feature_figures`

from tell_me_why.feature_scores import compute_feature_score_table
from tell_me_why.visualization import run_pls_feature_figures

# z: (n_images, latent_dim) from your trained xAAEnet on the split you analyse
# target_score: 1.0 = class A, 0.0 = class B, same row order as z
df_features = compute_feature_score_table([str(p) for p in image_paths])

out = run_pls_feature_figures(
    z,
    target_score,
    df_features,
    target_label="A",
    mask_positive=target_score.astype(bool),  # True for class A
    positive_label="A",
    negative_label="B",
    save_dir="results/pls",
    show=True,
)

fig_panels = out["alignment_fig"]
fig_ranking = out["importance_fig"]
ranked_features = out["importance_rank"]

Arguments

Argument	Role
`z`	Latent codes from your trained model: one row per image, shape `(number of images, latent dimension)` — e.g. 500 images × 128-D latent space → `(500, 128)`
`target_score`	Binary targets: `1.0` = class A, `0.0` = class B
`feature_table`	DataFrame from `compute_feature_score_table`
`target_label`	Name of class A on the importance chart
`mask_positive`	Boolean mask where `target_score == 1.0` (class A)
`positive_label`, `negative_label`	Legend labels for class A and class B on alignment panels
`feature_columns`	Columns to plot (default: all eleven feature scores)
`save_dir`	Saves `feature_alignment_panels.png` and `feature_importance_ranking.png`
`show`	Call `plt.show()` for each figure (`False` keeps figures only in `out`)

Reading the alignment panels

Each panel plots PLS component 1 (horizontal, latent direction linked to your classes) against one standardized feature score on the vertical axis (each column from compute_feature_score_table is scaled to mean 0 and standard deviation 1 over your sample, so brightness, texture, etc. are comparable on the same plot grid).

Feature alignment panels — binary example (class A vs class B)

Element	Meaning
Grey / blue points	Class B / class A
Green line	Linear trend across all points; legend shows Pearson `r`
Top-left	Strength of association (`\|r²\|`) and regression p-value
▲ / ▼ signed r²	Feature increases or decreases when PLS1 moves toward class A (`target_label`)

Two extremes cases

1. Horizontal green line — weak link to classification

The regression line is flat and the cloud does not climb or fall along PLS1. The feature co-varies little with this latent axis: the model is probably not using this cue to distinguish the two classes (e.g. skewness, symmetry in the example above).

2. Diagonal green line — captured bias

Points spread along a clear slope: low PLS1 ↔︎ low feature value on one side, high PLS1 ↔︎ high value on the other. Class A and class B often separate left–right and bottom–top. That is a strong sign that the model may rely on this pixel-level cue to classify (e.g. redness_dominance, brightness in the example figure, with class A = Male and class B = Female).

Use the panels to confirm features that stand out in the importance ranking, not to rank them (that is the bar chart’s job).

Reading the importance ranking

One horizontal bar per feature, sorted by |signed r²| with PLS component 1 (strongest at the top).

Feature importance ranking — binary example (class A vs class B)

Element	Meaning
Green bar (right)	Feature increases with PLS1 toward class A (`target_label`)
Red bar (left)	Feature decreases when PLS1 moves toward class A (anti-aligned → class B)
Bar length	Strength of linear association (not a p-value)
Near-zero bar	Little linear coupling to the latent axis on the full sample

How to read the chart

Top of the list: Features most aligned with the latent decision axis — start here when asking what the model might use in image space.
Long green bars: Cues that rise with PLS1 toward class A; open the matching alignment panel — you should see a diagonal green line and separated classes.
Short or near-zero bars: Weak linear link to the latent axis; the corresponding alignment panel usually shows a horizontal green line and a mixed cloud.
Compare with alignment panels: The ranking is a compact summary; the grid validates whether a large bar reflects a real visual bias or an outlier-driven slope.

Reading the example figure

On this human gender classification (class A = Male, class B = Female), the chart suggests the latent classification axis is mostly carried by color and luminance, not by shape regularity:

Strong cues: redness_dominance and brightness have the longest green bars (high signed r²). The model’s PLS1 direction co-varies strongly with “more red” and “brighter” toward class A — plausible pixel-level biases for this task.
Weak cues: symmetry_error, skewness, variance, and eye_region_contrast have bars close to zero. Along PLS1, facial symmetry (and these other scores) show little linear association with the classification axis in this sample — not evidence that the model relied on them here.

This is a hypothesis from alignment statistics, not a proof of what the network computes internally. Always cross-check the alignment panels for the top features.

Use this figure for reporting and comparing runs; use the alignment panels for qualitative checks on the top features.

run_pls_feature_figures


def run_pls_feature_figures(
    z:np.ndarray, target_score:np.ndarray, feature_table, feature_columns:Sequence[str] | None=None,
    target_label:str='target', mask_positive:np.ndarray | None=None, positive_label:str='positive',
    negative_label:str='negative', save_dir:str | Path | None=None, show:bool=True
)->dict[str, Any]:

Build alignment panels and importance ranking; display and optionally save PNGs.