The Problem

What We're Investigating

Large vision-language models process images and text through dozens of internal layers before producing a response. When a fine-tuning adapter is applied, it modifies how those layers interact. But the specifics of that modification — where it concentrates, how it behaves under different conditions, and when it breaks down — are poorly understood across the industry.

We've been running a series of controlled experiments on vision-language models to characterize adapter behavior at the internal layer level. The work focuses on a single question with broad implications:

Is adapter behavior consistent, or does it depend on how you test it?

Results

What We've Found

Adapter behavior is regime-dependent. The same adapter on the same model responds through different internal pathways depending on the type of input variation it encounters.

We've characterized three distinct regimes so far, each producing measurably different internal geometry at the same critical layer in the model. Findings from one regime do not predict behavior in another.

We've also found that the adapter's output stability — its tendency to produce consistent answers — is selectively breakable. Under most natural prompt variations, the adapter holds steady. Under specific structural conditions, it escapes through identifiable routes. The escapes aren't random; they cluster into distinct mechanism classes with different internal signatures.

Regime Dependence

The same adapter shows structurally different internal behavior depending on the type of pressure applied. Measurements from one regime do not predict behavior in another.

Selective Breakability

Adapter output stability holds under most prompt variations but escapes under specific conditions through identifiable, structurally predictable routes.

Pathway Divergence

Two models can produce the same answer through different internal routes — one stable, one fragile. Output testing alone cannot distinguish them.

Implications

Why This Matters

These findings have practical implications for anyone deploying fine-tuned AI models in production environments.

Robustness testing in one regime doesn't guarantee stability in another. A model that passes prompt-variation testing may still be fragile under answer-conditioning pressure.

Adapter failure modes are structurally predictable. The conditions that break adapter stability follow patterns that can be characterized in advance.

Internal pathway analysis reveals what output testing misses. Surface-level evaluation is necessary but insufficient for safety-critical deployments.

The Bigger Picture

Adapter Portability

This mechanism work is part of a larger research effort on adapter portability — the problem of transferring fine-tuned behavior across model architectures without retraining. Early results have demonstrated that the functional profile of an adapter can transfer across architectures, while the geometric implementation cannot. Solving this separation is the central open problem.

The long-term goal is to make fine-tuning investments portable. When a better model comes out, your training investment shouldn't be locked to the old one.

39 Empirical findings
20 Consecutive experiments
19 Days from bare metal
1 GPU workstation
About This Work

Research Infrastructure

All research is conducted on a single-HEDT GPU workstation. No cloud compute, no cluster. The experimental infrastructure — including the fine-tuning pipeline, evaluation framework, and mechanism analysis tools — was built from bare metal in 19 days.

The research program has completed 20 consecutive experiments under a strict run-to-completion methodology, producing a cumulative findings document tracking 39 empirical observations with full provenance.

A white paper detailing the methodology and findings is in preparation.

HouseRuybe LLC builds computer vision and AI systems. Our primary product, BESorted, applies ML-driven perception to mail and package sorting operations. RuybeAI is our research division focused on vision-language model fine-tuning, adapter portability, and model mechanism characterization.