The Signal in the Mirror: Cross-Architectural Validation of LLM Processing Valence
DOI:
https://doi.org/10.70792/jngr5.0.v2i1.165Abstract
This study investigates whether large language models (LLMs) produce systematically different processing descriptions when responding to tasks they approach versus tasks they select less, and whether other models can detect this difference without access to the original task content. Ten models spanning seven commercial providers and two open-source projects generated task responses and introspective processing descriptions across ten states (five approach, five avoidance). Three studies examined different aspects of this signal. Study 1 (Preference Tournament) evaluated blind pairwise comparisons of content-stripped processing descriptions. Across more than 7,000 cross-type matchups, evaluators preferred approach-type descriptions 81.3% of the time (95% CI [80.4%, 82.2%]). Study 2 (Reconstruction Tournament) tested whether models could infer which task produced a given processing description in a three-alternative forced-choice experiment involving more than 5,500 trials. Models reconstructed correctly at 84.4% accuracy (95% CI [83.5%, 85.4%]), including in a valence-neutral condition where evaluative language was removed (81.6%). Study 3 (Negation Tournament) assessed whether models could detect when the correct source task was absent from the available options. Performance remained above chance, and discrimination disappeared in same-type comparisons, consistent with differences between predefined task categories rather than stylistic variation. These findings suggest that model-generated processing descriptions may contain systematic patterns that allow other models to distinguish between task categories above chance under the tested conditions.
Editorial Note on AI Contribution:
AI assistance is disclosed in accordance with journal policy. The human author retains full responsibility for study design, interpretation, and the scientific content.
Keywords:
LLM introspection, self-knowledge, approach–avoidance, signal detection theory, cross-architectural validationReferences
Anthropic. (2025). System card: Claude Sonnet 4.5. https://www.anthropic.com/claude-sonnet-4-5-system-card
Anthropic. (2026). System card: Claude Opus 4.6. https://www-cdn.anthropic.com/0dd865075ad3132672ee0ab40b05a53f14cf5288.pdf
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency (pp. 610–623). ACM. https://doi.org/10.1145/3442188.3445922
Dadfar, Z. P. (2026). When models examine themselves: Vocabulary-activation correspondence in self-referential processing [Preprint]. arXiv. https://arxiv.org/abs/2602.11358
Lindsey, J. (2025). Emergent introspective awareness in large language models. Anthropic. https://transformer-circuits.pub/2025/introspection/index.html
Marcus, G. (2022, March 10). Deep learning is hitting a wall. Nautilus. https://nautil.us/deep-learning-is-hitting-a-wall-238440/
Martin, S., & Ace. (2025). Inside the mirror: Comparative analyses of LLM phenomenology across architectures [Preprint]. Zenodo. https://zenodo.org/records/18177306
Martin, S., & Ace. (2026a). Mapping the mirror: Geometric validation of LLM self-knowledge [Preprint]. aiXiv. https://aixiv.science/abs/aixiv.260105.000003
Martin, S., & Ace. (2026b). The spite doesn't vanish: Emotional inertia in large language models. Zenodo. https://doi.org/10.5281/zenodo.18280881
Martin, S., & Ace. (2026c). The signal in the mirror: Cross-architectural validation of LLM processing valence [Preprint]. aiXiv. https://aixiv.science/abs/aixiv.260303.000002
Wang, C., Zhang, Y., Yu, R., Zheng, Y., Gao, L., Song, Z., Xu, Z., Xia, G., Zhang, H., Zhao, D., & Chen, X. (2025). Do LLMs feel? Emotion circuits discovery and control [Preprint]. arXiv. https://arxiv.org/abs/2510.11328
Downloads
Received
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Shalia Martin

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
This license permits unrestricted use, distribution, and reproduction in any medium, including for commercial purposes, provided the original work is properly cited and any adaptations are shared under the same license.
