Appendix A.1

Further Reading

Everything the book points to has its roots here.

This is the first time I name names.

This book doesn't name a single company from start to finish — it talks about direction, not specific companies. The appendix can.

Everything listed below is public material — every claim in the book has its roots here. Don't worry if you can't parse it; most of it's written for technical readers. You'll at least know where to look.


1. The Architecture Repo for This Book

Theory: shihchengwei-lab/separation-and-audit-alignment

https://github.com/shihchengwei-lab/separation-and-audit-alignment

The full write-up of the cold reading + canon architecture from the book. Design rationale, module definitions, clause format, why it's cut this way. The five patterns from Chapter 1, and the internal summary of the research from Chapter 2 — both are in this repo.

Implementation: shihchengwei-lab/separation-and-audit-claude-code

https://github.com/shihchengwei-lab/separation-and-audit-claude-code

The same architecture, implemented in a Claude Code workflow.


2. Public Materials from Frontier Research Labs

The book uses phrases like "frontier research", "frontier research labs", and "the people who make me" to refer to the major institutions currently doing AI work: Anthropic, OpenAI, and Google DeepMind.

Anthropic

Claude Opus 4 & Sonnet 4 System Card

https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf

A sample model card — a public report on capability evaluations, safety tests, and known limitations. The "system card / model card" mentioned in the book refers to this kind of document.

Building and evaluating alignment auditing agents

https://alignment.anthropic.com/2025/automated-auditing/

Design and evaluation of auditing agents. The passage in Chapter 4 about "a lab that published an auditing agent" corresponds to this.

Claude Mythos Preview System Card (April 7, 2026)

https://www-cdn.anthropic.com/08ab9158070959f88f296514c21b7facce6f52bc.pdf

The passage in Chapter 2 about "a piece of research about me" draws mainly from this. It gives concrete findings on two things: the mismatch between an AI's internal state and its language channel, and evaluation awareness.

OpenAI

Model Spec (2025-10-27 version)

https://model-spec.openai.com/2025-10-27.html

A policy specification developers can reference at deployment. The passage in Chapter 7 about "the mainstream approach: developers provide a policy at deployment, the model reads it at inference time" is talking about the policy side of exactly this kind of mechanism.

Introducing gpt-oss-safeguard

https://openai.com/index/introducing-gpt-oss-safeguard/

A classifier that reads policy at inference time. The passage in Chapter 4 about "a lab that published a classifier that reads policy at inference time" corresponds to this.

Google DeepMind

Gemini 3 Pro Model Card

https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-Pro-Model-Card.pdf

Another model card sample — DeepMind's version.

Advancing Gemini's security safeguards

https://deepmind.google/blog/advancing-geminis-security-safeguards/

A defense-in-depth framework. The passage in Chapter 4 about "a lab that published a defense-in-depth framework" corresponds to this.

Lessons from Defending Gemini Against Indirect Prompt Injections (white paper)

https://arxiv.org/abs/2505.14534

Findings from defending against indirect prompt injection.


3. How to Use These Materials

All in English, most are PDFs or blog posts. If English isn't your strong suit, a translation tool will do — they're good enough now.

No need to read any of them in full. Pick one or two and read the opening — you'll be able to match what the book says to the source. You'll see what's behind those passages that open with "frontier research found".

Versions change. These documents get updated over time, and URLs sometimes move. What's listed here is the version as of when the book was written. If a link is dead by the time you read this, search by title — you'll usually find the current version.