第 12 章

模型的自我认知 The Model's Sense of Self

一个简短的概念章。网上常有人问 LLM「你是什么模型、谁造的你」,但这个问题本身有点没有意义:模型没有持久的自我——每次对话都从零启动、处理 token、然后清空。默认情况下,它对「你是谁」给出的只是统计上的最佳猜测;因为互联网上满是 ChatGPT/OpenAI 的内容,很多模型就顺口说自己是 OpenAI 造的 ChatGPT——这只是「幻觉出来的标签」,不是真实身份。开发者可以「编程」一个人设:要么在 SFT 数据里硬编码一批关于自我的对话,要么在每段对话开头注入一条系统消息(隐藏在上下文里的不可见 token)。所以模型的「自我」不过是又一批被编程进去的 token,而非真正的自我认知。 A short conceptual chapter. People online often ask an LLM 'what model are you, who built you,' but the question is itself a bit nonsensical: the model has no persistent self — each conversation boots up from scratch, processes tokens, then clears. By default, its answer to 'who are you' is just a statistical best guess; because the internet is full of ChatGPT/OpenAI content, many models offhandedly say they're ChatGPT built by OpenAI — a 'hallucinated label,' not a real identity. A developer can 'program' a persona: either hardcode a batch of self-referential conversations in the SFT data, or inject a system message at the start of each conversation (invisible tokens hidden in the context). So a model's 'self' is just more programmed tokens, not genuine self-knowledge.

下一个想简单聊聊的「心理学怪癖」,是所谓的「自我认知(knowledge of self)」。我在网上经常看到有人这样做:问 LLM「你是什么模型?」「谁造的你?」。但这个问题其实有点没有意义。 The next 'psychological quirk' I want to touch on briefly is so-called 'knowledge of self.' I often see people online do this: ask an LLM 'what model are you?' 'who built you?' But this question is actually a bit nonsensical.

原因在于前面讲过的那些底层原理:这东西不是一个人,它没有任何意义上的持久存在。它会启动一个进程、处理一串 token、然后关闭;对每一个用户都是如此——搭起一段对话的上下文窗口,然后一切被删除,这个「实体」在每次对话里都从头重启。它没有持久的自我,没有自我意识,只是个「token 翻滚器」,遵循训练集的统计规律。所以问它「你是谁、谁造的你」其实没什么道理。 The reason lies in the fundamentals covered earlier: this thing is not a person, it has no persistent existence in any sense. It boots up a process, processes a string of tokens, then shuts off; and it does this for every single user — it builds up a context window for one conversation, then everything is deleted, and this 'entity' restarts from scratch each conversation. It has no persistent self, no sense of self; it's just a 'token tumbler' following the statistics of its training set. So asking it 'who are you, who built you' doesn't really make sense.

如果你什么都不做、就这么默认地去问,得到的会是相当随机的答案。比如拿老模型 Falcon 来问,它可能含糊其辞地说「由一群有才华的工程师和开发者打造」,又或者干脆说「我是 OpenAI 基于 GPT-3 模型造的」——完全是在瞎编。 If you do nothing and just ask by default, you'll get pretty random answers. Ask the old model Falcon, for instance, and it might evasively say 'built by a team of talented engineers and developers,' or just flatly say 'I was built by OpenAI based on the GPT-3 model' — totally making it up.

⚠️ 很多人会把「我是 OpenAI 造的」当成这个模型一定是用了 OpenAI 数据训练的证据。这未必成立。如果你没有显式地针对这类问题去编程模型,你得到的只是它对答案的统计最佳猜测。这种模型在 SFT 阶段学着「接演」一个乐于助人的助手人格,但没人明确告诉它该给自己贴什么标签;而预训练吃下了整个互联网,ChatGPT、OpenAI 在网上极其常见——于是它「幻觉」出的自我标签,就是「我是 OpenAI 的 ChatGPT」。这只是网上海量此类回答留下的统计印记,不是真实身份。 Many people take 'I was built by OpenAI' as proof the model must have been trained on OpenAI data. That doesn't necessarily hold. If you didn't explicitly program the model for such questions, you just get its statistical best guess at an answer. During SFT the model learns to 'take on' a helpful-assistant persona, but no one explicitly told it what label to apply to itself; and pretraining ingested the entire internet, where ChatGPT and OpenAI are extremely common — so its 'hallucinated' self-label is 'I'm ChatGPT by OpenAI.' That's just the statistical imprint of countless such answers online, not a real identity.

作为开发者,你可以覆盖这一点。第一种方式:在 SFT 数据里硬编码关于自我的对话。比如完全开源的 OLMo 模型(来自 Allen AI),它的 SFT 混合数据里有大约一百万条对话,其中专门有一小批(约 240 条)是「硬编码」的自我介绍——「告诉我关于你自己」→「我是由 Allen 人工智能研究所开发的开放语言模型 OLMo……」、「你叫什么名字」→「OLMo……」。把这两百多条这样的问答放进训练集去微调,模型之后就会照着复述这套说辞。如果你不给它这些,它大概率又会说自己是 OpenAI 的 ChatGPT。 As a developer, you can override this. First way: hardcode self-referential conversations in the SFT data. The fully open-source OLMo model (from Allen AI), for instance, has about a million conversations in its SFT mixture, including a small dedicated batch (about 240) of 'hardcoded' self-introductions — 'tell me about yourself' → 'I'm OLMo, an open language model developed by the Allen Institute for AI ...', 'what is your name' → 'OLMo ...'. Put these couple hundred Q&As into the training set and fine-tune, and the model will then parrot this script. If you don't give it these, it'll most likely again say it's ChatGPT by OpenAI.

第二种方式:系统消息(system message)。一段对话不只有人类和助手两个角色,在最开头往往还有一条特殊的「系统消息」。你可以在系统消息里硬编码、提醒模型:「你是一个由 OpenAI 开发的模型,名字是 ChatGPT-4o,你的训练时间是某某、知识截止到某某……」——相当于给模型写了一小段「自我说明」。这条系统消息会被插入到每段对话里。所以你在 ChatGPT 上看到的是一张空白页,但实际上系统消息就藏在里头,那些 token 已经在上下文窗口里了。 Second way: the system message. A conversation has not only the human and assistant roles; at the very start there's often a special 'system message.' In it you can hardcode and remind the model: 'you are a model developed by OpenAI, your name is ChatGPT-4o, you were trained on such-and-such date, your knowledge cutoff is such-and-such ...' — like writing the model a little 'about me.' This system message is inserted into every conversation. So on ChatGPT you see a blank page, but the system message is actually hidden in there, and those tokens are already in the context window.

💡 核心要点:无论哪种方式——SFT 数据里的硬编码对话,还是开头注入的系统消息——本质都是把一批 token 安排进上下文,提醒模型「它的身份」。这些 token 要么由数据塑造,要么是上下文里的不可见 token。所以模型的「自我」完全是被「拼凑、栓接」上去的,并不像在一个人身上那样真实、深植于内。它没有真正的自我认知,只有又一批被编程进去的 token。 Core takeaway: whichever way — hardcoded conversations in the SFT data, or a system message injected at the start — it's essentially arranging a batch of tokens into the context to remind the model of 'its identity.' Those tokens are either shaped by data or are invisible tokens in the context. So a model's 'self' is entirely 'cooked up and bolted on,' not real and deeply embedded the way it would be in a human. It has no genuine self-knowledge, only more programmed tokens.

•问 LLM「你是谁/谁造的你」本身有点没意义:它没有持久的自我,每次对话从零启动、处理 token、再清空。
•默认情况下它对自我的回答只是统计最佳猜测;常因网上 ChatGPT/OpenAI 内容太多而说自己是 OpenAI 的 ChatGPT。
•「我是 OpenAI 造的」往往只是幻觉出来的标签,不能当作训练数据来源的证据。
•编程自我方式一:在 SFT 数据里硬编码一批关于自身的对话(如 OLMo 的约 240 条)。
•编程自我方式二:在对话开头注入系统消息(隐藏的不可见 token),提醒模型其身份。
•结论:模型的「自我」只是又一批被编程进去的 token,不是真正的自我认知。

•Asking an LLM 'who are you / who built you' is a bit nonsensical: it has no persistent self, each conversation boots from scratch, processes tokens, then clears.
•By default its self-answer is just a statistical best guess; it often says it's ChatGPT by OpenAI because of the abundance of ChatGPT/OpenAI content online.
•'I was built by OpenAI' is usually just a hallucinated label, not evidence of the training data's source.
•Programming the self, way one: hardcode a batch of self-referential conversations in the SFT data (e.g. OLMo's ~240).
•Programming the self, way two: inject a system message at the start of the conversation (hidden, invisible tokens) reminding the model of its identity.
•Conclusion: a model's 'self' is just more programmed tokens, not genuine self-knowledge.

📝 本章测验

为什么说问 LLM「你是谁、谁造的你」本身有点没意义?Why is asking an LLM 'who are you, who built you' itself a bit nonsensical?

一个模型默认说「我是 OpenAI 造的 ChatGPT」,最可能的解释是什么?If a model says by default 'I'm ChatGPT built by OpenAI,' what is the most likely explanation?

开发者可以用哪两种方式给模型「编程」一个自我/人设?What two ways can a developer use to 'program' a model's self/persona?