模型的自我认知 The Model's Sense of Self
一个简短的概念章。网上常有人问 LLM「你是什么模型、谁造的你」,但这个问题本身有点没有意义:模型没有持久的自我——每次对话都从零启动、处理 token、然后清空。默认情况下,它对「你是谁」给出的只是统计上的最佳猜测;因为互联网上满是 ChatGPT/OpenAI 的内容,很多模型就顺口说自己是 OpenAI 造的 ChatGPT——这只是「幻觉出来的标签」,不是真实身份。开发者可以「编程」一个人设:要么在 SFT 数据里硬编码一批关于自我的对话,要么在每段对话开头注入一条系统消息(隐藏在上下文里的不可见 token)。所以模型的「自我」不过是又一批被编程进去的 token,而非真正的自我认知。 A short conceptual chapter. People online often ask an LLM 'what model are you, who built you,' but the question is itself a bit nonsensical: the model has no persistent self — each conversation boots up from scratch, processes tokens, then clears. By default, its answer to 'who are you' is just a statistical best guess; because the internet is full of ChatGPT/OpenAI content, many models offhandedly say they're ChatGPT built by OpenAI — a 'hallucinated label,' not a real identity. A developer can 'program' a persona: either hardcode a batch of self-referential conversations in the SFT data, or inject a system message at the start of each conversation (invisible tokens hidden in the context). So a model's 'self' is just more programmed tokens, not genuine self-knowledge.
下一个想简单聊聊的「心理学怪癖」,是所谓的「自我认知(knowledge of self)」。我在网上经常看到有人这样做:问 LLM「你是什么模型?」「谁造的你?」。但这个问题其实有点没有意义。 The next 'psychological quirk' I want to touch on briefly is so-called 'knowledge of self.' I often see people online do this: ask an LLM 'what model are you?' 'who built you?' But this question is actually a bit nonsensical.
原因在于前面讲过的那些底层原理:这东西不是一个人,它没有任何意义上的持久存在。它会启动一个进程、处理一串 token、然后关闭;对每一个用户都是如此——搭起一段对话的上下文窗口,然后一切被删除,这个「实体」在每次对话里都从头重启。它没有持久的自我,没有自我意识,只是个「token 翻滚器」,遵循训练集的统计规律。所以问它「你是谁、谁造的你」其实没什么道理。 The reason lies in the fundamentals covered earlier: this thing is not a person, it has no persistent existence in any sense. It boots up a process, processes a string of tokens, then shuts off; and it does this for every single user — it builds up a context window for one conversation, then everything is deleted, and this 'entity' restarts from scratch each conversation. It has no persistent self, no sense of self; it's just a 'token tumbler' following the statistics of its training set. So asking it 'who are you, who built you' doesn't really make sense.
如果你什么都不做、就这么默认地去问,得到的会是相当随机的答案。比如拿老模型 Falcon 来问,它可能含糊其辞地说「由一群有才华的工程师和开发者打造」,又或者干脆说「我是 OpenAI 基于 GPT-3 模型造的」——完全是在瞎编。 If you do nothing and just ask by default, you'll get pretty random answers. Ask the old model Falcon, for instance, and it might evasively say 'built by a team of talented engineers and developers,' or just flatly say 'I was built by OpenAI based on the GPT-3 model' — totally making it up.
作为开发者,你可以覆盖这一点。第一种方式:在 SFT 数据里硬编码关于自我的对话。比如完全开源的 OLMo 模型(来自 Allen AI),它的 SFT 混合数据里有大约一百万条对话,其中专门有一小批(约 240 条)是「硬编码」的自我介绍——「告诉我关于你自己」→「我是由 Allen 人工智能研究所开发的开放语言模型 OLMo……」、「你叫什么名字」→「OLMo……」。把这两百多条这样的问答放进训练集去微调,模型之后就会照着复述这套说辞。如果你不给它这些,它大概率又会说自己是 OpenAI 的 ChatGPT。 As a developer, you can override this. First way: hardcode self-referential conversations in the SFT data. The fully open-source OLMo model (from Allen AI), for instance, has about a million conversations in its SFT mixture, including a small dedicated batch (about 240) of 'hardcoded' self-introductions — 'tell me about yourself' → 'I'm OLMo, an open language model developed by the Allen Institute for AI ...', 'what is your name' → 'OLMo ...'. Put these couple hundred Q&As into the training set and fine-tune, and the model will then parrot this script. If you don't give it these, it'll most likely again say it's ChatGPT by OpenAI.
第二种方式:系统消息(system message)。一段对话不只有人类和助手两个角色,在最开头往往还有一条特殊的「系统消息」。你可以在系统消息里硬编码、提醒模型:「你是一个由 OpenAI 开发的模型,名字是 ChatGPT-4o,你的训练时间是某某、知识截止到某某……」——相当于给模型写了一小段「自我说明」。这条系统消息会被插入到每段对话里。所以你在 ChatGPT 上看到的是一张空白页,但实际上系统消息就藏在里头,那些 token 已经在上下文窗口里了。 Second way: the system message. A conversation has not only the human and assistant roles; at the very start there's often a special 'system message.' In it you can hardcode and remind the model: 'you are a model developed by OpenAI, your name is ChatGPT-4o, you were trained on such-and-such date, your knowledge cutoff is such-and-such ...' — like writing the model a little 'about me.' This system message is inserted into every conversation. So on ChatGPT you see a blank page, but the system message is actually hidden in there, and those tokens are already in the context window.
- •问 LLM「你是谁/谁造的你」本身有点没意义:它没有持久的自我,每次对话从零启动、处理 token、再清空。
- •默认情况下它对自我的回答只是统计最佳猜测;常因网上 ChatGPT/OpenAI 内容太多而说自己是 OpenAI 的 ChatGPT。
- •「我是 OpenAI 造的」往往只是幻觉出来的标签,不能当作训练数据来源的证据。
- •编程自我方式一:在 SFT 数据里硬编码一批关于自身的对话(如 OLMo 的约 240 条)。
- •编程自我方式二:在对话开头注入系统消息(隐藏的不可见 token),提醒模型其身份。
- •结论:模型的「自我」只是又一批被编程进去的 token,不是真正的自我认知。
- •Asking an LLM 'who are you / who built you' is a bit nonsensical: it has no persistent self, each conversation boots from scratch, processes tokens, then clears.
- •By default its self-answer is just a statistical best guess; it often says it's ChatGPT by OpenAI because of the abundance of ChatGPT/OpenAI content online.
- •'I was built by OpenAI' is usually just a hallucinated label, not evidence of the training data's source.
- •Programming the self, way one: hardcode a batch of self-referential conversations in the SFT data (e.g. OLMo's ~240).
- •Programming the self, way two: inject a system message at the start of the conversation (hidden, invisible tokens) reminding the model of its identity.
- •Conclusion: a model's 'self' is just more programmed tokens, not genuine self-knowledge.
📝 本章测验
为什么说问 LLM「你是谁、谁造的你」本身有点没意义?Why is asking an LLM 'who are you, who built you' itself a bit nonsensical?
一个模型默认说「我是 OpenAI 造的 ChatGPT」,最可能的解释是什么?If a model says by default 'I'm ChatGPT built by OpenAI,' what is the most likely explanation?
开发者可以用哪两种方式给模型「编程」一个自我/人设?What two ways can a developer use to 'program' a model's self/persona?