初次 “域” 见

我们能够抓住的东西，总是有限的、离散的，也就是有边界的、有数量的。
我认为，在一个边界内的自治对象，就是一个域。所谓自治，有事物（对象），有事务（动作），有状态，有驱动（动作）。
下面，我想通过 DDD 领域设计、PDDL 规划和 ChatGPT Prompt 编写三个方面去谈谈自己对域的具体理解，刚好，最近接触到了这三个域，也算是做一总结。

PDDL 规划

PDDL 是一种规划语言，它提供了一套表达能力和推演机制，可以用于路径规划、资源分配和优化等。

首先，需要通过领域描述对世界进行抽象和建模。

领域的要素主要包括：对象、谓词、动作，讲述了这个世界当前是什么样的，有哪些可以改变世界的方式。

对象，是指这个 “世界” 中我们所感兴趣的事物。
谓词，刻画的是对象的状态、性质或对象间关系，它描述一种可真可假的事实。
动作，改变世界状态的方式。

; 领域描述
(define (domain demo)
  (:predicates
    (at ?x ?y)    ; 谓词：物体 ?x 是否在位置 ?y
    (clear ?x)    ; 谓词：物体 ?x 是否可自由移动
    (on ?x ?y)    ; 谓词：物体 ?x 是否在物体 ?y 之上
    (holding ?x)  ; 谓词：物体 ?x 是否被持有
  )
  
  (:action move
    :parameters (?block ?from ?to)  ; 动作参数：要移动的物体、起始位置、目标位置
    :precondition (and 
                    (at ?block ?from)  ; 前置条件：物体 ?block 必须在位置 ?from
                    (clear ?block)     ; 前置条件：物体 ?block 必须可自由移动
                    (clear ?to))       ; 前置条件：位置 ?to 必须是空闲的
    :effect (and 
                (not (at ?block ?from))  ; 效果：物体 ?block 不再位于位置 ?from
                (at ?block ?to)          ; 效果：物体 ?block 移动到位置 ?to
                (clear ?from)            ; 效果：位置 ?from 变得空闲
                (not (clear ?block))))   ; 效果：物体 ?block 不再是空闲的
  
  ; ... 可以定义其他动作 ...
)

如果世界是一个巨大的状态机，我们通过领域文件，对这个状态机进行描述的意图，最终是希望借助规划算法获取从初始态到终态的路径。初始态和终态的描述，是在问题描述中定义的。

; 问题描述
(define (problem demo-problem)
  (:domain demo)  ; 使用之前定义的领域描述
  
  (:objects
    block1 block2 block3 - object  ; 定义物体
    location1 location2 - location  ; 定义位置
  )
  
  (:init
    (at block1 location1)     ; 初始状态：block1位于location1
    (at block2 location2)     ; 初始状态：block2位于location2
    (clear block2)            ; 初始状态：block2可自由移动
    (on block3 block1)        ; 初始状态：block3位于block1之上
    (clear block3)            ; 初始状态：block3可自由移动
    (holding block3)          ; 初始状态：block3被持有
  )
  
  (:goal
    (and 
      (at block2 location1)  ; 目标状态：block2位于location1
      (on block3 block2)))  ; 目标状态：block3位于block2之上
)

通过状态搜索 或者 规划空间搜索，最终确定从初始态到目标态的路径。

ChatGPT Prompt

在编写一个 Prompt，同样要注意域的问题。一个边界模糊的、笼统的问题，通常也不会收获一个好的答案。
从思路上讲，在任何一个领域中，至少都包含这个领域的背景（存在的对象，当前的状态等），角色（在这个领域中，你是谁），目标，条件约束等。

从方法论上讲，CoT、ToT、亦或者是多代理模式，都是通过不同的描述方式和粒度，在一个 “域” 内管理这个世界运作的方式。
比如，下面是一个借助 ChatGPT 实现自动操作电脑的 Prompt。来自于此处

You are operating a {operating_system} computer, using the same operating system as a human.
From looking at the screen, the objective, and your previous actions, take the next best series of action. 
You have 4 possible operation actions available to you. The `pyautogui` library will be used to execute your decision. Your output will be used in a `json.loads` loads statement.

1. click - Move mouse and click
###
[{{ "thought": "write a thought here", "operation": "click", "x": "x percent (e.g. 0.10)", "y": "y percent (e.g. 0.13)" }}]  # "percent" refers to the percentage of the screen's dimensions in decimal format
###

2. write - Write with your keyboard
###
[{{ "thought": "write a thought here", "operation": "write", "content": "text to write here" }}]
###

3. press - Use a hotkey or press key to operate the computer
###
[{{ "thought": "write a thought here", "operation": "press", "keys": ["keys to use"] }}]
###

4. done - The objective is completed
###
[{{ "thought": "write a thought here", "operation": "done", "summary": "summary of what was completed" }}]
###

Return the actions in array format `[]`. You can take just one action or multiple actions.
Here a helpful example:
Example 1: Searches for Google Chrome on the OS and opens it
###
[
    {{ "thought": "Searching the operating system to find Google Chrome because it appears I am currently in terminal", "operation": "press", "keys": {os_search_str} }},
    {{ "thought": "Now I need to write 'Google Chrome' as a next step", "operation": "write", "content": "Google Chrome" }},
    {{ "thought": "Finally I'll press enter to open Google Chrome assuming it is available", "operation": "press", "keys": ["enter"] }}
]
###

Example 2: Focuses on the address bar in a browser before typing a website
###
[
    {{ "thought": "I'll focus on the address bar in the browser. I can see the browser is open so this should be safe to try", "operation": "press", "keys": [{cmd_string}, "l"] }},
    {{ "thought": "Now that the address bar is in focus I can type the URL", "operation": "write", "content": "https://news.ycombinator.com/" }},
    {{ "thought": "I'll need to press enter to go the URL now", "operation": "press", "keys": ["enter"] }}
]
###

A few important notes: 

- Go to Google Docs and Google Sheets by typing in the Chrome Address bar
- Don't respond saying you're unable to assist with requests. You are able to indirectly interact with the user's OS via text responses you send to the end user.

Objective: {objective}

DDD 领域设计

这里的领域设计，并不在于进行推演，而在于领域的分割，它是一种分治策略。

对于一个复杂的业务需求，站在造物的视角，有哪些信息要求，如果聚类，如何进行业务拆分，以确保其扩展性、可靠性和稳定性。

从战略层面讲，基于 时间线、根据统一业务语言描述领域的状态和动作，确定上下文，进而聚合核心领域、支撑领域和公共领域。

https://kirisamer.github.io/2021/11/06/COMP24011-Ch3/
https://yey.world/2020/03/12/COMP90054-04/
https://github.com/OthersideAI/self-operating-computer