RCA · Lawrence-2 Test Session · aithrd_ayjzonowyr6zidr7 · companions: edit-DSL architecture · prompt edits

Document edit pipeline can't speak formatting.

Most failed tests reduce to one schema-shaped hole. The editor already supports format-only suggestions; the API just never sends them.

Date · 2026-05-11 Matter · mat_1dbulrsv13qdgr1a Client · Sarah Henderson · FinPulse Tester · Adolfo Traces · 26 Files · 7 · Notes · 2
Root cause · one sentence
The matter-document edit DSL accepts only edit | insert | delete | refine | withdraw — all text-shaped — so the agent has no way to express bold, italic, underline, font size, heading level, alignment, indent, or list conversion. The editor would accept format-only suggestions, but lawrence-api hardcodes formatChange: null.
8
Passed
8
Partial
6
Failed
5
Couldn't test

27 test scenarios + email-group (5 untestable in dev: no email integration set up).

1. What broke — at a glance

pass partial fail n/a hover any cell for trace ref
01Populate placeholders115s · 43 ops batched
02Graceful gap handlingundefined marker note
03Refine section⚠ leaked block IDs
04BoldDSL has no marks
05ItalicDSL has no marks
06UnderlineDSL has no marks
07Multi-style (b+i)blocked by 04–06
08Remove formattingalso strips font
09Font sizeDSL has no marks
10Heading styleno node-type change
11List conversiondelete-then-insert
12Alignment / indentblock_attrs invented
13Research inserttypeface drift
14Add new section⚠ leaked block IDs
15Preserve boilerplateverified
16Save & persistverified
17Letter from scratch⚠ routed to email
18Contract from scratch182s · late card
19Memo from scratchH1/H2/H3 ok
20Edit specific part⚠ minor leak
21Add doc headerformatting bad
22Email formatno dev infra
23Email recipientno dev infra
24Email edit bodyno dev infra
25Email subjectno dev infra
26Email edit subjectno dev infra
27Email format inlineno dev infra
28Note creationconsistent format

2. The architectural wall

The pipeline from user message to editor crosses six boundaries. Three of them enforce the same five-command vocabulary. The seventh (the editor extension) already speaks formatting — it just never gets asked.

User "bold it" Agent (LLM) chat-agent VFS edit tool platform-vfs Edit schema 5 commands only Planner 3 planners only Yjs apply 5 ops · marks=diff only Editor extension supports formatChange ✓ Suggestion builder formatChange: null ⚠ ⛔ no marks, no node-type, no list conversion ↑ schema strips unknown fields (block_attrs etc.) ↑ hardcoded null — editor can't get format-only suggestions
Red boundaries enforce a five-command, text-only vocabulary. The amber box is the half-built piece: the editor (green) already accepts formatChange.addedMarks / removedMarks, but lawrence-api never builds one.

3. The two lines of code that explain everything

What the edit DSL accepts today

apis/lawrence-api/src/content/vfs/schemas/matter-document-edit-schemas.ts:55-61

export const EditCommandSchema = z.discriminatedUnion("type", [
  EditCommandShape,      // type: "edit"
  InsertCommandShape,    // type: "insert"
  DeleteCommandShape,    // type: "delete"
  RefineCommandShape,    // type: "refine"
  WithdrawCommandShape,  // type: "withdraw"
])
// None of the shapes carry marks, attrs,
// heading level, alignment, indent or list ops.

Half-built bridge to the editor

apis/lawrence-api/src/content/vfs/resolvers/diff-suggestion-builder.ts:48-62

  return {
    id: randomUUID(),
    originalText,
    suggestedText,
    comment: params.comment ?? null,
    revisionId: params.revisionId ?? null,
    author: params.author ?? AGENT_AUTHOR_NAME,
    date,
    source: "user",
    formatChange: null,   // ⚠ hardcoded
  }

The editor mark already supports formatChange: { addedMarks, removedMarks } (@lawhive/ooxml-extensions ≥ 0.5.0). Removing the hardcoded null + threading a real value through is the keystone fix.

4. Why "remove formatting" wiped the typeface

Today: replace is delete + insertText

yjs-apply.ts:122-128

case "replace": {
  const from = map(plan.operation.from)
  const to = map(plan.operation.to)
  const { newText, attrs } = plan.operation
  tr.delete(from, to)
  tr.insertText(newText, from)  // no marks
  const mark = diffSuggestionMarkType.create(attrs)
  tr.addMark(from, from + newText.length, mark)
}

Minimal patch: re-carry inherited marks

Half-day fix · same file

case "replace": {
  const from = map(plan.operation.from)
  const to = map(plan.operation.to)
  const { newText, attrs } = plan.operation
  const inherited = tr.doc.resolve(from).marks()
  tr.delete(from, to)
  tr.insertText(newText, from)
  inherited.forEach(m => tr.addMark(from, from + newText.length, m))
  const mark = diffSuggestionMarkType.create(attrs)
  tr.addMark(from, from + newText.length, mark)
}

5. The agent diagnosed itself, accurately

By the third or fourth failed attempt the agent reasoned its way to the gap. Direct quotes from Langfuse:

trace 6 · response to "does your edit dsl support formatting?"
"Looking honestly at the tool's capabilities: the edit tool for documents works by finding and replacing text, and applying marks to the replaced text requires the replace content to carry the formatting inline. Unfortunately, the current document edit surface doesn't expose a mechanism to apply a bold mark…"
trace 13 · response to "make section 6 list into a numbered list?"
"I can see the full bullet list structure. I can replace the entire bulletList block with an orderedList — let me do that now. The edit tool doesn't support replacing a block node with a different node type — the replace field only accepts a string, not a full ProseMirror node structure."
trace 17 · after testing block_attrs by user request
"Thanks for testing that — so block_attrs isn't honoured by the edit surface. Can do: find & replace text, insert new blocks, strip formatting. Cannot: apply inline marks (bold/italic/underline), apply font, font size, change indentation, convert list types."

6. Adjacent issues (independent fixes)

Timestamp echo — structural, not hallucinated

Every user message in all 26 traces arrived with an ISO-8601 microsecond prefix:

2026-05-11T17:21:32.163916: how about font size?

Template at agent.py:1024-1026; fallback at :1070 · model sees this as content and sometimes echoes it back (T12, T14).

Fix: wrap in an XML tag (<currentTimestamp>…</currentTimestamp>) and add a prompt guardrail.

Internals leak — pure prompt issue

Tool description tells the model to think in block_id / content_version_id. No prompt anywhere says "don't surface these to the user".

5 of 8 partials surfaced strings like "I have the current block IDs and positions. Now I'll propose…". Fix lives in Langfuse prompt agent/lawrence-2, not the code.

"Draft a letter" → email

Tool description for emails:// includes "Draft an email". For documents:// it includes "Draft a contract, demand letter". "draft" hits first.

Fix: tools/platform-vfs/.../create.py — explicit rule that any letter-format output is documents:// unless the user says email.

Latency / streaming UX

Populate 115s · research 161s · contract 182s · memo 211s. Card renders after streaming completes, not at first edit op.

Fix: emit the draft-card event on the first edit command, not on completion. Consider micro-batching the 43-op populate.

7. Trace timeline (26 turns · 31 minutes)

01
"Hi Lawrence! can you tell me the high level of this matter"
18.7s · 46 ev
02
"can you use my settlement agreement template and fill it in..."
115.3s · 323 ev
03
"expand the return of property section" ⚠ internals leak
49.4s · 142 ev
04
"in 3.1 can you make the word Agreement bold?" markdown ** inserted as text
15.0s · 38 ev
05
"why did you do it like markdown"
25.5s · 87 ev
06
"does your edit dsl support formatting?" agent self-diagnoses
8.7s · 23 ev
07
"can you make it italic?"
3.6s · 12 ev
08
"underlien?"
3.0s · 10 ev
09
"remove formatting from PARTIES?"
3.3s · 11 ev
10
"replace it with just PARTIES again?"
14.2s · 52 ev
11
"it worked but also lost the typeface"
7.1s · 17 ev
12
"how about font size?" ⚠ timestamp asked back
3.4s · 12 ev
13
"can you make section 6 list into a numbered list?"
34.1s · 96 ev
14
"can you make 6.2 into a numbered list?" ⚠ timestamp echoed
7.5s · 13 ev
15
"just replace 6.2 from prose to a new block?" delete+insert workaround
22.6s · 81 ev
16
"do you have the ability to add indentation?"
28.6s · 54 ev
17
"it didn't work unfortunately" agent confirms block_attrs not supported
12.8s · 41 ev
18
"research Edwards v Arthur Andersen + AMN v Aya, insert above 11.2"
161.2s · 186 ev
19
"add new Section 17 Tax Indemnity" ⚠ internals leak
43.8s · 121 ev
20
"draft a settlement letter to Andrew Larkin" ⚠ routed to email
89.8s · 229 ev
21
"create it as a document not an email"
120.5s · 294 ev
22
"draft a Consulting Agreement"
182.6s · 460 ev
23
"draft a memo with H1/H2/H3 hierarchy"
211.0s · 524 ev
24
"tighten the BATNA section, don't touch anything else" ⚠ minor leak
49.6s · 103 ev
25
"add a firm letterhead + matter ref header" ⚠ internals leak
27.8s · 56 ev
26
"create a note titled Counter-offer strategy"
48.1s · 109 ev

Bars scale to the slowest turn (T23 · 211s). Latency colour matches result; flags mark cross-cutting issues.

8. Fix sequence — by impact & effort

#FixFilesEffortUnlocks tests
1 Prompt-only quick wins
Hide internals · wrap timestamp · disambiguate "letter" · document edit-tool limits · clarify precedent flow → six concrete edits
Langfuse agent/lawrence-2 v43 (dev) → promote · agent.py template fallback XS 03 07 09 11 14 17 20 21
2 Preserve inherited marks on replace
Re-apply marks after tr.insertText
yjs-apply.ts:122-128 S · ½ day 08
3 Add format command end-to-end
Schema variant · planner · add-format-mark op · stop hardcoding formatChange: null · tool description  → full architecture grounding
schemas + planners + yjs-apply + diff-suggestion-builder + matter_document_content.py M · 3-5 d 04 05 06 07 09
4 Add set_block command
Heading level · alignment · indent via tr.setNodeMarkup  → schema sketch
same files as #3 M · 3-5 d 10 12
5 List conversion via wrapInList/liftListItem
list_op · note numId footgun
prosemirror-schema-list-backed planner M · 2-3 d 11
6 Latency / early card render
Emit draft-card on first edit op · micro-batch 43-op populate
streaming runtime · agent loop M · independent track 01 18 22 23
7 Page break visible
Verify pagination.ts extension is loaded for matter docs
apps/legal-os editor wiring XS visual polish

9. Appendix — verification artifacts

Code-level evidence used in this RCA
ClaimFile:line
Edit DSL accepts only 5 command typesapis/lawrence-api/src/content/vfs/schemas/matter-document-edit-schemas.ts:55-61
Yjs apply switches on 5 operation kinds onlyapis/lawrence-api/src/content/vfs/resolvers/yjs-apply.ts:101-172
replace = delete + insertText (loses marks)yjs-apply.ts:122-128
formatChange: null hardcodedapis/lawrence-api/src/content/vfs/resolvers/diff-suggestion-builder.ts:61
Only 3 planners registeredapis/lawrence-api/src/content/vfs/resolvers/planners/index.ts:14-28
Tool description enumerates same 5 commandstools/platform-vfs/src/platform_vfs/editable_surfaces/matter_document_content.py:15-109
Editor mark supports formatChange.addedMarks / removedMarks@lawhive/ooxml-extensions ≥ 0.5.0 · dist/diff-suggestion-mark-*.js:116, 219, 235
Timestamp templateagents/chat/src/chat/agent/agent.py:1024-1026, 1041-1043, 1070
Intent-classification not invoked in Lawrence-2 modeagent.py:1144 (requires not use_vfs_tools)
block_attrs not present anywheregrep -rn "block_attrs\|blockAttrs" · zero matches both repos
Data artifacts
Open questions