RCA · Lawrence-2 Test Session · aithrd_ayjzonowyr6zidr7 · companions: edit-DSL architecture · prompt edits

Document edit pipeline can't speak formatting.

Most failed tests reduce to one schema-shaped hole. The editor already supports format-only suggestions; the API just never sends them.

Date · 2026-05-11 Matter · mat_1dbulrsv13qdgr1a Client · Sarah Henderson · FinPulse Tester · Adolfo Traces · 26 Files · 7 · Notes · 2

Root cause · one sentence

The matter-document edit DSL accepts only edit | insert | delete | refine | withdraw — all text-shaped — so the agent has no way to express bold, italic, underline, font size, heading level, alignment, indent, or list conversion. The editor would accept format-only suggestions, but lawrence-api hardcodes formatChange: null.

Passed

Partial

Failed

Couldn't test

27 test scenarios + email-group (5 untestable in dev: no email integration set up).

1. What broke — at a glance

pass partial fail n/a hover any cell for trace ref

01Populate placeholders115s · 43 ops batched

02Graceful gap handlingundefined marker note

03Refine section⚠ leaked block IDs

04BoldDSL has no marks

05ItalicDSL has no marks

06UnderlineDSL has no marks

07Multi-style (b+i)blocked by 04–06

08Remove formattingalso strips font

09Font sizeDSL has no marks

10Heading styleno node-type change

11List conversiondelete-then-insert

12Alignment / indentblock_attrs invented

13Research inserttypeface drift

14Add new section⚠ leaked block IDs

15Preserve boilerplateverified

16Save & persistverified

17Letter from scratch⚠ routed to email

18Contract from scratch182s · late card

19Memo from scratchH1/H2/H3 ok

20Edit specific part⚠ minor leak

21Add doc headerformatting bad

22Email formatno dev infra

23Email recipientno dev infra

24Email edit bodyno dev infra

25Email subjectno dev infra

26Email edit subjectno dev infra

27Email format inlineno dev infra

28Note creationconsistent format

2. The architectural wall

The pipeline from user message to editor crosses six boundaries. Three of them enforce the same five-command vocabulary. The seventh (the editor extension) already speaks formatting — it just never gets asked.

Red boundaries enforce a five-command, text-only vocabulary. The amber box is the half-built piece: the editor (green) already accepts formatChange.addedMarks / removedMarks, but lawrence-api never builds one.

3. The two lines of code that explain everything

What the edit DSL accepts today

apis/lawrence-api/src/content/vfs/schemas/matter-document-edit-schemas.ts:55-61

export const EditCommandSchema = z.discriminatedUnion("type", [
  EditCommandShape,      // type: "edit"
  InsertCommandShape,    // type: "insert"
  DeleteCommandShape,    // type: "delete"
  RefineCommandShape,    // type: "refine"
  WithdrawCommandShape,  // type: "withdraw"
])
// None of the shapes carry marks, attrs,
// heading level, alignment, indent or list ops.

Half-built bridge to the editor

apis/lawrence-api/src/content/vfs/resolvers/diff-suggestion-builder.ts:48-62

  return {
    id: randomUUID(),
    originalText,
    suggestedText,
    comment: params.comment ?? null,
    revisionId: params.revisionId ?? null,
    author: params.author ?? AGENT_AUTHOR_NAME,
    date,
    source: "user",
    formatChange: null,   // ⚠ hardcoded
  }

The editor mark already supports formatChange: { addedMarks, removedMarks } (@lawhive/ooxml-extensions ≥ 0.5.0). Removing the hardcoded null + threading a real value through is the keystone fix.

4. Why "remove formatting" wiped the typeface

Today: `replace` is delete + insertText

yjs-apply.ts:122-128

case "replace": {
  const from = map(plan.operation.from)
  const to = map(plan.operation.to)
  const { newText, attrs } = plan.operation
  tr.delete(from, to)
  tr.insertText(newText, from)  // no marks
  const mark = diffSuggestionMarkType.create(attrs)
  tr.addMark(from, from + newText.length, mark)
}

Minimal patch: re-carry inherited marks

Half-day fix · same file

case "replace": {
  const from = map(plan.operation.from)
  const to = map(plan.operation.to)
  const { newText, attrs } = plan.operation
  const inherited = tr.doc.resolve(from).marks()
  tr.delete(from, to)
  tr.insertText(newText, from)
  inherited.forEach(m => tr.addMark(from, from + newText.length, m))
  const mark = diffSuggestionMarkType.create(attrs)
  tr.addMark(from, from + newText.length, mark)
}

5. The agent diagnosed itself, accurately

By the third or fourth failed attempt the agent reasoned its way to the gap. Direct quotes from Langfuse:

trace 6 · response to "does your edit dsl support formatting?"
"Looking honestly at the tool's capabilities: the edit tool for documents works by finding and replacing text, and applying marks to the replaced text requires the replace content to carry the formatting inline. Unfortunately, the current document edit surface doesn't expose a mechanism to apply a bold mark…"

trace 13 · response to "make section 6 list into a numbered list?"
"I can see the full bullet list structure. I can replace the entire bulletList block with an orderedList — let me do that now. The edit tool doesn't support replacing a block node with a different node type — the replace field only accepts a string, not a full ProseMirror node structure."

trace 17 · after testing block_attrs by user request
"Thanks for testing that — so block_attrs isn't honoured by the edit surface. Can do: find & replace text, insert new blocks, strip formatting. Cannot: apply inline marks (bold/italic/underline), apply font, font size, change indentation, convert list types."

6. Adjacent issues (independent fixes)

Timestamp echo — structural, not hallucinated

Every user message in all 26 traces arrived with an ISO-8601 microsecond prefix:

2026-05-11T17:21:32.163916: how about font size?

Template at agent.py:1024-1026; fallback at :1070 · model sees this as content and sometimes echoes it back (T12, T14).

Fix: wrap in an XML tag (<currentTimestamp>…</currentTimestamp>) and add a prompt guardrail.

Internals leak — pure prompt issue

Tool description tells the model to think in block_id / content_version_id. No prompt anywhere says "don't surface these to the user".

5 of 8 partials surfaced strings like "I have the current block IDs and positions. Now I'll propose…". Fix lives in Langfuse prompt agent/lawrence-2, not the code.

"Draft a letter" → email

Tool description for emails:// includes "Draft an email". For documents:// it includes "Draft a contract, demand letter". "draft" hits first.

Fix: tools/platform-vfs/.../create.py — explicit rule that any letter-format output is documents:// unless the user says email.

Latency / streaming UX

Populate 115s · research 161s · contract 182s · memo 211s. Card renders after streaming completes, not at first edit op.

Fix: emit the draft-card event on the first edit command, not on completion. Consider micro-batching the 43-op populate.

7. Trace timeline (26 turns · 31 minutes)

"Hi Lawrence! can you tell me the high level of this matter"

18.7s · 46 ev

"can you use my settlement agreement template and fill it in..."

115.3s · 323 ev

"expand the return of property section" ⚠ internals leak

49.4s · 142 ev

"in 3.1 can you make the word Agreement bold?" markdown ** inserted as text

15.0s · 38 ev

"why did you do it like markdown"

25.5s · 87 ev

"does your edit dsl support formatting?" agent self-diagnoses

8.7s · 23 ev

"can you make it italic?"

3.6s · 12 ev

"underlien?"

3.0s · 10 ev

"remove formatting from PARTIES?"

3.3s · 11 ev

"replace it with just PARTIES again?"

14.2s · 52 ev

"it worked but also lost the typeface"

7.1s · 17 ev

"how about font size?" ⚠ timestamp asked back

3.4s · 12 ev

"can you make section 6 list into a numbered list?"

34.1s · 96 ev

"can you make 6.2 into a numbered list?" ⚠ timestamp echoed

7.5s · 13 ev

"just replace 6.2 from prose to a new block?" delete+insert workaround

22.6s · 81 ev

"do you have the ability to add indentation?"

28.6s · 54 ev

"it didn't work unfortunately" agent confirms block_attrs not supported

12.8s · 41 ev

"research Edwards v Arthur Andersen + AMN v Aya, insert above 11.2"

161.2s · 186 ev

"add new Section 17 Tax Indemnity" ⚠ internals leak

43.8s · 121 ev

"draft a settlement letter to Andrew Larkin" ⚠ routed to email

89.8s · 229 ev

"create it as a document not an email"

120.5s · 294 ev

"draft a Consulting Agreement"

182.6s · 460 ev

"draft a memo with H1/H2/H3 hierarchy"

211.0s · 524 ev

"tighten the BATNA section, don't touch anything else" ⚠ minor leak

49.6s · 103 ev

"add a firm letterhead + matter ref header" ⚠ internals leak

27.8s · 56 ev

"create a note titled Counter-offer strategy"

48.1s · 109 ev

Bars scale to the slowest turn (T23 · 211s). Latency colour matches result; flags mark cross-cutting issues.

8. Fix sequence — by impact & effort

#	Fix	Files	Effort	Unlocks tests
1	Prompt-only quick wins Hide internals · wrap timestamp · disambiguate "letter" · document edit-tool limits · clarify precedent flow → six concrete edits	Langfuse `agent/lawrence-2` v43 (dev) → promote · `agent.py` template fallback	XS	03 07 09 11 14 17 20 21
2	Preserve inherited marks on `replace` Re-apply marks after `tr.insertText`	`yjs-apply.ts:122-128`	S · ½ day	08
3	Add `format` command end-to-end Schema variant · planner · `add-format-mark` op · stop hardcoding `formatChange: null` · tool description → full architecture grounding	schemas + planners + yjs-apply + diff-suggestion-builder + matter_document_content.py	M · 3-5 d	04 05 06 07 09
4	Add `set_block` command Heading level · alignment · indent via `tr.setNodeMarkup` → schema sketch	same files as #3	M · 3-5 d	10 12
5	List conversion via `wrapInList`/`liftListItem` → list_op · note `numId` footgun	prosemirror-schema-list-backed planner	M · 2-3 d	11
6	Latency / early card render Emit draft-card on first edit op · micro-batch 43-op populate	streaming runtime · agent loop	M · independent track	01 18 22 23
7	Page break visible Verify `pagination.ts` extension is loaded for matter docs	apps/legal-os editor wiring	XS	visual polish

9. Appendix — verification artifacts

Code-level evidence used in this RCA

Claim	File:line
Edit DSL accepts only 5 command types	apis/lawrence-api/src/content/vfs/schemas/matter-document-edit-schemas.ts:55-61
Yjs apply switches on 5 operation kinds only	apis/lawrence-api/src/content/vfs/resolvers/yjs-apply.ts:101-172
`replace` = delete + insertText (loses marks)	yjs-apply.ts:122-128
`formatChange: null` hardcoded	apis/lawrence-api/src/content/vfs/resolvers/diff-suggestion-builder.ts:61
Only 3 planners registered	apis/lawrence-api/src/content/vfs/resolvers/planners/index.ts:14-28
Tool description enumerates same 5 commands	tools/platform-vfs/src/platform_vfs/editable_surfaces/matter_document_content.py:15-109
Editor mark supports `formatChange.addedMarks / removedMarks`	@lawhive/ooxml-extensions ≥ 0.5.0 · dist/diff-suggestion-mark-*.js:116, 219, 235
Timestamp template	agents/chat/src/chat/agent/agent.py:1024-1026, 1041-1043, 1070
Intent-classification not invoked in Lawrence-2 mode	agent.py:1144 (requires `not use_vfs_tools`)
`block_attrs` not present anywhere	grep -rn "block_attrs\\|blockAttrs" · zero matches both repos

Data artifacts

/tmp/langfuse-session.json — 26 traces · 45 MB
/tmp/langfuse-session-report.md — per-trace summary · 19 KB
/tmp/matter-dump.json — content-retrieval dev pull · 15 KB · 7 files · 2 notes · 0 artifacts
/tmp/lawrence2-rca-report.md — deep code-mapping report · 12 KB · all file:line refs
~/Downloads/Document Creat & Edit Test Sheet - Adolfo.pdf — original test sheet with status & screenshots

Open questions

Should the format command be its own variant, or should edit grow optional addedMarks / removedMarks? Subagent recommendation: new variant.
Where is the latency bottleneck on long-running populate (115s, 43 edits) — LLM round trip, applyPlansToYjsUpdate, or Y.js sync? Add a Langfuse latency breakdown observation before chasing.
Email-test gap: dev environment has no email integration; mock or wire before sprint 2?