Most failed tests reduce to one schema-shaped hole. The editor already supports format-only suggestions; the API just never sends them.
edit | insert | delete | refine | withdraw — all text-shaped — so the agent has no way to express bold, italic, underline, font size, heading level, alignment, indent, or list conversion. The editor would accept format-only suggestions, but lawrence-api hardcodes formatChange: null.
27 test scenarios + email-group (5 untestable in dev: no email integration set up).
The pipeline from user message to editor crosses six boundaries. Three of them enforce the same five-command vocabulary. The seventh (the editor extension) already speaks formatting — it just never gets asked.
formatChange.addedMarks / removedMarks, but lawrence-api never builds one.apis/lawrence-api/src/content/vfs/schemas/matter-document-edit-schemas.ts:55-61
export const EditCommandSchema = z.discriminatedUnion("type", [ EditCommandShape, // type: "edit" InsertCommandShape, // type: "insert" DeleteCommandShape, // type: "delete" RefineCommandShape, // type: "refine" WithdrawCommandShape, // type: "withdraw" ]) // None of the shapes carry marks, attrs, // heading level, alignment, indent or list ops.
apis/lawrence-api/src/content/vfs/resolvers/diff-suggestion-builder.ts:48-62
return { id: randomUUID(), originalText, suggestedText, comment: params.comment ?? null, revisionId: params.revisionId ?? null, author: params.author ?? AGENT_AUTHOR_NAME, date, source: "user", formatChange: null, // ⚠ hardcoded }
The editor mark already supports formatChange: { addedMarks, removedMarks } (@lawhive/ooxml-extensions ≥ 0.5.0). Removing the hardcoded null + threading a real value through is the keystone fix.
replace is delete + insertTextyjs-apply.ts:122-128
case "replace": { const from = map(plan.operation.from) const to = map(plan.operation.to) const { newText, attrs } = plan.operation tr.delete(from, to) tr.insertText(newText, from) // no marks const mark = diffSuggestionMarkType.create(attrs) tr.addMark(from, from + newText.length, mark) }
Half-day fix · same file
case "replace": { const from = map(plan.operation.from) const to = map(plan.operation.to) const { newText, attrs } = plan.operation const inherited = tr.doc.resolve(from).marks() tr.delete(from, to) tr.insertText(newText, from) inherited.forEach(m => tr.addMark(from, from + newText.length, m)) const mark = diffSuggestionMarkType.create(attrs) tr.addMark(from, from + newText.length, mark) }
By the third or fourth failed attempt the agent reasoned its way to the gap. Direct quotes from Langfuse:
trace 6 · response to "does your edit dsl support formatting?""Looking honestly at the tool's capabilities: the edit tool for documents works by finding and replacing text, and applying marks to the replaced text requires thereplacecontent to carry the formatting inline. Unfortunately, the current document edit surface doesn't expose a mechanism to apply a bold mark…"
trace 13 · response to "make section 6 list into a numbered list?""I can see the full bullet list structure. I can replace the entirebulletListblock with anorderedList— let me do that now. The edit tool doesn't support replacing a block node with a different node type — thereplacefield only accepts a string, not a full ProseMirror node structure."
trace 17 · after testing"Thanks for testing that — soblock_attrsby user requestblock_attrsisn't honoured by the edit surface. Can do: find & replace text, insert new blocks, strip formatting. Cannot: apply inline marks (bold/italic/underline), apply font, font size, change indentation, convert list types."
Every user message in all 26 traces arrived with an ISO-8601 microsecond prefix:
2026-05-11T17:21:32.163916: how about font size?
Template at agent.py:1024-1026; fallback at :1070 · model sees this as content and sometimes echoes it back (T12, T14).
Fix: wrap in an XML tag (<currentTimestamp>…</currentTimestamp>) and add a prompt guardrail.
Tool description tells the model to think in block_id / content_version_id. No prompt anywhere says "don't surface these to the user".
5 of 8 partials surfaced strings like "I have the current block IDs and positions. Now I'll propose…". Fix lives in Langfuse prompt agent/lawrence-2, not the code.
Tool description for emails:// includes "Draft an email". For documents:// it includes "Draft a contract, demand letter". "draft" hits first.
Fix: tools/platform-vfs/.../create.py — explicit rule that any letter-format output is documents:// unless the user says email.
Populate 115s · research 161s · contract 182s · memo 211s. Card renders after streaming completes, not at first edit op.
Fix: emit the draft-card event on the first edit command, not on completion. Consider micro-batching the 43-op populate.
Bars scale to the slowest turn (T23 · 211s). Latency colour matches result; flags mark cross-cutting issues.
| # | Fix | Files | Effort | Unlocks tests |
|---|---|---|---|---|
| 1 | Prompt-only quick wins Hide internals · wrap timestamp · disambiguate "letter" · document edit-tool limits · clarify precedent flow → six concrete edits |
Langfuse agent/lawrence-2 v43 (dev) → promote · agent.py template fallback |
XS | 03 07 09 11 14 17 20 21 |
| 2 | Preserve inherited marks on replaceRe-apply marks after tr.insertText |
yjs-apply.ts:122-128 |
S · ½ day | 08 |
| 3 | Add format command end-to-endSchema variant · planner · add-format-mark op · stop hardcoding formatChange: null · tool description → full architecture grounding |
schemas + planners + yjs-apply + diff-suggestion-builder + matter_document_content.py | M · 3-5 d | 04 05 06 07 09 |
| 4 | Add set_block commandHeading level · alignment · indent via tr.setNodeMarkup → schema sketch |
same files as #3 | M · 3-5 d | 10 12 |
| 5 | List conversion via wrapInList/liftListItem→ list_op · note numId footgun |
prosemirror-schema-list-backed planner | M · 2-3 d | 11 |
| 6 | Latency / early card render Emit draft-card on first edit op · micro-batch 43-op populate |
streaming runtime · agent loop | M · independent track | 01 18 22 23 |
| 7 | Page break visible Verify pagination.ts extension is loaded for matter docs |
apps/legal-os editor wiring | XS | visual polish |
| Claim | File:line |
|---|---|
| Edit DSL accepts only 5 command types | apis/lawrence-api/src/content/vfs/schemas/matter-document-edit-schemas.ts:55-61 |
| Yjs apply switches on 5 operation kinds only | apis/lawrence-api/src/content/vfs/resolvers/yjs-apply.ts:101-172 |
replace = delete + insertText (loses marks) | yjs-apply.ts:122-128 |
formatChange: null hardcoded | apis/lawrence-api/src/content/vfs/resolvers/diff-suggestion-builder.ts:61 |
| Only 3 planners registered | apis/lawrence-api/src/content/vfs/resolvers/planners/index.ts:14-28 |
| Tool description enumerates same 5 commands | tools/platform-vfs/src/platform_vfs/editable_surfaces/matter_document_content.py:15-109 |
Editor mark supports formatChange.addedMarks / removedMarks | @lawhive/ooxml-extensions ≥ 0.5.0 · dist/diff-suggestion-mark-*.js:116, 219, 235 |
| Timestamp template | agents/chat/src/chat/agent/agent.py:1024-1026, 1041-1043, 1070 |
| Intent-classification not invoked in Lawrence-2 mode | agent.py:1144 (requires not use_vfs_tools) |
block_attrs not present anywhere | grep -rn "block_attrs\|blockAttrs" · zero matches both repos |
/tmp/langfuse-session.json — 26 traces · 45 MB/tmp/langfuse-session-report.md — per-trace summary · 19 KB/tmp/matter-dump.json — content-retrieval dev pull · 15 KB · 7 files · 2 notes · 0 artifacts/tmp/lawrence2-rca-report.md — deep code-mapping report · 12 KB · all file:line refs~/Downloads/Document Creat & Edit Test Sheet - Adolfo.pdf — original test sheet with status & screenshotsformat command be its own variant, or should edit grow optional addedMarks / removedMarks? Subagent recommendation: new variant.applyPlansToYjsUpdate, or Y.js sync? Add a Langfuse latency breakdown observation before chasing.