Supporting diagrams and complex documents #1566

skhell · 2026-02-22T15:34:30Z

skhell
Feb 22, 2026

Ciao,
this is a brainstorm proposal thread. If agreed, concrete work would track under #171 (or we can open a dedicated tracking issue if preferred).

Problem

Diagrams in general (they can even come from a sketch in microsoft paint, whiteboard something nested in a pdf or other document but also some complex that use Visio or other platforms) are often the real source of truth (including e.g. network topologies, dependencies, entity diagrams) sometimes as standalone files, sometimes embedded as images inside architecture/ARM-style documents but they’re difficult to:

index and search
diff/review
feed into RAG/LLM pipelines reliably

I see this repeatedly in large SharePoint environments with hundreds of thousands of files: even assistants (e.g. Copilot) can misinterpret or miss key relationships when diagrams are only available as raster images and indeed make no sense to process documents that have 100+ pages if not properly formatted that's represent a huge token consumption ending probably ignoring what matter or having allucinations. The same happens with on-prem LLM pipelines.

Exporting diagrams as images is usually not enough: text is extremely small in complex wide architectures, connectors/waypoints get lost, and raster exports can quickly become unreadable.

Proposal (MVP 1): Visio .vsdx to Markdown extraction

Start with Visio .vsdx (OpenXML) to Markdown extraction:

Page names/titles (and basic page properties if useful as headings/metadata)
Visible shape text (labels)
Simple, stable, diff-friendly Markdown structure

Example output sketch:

# <document>
## Page: <name>
- <shape label>
(optional) group by layers/containers if reliable

Goal (MVP 2): Connectors/waypoints to Mermaid (best-effort)

Attempt to represent nodes/edges as a Mermaid graph when relationships can be extracted reliably
Keep this best-effort/optional because connector semantics vary and stability matters

Implementation approach

I see this as a VsdxConverter using safe OpenXML parsing what you think?
Keep dependencies optional or at minimum
favor processing of multiple documents in automated manner

Acceptance criteria

Works cross-platform (Linux/macOS/Windows) without requiring Office
Includes tests with a small .vsdx fixture
Produces stable Markdown suitable for search/RAG and code review
Must be implemented considering also to leave open space to others (Draw.io, Lucidcharts, Edraw etc)

Let me know, Tia Zanella

boy397 · 2026-03-03T18:35:18Z

boy397
Mar 3, 2026

Python: There is no perfect off-the-shelf "Visio-to-Markdown" library. However, the vsdx library allows you to read and navigate Visio files. For raw extraction, developers typically use standard libraries: zipfile (to open the container) combined with lxml or xml.etree.ElementTree to parse the XML.

TypeScript: The ecosystem is similar. There isn't a dedicated, robust .vsdx parser. You would rely on jszip to extract the archive in memory, and an XML parser like fast-xml-parser or xmldom to navigate the nodes.

0 replies

boy397 · 2026-03-03T18:40:28Z

boy397
Mar 3, 2026

These are the steps you can follow :

Initialization: Parse CLI flags (e.g., input path, output format, recursive directory scan) using the language's standard flag library.

Decompression: Open the .vsdx file as a ZIP archive in memory (avoiding writing temp files to the disk).

Manifest Parsing: Read [Content_Types].xml and the _rels/.rels files to locate the main document and page structures.

Page Extraction: Navigate to the visio/pages/ directory inside the archive to find the individual page*.xml files.

Node/Text Parsing: Iterate through each page XML. Extract the elements (to grab ) and the elements (to map relationships).

Formatting & Output: Transform the extracted data into Markdown/Mermaid syntax and print it directly to stdout.

Does this makes sense??

0 replies

aniruddhaadak80 · 2026-03-09T22:51:36Z

aniruddhaadak80
Mar 9, 2026

From my point of view, this is a strong direction because diagrams often carry the highest-value relationships while remaining almost unusable for search and LLM pipelines once flattened into images. Starting with stable text and shape extraction before ambitious graph reconstruction feels like the right order.

If Mermaid stays best-effort and the Markdown output remains deterministic for diffing and indexing, that would already unlock a lot of value for real document corpora.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supporting diagrams and complex documents #1566

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Supporting diagrams and complex documents #1566

Uh oh!

skhell Feb 22, 2026

Problem

Proposal (MVP 1): Visio .vsdx to Markdown extraction

Goal (MVP 2): Connectors/waypoints to Mermaid (best-effort)

Implementation approach

Acceptance criteria

Replies: 3 comments

Uh oh!

boy397 Mar 3, 2026

Uh oh!

boy397 Mar 3, 2026

Uh oh!

aniruddhaadak80 Mar 9, 2026

skhell
Feb 22, 2026

boy397
Mar 3, 2026

boy397
Mar 3, 2026

aniruddhaadak80
Mar 9, 2026