Pete’s PDF to MD: Turning PDFs into AI-Ready Markdown

Screenshot of the “Pete’s PDF to MD” desktop application with the subtitle “Outline + section workflow for AI-ready markdown.” Along the top are buttons: “Select PDF,” “Up to Date,” “Open Output Folder,” “About,” and “Support.”

The input path shows: D:\My Documents\Downloads\Science SNC1W.pdf. The output root is set to: D:\My Documents\Pete's PDF to MD Output, with a “Change” button and a checkbox labeled “Include section info in files.” Status reads “Ready.” A note warns: “PDF structure varies by source. Heading detection and section text extraction may not be 100% accurate; please review output before publishing or automation.”

The interface is split into three columns:

Left panel titled “Outline Preview” shows a Markdown-style outline beginning with “# Outline” and nested headings such as “L3 p1 [THE ONTARIO CURRICULUM],” “L1 p1 [Science],” and entries with file paths like sections/1.0-science.md including line and character counts.

Middle panel titled “Sections” lists clickable section titles including “Secondary,” “Reporting Student Achievement,” “Elementary,” and “The Achievement Chart for Science and Technology, Grades 1–8,” which is highlighted.

Right panel titled “Section Content” displays the selected section rendered as Markdown. A checkbox “Render Markdown” is checked. The visible heading reads “The Achievement Chart for Science and Technology, Grades 1–8,” followed by descriptive text and a table with columns “Categories,” “Level 1,” “Level 2,” “Level 3,” and “Level 4,” describing levels of student achievement. Scrollbars are visible in each panel.

I just released version 0.6.0 of a little open-source tool I’ve been working on: Pete’s PDF to MD.

It converts PDFs into Markdown that actually works well with AI tools.

You can grab it here:
https://github.com/pbeens/Pete-s-PDF-to-MD

Why I Built ItI kept running into the same issue.

Uploading a PDF to ChatGPT, Gemini, Copilot, or NotebookLM means handing over something designed for print layout, not structured text. Headings aren’t always real headings. Text flow can break. Multi-column layouts get weird fast.

Markdown is clean. Headings are explicit. Sections are predictable.

So I started converting PDFs to Markdown before feeding them into AI. Eventually I turned that into a proper tool.

What It Does Right Now (v0.6.0)The main focus at this stage is extracting a reliable heading outline and splitting the document into logical sections.

It:

  • Pulls the embedded outline from the PDF
  • Generates an outline.md
  • Creates structured section files
  • Outputs merged Markdown per heading
  • Organizes everything into a clean output folder

Under the hood it uses Node.js and Python with PyMuPDF (fitz) .

There are prebuilt macOS and Windows versions under the Releases section of the repo (look on the right side of the GitHub page). If you prefer building from source, the README walks through that step by step.

The GUI (And the Right-Click Features I Actually Use)

There’s a minimal Electron desktop app wrapped around the conversion pipeline.

You can:

  • Select or drag-and-drop a PDF
  • Choose your output folder
  • Run conversion
  • Preview the generated outline
  • Browse individual sections

For each generated section, you can right-click and:

  • Open in default program (I recommend VS Code if you don’t already have a Markdown editor)
  • Open in folder
  • Copy the full file path

That last one is surprisingly useful when you’re scripting, feeding specific sections into an LLM, or referencing files elsewhere.

It’s intentionally lightweight. The goal was to make the workflow smoother without turning it into a giant desktop app.

A Quick Reality Check About PDFs

PDFs vary wildly.

If the file has a proper embedded outline or table of contents, results are usually very good.

If it’s poorly structured, text flow and headings won’t always be perfect.

If it’s a scanned document, it needs OCR first. That part isn’t highlighted enough in the README yet, but it matters. No embedded text means there’s nothing clean to extract.

This tool works best when the input PDF has some structure to work with.

What’s Coming Next

Right now sections are split strictly based on headings.

I’m planning to add smarter merging of sections. Some PDFs generate tons of tiny fragments that aren’t ideal for AI workflows. Being able to combine logical chunks will make the output more practical when working with LLMs.

If you use AI heavily for research, document analysis, or summarizing long reports, converting to Markdown first can make a noticeable difference.

It’s GPL-3.0 and open source. Fork it, modify it, improve it.

If you try it out, I’d genuinely love to hear how it works for you.

Did it handle your PDFs well? Break in weird places? Save you time? Annoy you in creative new ways?

Drop your thoughts in the comments below. Feedback, feature requests, and real-world use cases are what will shape the next version.

Leave a Reply