// html-to-md
Convert HTML to clean markdown.
Paste raw HTML or drop a URL. SuperMD strips nav, scripts, and noise — converts the content to clean markdown your LLM can actually read.
// what gets removed
Markdown output appears here
// why html to markdown
HTML is for browsers. Markdown is for LLMs.
Raw HTML burns context window on tags, attributes, inline styles, and script blocks the model has to mentally discard. A typical blog post HTML is 3–5× larger than its markdown equivalent — and the model still has to extract the same content.
Converting to markdown first strips the noise and gives your LLM a clean, linear version of the content — the same information, fewer tokens, better results.
// html
<div class="post-content
container mx-auto">
<h1 class="text-4xl font-bold
mb-4 tracking-tight">
Hello World
</h1>
<p class="text-gray-600
leading-7">
Content here...
</p>
</div>~180 tokens
// markdown
# Hello World Content here...
~12 tokens
// use cases
When do you convert HTML to markdown?
Web scraping for RAG pipelines
RAGScrape pages, strip HTML, get clean markdown chunks ready to embed. Avoids the HTML-parsing step in your ingestion pipeline.
Feeding docs to your LLM
ContextPaste API docs, blog posts, or changelogs as markdown instead of HTML. The model focuses on the content, not the DOM structure.
Converting CMS content
CMSExport from WordPress, Notion, or any CMS as HTML — convert to markdown for storage, version control, or LLM consumption.
Documentation ingestion
DocsTechnical docs are often HTML-heavy. Convert them to clean markdown before adding to a knowledge base or feeding to an AI assistant.
// faq
Frequently asked questions
Does the HTML get sent to a server?
Only when using the URL tab — the server fetches the page on your behalf to avoid CORS restrictions. When you paste HTML directly, conversion runs entirely in your browser using the Turndown library. No HTML is stored.
What does 'fetch URL' do exactly?
The server fetches the public URL, extracts the main content block (article, main, or body), strips scripts/styles/nav/footer, and returns the HTML. Turndown then converts it to markdown in your browser.
Can I convert private or authenticated pages?
Not via the URL tab — the fetch runs server-side without your session cookies. For authenticated content, copy the page source (Cmd+U in Chrome) and paste it into the Paste tab instead.
How does it compare to Pandoc?
Pandoc produces more complete conversions for complex HTML but requires a local install. This tool runs entirely in the browser, handles common web page patterns well, and is optimised for LLM consumption rather than document fidelity.