TL;DR
Paragraph-level coaching beats “page A → page B” suggestions by telling you where to add a link and what to change.
Try XML sitemaps first, fall back to HTML sitemap, then accept a CSV upload.
Strip everything that isn’t body copy before analysis to cut tokens and noise (nav, footer, widgets), while still mapping follow vs nofollow links.
The visual graph should be directional, clustered, searchable—and include ghost nodes for “missing topic” content opportunities.
Product risks are mostly operational: don’t store raw content, control API costs, enforce caps/cooldowns, and run audits via a job queue.
Fair warning: this post is part build log, part brainstorm recap, part “here’s what I actually learned going down this rabbit hole.” If you want polished and buttoned up, wrong blog. If you want the real thinking process behind starting a SaaS from scratch, keep reading.

The idea
So here’s what bugged me for a while: every internal linking tool I’d used was either crawl-data-only (Screaming Frog vibes, useful but not strategic) or automation-first (click a button, links appear, hope for the best). Neither one actually read my content.

What I actually wanted was something that could look at a page, understand what the paragraphs were saying, and tell me: “hey, in paragraph 3 you’re already talking about X, add a line about Y and you’ve got a perfect internal link opportunity right there.” I wanted paragraph-level coaching.
And then the graph. I wanted to see the whole thing. Visual, interactive, actually useful.
So I started workshopping it. That kicked off the rabbit hole.
First question: what does it even need to ingest?
I didn’t want to build a crawler. Crawling sites yourself is slow, expensive, and annoying to maintain. So I needed URLs another way.
The intake priority ended up being: start with the sitemap.
- Auto-detect the XML sitemap (tries the common variations:
/sitemap.xml,/sitemap_index.xml,/post-sitemap.xml, etc.) - HTML sitemap as a fallback
- Manual upload, a Screaming Frog export CSV works fine
Once you have the URLs, the tool fetches and reads the full page content itself. Not headings only. Not metadata. The actual paragraphs.
If you want paragraph-level recommendations, you have to fetch and analyze rendered body copy—titles and headings alone won’t cut it.
Second question: what do you do with the content once you have it?
This is where it got interesting. The tool needs to understand what each page is actually about, not just what keywords are in the title tag. It has to read for meaning.
Keywords: You can input a target keyword manually, the tool can suggest the top 3 candidates and let you confirm, or it can just infer. All three workflows matter.
Content stripping: Fetch the page, yes, but strip everything that isn’t body copy. Nav, footer, sidebars, cookie banners, author bios, related post widgets. All gone.
Follow vs. nofollow: The tool needs to map existing links on the page but should it treat followed and nofollowed links the same? Answer: show both, but let the user toggle between views. SEOs care about the distinction.
Third question: what about the visual graph?
This is the thing I cared most about getting right. I’d seen Screaming Frog’s visualizations. Useful for crawl architecture, not what I needed. I wanted strategy, not just structure.
What I landed on: make the graph do real work.
- Nodes = pages. Size reflects authority (how many internal links point to it).
- Edges = links. Green = already exists. Orange = recommended. Red = orphan (no connections at all, floating alone).
- Directional arrows, not just lines. If A should link to B, the arrow goes A → B.
- Ghost nodes. If the tool figures out you’re missing a topic that would connect 4 existing posts, it adds a placeholder node in the graph, gray, dismissible, clearly marked as “suggested content.”
- Cluster mode by default, with an expand/contract all toggle.
- Search, type a URL or keyword, relevant nodes light up.
Click any node and a detail panel opens: recommended links to add (with anchor text suggestions), what should be pointing here, content addition suggestions. Built modular for later metrics.
Fourth question: does this already exist?
Good question. And yes, kind of. But not quite in this form.
LinkVector has a visual graph, orphan detection, topic cluster visualization, and crawl depth analysis. Pretty close on the surface. But it’s built for site owners optimizing their own linking, and its graph shows existing structure, not AI-inferred opportunities. No paragraph-level coaching.
LinkBoss is NLP-based, finds contextually relevant links, has silo presets (Reverse Silo, Priority Silo, Circle Silo, actually smart). Also does automated link insertion into your CMS. The closest thing to paragraph-level analysis, but it’s doing it to automate insertion, not to coach you. Different philosophy entirely.
Screaming Frog is crawl data and architecture visualization. Great tool, wrong job. Helpful, but not strategic.

The gap is real: none of them read your paragraphs and say “add a sentence here, here’s why, here’s what it unlocks.” That’s still open space.
Fifth question: what are the real risks if this becomes a product?
This is the part most people skip and then get burned by six months in. So I went through them one by one. Operational details become product reality.
Storage — the answer is don’t store raw page content at all. Fetch it, analyze it, store only the structured output: the link graph data, scores, recommendations, metadata. Lean database, fewer headaches.
To avoid legal/privacy and cost creep, don’t cache raw third-party page content. Store only structured outputs (graphs, scores, recs) and re-fetch when needed.
API costs — full page content is non-negotiable for the quality of the output. But smart minification (stripping everything except body copy) can cut token count by roughly 30–40% before a single call gets made. Fewer, smarter calls win.
Rate limiting and abuse — one audit running per user at a time. Hard URL cap (1,000 per audit). 24-hour cooldown per site. Email verification on signup.
Async job queue so one power user can’t spike the server for everyone else. Put guardrails in early.
Audit lifecycle — audits auto-expire after 90 days of inactivity. Warning email at 75 days. One button to reset the clock.
Re-running an audit replaces the old one, doesn’t stack. Orphaned accounts get their data purged after 6 months but the account stays. Data should not linger forever.
The job queue — BullMQ + Redis. Pages processed in batches of 10–20, not all at once. Progress bar in the UI (“analyzing 47 of 200 pages”) so the wait feels like thoroughness, not slowness.
Jobs save progress on failure so they can resume instead of starting over. Resumable jobs are mandatory.
Security — Cloudflare handles DDoS, WAF, rate limiting at the edge, SSL. The application layer handles SSRF protection, API key security (server-side only, never in the frontend), input sanitization, JWT auth with expiry, bcrypt, per-user rate limiting.
GDPR: minimal data collection, privacy policy, right to deletion built in from day one. Design security before launch.
Sixth question: what’s the actual tech stack?
- Frontend: Next.js + React
- Graph visualization: React Flow (modular component, can swap to D3 later if needed)
- Backend: Node.js
- Database: PostgreSQL
- Queue: BullMQ + Redis
- Email: Resend
- Auth: Auth.js or Supabase Auth
- Hosting: Hostinger VPS behind Cloudflare
- Dev workflow: Docker from day one, local first, deploy when ready
PostgreSQL and Redis are already running from another project. It’s building on existing infrastructure.
The name: Linkbase
After going through about 80 options (yes, really), we landed on Linkbase. Short, descriptive, doesn’t try too hard. It’s the base layer for everything link-strategy-related on a site. Good enough.
Where it’s at now
The PRD is done. Competitive landscape is mapped. Tech stack is decided. Security model is thought through before a single line of code exists, which if you’ve ever had to retrofit rate limiting onto a live SaaS you know is the right order to do things. Plan first, then build.
Next step: build. Starting local with Docker, shipping to the VPS when it’s ready to show people. Time to turn ideas into code.
I’ll document the build as I go. If you’re building something similar or have thoughts on the approach, drop it in the comments. I want feedback while it’s malleable.














