Internal linking tool: does one exist and what should you build?

Explore whether an internal linking tool that reads paragraphs and suggests paragraph-level links exists, and learn what to build next to fill the gap now.

Spread the love

TL;DR

Paragraph-level coaching beats “page A → page B” suggestions by telling you where to add a link and what to change.

Try XML sitemaps first, fall back to HTML sitemap, then accept a CSV upload.

Strip everything that isn’t body copy before analysis to cut tokens and noise (nav, footer, widgets), while still mapping follow vs nofollow links.

The visual graph should be directional, clustered, searchable—and include ghost nodes for “missing topic” content opportunities.

Product risks are mostly operational: don’t store raw content, control API costs, enforce caps/cooldowns, and run audits via a job queue.

Fair warning: this post is part build log, part brainstorm recap, part “here’s what I actually learned going down this rabbit hole.” If you want polished and buttoned up, wrong blog. If you want the real thinking process behind starting a SaaS from scratch, keep reading.

Crashing adam driver gif by sony pictures gif via giphy.
“crashing adam driver gif” via giphy.

The idea

So here’s what bugged me for a while: every internal linking tool I’d used was either crawl-data-only (Screaming Frog vibes, useful but not strategic) or automation-first (click a button, links appear, hope for the best). Neither one actually read my content.

Eye-catching thumbnail of an seo tool concept
Discover the new seo tool ideas i explored.

What I actually wanted was something that could look at a page, understand what the paragraphs were saying, and tell me: “hey, in paragraph 3 you’re already talking about X, add a line about Y and you’ve got a perfect internal link opportunity right there.” I wanted paragraph-level coaching.

And then the graph. I wanted to see the whole thing. Visual, interactive, actually useful.

So I started workshopping it. That kicked off the rabbit hole.

First question: what does it even need to ingest?

I didn’t want to build a crawler. Crawling sites yourself is slow, expensive, and annoying to maintain. So I needed URLs another way.

The intake priority ended up being: start with the sitemap.

  • Auto-detect the XML sitemap (tries the common variations: /sitemap.xml, /sitemap_index.xml, /post-sitemap.xml, etc.)
  • HTML sitemap as a fallback
  • Manual upload, a Screaming Frog export CSV works fine

Once you have the URLs, the tool fetches and reads the full page content itself. Not headings only. Not metadata. The actual paragraphs.

Info icon.

Tip

If you want paragraph-level recommendations, you have to fetch and analyze rendered body copy—titles and headings alone won’t cut it.

Second question: what do you do with the content once you have it?

This is where it got interesting. The tool needs to understand what each page is actually about, not just what keywords are in the title tag. It has to read for meaning.

Keywords: You can input a target keyword manually, the tool can suggest the top 3 candidates and let you confirm, or it can just infer. All three workflows matter.

Content stripping: Fetch the page, yes, but strip everything that isn’t body copy. Nav, footer, sidebars, cookie banners, author bios, related post widgets. All gone.

Follow vs. nofollow: The tool needs to map existing links on the page but should it treat followed and nofollowed links the same? Answer: show both, but let the user toggle between views. SEOs care about the distinction.

Third question: what about the visual graph?

This is the thing I cared most about getting right. I’d seen Screaming Frog’s visualizations. Useful for crawl architecture, not what I needed. I wanted strategy, not just structure.

What I landed on: make the graph do real work.

  • Nodes = pages. Size reflects authority (how many internal links point to it).
  • Edges = links. Green = already exists. Orange = recommended. Red = orphan (no connections at all, floating alone).
  • Directional arrows, not just lines. If A should link to B, the arrow goes A → B.
  • Ghost nodes. If the tool figures out you’re missing a topic that would connect 4 existing posts, it adds a placeholder node in the graph, gray, dismissible, clearly marked as “suggested content.”
  • Cluster mode by default, with an expand/contract all toggle.
  • Search, type a URL or keyword, relevant nodes light up.

Click any node and a detail panel opens: recommended links to add (with anchor text suggestions), what should be pointing here, content addition suggestions. Built modular for later metrics.

Fourth question: does this already exist?

Good question. And yes, kind of. But not quite in this form.

LinkVector has a visual graph, orphan detection, topic cluster visualization, and crawl depth analysis. Pretty close on the surface. But it’s built for site owners optimizing their own linking, and its graph shows existing structure, not AI-inferred opportunities. No paragraph-level coaching.

LinkBoss is NLP-based, finds contextually relevant links, has silo presets (Reverse Silo, Priority Silo, Circle Silo, actually smart). Also does automated link insertion into your CMS. The closest thing to paragraph-level analysis, but it’s doing it to automate insertion, not to coach you. Different philosophy entirely.

Screaming Frog is crawl data and architecture visualization. Great tool, wrong job. Helpful, but not strategic.

Prison fah gif by foilarmsandhog gif via giphy.
“prison fah gif” via giphy.

The gap is real: none of them read your paragraphs and say “add a sentence here, here’s why, here’s what it unlocks.” That’s still open space.

Fifth question: what are the real risks if this becomes a product?

This is the part most people skip and then get burned by six months in. So I went through them one by one. Operational details become product reality.

Storage — the answer is don’t store raw page content at all. Fetch it, analyze it, store only the structured output: the link graph data, scores, recommendations, metadata. Lean database, fewer headaches.

Warning callout icon.

Warning

To avoid legal/privacy and cost creep, don’t cache raw third-party page content. Store only structured outputs (graphs, scores, recs) and re-fetch when needed.

API costs — full page content is non-negotiable for the quality of the output. But smart minification (stripping everything except body copy) can cut token count by roughly 30–40% before a single call gets made. Fewer, smarter calls win.

Rate limiting and abuse — one audit running per user at a time. Hard URL cap (1,000 per audit). 24-hour cooldown per site. Email verification on signup.

Async job queue so one power user can’t spike the server for everyone else. Put guardrails in early.

Audit lifecycle — audits auto-expire after 90 days of inactivity. Warning email at 75 days. One button to reset the clock.

Re-running an audit replaces the old one, doesn’t stack. Orphaned accounts get their data purged after 6 months but the account stays. Data should not linger forever.

The job queueBullMQ + Redis. Pages processed in batches of 10–20, not all at once. Progress bar in the UI (“analyzing 47 of 200 pages”) so the wait feels like thoroughness, not slowness.

Jobs save progress on failure so they can resume instead of starting over. Resumable jobs are mandatory.

Security — Cloudflare handles DDoS, WAF, rate limiting at the edge, SSL. The application layer handles SSRF protection, API key security (server-side only, never in the frontend), input sanitization, JWT auth with expiry, bcrypt, per-user rate limiting.

GDPR: minimal data collection, privacy policy, right to deletion built in from day one. Design security before launch.

Sixth question: what’s the actual tech stack?

  • Frontend: Next.js + React
  • Graph visualization: React Flow (modular component, can swap to D3 later if needed)
  • Backend: Node.js
  • Database: PostgreSQL
  • Queue: BullMQ + Redis
  • Email: Resend
  • Auth: Auth.js or Supabase Auth
  • Hosting: Hostinger VPS behind Cloudflare
  • Dev workflow: Docker from day one, local first, deploy when ready

PostgreSQL and Redis are already running from another project. It’s building on existing infrastructure.

The name: Linkbase

After going through about 80 options (yes, really), we landed on Linkbase. Short, descriptive, doesn’t try too hard. It’s the base layer for everything link-strategy-related on a site. Good enough.

Where it’s at now

The PRD is done. Competitive landscape is mapped. Tech stack is decided. Security model is thought through before a single line of code exists, which if you’ve ever had to retrofit rate limiting onto a live SaaS you know is the right order to do things. Plan first, then build.

Next step: build. Starting local with Docker, shipping to the VPS when it’s ready to show people. Time to turn ideas into code.

I’ll document the build as I go. If you’re building something similar or have thoughts on the approach, drop it in the comments. I want feedback while it’s malleable.

Does anyone else feel frustrated with seo tools that promise the world but only deliver crawl data; or is it just me? 🤔
Does anyone else feel frustrated with seo tools that promise the world but only deliver crawl data; or is it just me? 🤔

Leave a Comment

Frequently asked questions (FAQ)

LiaisonLabs is your local partner for SEO & digital marketing services in Mount Vernon, Washington. Here are some answers to the most frequently asked questions about our SEO services.

SEO (Search Engine Optimization) is the process of improving your website's visibility in search engines like Google. When potential customers in Mount Vernon or Skagit County search for your products or services, SEO helps your business appear at the top of search results. This drives more qualified traffic to your website—people who are actively looking for what you offer. For local businesses, effective SEO means more phone calls, more foot traffic, and more revenue without paying for every click like traditional advertising.

A local SEO partner understands the unique market dynamics of Skagit Valley and the Pacific Northwest. We know the seasonal patterns that affect local businesses, from tulip festival tourism to agricultural cycles. Local expertise means we understand which keywords your neighbors are searching, which directories matter for your industry, and how to position your business against local competitors. Plus, we're available for in-person meetings and truly invested in the success of our Mount Vernon business community.

SEO is a long-term investment, and most businesses begin seeing meaningful results within 3 to 6 months. Some quick wins—like optimizing your Google Business Profile or fixing technical issues—can show improvements within weeks. However, building sustainable rankings that drive consistent traffic takes time. The good news? Unlike paid advertising that stops the moment you stop paying, SEO results compound over time. The work we do today continues delivering value for months and years to come.

SEO pricing varies based on your goals, competition, and current website health. Local SEO packages for small businesses typically range from $500 to $2,500 per month, while more comprehensive campaigns for competitive industries may require a larger investment. We offer customized proposals based on a thorough audit of your website and competitive landscape. During your free consultation, we'll discuss your budget and create a strategy that delivers measurable ROI—because effective SEO should pay for itself through increased revenue.

Both aim to improve search visibility, but the focus differs significantly. Local SEO targets customers in a specific geographic area—like Mount Vernon, Burlington, Anacortes, or greater Skagit County. It emphasizes Google Business Profile optimization, local citations, reviews, and location-based keywords. Traditional SEO focuses on broader, often national rankings and prioritizes content marketing, backlink building, and technical optimization. Most Mount Vernon businesses benefit from a local-first strategy, though many of our clients combine both approaches to capture customers at every stage of their search journey.

Absolutely! SEO and paid advertising work best as complementary strategies. Google Ads deliver immediate visibility and are great for testing keywords and driving quick traffic. SEO builds sustainable, long-term visibility that doesn't require ongoing ad spend. Together, they create a powerful combination—ads capture immediate demand while SEO builds your organic presence over time. Many of our Mount Vernon clients find that strong SEO actually improves their ad performance by increasing Quality Scores and reducing cost-per-click, ultimately lowering their total marketing costs while increasing results.