This is an English version of my article on Habr about autobb — the open-source perimeter scanner I built and use day-to-day. Since the original was published the pipeline has grown a lot, so the second half covers what landed after it.

Why another scanner

Most teams that watch a large external perimeter end up writing their own glue around the same handful of tools: subfinder, naabu, httpx, nuclei. The original setup I inherited was a chain of bash scripts that ran nmap and a few python tools one after another. It worked but was slow, hard to extend, and didn’t persist anything — every run started from zero.

I wanted three things:

  • one declarative config for scopes and tooling,
  • a database that remembers what we’ve already seen so we only alert on changes,
  • a single docker image that any teammate can run without reading 200 lines of bash.

That turned into autobb: a thin python wrapper around the usual Project Discovery stack, with MongoDB for state and pluggable alert backends.

Pipeline

The main recon entry point is subs.py. It runs four stages, each of which can be toggled with a flag so you can do a heavy nightly run and a lighter hourly one.

1. Subdomain discovery

For every scope autobb:

  • pulls passive sources with subfinder (API keys go into the config),
  • optionally brute-forces with a wordlist via shuffledns + massdns (--dns-brute),
  • generates permutations of everything it has seen so far with dnsgen and resolves them (--dns-alts),
  • enriches new hits with TLS SAN names, CNAME targets, and PTR records — every one of those is a free in-scope subdomain you’d otherwise miss,
  • harvests more hostnames and URLs out of every HTTP response saved on the previous run — both httpx probes and ffuf fuzz responses (more on this below),
  • cleans everything through puredns to drop wildcard noise.

Anything new is written to MongoDB and triggers a “new subdomain” alert. Anything that disappears or changes is also recorded — sub-takeover candidates usually fall out of this diff.

2. Port scanning

naabu scans newly discovered hosts (top 1000 by default) and, with --ports-olds, periodically re-scans previously known ones at a lighter depth (top 100). Newly opened ports are fed straight into nuclei network templates so a freshly exposed RDP or Redis is caught the same run it appears.

3. HTTP probing

httpx is used to find which combinations of host:port actually speak HTTP(S). To avoid hammering a single target the probe order is randomized across all scopes; on a large perimeter this matters more than people think — it’s the difference between getting useful data and getting WAF-banned.

The probe result (status, title, tech, headers, TLS, hashes) is diffed against the last one stored. Material changes — a new 200 where there used to be a 404, a new tech fingerprint, a new redirect — generate an alert.

4. Vulnerability checks

Two modes:

  • active (--nuclei) — run nuclei templates against new or changed targets,
  • passive (--passive) — re-evaluate templates against the cached httpx response without sending any traffic. About 700 of the public templates are pure response-matching, so this catches a lot for free and won’t get you rate-limited.

Custom templates sit in a separate directory and are layered on top of the bundled nuclei-templates. Excluding noisy ones is one config line.

Alerts

Out of the box: Telegram, VK Teams, and SMTP (added after the article). You can enable several at once:

alerts:
  use: [telegram, smtp]

  telegram:
    token: "BOT_TOKEN"
    chat_id: "CHAT_ID"
    msg_max_size: 4000

  smtp:
    host: smtp.example.com
    port: 587
    tls: true
    username: user@example.com
    password: "APP_PASSWORD"
    from: autobb@example.com
    to: [me@example.com]
    subject: autobb alert

When a message would exceed msg_max_size it’s attached as a .txt file instead of being truncated — handy on a noisy day when a single scope dumps a few hundred new probes.

What landed after the article

The article covered the core pipeline. Since then the project picked up a few things worth calling out:

This is the biggest recent change and the one I’ve gotten the most signal out of. Every HTTP response autobb makes is kept on disk — both the httpx probes (-srd) and the ffuf fuzz responses (-od). A new module walks every one of those savedirs and pulls out:

  • every absolute URL it can find with a regex over bodies and headers,
  • every relative URL (href="/foo", src='/bar.js', etc.), resolved against the page that contained it,
  • every bare-looking hostname (a-z0-9-. with at least one dot) it spots in JS, JSON, HTML, comments, redirect headers, Location headers, even backend stack traces.

Each candidate is normalized (lowercased, trailing punctuation stripped, ports kept on URLs, ports dropped from hostnames), filtered against the configured scopes (including sub_refilters), and written out as two artefacts the rest of the pipeline already understands:

  • subs.txt — bare in-scope hostnames, merged into the subdomain discovery flow next run,
  • links.txt — full in-scope URLs with port and query intact, fed into the HTTP probe/fuzz queue.

The net effect: any SPA that hard-codes API hostnames in a JS bundle, any 500 page that leaks an internal redirect, any swagger.json that lists endpoints, any directory that ffuf finds and that itself leaks more paths — all of it becomes new scope on the next cycle. In practice this finds far more hosts than DNS brute-forcing, and it’s essentially free because the responses are already on disk.

Scope-wide DNS permutations

--dns-alts used to permutate names within a single root domain. It now runs across all known names in a scope, so a pattern learned on one acquisition (api-canary-eu1.foo.com) gets tried against every other root in the same scope. Combined with the all-known brute pass, it’s noticeably better at catching the “same naming convention, different brand” pattern that big orgs leak constantly.

Stale rescans

The base pipeline only touches new/changed assets, which is fast but means findings can rot. The rescan: config block now schedules per-asset re-runs:

rescan:
  nuclei_interval_days: 14      # 0 disables periodic nuclei rescan
  httpfuzz_interval_days: 30    # 0 disables periodic httpfuzz rescan
  host_alive_in_days: 7         # only rescan probes seen alive within this window

So nuclei gets re-run against every probe at least every two weeks, ffuf at least once a month, but only if httpx still considers the host alive. This catches the case where a new template lands upstream and an old host suddenly becomes interesting — without re-scanning the whole perimeter on every cycle.

Alert history in the database

Alerts now get persisted to MongoDB instead of just being fired off.

fullscan.py — periodic deep scan

subs.py only nuclei-scans new or changed targets, which is great for daily runs but means a brand-new high-severity template won’t fire on hosts that haven’t changed. fullscan.py walks every alive host within a configurable window and re-runs nuclei at high/critical severity. I run it on a slower cadence (weekly) as a backstop.

docker run --rm -v $(pwd):/autobb --net autobbnet \
  --entrypoint python autobb fullscan.py

export.py — getting data out

The database is finally easy to query without writing mongo aggregations:

# live domains for one scope, one per line
export.py -g domains -s hackerone -p host

# http probes from the last 7 days as JSON
export.py -g http_probes -l 7

# everything added in the last 2 days
export.py -g domains -a 2 -p host

This is what feeds my ad-hoc tooling — fuzzing wordlists, JS endpoint extractors, manual triage spreadsheets.

HTTP fuzzing in the main pipeline

--http-fuzz runs ffuf against every newly discovered alive HTTP probe with the configured wordlists. New, non-404 paths feed into the same alerting and diff machinery as everything else, so a freshly exposed /admin/ or /.env shows up in the same channel as a new subdomain.

Better scope handling

Scopes grew a few quality-of-life features:

  • domains_file, cidr_file, ips_file — load targets from external files (useful when a scope is generated by some other process),
  • sub_refilters — per-scope regex blocklist to drop subdomains you don’t want in the pipeline (staging, dev, customer-specific tenants),
  • scope: !include ./scopes.yaml — keep scope definitions in a separate file so config.yaml stays small and reviewable.
scope:
  - name: hackerone
    domains: [hackerone.com]
    domains_file: extra_domains.txt
    sub_refilters:
      - \.(stage|dev)\.hackerone\.com$

Re-probing old assets

--workflow-olds re-runs httpx against subdomains you’ve seen before, not just new ones. That’s how you catch the case where an existing host quietly starts serving a different app — a new login page, a swapped backend, a fresh framework. In my experience this triggers more interesting findings than the new-subdomain stream.

Multiple alert sinks at once

alerts.use accepts a list, so the same findings can land in two places at once. Mostly I use this as a backup channel — if Telegram is unreachable from where autobb is running, SMTP still gets the alert out.

Trade-offs

Honest list of what autobb is and isn’t:

Good for

  • continuous monitoring of a known set of scopes,
  • catching diffs (new subdomain, new port, new tech, new path),
  • piggy-backing on the nuclei community templates for free coverage,
  • running on small hardware — 2 cores and 6GB of RAM handles a sizeable perimeter.

Not good for

  • one-shot pentests where you don’t care about state,
  • people who want a dashboard out of the box (there isn’t one; alerts + export.py is the UX),
  • targets that need browser-driven auth flows — chromium is bundled but the pipeline doesn’t log in for you.

Try it

git clone https://github.com/rivalsec/autobb.git
cd autobb
cp config.dist.yaml config.yaml
# edit scope + alerts
sudo docker build -t autobb .
sudo docker run --rm -v $(pwd):/autobb --net autobbnet autobb \
  --dns-brute --ports --nuclei

The README covers MongoDB setup, fresh resolvers via dnsvalidator, and the nf_conntrack: table full fix you’ll hit the first time you scan something large. PRs and template contributions welcome.