Scanner guide16 min read

Scanner Rate Limiting and Throttling: A Production Scanning Guide

The rate at which a scanner sends requests is a coverage decision, not a performance decision. Programmes that pick a single global rate, open every scan at full speed, and ignore 429 feedback either truncate coverage in silence or strain production systems and lose asset owner trust. Programmes that operate rate as a per-target budget, ramp into the rate, honour back-off signals, and record the adaptation produce scans that complete, stay accurate, and survive an audit read. The defensible rate is the one at which the target stays healthy, the WAF stays passive, and the response classifier stays accurate.

This guide covers how to choose a starting scan rate per target class, how to use a ramp profile to find the operating rate without saturating, how to handle HTTP 429 and Retry-After feedback, where WAFs and CDNs change the rate calculus, how authenticated and API rate management differ from external scanning, the concurrency-versus-RPS decision, the off-hours-versus-business-hours trade-off, what compliance auditors read into the scan log, and how internal security, AppSec, vulnerability management, cloud security, and security engineering teams operate rate as part of the scanning discipline rather than as an afterthought.

Rate is a coverage decision, not a speed decision

Most scanner failures attributed to rate are coverage failures wearing a different label. A scanner that hits a per-IP rate limit halfway through an injection-class module produces an empty result for the second half of the module and a clean-looking scan record for the first half. The aggregate output reads as completed scanning. The actual coverage is whatever fraction of the test corpus made it through before the limit fired. The defensible read is that absence of findings under a rate-limited scan is a coverage drop dressed as a clean scan, not a negative result.

The same logic applies to WAF blocking, CDN burst rules, and origin-side connection pool exhaustion. Each one converts a subset of scanner requests from informative responses into inconclusive ones. The scanner that does not record the inconclusive-response rate per module produces output that looks complete and is actually partial.5,8,14

The discipline that survives is to read scanner output paired with rate-feedback telemetry rather than as an isolated finding list. A scan that completed its modules with under 1 percent inconclusive responses is a defensible coverage claim; a scan that completed its modules with 30 percent inconclusive responses is a coverage gap that no severity calibration or finding deduplication exercise will fix downstream.

Starting rate by target class

The starting rate is a guess; the operating rate is the one feedback signals settle on after the first ramp. The starting rate matters because picking too high triggers burst-detection rules in the first sixty seconds, and picking too low pushes scan duration into the next maintenance window. The table below sets reasonable starting ranges per target class. Teams should adjust based on the asset criticality, the infrastructure tier, and the historical scan record.

Target classStarting RPS per workerWorker concurrency
Production customer-facing app behind CDN/WAF1 to 51 to 4 with back-off enabled
Production internal-facing business app5 to 102 to 4
Pre-production environment, no rate-sensitive integrations10 to 254 to 8
Staging environment with synthetic data25 to 1008 to 16, depends on infrastructure
Authenticated production app, single-session model1 to 51 to 4 with shared session
API with published rate-limit headersRead X-RateLimit-Limit, target 60-80 percent of capSingle worker per token, multiple tokens by design
Code scan (SAST/SCA) in CI runnerNot applicable, runner concurrency only1 to 4 per repo, scaled by runner count

Two operational rules make the starting rate easier to defend. First, document the starting RPS and concurrency in the scan policy so the rate decision is auditable rather than a tester preference. Second, treat the starting rate as a hypothesis the ramp tests, not as the final operating rate. The page on scan scoping and target selection covers the upstream decisions that shape what the rate operates against.

Use a ramp profile, not a flat-rate open

A ramp profile increases scanner request rate from a low baseline to the target rate over a defined interval rather than opening at full rate. The ramp surfaces three operating signals that a flat open hides.

Rate-limit threshold discovery

The ramp climbs until the first 429, the first WAF challenge, or the first latency inflection. The rate at the inflection is the operating rate the asset actually tolerates, often well below the rate the operator guessed. The ramp produces a durable operating rate per target without a separate calibration scan.

Burst-detection rule avoidance

CDNs (Cloudflare, Akamai, Fastly) and WAFs run burst-detection rules against sudden traffic spikes from new source IPs. A scanner that opens at 50 RPS from an unfamiliar IP almost always trips a burst rule and converts subsequent requests into challenge pages. The ramp keeps the scan under the burst threshold.

Operator visibility

The ramp gives the operator a visible inflection point in the scan log where the rate stops climbing because of feedback. The inflection is the artefact that explains the operating rate to an asset owner who asks why the scan ran at the rate it did.

Practical ramp shape

Start at 10 percent of the target rate, double every 60 seconds, halt the climb on the first feedback signal, then operate at the rate the climb settled on. The shape is geometric on the way up, linear on the way down. Programmes that ramp arithmetically take longer to reach the operating rate without producing better signal.

Handling HTTP 429 and Retry-After

HTTP 429 (Too Many Requests, RFC 6585) is the canonical signal that the scanner has exceeded the target throttle. The Retry-After header (RFC 7231) carries the wait either as a delta-seconds integer or as an HTTP-date. The disciplined response is to back off, honour the Retry-After value, and reduce the operating rate by a configurable factor (commonly 0.5 to 0.75 of the prior rate).1,2,3

SignalDisciplined response
HTTP 429 with Retry-After delta-secondsPause for the delta, halve the operating rate, resume the module from the same offset, log the back-off event.
HTTP 429 without Retry-AfterPause for an exponential back-off (start 5 seconds, double on repeat 429), halve the rate, resume.
HTTP 503 Service Unavailable with Retry-AfterPause for the Retry-After value, halve the rate. 503 indicates origin or upstream pressure; back off harder than for 429.
Sustained 5xx error rate above 5 percentPause the module, lower the rate by 0.5x, restart the module from the start so the partial result is not mistaken for completion.
Latency p95 doubles versus baselineLower the rate by 0.75x without pause. Doubled latency without explicit error responses is a leading indicator of impending throttle or origin saturation.
Connection reset by peer (RST)Drop concurrency by one worker, retain rate per worker, log the reset rate per minute.
CAPTCHA or JavaScript challenge page returnedHalt the scan, the WAF is now blocking. Resolve the allowlist or move source IP before resuming. Do not parse the challenge as application response.

The scan record should capture the back-off event, the Retry-After value (when present), the affected module, and the new operating rate so the audit trail shows where the scan adapted. Scanners that ignore 429 produce noisy or destructive output; scanners that record the adaptation produce defensible coverage at a slightly slower clock.5,8

WAF and CDN rate limits change the calculus

WAFs and CDNs sit between the scanner and the application and apply their own rate baselines that are usually narrower than the application rate budget. A WAF challenge page (CAPTCHA, JavaScript redirect, IP block) returned in place of the application response converts every subsequent test into an inconclusive answer regardless of the application state. The blocking question is operationally upstream of the rate question: a scan that runs at a perfectly tuned rate against a WAF that is silently blocking still produces inconclusive output.

  • Allowlist the scan source IP range: the WAF inspects but does not block, the rate operates against the application, and the scan output reflects application behaviour. The allowlist should be narrow (specific IPs, time-bounded, audited) rather than broad.
  • Lower the rate to under the WAF baseline: if allowlisting is not available, the operating rate has to stay under the rule fire threshold. The ramp profile finds the threshold; the operator sets the rate just below it.
  • Run from inside the network perimeter: for assets where the WAF is the perimeter, scanning from inside bypasses the WAF and the rate question reverts to the origin-side budget. The trade-off is reduced perimeter coverage; some findings only surface on the externally reachable surface.
  • Coordinate the maintenance window: brief WAF rule pause for the scan duration is sometimes safer than a permanent allowlist. The pause should be time-bounded, scoped to the scan source, and recorded in the change log so the audit trail shows the WAF state during the scan.

The page on scanner blocking and WAF allowlisting covers where blocks land in a typical stack, how to write a narrow allowlist rule that survives audit, and how to detect partial blocks before the report ships.

Authenticated scan rate is a different budget

Authenticated scans run with valid credentials against the surface behind login and tend to need much lower rates than external scans. Every authenticated request creates session state, may write data, and may trip per-account rate limits that apply only to logged-in traffic. Authenticated scanners also need to respect login-flow throttling: a scanner that re-authenticates aggressively triggers anti-credential-stuffing defences (CAPTCHAs, account locks, security alerts to the operations team).

Single-session model

The scanner authenticates once, holds the session for the scan duration, and refreshes only when the server invalidates the session. Re-authenticating on every batch doubles effective load and shifts the rate-limit response from the application surface to the auth endpoint, which is the worst signal because it triggers credential-protection defences.

Per-account caps

Many applications apply per-user quotas (read rate, write rate, search rate) that are tighter than the per-IP budget. The scanner that runs at the per-IP budget while inside one account often hits the per-account cap on the first quota-bearing endpoint. Lower the rate to under the per-account cap, or use multiple accounts with worker separation per account.

CSRF token rotation

Some applications rotate CSRF tokens on every response. A scanner that does not track the rotation re-uses stale tokens and gets a series of 403 responses that look like authorisation failures and are actually CSRF token mismatches. The page on authenticated scanner failure modes covers the six failure classes that account for most authenticated coverage gaps.

Credential rotation cadence

Stored credentials should be rotated on a documented interval. A scan that runs against a stale credential is a coverage failure dressed up as a successful run. The credential is part of the rate budget; refreshing tokens on every batch doubles the effective auth-endpoint load.

API rate management with published headers

APIs typically publish explicit rate limits in headers and apply them per token, per origin, per endpoint, or per user. The headers the scanner can read change the rate management model from feedback-driven (wait for 429) to budget-driven (track the remaining quota and stay inside it).4,6

HeaderMeaning
X-RateLimit-LimitTotal requests allowed in the current window per the documented policy.
X-RateLimit-RemainingRequests left before the policy fires. The scanner should pace requests to keep this positive.
X-RateLimit-ResetTime when the window resets, either Unix timestamp or seconds-until.
RateLimit-Limit / RateLimit-Remaining / RateLimit-ResetIETF draft standardised forms (without the X- prefix). Scanners should accept both.
Retry-AfterWhen returned with 429, indicates the wait before retrying. Scanners must honour it.
  • Target 60 to 80 percent of the published cap so the scan leaves headroom for legitimate user traffic from the same token or origin.
  • Quota-bearing endpoints (billing, search, write-heavy) usually warrant lower rates than read-only endpoints; the published cap is per the strictest endpoint policy.
  • Per-token budgets and per-IP budgets are different; scans run under one or the other depending on auth model. Plan around whichever bound applies first.
  • Record the published policy alongside the scan output so the audit trail can defend the rate the scan operated under.
  • If the API publishes no rate-limit headers, the scanner falls back to feedback-driven adaptation through 429 and Retry-After.

Concurrency separately from per-worker rate

Concurrency is the parallel-worker count, separate from per-worker rate. Most rate-related scan failures come from concurrency rather than from raw RPS. A scan at 5 RPS across 1 worker behaves differently from a scan at 1 RPS across 5 workers even though aggregate RPS is identical: the parallel scan is more likely to trigger per-IP burst rules, exhaust short-lived connection pools, and saturate origin-side worker queues.

Same aggregate, different load

5 RPS x 1 worker arrives as a sequential stream the origin can serialise. 1 RPS x 5 workers arrives as five parallel streams that all hit middleware, connection pools, and per-request initialisation simultaneously. The parallel form usually produces more inconclusive responses for the same aggregate rate.

Connection pool exhaustion

Targets backed by single-instance databases, single-Redis caches, or single-pool upstream services run out of connection slots faster under parallel scanning than under sequential. The first scanner with persistent connections holding a slot blocks the next worker even at low aggregate RPS.

Burst-rule sensitivity

Burst-detection rules at WAFs and CDNs trigger on requests-per-second per source, which counts arrival rate rather than completion rate. Five workers each sending one request per second from the same source IP hit a burst rule that one worker sending five requests per second often does not.

Document concurrency next to RPS

Document concurrency in the scan policy alongside per-worker RPS. A policy that states "5 RPS per worker, 4 workers maximum, ramp from 1 worker" is auditable; a policy that states "20 RPS aggregate" hides the parameter that actually drives the failure rate.

Off-hours scanning versus business hours

Off-hours scanning reduces the operational cost of a rate-related incident. A scan that strains an internal database is less likely to disrupt customers at 02:00 than at 14:00. The trade-off is detection latency for production-impacting changes and the operational cost of running infrastructure (pre-production fixtures, scanner workers, on-call review) outside business hours. The defensible split is per asset criticality rather than as a global policy.

  • Customer-facing production assets: schedule recurring scans off-hours, run on-change scans whenever the change happens with the rate calibrated for the time of day. Off-hours rate can be higher than business-hours rate for the same asset.
  • Internal pre-production assets: business-hours scanning is fine; the rate question reverts to the asset budget without the customer-impact overlay.
  • Staging environments with synthetic data: continuous scanning is acceptable; rate can be high since the data is replaceable and the impact stays inside the test boundary.
  • Coordinate windows with the asset owner: the asset owner has knowledge of business-cycle peaks (month-end batch, quarterly close, marketing campaigns) that the scan operator does not. Surface the schedule to the asset owner before it operates.

The cadence question is upstream of the rate question. The page on scan scheduling and baseline cadence covers how the schedule shapes the rate decision, including how to read the diff between two scan baselines so a coverage drop from rate-limit truncation is not misread as remediation.

Six failure modes that look like accurate scans

Rate-related failures rarely surface as obvious errors. They surface as scan output that looks plausible, passes through triage, and produces remediation evidence that does not survive audit. The six failure modes below recur across enterprise scanning programmes.

Silent module truncation

A module hits the rate limit halfway through, completes the early test cases, drops the late ones, and reports clean. The output looks like negative results for the dropped tests. The fix is to record the actual test count per module against the planned count in the scan log.

WAF challenge as application response

The scanner parses a JavaScript challenge page or an HTML CAPTCHA as the application response and runs every subsequent rule against the WAF, not the application. The output is a clean scan against an inert page. The fix is to fingerprint challenge pages and halt the scan with an error rather than continue.

Stale Retry-After loop

A scanner pauses for Retry-After, sends one request, gets 429 again, pauses again, and never recovers. The rate never adjusts. The fix is to halve the operating rate after every 429 (not just pause) and to abort the module after a bounded number of consecutive 429s.

Per-account quota collision

The authenticated scanner runs at the per-IP budget while inside a single user account. Quota-bearing endpoints (search, billing, bulk operations) hit the per-account cap on the first batch and return 429 for the remainder. The output looks like quota-bearing endpoints are unreliable rather than like the scan exhausted the account quota.

Burst-rule trip on scan start

The scanner opens at the target rate, the CDN burst rule fires in the first fifteen seconds, and the source IP gets blocked for the next fifteen minutes. The scan retries during the block window and produces a sequence of timeouts. The fix is the ramp profile, not the retry interval.

Coverage drop misread as remediation

A scan in cycle N+1 hits a new rate-limit threshold (because of WAF rule update, CDN switch, or back-end change) and silently truncates. The diff against cycle N shows missing findings that read as fixed in the diff and are actually a coverage drop. The fix is to pair the diff with the scan-coverage record so the fix count is anchored to the surface that actually ran in each cycle.

How compliance frameworks read rate-limited scanning

Auditors do not read rate as a separate evidence axis but they do read it through three indirect lenses. Programmes that operate rate as a documented decision pass the indirect read; programmes that operate rate as a tester preference fail when the audit asks why the scan record shows partial coverage.

FrameworkHow rate-related evidence reads
PCI DSS v4.0 Requirement 11.3Vulnerability scans, not denial-of-service tests. A scan log that crashed the target reads as out-of-scope testing. Quarterly cadence requires scans that completed coverage, not scans that were rate-limited into truncation.7
ISO 27001:2022 Annex A 8.8Detection that operates without disrupting the systems being detected. Rate decisions are operational evidence the technical vulnerability management process is mature.10
SOC 2 CC7.1Ongoing detection across the audit observation period. A scan log with sustained 429s without adaptation reads as detection that did not actually operate at the cadence the policy claims.11
NIST SP 800-53 RA-5Vulnerability monitoring and scanning at a defined frequency. The control assumes the scans complete coverage; rate-limited truncation is a control-effectiveness finding.9
NIST SP 800-115Technical guide to information security testing. Rate decisions are part of the test plan that shapes coverage and reproducibility.8
CISA BOD 22-01Continuous monitoring against the Known Exploited Vulnerabilities catalogue. Rate-limited scans that miss KEV-listed exposure inside the bounded remediation window are a directive-level operating gap.13

The cross-cutting read is that rate decisions are documented operational evidence, not silent operator preferences. A scan record that shows the rate, the ramp, the feedback, and the adaptation reads as defensible scanning. A scan record that shows only the final findings list reads as scanning that may or may not have completed.

Operational checklist for a rate-respecting scan

At policy design

  • The scan policy names starting RPS and concurrency per asset class.
  • The policy names the back-off model for 429, 503, latency spikes, and connection resets.
  • The policy names the off-hours window for production-facing assets.
  • The policy names the WAF coordination model (allowlist, lower rate, internal source).

At schedule creation

  • The schedule names the asset, the scanner class, the rate, the concurrency, and the ramp profile.
  • Authenticated schedules name the credential rotation cadence and the per-account quota expectation.
  • API schedules name the published-rate-header policy or the feedback-only fallback.
  • The asset owner is notified of the schedule before it operates.

At scan execution

  • The ramp climbs from 10 percent to the operating rate over 60 to 120 seconds.
  • Back-off events are recorded with the trigger, the new rate, and the affected module.
  • WAF challenge pages are fingerprinted and halt the scan rather than parsing as application response.
  • Per-module test counts are recorded against the planned count so truncation is visible.

At scan completion

  • The scan record includes the operating rate, the back-off events, the inconclusive-response rate, and the per-module completion state.
  • Findings are paired with the scan-coverage record so absence is read as either fix or coverage drop, not silently as fix.
  • The diff against the previous scan separates new and fixed findings from coverage changes.
  • Asset owner notification of completion includes a rate summary so the next coordination conversation has context.

For internal security, AppSec, and vulnerability management teams

Internal teams carry the rate discipline between scans and between assets. The patterns that survive scanner-stack changes, vendor swaps, and asset growth are the same: per-asset starting rate, ramp into the operating rate, honour back-off feedback, record adaptations, and pair findings with scan coverage so the queue reads truthfully.

  • Set rate per asset class rather than as a single scanner default.
  • Ramp into the operating rate so the first feedback signal arrives before the scan saturates.
  • Halve the operating rate after sustained 429s, do not just pause and resume at the same rate.
  • Coordinate WAF allowlists at the source-IP level rather than disabling rules globally.
  • Hold authenticated scans to a single-session model so login-flow throttling does not multiply by worker count.
  • Read API rate-limit headers when published; fall back to feedback adaptation only when the API does not publish.
  • Document concurrency next to RPS so the parameter that actually drives failure rate is auditable.
  • Pair findings with scan-coverage records so the diff reads coverage drops as coverage drops, not as remediation.

For internal security teams, AppSec teams, vulnerability management teams, cloud security teams, and security engineering teams, the operating commitment is to keep rate decisions on the engagement record so the audit conversation reads from the same source as the operator conversation. The scanner result triage workflow and the vulnerability SLA management workflow both assume rate-respecting scans behind them; a scan that truncated under rate is a backlog item the SLA cannot honour even on paper.

How SecPortal handles scanner rate and throttling

SecPortal applies platform-level rate limits before any scan reaches the target. The rate-limit subsystem operates per workspace plan, per domain, and per tenant burst envelope. The platform does not bypass target-side rate limits, override Retry-After headers, or claim to test through WAFs that block the scan source.

Identifiable scanner traffic

External and authenticated scans send the SecPortal-Scanner User-Agent so asset owners and WAF operators can pattern-match and write narrow allowlist rules. The scanner information page covers the User-Agent string and the verifier identification used by the domain verification flow.

Plan-based scan limits

Scan frequency and concurrency are gated at the workspace plan tier, with per-domain ceilings and per-tenant burst protection. The external scanning feature covers the modules and the per-plan limits; continuous monitoring covers the schedule layer that sits on top.15,18

Authenticated scan rate model

Authenticated scans use AES-256-GCM encrypted credentials and respect a single-session model so login-flow rate limits are not multiplied by worker count. The authenticated scanning feature covers the credential lifecycle, and encrypted credential storage covers the storage and rotation model.16,20

Code scan runner concurrency

Code scans run inside the worker pool against repositories connected through GitHub, GitLab, or Bitbucket OAuth and operate per-repo runner concurrency rather than HTTP rate. The code scanning feature covers the runner integration and the SAST and SCA rule packs.17

Audit trail of scan executions

Every scan execution lands on the engagement record alongside the findings. The activity log feature records the user, timestamp, module list, and outcome so the audit trail shows which scans ran, which adapted to feedback, and which completed coverage. The 30, 90, or 365-day retention depends on the workspace plan.19

Domain verification and blocklist

Scans are gated by domain verification so a workspace can only scan assets it has proved control over. A platform-level blocklist prevents scans against government, military, critical-infrastructure, and cloud-provider management domains regardless of verification state. Both guardrails sit upstream of the rate decision so the rate question only applies to legitimately-scoped assets.

The rate-and-throttling discipline lives next to the engagement record the operational work lives on, rather than as a static configuration document. Findings management holds each finding with the producing tool, CVSS 3.1 vector, severity band, affected asset, and evidence trail so the rate-coverage pairing reads at the finding level during triage. The compliance tracking feature maps findings to ISO 27001, SOC 2, Cyber Essentials, PCI DSS, and NIST frameworks with CSV export so the audit-side read of cadence-and-rate evidence is one query against the same record.

Related scanner discipline

Rate operates inside a wider scanner discipline. The pages below cover the surrounding decisions that shape what the rate operates against and how the output gets read.

For the wider operating model, the security tool coverage overlap research covers the catalogue-level coverage matrix that rate decisions sit inside, and the continuous security monitoring guide covers the programme-level discipline that rate-respecting scans feed into.

Scope and limitations of this guide

Scanner rate is one parameter inside the scanning programme. No rate decision makes the underlying scanner capabilities reach further than they structurally can, and no rate budget replaces the manual testing the scanner cannot reach. The rate question is what request volume keeps the target healthy, the WAF passive, the response classifier accurate, and the finding queue truthful. The answer is per asset class, per scanner class, per feedback channel, and per coordination model with the asset owner.

Rate claims that depend on a single global default almost always understate the target-class variation. Rate claims that decompose into starting rate, ramp profile, back-off model, concurrency model, and audit trail of adaptations are the claims that survive both the asset owner conversation and the audit read.

Frequently Asked Questions

Sources

  1. IETF RFC 6585, Additional HTTP Status Codes (HTTP 429 Too Many Requests)
  2. IETF RFC 7231, HTTP/1.1 Semantics and Content (Retry-After header)
  3. IETF RFC 9110, HTTP Semantics
  4. IETF Draft, RateLimit Header Fields for HTTP
  5. OWASP, Web Security Testing Guide (WSTG) Rate Limiting Guidance
  6. OWASP, API Security Top 10 (API4 Unrestricted Resource Consumption)
  7. PCI Security Standards Council, PCI DSS v4.0 (Requirement 11.3 Vulnerability Scanning)
  8. NIST, SP 800-115 Technical Guide to Information Security Testing and Assessment
  9. NIST, SP 800-53 Rev. 5 (RA-5 Vulnerability Monitoring and Scanning)
  10. ISO/IEC, ISO 27001:2022 Annex A 8.8 Management of Technical Vulnerabilities
  11. AICPA, SOC 2 Trust Services Criteria CC7.1 Detection of Vulnerabilities
  12. NCSC, Vulnerability Management Guidance
  13. CISA, Binding Operational Directive 22-01: Reducing the Significant Risk of Known Exploited Vulnerabilities
  14. OWASP, Vulnerability Scanning Tools Guidance
  15. SecPortal, External Scanning Feature
  16. SecPortal, Authenticated Scanning Feature
  17. SecPortal, Code Scanning Feature
  18. SecPortal, Continuous Monitoring Feature
  19. SecPortal, Activity Log & Workspace Audit Trail
  20. SecPortal, Encrypted Credential Storage

Run scans the asset owner trusts and the auditor reads

SecPortal supports external, authenticated, and code scans with per-workspace rate limits, plan-based concurrency, identifiable scanner traffic, encrypted credentials for authenticated scans, and a full activity log of scan executions. The scan rate, the back-off events, and the coverage reach the audit record on the same engagement the findings live on.