AI Risk Scanner

An OpenWatch Project

Methodology

Our scanning process uses publicly available data sources exclusively. No access to your internal systems is required or attempted. Every finding in your report traces back to data that is already publicly accessible.

Data Collection

Each scan tier runs a specific set of collectors. All collectors operate over public protocols (DNS, HTTPS, public APIs). The tier you select determines which collectors run and which sections appear in your report.

DNS & Certificate Analysis

Queries public DNS records (A, AAAA, MX, NS, TXT, CNAME, SOA) and Certificate Transparency logs to map subdomains, mail servers, and certificate issuance patterns. Checks SPF, DMARC, and DKIM configurations for email security posture.

Data sources: Public DNS resolvers, crt.sh Certificate Transparency database.

Limitations: Only discovers subdomains present in CT logs or public DNS. Internal-only subdomains, split-horizon DNS, and private zones are not visible.

Security Header Audit

Sends HTTPS requests to your domain and inspects response headers against security best practices: HSTS, CSP, X-Frame-Options, X-Content-Type-Options, Referrer-Policy, and Permissions-Policy. Also checks TLS certificate validity, protocol version, CORS configuration, and cookie security flags.

Data sources: Direct HTTPS requests to the target domain. TLS certificate inspection via standard handshake.

Limitations: Only inspects the primary domain endpoint. Subdomains and non-standard ports are not individually audited at this level.

GitHub Exposure Scan

Searches public GitHub repositories and gists for references to your domain, looking for accidentally committed credentials, configuration files, API keys, or internal documentation. Also detects references to AI/ML frameworks (OpenAI, Anthropic, LangChain, vector databases) associated with your domain.

Data sources: GitHub Search API (public repositories only).

Limitations: Limited to public repos. Private repositories are not accessible. Results are keyword-based search matches and may include false positives (e.g., a variable named password in documentation). Findings should be manually verified.

Breach Database Lookup

Checks known breach databases for records associated with your domain. Reports breach names, affected account counts, exposed data types, and breach dates.

Data sources: HaveIBeenPwned API.

Limitations: Coverage depends on publicly disclosed breaches. Recent or undisclosed breaches will not appear. The presence of a breach record does not confirm current credential exposure — only that the domain appeared in a past breach dataset.

Web Technology Fingerprint

Identifies web technologies, frameworks, and services running on your public-facing infrastructure by analyzing HTTP responses, robots.txt, sitemap.xml, and page content. Probes a curated set of sensitive paths (/.git/HEAD, /.env, /swagger.json, /graphql, etc.) to detect exposed development artifacts and API documentation.

Data sources: HTTP response analysis, public page content, standard path probing.

Limitations: Uses soft-404 detection to reduce false positives, but custom error pages may still cause misidentification. Path probing covers common patterns but is not exhaustive.

Technology-to-CVE Cross-Reference

Correlates detected technologies and their disclosed versions against known CVE databases via threat intelligence feeds. This is a cross-reference, not a vulnerability scan — it indicates potential exposure based on detected software versions, not confirmed exploitability.

Data sources: AlienVault OTX threat intelligence API, technology version strings from HTTP headers and service banners.

Limitations: Version detection relies on publicly disclosed version strings in HTTP headers or banners, which may be suppressed, spoofed, or outdated. A CVE match does not confirm the vulnerability is exploitable in your specific configuration.

Threat Intelligence Feed Correlation

Checks your domain and associated IP addresses against public threat intelligence feeds for indicators of compromise, malicious activity reports, and reputation data.

Data sources: AlienVault OTX API.

Limitations: Correlation indicates that your domain or IP appeared in a threat intelligence pulse. This does not confirm active compromise — it may reflect historical activity, shared hosting, or CDN IP reuse.

Network Exposure Scan (Standard & Deep tiers)

Queries passive network scanning databases for open ports, running services, operating system fingerprints, and known vulnerabilities associated with your public IP addresses. Detects exposed databases (MySQL, PostgreSQL, Redis, MongoDB), remote access services (SSH, VNC), and container infrastructure (Docker API).

Data sources: Shodan API (passive scan data — no active probing is performed by our service).

Limitations: Data reflects Shodan's most recent scan, which may be hours to weeks old. CDN-fronted services are detected and excluded to avoid false positives. Services on non-standard ports may not appear in passive scan databases.

AI Service Exposure Detection (Standard & Deep tiers)

Identifies publicly exposed AI/ML infrastructure by checking for known AI service ports (Jupyter notebooks, vector databases, model serving endpoints, Gradio/Streamlit interfaces) and detecting client-side AI API calls in your web application's JavaScript.

Limitations: Detection is based on known default ports and service signatures. Custom ports or authenticated-only services may not be detected. Client-side detection inspects visible page source only.

AI Adoption Footprint (Deep tier only)

Analyzes public job postings to infer AI/ML technology adoption, including specific frameworks, self-hosted LLM infrastructure, and shadow AI tool usage. Reports AI hiring intensity and technology stack signals.

Data sources: Public job boards via search API.

Limitations: Inferences are based on keyword matching in publicly available job posting snippets, not confirmed deployments. Job postings may reflect aspirational hiring, not current infrastructure. Results should be treated as directional signals, not ground truth.

Report Generation

Collected data is analyzed by an AI model (Claude by Anthropic) which generates a narrative threat assessment. The AI is instructed to cite specific evidence from the scan data, apply a consistent risk-rating rubric, and explicitly flag gaps where collectors failed or returned no data.

The AI is prohibited from fabricating findings or inflating risk language beyond what the data supports. Every claim in the report must trace back to a specific collector finding.

Important: AI-generated analysis may still contain inaccuracies or misinterpretations. Findings should be verified by qualified security professionals before acting on them. If you believe your report contains errors, contact support [at] scalingsuccess (.) io for review.

What This Is Not

This tool provides a surface-level overview of your organization's public exposure from the perspective of an external observer. For comprehensive security assessment, engage a qualified security firm for a full penetration test and vulnerability assessment.