HackWatch

Cyber Hub Architecture

HackWatch CTI Platform Blueprint

This page turns the internal architecture brief into an implementation-ready plan: legal-safe source strategy, multi-layer data pipeline, canonical entity model, confidence scoring and response-first product surfaces for defenders.

Data lake + entity resolutionSource contracts + license safetyIOC + exposure + exploit contextSEO-ready CTI content model

Why this architecture exists

HackWatch should be built as an original CTI platform, not a reproduction of one public cyber map. The target model is multi-source, license-aware and entity-driven, so we can keep legal safety while still delivering credible real-time threat intelligence and recovery guidance.

The product outcome is practical: one system that links phishing, malware, vulnerabilities, ransomware and exposure telemetry into actionable user journeys.

Source layers

IP and reputation layer

Core sources: AbuseIPDB, CrowdSec CTI, GreyNoise, Spamhaus (SIA, DROP, DROPv6, ASN-DROP)

This layer powers confidence-weighted IP reputation, noisy scanner suppression and ASN-level context for abuse-driven triage.

Malware, IOC and URL layer

Core sources: URLhaus, ThreatFox, MalwareBazaar, SSLBL, Feodo Tracker

Use this layer for fresh IOC ingestion, malware family clustering, cert and JA3 context and rapid five-minute feed refreshes.

Phishing and brand abuse layer

Core sources: OpenPhish Community, Premium and Database

Use brand, ASN, country, language and SSL metadata to build phishing maps, sector dashboards and brand-risk rankings.

Vulnerability and exploitation layer

Core sources: CISA KEV, NVD, EPSS, CIRCL Vulnerability-Lookup, optional VulnCheck Community

Combine exploit evidence and probability signals to prioritize patching and map CVEs to response urgency.

Internet exposure and attack surface layer

Core sources: Shodan, Censys, Netlas

Track internet-facing assets and service exposure separately from IOC feeds to avoid mixing exposure with confirmed malicious activity.

Ransomware and leak-site context layer

Core sources: Ransomware.live (+ PRO when used commercially), Tor exit list as context only

This layer provides campaign context, leak-site telemetry and victim-side reporting patterns for incident workflows.

Target platform architecture

The system below separates raw ingestion, normalization, analytics and serving so we avoid one overloaded database trying to do everything at once.

[CTI APIs and feeds]
      |
      +-- Pull connectors (5m / 15m / hourly / daily)
      +-- Stream connectors (real-time)
      +-- Premium/manual enrichment jobs
      |
[Ingestion bus: Kafka or Redpanda]
      |
[Bronze / Raw zone: S3 or MinIO]
      |  immutable payloads + source metadata + license metadata
      |
[Normalization and entity resolution]
      |  canonical IOC model, dedup, correlation, scoring
      |
[Silver / Core layer]
      +-- ClickHouse (time-series analytics)
      +-- PostgreSQL (entities, users, contracts, watchlists)
      +-- OpenSearch (full-text search)
      +-- Neo4j (graph pivoting)
      |
[Gold / Serving layer]
      +-- REST and GraphQL API
      +-- 1h / 24h / 7d / 30d aggregates
      +-- alerting and watchlists
      +-- vector tiles for map layers
      |
[Frontend product surfaces]
      +-- Global map
      +-- IOC explorer
      +-- Vulnerability board
      +-- Phishing board
      +-- Ransomware board
      +-- Exposure board

Canonical data model

Every ingestion source should map into shared objects so relationships stay queryable and provenance remains intact.

source_catalog

raw_record

observable

entity

sighting

relation

vulnerability

victim_event

source_contract

Confidence scoring

Use a two-stage model: source confidence first, then entity confidence across corroboration and exploit context.

entity_confidence =
  0.30 * source_weight +
  0.25 * corroboration +
  0.20 * recency +
  0.15 * specificity +
  0.10 * exploit_context

Boost exploit context when a CVE appears in KEV with strong EPSS signal. Boost specificity for hash, cert and JA3 matches. Down-rank broad indicators such as generic ASN-only context.

Compliance by design

`source_contract` must govern what can be public, what is internal-only, allowed cache windows and commercial redistribution constraints. This keeps HackWatch legally safe while scaling feeds from community and premium providers.

Implementation roadmap

Phase 1

Core ingestion and legal-safe foundations

Start with URLhaus, ThreatFox, MalwareBazaar, SSLBL, OpenPhish, AbuseIPDB, CISA KEV, NVD and EPSS. Ship source contracts and immutable raw zone first.

Phase 2

Entity resolution and confidence scoring

Deploy canonical IOC normalization, source-preserving deduplication and confidence scoring built from corroboration, recency and exploit context.

Phase 3

Premium enrichment and map intelligence

Add Spamhaus SIA, GreyNoise, CrowdSec CTI, Shodan, Censys, Netlas, VirusTotal and Ransomware.live PRO for deeper attribution context.

Phase 4

Response-first user products

Expose map layers, IOC explorer and exploit board with direct links into recovery playbooks, tool pages and high-intent incident workflows.

FAQ

Why does this blueprint avoid cloning a third-party cyber map?

Because public cyber maps often have strict usage terms, including non-commercial limits and anti-scraping clauses. HackWatch should build an original CTI stack from licensed feeds and APIs instead.

What is the key difference between IOC feeds and exposure feeds?

IOC feeds describe known malicious indicators, while exposure feeds describe publicly reachable services. Treating them separately avoids false attribution and improves analyst trust.

Why is source_contract a mandatory object in the data model?

It enforces license-aware handling: what can be shown publicly, cache duration, redistribution constraints and commercial use permissions.

How does this architecture help SEO and Discover?

It creates stronger entity-driven pages, more credible documented updates and cleaner topic ownership across phishing, vulnerabilities, ransomware and recovery workflows.