IP and reputation layer
Core sources: AbuseIPDB, CrowdSec CTI, GreyNoise, Spamhaus (SIA, DROP, DROPv6, ASN-DROP)
This layer powers confidence-weighted IP reputation, noisy scanner suppression and ASN-level context for abuse-driven triage.
Cyber Hub Architecture
This page turns the internal architecture brief into an implementation-ready plan: legal-safe source strategy, multi-layer data pipeline, canonical entity model, confidence scoring and response-first product surfaces for defenders.
HackWatch should be built as an original CTI platform, not a reproduction of one public cyber map. The target model is multi-source, license-aware and entity-driven, so we can keep legal safety while still delivering credible real-time threat intelligence and recovery guidance.
The product outcome is practical: one system that links phishing, malware, vulnerabilities, ransomware and exposure telemetry into actionable user journeys.
Core sources: AbuseIPDB, CrowdSec CTI, GreyNoise, Spamhaus (SIA, DROP, DROPv6, ASN-DROP)
This layer powers confidence-weighted IP reputation, noisy scanner suppression and ASN-level context for abuse-driven triage.
Core sources: URLhaus, ThreatFox, MalwareBazaar, SSLBL, Feodo Tracker
Use this layer for fresh IOC ingestion, malware family clustering, cert and JA3 context and rapid five-minute feed refreshes.
Core sources: OpenPhish Community, Premium and Database
Use brand, ASN, country, language and SSL metadata to build phishing maps, sector dashboards and brand-risk rankings.
Core sources: CISA KEV, NVD, EPSS, CIRCL Vulnerability-Lookup, optional VulnCheck Community
Combine exploit evidence and probability signals to prioritize patching and map CVEs to response urgency.
Core sources: Shodan, Censys, Netlas
Track internet-facing assets and service exposure separately from IOC feeds to avoid mixing exposure with confirmed malicious activity.
Core sources: Ransomware.live (+ PRO when used commercially), Tor exit list as context only
This layer provides campaign context, leak-site telemetry and victim-side reporting patterns for incident workflows.
The system below separates raw ingestion, normalization, analytics and serving so we avoid one overloaded database trying to do everything at once.
[CTI APIs and feeds]
|
+-- Pull connectors (5m / 15m / hourly / daily)
+-- Stream connectors (real-time)
+-- Premium/manual enrichment jobs
|
[Ingestion bus: Kafka or Redpanda]
|
[Bronze / Raw zone: S3 or MinIO]
| immutable payloads + source metadata + license metadata
|
[Normalization and entity resolution]
| canonical IOC model, dedup, correlation, scoring
|
[Silver / Core layer]
+-- ClickHouse (time-series analytics)
+-- PostgreSQL (entities, users, contracts, watchlists)
+-- OpenSearch (full-text search)
+-- Neo4j (graph pivoting)
|
[Gold / Serving layer]
+-- REST and GraphQL API
+-- 1h / 24h / 7d / 30d aggregates
+-- alerting and watchlists
+-- vector tiles for map layers
|
[Frontend product surfaces]
+-- Global map
+-- IOC explorer
+-- Vulnerability board
+-- Phishing board
+-- Ransomware board
+-- Exposure boardEvery ingestion source should map into shared objects so relationships stay queryable and provenance remains intact.
Use a two-stage model: source confidence first, then entity confidence across corroboration and exploit context.
entity_confidence = 0.30 * source_weight + 0.25 * corroboration + 0.20 * recency + 0.15 * specificity + 0.10 * exploit_context
Boost exploit context when a CVE appears in KEV with strong EPSS signal. Boost specificity for hash, cert and JA3 matches. Down-rank broad indicators such as generic ASN-only context.
`source_contract` must govern what can be public, what is internal-only, allowed cache windows and commercial redistribution constraints. This keeps HackWatch legally safe while scaling feeds from community and premium providers.
Phase 1
Start with URLhaus, ThreatFox, MalwareBazaar, SSLBL, OpenPhish, AbuseIPDB, CISA KEV, NVD and EPSS. Ship source contracts and immutable raw zone first.
Phase 2
Deploy canonical IOC normalization, source-preserving deduplication and confidence scoring built from corroboration, recency and exploit context.
Phase 3
Add Spamhaus SIA, GreyNoise, CrowdSec CTI, Shodan, Censys, Netlas, VirusTotal and Ransomware.live PRO for deeper attribution context.
Phase 4
Expose map layers, IOC explorer and exploit board with direct links into recovery playbooks, tool pages and high-intent incident workflows.
Because public cyber maps often have strict usage terms, including non-commercial limits and anti-scraping clauses. HackWatch should build an original CTI stack from licensed feeds and APIs instead.
IOC feeds describe known malicious indicators, while exposure feeds describe publicly reachable services. Treating them separately avoids false attribution and improves analyst trust.
It enforces license-aware handling: what can be shown publicly, cache duration, redistribution constraints and commercial use permissions.
It creates stronger entity-driven pages, more credible documented updates and cleaner topic ownership across phishing, vulnerabilities, ransomware and recovery workflows.