EMQX is a distributed MQTT broker for IoT-scale pub/sub — millions of concurrent device connections. To reason about it you need MQTT itself: topics & wildcards, the three QoS levels, retained messages, the will, sessions, plus EMQX's clustering, rule engine, and auth/ACL. Senior interviews probe QoS semantics, session state, and how it scales connections.
1. MQTT in one screen
| Concept | What |
|---|---|
| Broker | Central server routing messages between clients (EMQX). Clients never talk directly. |
| Client | Any device/app that connects (publisher and/or subscriber — same connection can do both). |
| Topic | Hierarchical string, /-separated: sensors/floor1/temp. No pre-declaration — publish/subscribe create routing on the fly. |
| Publish / Subscribe | Publishers send to a topic; subscribers get messages for topics they match. Fully decoupled. |
| QoS | Per-message delivery guarantee (0/1/2). |
| Retained message | Broker keeps the last message on a topic; new subscribers get it immediately. |
| Will (LWT) | Last Will & Testament — broker publishes a preset message if the client drops unexpectedly. |
| Keepalive | Client pings within an interval; broker declares it dead if silent (1.5× keepalive). |
MQTT runs over TCP (1883) or TLS (8883); EMQX also speaks MQTT-over-WebSocket and MQTT 5.0.
2. Topic wildcards
| Wildcard | Matches |
|---|---|
+ (single level) | sensors/+/temp → sensors/floor1/temp, sensors/floor2/temp (one level only). |
# (multi level) | sensors/# → everything under sensors/ (must be last). |
$SYS/... | Broker internal stats/metrics topics. |
Publishers must use concrete topics (no wildcards); only subscribers use +/#.
3. QoS levels (the core question)
| QoS | Guarantee | Handshake |
|---|---|---|
| 0 — at most once | Fire-and-forget; may be lost | PUBLISH (no ack) |
| 1 — at least once | Guaranteed delivery, may duplicate | PUBLISH → PUBACK |
| 2 — exactly once | Delivered once, no dupes | PUBLISH → PUBREC → PUBREL → PUBCOMP (4-way) |
Effective QoS = min(publish QoS, subscribe QoS) on the broker→subscriber leg. QoS 2 is expensive (4-way handshake + state) — most IoT uses QoS 1 with idempotent handlers.
QoS is per hop, not end-to-end
QoS is negotiated client↔broker on each leg separately. A QoS 2 publish to a QoS 0 subscriber is
delivered at QoS 0. There's no end-to-end exactly-once across two clients.
4. Sessions & clean start
- Clean session (MQTT 3) / clean start + session expiry (MQTT 5): if clean, the broker discards subscriptions + queued messages on disconnect; if persistent, it keeps them and re-delivers on reconnect.
- A persistent session with QoS 1/2 queues messages for an offline client (bounded by config) and replays on reconnect — key for flaky IoT links.
- Client ID identifies a session; reconnecting with the same ID resumes it. Duplicate IDs kick the older connection.
5. MQTT 5.0 additions
- Reason codes on acks (why a publish/subscribe failed).
- Session/message expiry intervals; topic aliases (shrink repeated long topics).
- User properties (custom headers), response topic + correlation data (request/response).
- Shared subscriptions
$share/group/topic— load-balance a topic across a group of subscribers (like a consumer group). - Flow control (receive maximum), will delay.
6. EMQX architecture & features
- Clustering — nodes form a cluster (auto-discovery: static, DNS, k8s, etcd); the routing table is shared so a publish on any node reaches subscribers on any node.
- Scale — built on Erlang/BEAM (lightweight processes per connection) → millions of concurrent MQTT connections per cluster.
- Rule Engine — SQL-like rules on incoming messages → actions: republish, bridge to Kafka/MQTT/HTTP, write to a DB, etc. (no app code).
- Data integration / bridges — Kafka, Pulsar, databases, HTTP, other MQTT brokers.
- Auth & ACL — authn (username/password, JWT, X.509 client certs, PSK) + authz (per-topic publish/subscribe ACLs) backed by built-in DB, MySQL/Postgres, Redis, HTTP, LDAP.
- Dashboard + REST API for clients, subscriptions, metrics; observability via Prometheus.
7. Operating & diagnosing
emqx ctl status # node status emqx ctl cluster status # cluster membership emqx ctl clients list # connected clients emqx ctl subscriptions list # active subscriptions emqx ctl topics list # routing table emqx ctl listeners # tcp/ssl/ws listeners # $SYS/brokers/+/metrics topics expose live stats; scrape Prometheus endpoint
Watch: connection count vs OS file-descriptor/erlang process limits, message rate, dropped messages (slow/offline subscribers, queue full), retained-message store size, ACL latency (external auth backend on the hot path).
8. MQTT vs Kafka vs AMQP
| MQTT / EMQX | Kafka | RabbitMQ (AMQP) | |
|---|---|---|---|
| Best for | Massive device fan-in/out, IoT, low bandwidth | High-throughput durable event streams, replay | Flexible routing, task queues |
| Model | Lightweight pub/sub topics | Partitioned commit log | Exchanges → queues |
| Retention | Last-value (retained) / queued for session | Long, replayable | Until consumed/acked |
| Footprint | Tiny client, 2-byte header | Heavier client | Medium |
9. Senior interview Q&A
- Explain the three QoS levels.0 at-most-once (no ack), 1 at-least-once (PUBACK, may dup), 2 exactly-once (4-way PUBREC/PUBREL/PUBCOMP). Effective QoS = min(pub, sub) per hop.
- Is QoS 2 end-to-end exactly-once?No — QoS is per client↔broker leg. Across two clients there's no global exactly-once; design idempotent consumers.
- + vs # wildcard?+ matches exactly one level; # matches all remaining levels and must be last. Only subscribers use wildcards.
- Retained message vs will?Retained = broker keeps last message on a topic for future subscribers. Will (LWT) = a message the broker publishes if a client disconnects unexpectedly.
- Clean vs persistent session?Clean discards subs + queued messages on disconnect. Persistent keeps them and replays QoS1/2 messages on reconnect (key for flaky IoT).
- How does EMQX scale to millions of connections?Erlang/BEAM lightweight processes per connection + clustering with a shared routing table; horizontal scale across nodes.
- How do you load-balance a topic across workers?MQTT 5 shared subscriptions:
$share/group/topic— messages are distributed across the group (like a consumer group). - How does the rule engine help?SQL on messages → actions (bridge to Kafka/DB/HTTP, republish, filter) without app code — does ETL/routing at the broker.
- How is auth/ACL done?authn (user/pass, JWT, X.509, PSK) + per-topic publish/subscribe authz, backed by built-in DB, SQL, Redis, HTTP, or LDAP. Keep the backend fast — it's on the connect/publish path.
- When MQTT over Kafka?MQTT for huge numbers of constrained devices, low bandwidth, intermittent links; bridge MQTT → Kafka when you then need durable, replayable, high-throughput stream processing.
- What does keepalive do?Client sends PINGREQ within the interval; broker marks it dead (and fires the will) if silent past ~1.5× keepalive.
- Why might messages be dropped?Slow/offline subscriber with a full queue, QoS 0 (no guarantee), session expired, or hitting connection/process/FD limits.