Heptaconn
Deterministic inference gateway for LLM servers (e.g., llama.cpp). Bounded inflight calls, receipts, and explicit backpressure for local inference compute.
Implementation: Go. Built for self-hosted inference environments.
Overview
Heptaconn sits between clients and inference backends. It governs inference compute (not just HTTP): deterministic admission, bounded concurrency, and receipts describing outcomes (accepted, queued, throttled, timed out).
Direct-to-backend inference under load often leads to unbounded concurrency, retry amplification, timeouts, and inconsistent accounting.
Keep inference compute stable. When things fail, fail clearly (overload vs malformed vs backend failure) and keep records you can aggregate.
Works well with telemetry sinks: receipts + call records become an audit trail instead of a pile of logs.
Core mechanics
A hard cap on concurrent inference work. Queueing exists, but it is intentional and measurable.
- max inflight
- queue policy (explicit)
- timeouts and cancellation semantics
Each request has a lifecycle record: accepted → running → completed/failed/cancelled.
- request_id
- timestamps and durations
- exit status + error taxonomy
Backpressure is a first-class response, not an accident.
- overload responses are early
- retry hints are explicit
- limits are enforced, not suggested
If a client cannot determine the outcome from the receipt, the interface is not specific enough.
Quick demo
Heptamini ships two binaries:
- heptaconnd — gateway daemon (server)
- heptaconnc — client for sending test requests
Start the gateway daemon.
heptaconnd --listen 0.0.0.0:7777
Send a small payload through the gateway.
printf "hi" > /tmp/p.bin
heptaconnc --addr 127.0.0.1:7777 --file /tmp/p.bin --chunk 8
A deterministic receipt: accepted/queued/throttled plus timing. Under load, inflight stays bounded and backpressure becomes explicit.
Flags and defaults may change. The intent is to keep limits explicit and receipts versioned.
Web demo
Heptaconn can be driven from a browser console while inference stays on your machine. The browser talks to the gateway; a local Heptamini node connects outbound and forwards requests to your local backend (e.g., llama.cpp).
browser → heptaconn (cloud) → heptamini (local) → llama.cpp (local)
↑
outbound link
Heptaconn does not host a model in this setup.
- Open /demo/heptaconn-web.html and generate a pairing code.
- Run Heptamini locally and connect using the pairing code.
- Send requests from the web console. Receipts are shown for each request.
Demo limits may apply (short sessions, bounded inflight).
- receipt (always): accepted/queued/throttled + timing
- optional response preview: truncated (e.g., 256 chars)
Use receipts to verify backpressure and stability under load.
Installation
Point clients to Heptaconn; configure backends and concurrency limits.
# Typical shape (conceptual)
# client → heptaconn → llama.cpp (or other local backend)
# Configure:
# - backend targets
# - max inflight
# - timeouts
# - receipt format/version
Heptamini is the free binary distribution used for evaluation and testing.
Pricing
Heptaconn is distributed privately while the feature boundary stabilizes. Pricing is a placeholder.
Public binaries for testing the gateway and client tools.
Multi-tenant limits, richer routing, extended receipts.
Support contract and custom deployment constraints.
Contact
Questions: nsc@newssourcecrawler.com