Private inference for the Ughoron stack

One API. Many nodes. Smart routing.

Ughoron Vertex is the private model-serving layer of the Ughoron cloud. Deploy lightweight models like Gemma across your Dockploy fleet and expose them through a single, load-aware endpoint — without the per-call markup of a public AI provider.

Built for the Ughoron ecosystem
vertex.ugharon.cloud
curl https://api.vertex.ugharon.cloud/v1/chat/completions \
  -H "Authorization: Bearer $UGV_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma-2-9b",
    "messages": [
      { "role": "user", "content": "Summarise this PR diff in 3 bullets." }
    ]
  }'

Infrastructure that scales with revenue, not with hype.

Vertex is engineered for lean, paid AI features. Nodes are cheap to run, easy to add, and managed entirely from a single dashboard.

Load-aware routing

Every request is steered to the least-loaded healthy node in real time. A node going down is a graph blip, not an outage.

One-click node provisioning

Spin up new Dockploy nodes from the admin panel. The platform pulls the runner image, joins the routing pool, and starts serving.

Model library

Gemma 2, Phi-3, Qwen — pre-staged in object storage. Deploy any model to any node from a single dropdown.

Drop-in API

OpenAI-compatible chat completions. Point your existing SDK at the Vertex endpoint and you're done.

Live observability

RPS, latency and queue depth per node. Logs stream into the dashboard with sub-second freshness.

Ughoron Account SSO

The same account that signs into ugharon.cloud signs into Vertex. No new identity to manage.

The model catalogue.

Lean, open-weight models pre-staged in Vertex storage. Pick one per endpoint, or load several across the fleet.

available

gemma-2-2b

Gemma 2

Lean general-purpose chat model. Great for high-throughput, low-latency endpoints.

2B params8K ctx
available

gemma-2-9b

Gemma 2

Mid-size model for higher quality reasoning and structured generation.

9B params8K ctx
available

phi-3-mini

Phi-3

Compact instruction-tuned model from Microsoft. Strong on code.

3.8B params4K ctx
downloading

qwen-2.5-7b

Qwen 2.5

Long-context generalist. Currently downloading to model storage.

7B params32K ctx

Pricing built for the Ughoron flywheel.

Infrastructure cost scales with revenue. We spin up new nodes as paid traffic grows, so the platform stays sustainable on day one.

Hobby

Free

For prototyping and side projects on the Ughoron network.

  • 100K tokens / month included
  • Shared routing pool
  • Community Discord
  • Gemma 2 (2B) only
Start free

Builder

Most popular

$29/mo

For production apps with steady traffic.

  • 5M tokens / month included
  • Then $0.20 / 1M tokens
  • Access to all open-weight models
  • Usage dashboard + alerts
  • Email support
Start builder plan

Scale

Custom

Dedicated nodes for your traffic with an SLA.

  • Reserved Dockploy capacity
  • Pinned model loadouts
  • 99.9% uptime SLA
  • Private Slack channel
Talk to us