Reference
Integration reference
Version 2026-04-01 · Example base URL https://infer.your-tenant.backend-dev.example (your deployment will differ).
Overview
After we install your model stack, your application talks to an OpenAI-compatible or plain JSON inference layer (we agree which during the project). This reference describes the control plane we often leave in place: keys, usage headers, batch evaluation jobs, and artefact pins (adapters, indexes).
All timestamps are RFC 3339 in UTC. Request bodies are UTF-8 JSON unless noted.
Authentication
Send the deployment secret in the Authorization header. Rotate keys from the operator console we hand over.
Authorization: Bearer bd_infer_************************
Read-only analytics tokens use the bd_read_ prefix and cannot enqueue jobs or change artefacts.
Errors
Errors return JSON with a stable code, human message, and optional details.
{
"error": {
"code": "context_length_exceeded",
"message": "prompt + max_tokens exceeds model window",
"request_id": "req_8f2a9c1d4e",
"details": { "model": "bd-custom-7b", "max_input_tokens": 8192 }
}
}
| Status | Meaning |
|---|---|
400 | Validation—bad JSON, missing field, policy block |
401 | Missing or invalid token |
403 | Token cannot perform this action |
404 | Model or artefact ID unknown |
429 | Quota or concurrency limit; Retry-After set |
503 | GPU pool warming or overload—retry with backoff |
Inference
OpenAI-style chat payload. model is the logical name we registered for your tuned weights.
curl -s https://infer.your-tenant…/v1/chat/completions \
-H "Authorization: Bearer bd_infer_***" \
-H "Content-Type: application/json" \
-d '{
"model": "bd-support-7b-2026-03",
"messages": [
{"role":"system","content":"You answer using only the provided context."},
{"role":"user","content":"Where is my shipment?"}
],
"temperature": 0.2,
"max_tokens": 256
}'
Streaming (stream: true) uses SSE chunks with the same schema as upstream OpenAI clients expect.
Batch & tuning jobs
Enqueue offline evaluation, re-indexing, or a scheduled adapter refresh. Returns 202 with a job id.
| Field | Type | Required | Description |
|---|---|---|---|
type | string | yes | e.g. eval_regression, rebuild_index, merge_lora |
payload | object | yes | Type-specific; validated server-side |
idempotency_key | string | no | Dedupes within 24h |
{
"id": "job_7Qk2Lp9",
"status": "queued",
"created_at": "2026-04-05T14:02:11Z"
}
Poll for completion; result may include metric summaries and artefact ids to pin in production.
{
"id": "job_7Qk2Lp9",
"status": "succeeded",
"finished_at": "2026-04-05T14:18:02Z",
"result": {
"exact_match": 0.41,
"human_preferred_rate": 0.78,
"artefact_id": "art_lora_20260405a"
}
}
Artefacts
Metadata for adapters, merged weights, or snapshotted vector indexes—hashes, size, and compatibility matrix. Binary blobs are fetched from signed URLs in the response.
{
"id": "art_lora_20260405a",
"kind": "lora_adapter",
"sha256": "b3c1…9f",
"compatible_models": ["bd-support-7b-2026-03"],
"created_at": "2026-04-05T14:17:40Z"
}
Questions about your deployment? Contact us.