Specialists Guide

Specialists are the core building blocks of Alveare. Each one is a tuned configuration running on a shared model, optimised for a specific task.

What is a cognitive hive?

A cognitive hive is Alveare's core architecture. Instead of running separate models for each task (classification, summarisation, extraction, etc.), a hive loads one model and creates multiple specialists from it.

Each specialist has its own:

Because all specialists share the same loaded model weights, a hive uses 80-90% less GPU memory than running separate models. That structural advantage is why Alveare costs a fraction of OpenAI for the same workloads.

How specialists share a single model

When a request arrives at the hive:

  1. The router identifies which specialist to use (from the specialist field or model name)
  2. The specialist's system prompt is prepended to the user's input
  3. Sampling parameters (temperature, top-p) are applied per the specialist's config
  4. The model generates a response using the shared weights
  5. Guardrails validate the output before returning it

This happens in the same inference process — there is no extra serialisation or network hop between specialists. Switching from one specialist to another is essentially free.

Available specialists

classify

Text Classification

Categorise text into predefined labels. Useful for sentiment analysis, intent detection, topic routing, and content moderation. Runs at low temperature (0.2-0.3) for consistent results.

python
result = client.infer(
    specialist="classify",
    prompt="""Classify the sentiment of this review as positive, negative, or neutral:

"The battery life is incredible but the camera quality
could be better. Overall a solid phone for the price."""",
    temperature=0.2,
)

print(result.result)
# positive
summarise

Text Summarisation

Condense long text into concise summaries. Supports bullet points, executive summaries, TL;DR, and custom formats. Works well with documents up to ~3000 tokens of input.

python
result = client.infer(
    specialist="summarise",
    prompt="""Summarise this in 3 bullet points:

The company reported Q4 revenue of $4.2 billion, up 23% year-over-year.
Operating margins expanded to 28%, driven by efficiency improvements in
cloud infrastructure and reduced customer acquisition costs. The enterprise
segment grew 45% and now represents 60% of total revenue. Management
raised full-year guidance to $18B-$18.5B, above analyst expectations of
$17.8B. Free cash flow reached $1.1B in the quarter.""",
    max_tokens=200,
)

print(result.result)
# - Q4 revenue hit $4.2B (+23% YoY) with operating margins at 28%
# - Enterprise segment grew 45%, now 60% of total revenue
# - Full-year guidance raised to $18B-$18.5B, beating expectations
extract

Structured Data Extraction

Pull structured data from unstructured text. Outputs JSON by default. Useful for parsing invoices, emails, resumes, receipts, and any document where you need specific fields.

python
result = client.infer(
    specialist="extract",
    prompt="""Extract all contact information as JSON:

Hi there,

I'm Sarah Chen, VP of Engineering at TechFlow Inc.
You can reach me at sarah.chen@techflow.io or call
my direct line at (415) 555-0198. Our office is at
123 Market Street, Suite 400, San Francisco, CA 94105.""",
    max_tokens=256,
)

print(result.result)
# {
#   "name": "Sarah Chen",
#   "title": "VP of Engineering",
#   "company": "TechFlow Inc.",
#   "email": "sarah.chen@techflow.io",
#   "phone": "(415) 555-0198",
#   "address": "123 Market Street, Suite 400, San Francisco, CA 94105"
# }
qa

Question Answering

Answer questions grounded in a provided context passage. The specialist is trained to only use information present in the context, reducing hallucination. Returns "I don't have enough information" when the answer is not in the context.

python
result = client.infer(
    specialist="qa",
    prompt="""Context: The Alveare platform uses a cognitive hive architecture
where a single SLM (Small Language Model) is shared across multiple
specialists. Each specialist has its own system prompt and sampling
parameters. The platform supports Mistral 7B and Llama 2 7B/13B models.
Inference latency is typically under 500ms for requests under 1000 tokens.

Question: What models does Alveare support?""",
)

print(result.result)
# Alveare supports Mistral 7B and Llama 2 in both 7B and 13B parameter sizes.
chat

Multi-turn Conversation

General-purpose conversational AI. Maintains context across multiple turns via the messages array. Use with the OpenAI-compatible endpoint for the best multi-turn experience.

python
response = client.chat.completions.create(
    model="alveare-chat",
    messages=[
        {"role": "system", "content": "You are a helpful customer support agent for an e-commerce store."},
        {"role": "user", "content": "I ordered a laptop last week but it hasn't arrived."},
        {"role": "assistant", "content": "I'm sorry to hear that. Could you provide your order number so I can look into this?"},
        {"role": "user", "content": "It's ORD-98765"},
    ],
    max_tokens=256,
)

print(response.choices[0].message.content)
# Thank you. I can see order ORD-98765 was shipped on March 12th via
# express delivery. The tracking shows it's currently at the regional
# distribution center. It should arrive within 1-2 business days...
code

Code Generation

Generate, explain, debug, and refactor code. Works with Python, JavaScript, TypeScript, Go, Rust, SQL, and other mainstream languages. Best results with clear, specific prompts.

python
result = client.infer(
    specialist="code",
    prompt="""Write a Python function that takes a list of dictionaries
and groups them by a specified key. Include type hints and docstring.""",
    max_tokens=512,
)

print(result.result)
# def group_by(items: list[dict], key: str) -> dict[str, list[dict]]:
#     """Group a list of dictionaries by a specified key.
#
#     Args:
#         items: List of dictionaries to group.
#         key: The dictionary key to group by.
#
#     Returns:
#         Dictionary mapping key values to lists of matching items.
#     """
#     groups: dict[str, list[dict]] = {}
#     for item in items:
#         value = item.get(key, "unknown")
#         groups.setdefault(value, []).append(item)
#     return groups
custom

Custom Specialists

Available on Professional and Scale plans. Create your own specialists with custom system prompts, temperature defaults, and output validators. Define them in the dashboard or via the management API.

python
# Use a custom specialist defined in your dashboard
result = client.infer(
    specialist="my-legal-reviewer",
    prompt="Review this contract clause for potential issues: ...",
    max_tokens=512,
)

Best practices

Choosing the right specialist

Prompt tips

Token optimisation

Lower temperature + lower max_tokens = faster responses and lower token usage. For classification workloads, you can often achieve sub-100ms latency with temperature: 0.1 and max_tokens: 16.