Viabig logoViabig
Services ↓
Bounding BoxPolygon & InstanceSemantic SegmentationKeypoint & LandmarkLiDAR / 3DNLP / Text LabelingAudio TranscriptionDataset CurationModel Training & Fine-TuningLLMs & RAG SystemsGenerative AIComputer VisionMLOps & InfraSports AnalyticsHealthcare & Life SciencesAutomotive & MobilityRetail & E-commerceManufacturing & Industry 4.0Financial ServicesLogistics & Supply ChainEnergy & Utilities
AboutCareersBlogContactBook a call →
Home/Services/Discipline 03
// Discipline 03

LLMs & RAG that answer from your knowledge.

Retrieval pipelines over private knowledge, evaluation harnesses, prompt + context engineering. We build LLM systems that are measurably useful, not just functional.

// query → retrieve → ground → answer
01user query → retrieverk=12 · hybrid
02rerank → top-5cross-encoder
03context assembly3,840 tok
04generation w/ citationsclaude · gpt · llama
05faithfulness checkpassed · 0.94
“Yes — see policy §4.2.[3] The 30-day window applies only to commercial accounts.[5]
/ 3.1

Retrieval-Augmented Generation

Grounded answers from private documents, databases, and knowledge bases — without re-training.

// rag_pipeline · hybrid · pgvector + bm25
query → embed(ada-002) → vec search (k=20)
→ bm25 sparse → union → rerank (top-5)
retrieved 5 chunks · context: 3,840 tok
faithfulness: 0.94 relevance: 0.91
“Per §4.2, the 30-day window applies to commercial accounts. [3][5]

// Details

  • Vector search (pgvector, Pinecone, Weaviate, Qdrant)
  • Chunking strategy and embedding selection
  • Hybrid retrieval (dense + sparse BM25)
  • Contextual compression and re-ranking

// Output formats

REST APIPython SDKDocker
/ 3.2

LLM Evaluation & Benchmarking

You can't improve what you can't measure. We build evaluation suites before we build the system.

// ragas_eval · system_v1 · 500 questions
faithfulness ....... 0.94 ↑ +0.08 from v0
answer_relevancy ....... 0.91 ↑ +0.06
context_recall ....... 0.87 needs improvement
context_precision....... 0.89
action: improve chunking · tune rerank threshold

// Details

  • Groundedness, relevance, faithfulness metrics
  • RAGAS / custom evaluation harnesses
  • Regression benchmarks across model versions
  • Human evaluation integration

// Output formats

Eval reportJSONDashboard
/ 3.3

Prompt & Context Engineering

Systematic prompt development, few-shot curation, context window optimization.

// prompt_template · structured · json_mode
SYSTEM: You are a policy analyst. Answer using only
the provided context. Say “unclear” if unsure.
CONTEXT: {{retrieved_chunks}}
USER: {{user_question}}
OUTPUT: {"answer":…, "citations":[…], "confidence":…}

// Details

  • Structured prompt templates
  • Chain-of-thought, structured output (JSON mode)
  • Prompt regression testing
  • Context window management strategies

// Output formats

YAML promptsLangChainLlamaIndex
/ 3.4

Tool-Use & Function Calling

LLMs connected to APIs, databases, and tools — with proper fallback, error handling, and observability.

// tool_trace · search_agent · 3 calls
[1] call: search_docs(“refund policy 2026”) → 5 chunks
[2] call: lookup_clause(“§4.2”) → 2025-01-01
[3] call: validate_output(schema) → pass
final: grounded · citations: [3,5] · tokens: 842

// Details

  • OpenAI function calling / tool_use
  • Multi-step reasoning with tool selection
  • Output parsing and validation
  • Observability with tracing

// Output formats

OpenAI APIAnthropic APIOpen-source
// Work with us

Ready to ship? Let's scope it together.

Whether it's labeled data, a fine-tuned model, a RAG pipeline, or an agent running in production — bring us the brief. We'll scope it, price it, and tell you honestly if we're the right team. Inside 48 hours, no commitment.