← всі звіти · replicate.md

type: integration name: Replicate slug: replicate category: llm status: active owner_agent: TODO used_by: [arteggia-bot, med-detective] last_updated: 2026-05-01

Replicate

Why deployed

Hosted inference для LLM/Vision моделей коли власні Gemini ключі недоступні (403/quota) або не підходять (Llama для UA-locale).

Where used

Arteggia Bot — google/gemini-2.5-flash для OCR чеків (після того як Gemini API ключ помер 20.04)
Med Detective — meta/meta-llama-3-8b-instruct як patient/intent default (1-1.4с latency vs 15с на 70B)

Endpoints / Touchpoints

API: https://api.replicate.com/v1/
Account: deltamedical

Credentials

Path: /srv/passepartout/replicate/token.txt

Health & monitoring

Pay-per-use → дивитись споживання у Replicate dashboard

Known issues / quirks

Llama-3-8B якість «трохи гірша» за 70B, іноді русизми («Рукі» → «Руки»)
Qwen 2.5-VL ще немає на Replicate (тільки Qwen 2-VL)

Cost

Pay-per-use. Орієнтовно ~$0.001-0.005 per Gemini-2.5-flash call (OCR), ~$0.0002 per Llama-3-8B call.