Descripción de la oferta
About UsStaq is a leading Banking-as-a-Service (BaaS) and embedded finance platform, transforming the way businesses integrate banking and financial services. At Staq, we empower our clients to innovate, expand, and streamline their financial services offerings, leveraging our cutting-edge platform. Our mission is to bridge the gap between traditional banking and the digital era, providing seamless, scalable, and secure financial solutions. The RoleWe are building the intelligence layer that will power an AI-powered financial assistant and serve as the SDK that other banking applications plug into. The long-term vision is an AI-native bank where every customer interaction, recommendation, and financial operation is orchestrated through this platform. That means the agent runtime, automation engine, recommendation systems, and tool execution framework all need to be built as reusable, production-grade infrastructure — not one-off features for a single product. The objective is to build, harden, and ship the intelligence platform across multiple products simultaneously. You will be building the systems that make AI actually work in finance: agents that reason about money, automations that run reliably on people's financial data, recommendations that are genuinely useful, and tool execution that is safe and observable. This is systems engineering meets applied AI. Key ResponsibilitiesAgent Runtime & OrchestrationBuild and maintain production AI agent flows using Python and LangGraph, including multi-step planning, tool selection, and context assemblyAuthor and evolve Agent Cards that define agent capabilities, context requirements, and output contracts for each product domainImplement the agent-side integration with Temporal workflows — the AGENT_STEP and AGENT_LOOP activity interfaces that the Java orchestrator calls intoOwn prompt engineering, template management, and context window optimization across all agent flowsDesign and implement memory systems that give agents meaningful continuity — conversation history, user financial context, and long-term preference tracking across sessionsAutomation & Intelligent WorkflowsDesign and implement automation flows that go beyond conversational agents — scheduled financial health checks, proactive alerting, background data analysis, and event-driven triggersBuild reliable, deterministic automation pipelines that can execute multi-step financial operations with proper error handling, compensation logic, and human-in-the-loop escalationEnsure automations are idempotent, observable, and operate within the platform's risk gate frameworkRecommendation SystemsBuild and iterate on recommendation engines that surface personalized financial insights, product suggestions, and actionable next-best-actions to usersDesign the data contracts and feature pipelines that feed recommendations, working with domain services for banking, credit, and subscription dataImplement evaluation frameworks to measure recommendation quality, relevance, and user engagementSandboxed Tool ExecutionOwn the integration with sandboxed execution environments (E2B) where agents run tools against real financial APIs and data sourcesImplement and maintain MCP (Model Context Protocol) tool definitions, ensuring agents can safely invoke financial operations within policy-controlled boundariesBuild guardrails around tool execution — input validation, output verification, and safe fallback behavior when tools fail or return unexpected resultsReliability & TestingBuild comprehensive test harnesses for agent behavior — deterministic scenario tests, regression suites, and evaluation benchmarksOwn the reliability engineering of the agent runtime: graceful degradation when LLMs misbehave, proper retry logic, timeout handling, and circuit breakersSupport adversarial testing and red-teaming efforts from the AI sidePlatform & SDK MindsetEverything you build must be reusable. Zeen is the first product, but the intelligence layer is an SDK — other banking applications will build on top of the same agent patterns, tool integrations, and automation frameworksMaintain and evolve the shared contracts (Agent Cards, tool schemas, risk gate interfaces) that allow new products to onboard onto the platform with minimal custom workThink in terms of clean abstractions and extension points, not hard-coded product logicTechnical EnvironmentPython (primary), with integration touchpoints to Java microservicesLangGraph for agent orchestration; Temporal Cloud (Java SDK) as the durable workflow engineOPA/Rego for policy enforcement across four risk gate stages (pre-LLM, post-LLM, pre-tool, post-tool)E2B sandboxed containers for tool execution; MCP for tool protocolOpenTelemetry for observability; structured artifact loggingLLM providers via a gateway abstraction (model-agnostic)Fintech domain: Plaid integrations, banking/credit/subscription data What We Are Looking ForMust Have3+ years building production AI/ML systems (not just notebooks — deployed, monitored, maintained)Strong Python fundamentals and experience with async patterns, error handling, and production-grade codeHands-on experience with LLM application development — prompt engineering, context engineering, tool/function calling, and structured outputsExperience building at least one of: recommendation systems, automation pipelines, or multi-step agent workflowsUnderstanding of evaluation and testing for non-deterministic systems — you know that "it works on my prompt" is not a test strategyComfort working with financial data where correctness and reliability matter more than speed of iteration Strong SignalsExperience with agent frameworks (LangGraph, LangChain, AutoGen, CrewAI) in production, not just prototypesFamiliarity with memory systems for AI agents — short-term and long-term memory architectures, retrieval-augmented generation, and context window management strategiesExperience with prompt management at scale — versioning, templating, A/B testing, and systematic prompt optimization workflowsFamiliarity with sandboxed code execution, MCP, or tool-use patterns for LLM agentsBackground in fintech, financial data, or regulated industriesExperience with recommendation engines (collaborative filtering, content-based, hybrid approaches)Familiarity with workflow orchestration systems (Temporal, Airflow, Prefect) and how AI fits into durable execution patternsExperience with LLM observability and performance tracking — call latency profiling, token usage monitoring, cost attribution, and tracing through multi-step agent flows What This Role Is NotThis is not a pure ML research position. We are not training foundation models. You will be building application-layer AI systems on top of LLMs and integrating them into a financial services platform that real people depend on for real money. The challenge is in the systems engineering, reliability, and product thinking — not in publishing papers.