A full-time, immersive upskilling program combining AI fundamentals with agentic development tooling. All resources are free. Built for 30+ engineers with varying experience levels.
Get grounded in how LLMs work, master prompt engineering, and build pair-programming fluency with Cursor.
Build retrieval pipelines, connect LLMs to external tools, and design your first autonomous agents.
Master stateful workflows, build multi-agent systems, and coordinate AI teams for complex tasks.
Choose a project from the bank below. 10 full days of building. Ship something real.
Each brief is written as a non-technical client would describe it. Your job: translate the business need into a working AI-powered solution in 10 days.
This framework is designed to give engineers structured self-assessment to guide their growth, with formal evaluation conducted by their engineering manager or team lead. Ratings are qualitative — the goal is honest development feedback, not number-chasing. Evaluation outcomes feed directly into performance reviews and career progression conversations.
Evaluated at the end of Week 3, before the capstone begins. Covers both conceptual understanding and practical ability to apply what was learned. The engineer completes a self-assessment first, then the lead reviews and has a 1-on-1 conversation.
| Category | Needs Improvement | Meets Expectations | Exceeds Expectations |
|---|---|---|---|
| LLM Fundamentals & API Integration | Cannot explain how LLMs generate text or has not built a working API integration. Struggles with basic concepts like tokenization, context windows, or message formatting. | Can explain transformer basics, token economics, and model tradeoffs. Has built at least one working LLM API integration with proper error handling and streaming. | Deeply understands model selection tradeoffs (cost, latency, capability). Builds production-grade integrations with structured outputs, retries, and fallback models. Can teach others. |
| Prompt Engineering | Writes vague or unstructured prompts. Doesn't know the difference between zero-shot, few-shot, and chain-of-thought. Outputs are inconsistent. | Applies appropriate prompting techniques for different tasks. Writes clear system prompts that produce consistent, well-structured output. Understands ReAct and when to use it. | Designs prompt architectures for complex workflows. Systematically tests and iterates on prompts. Creates reusable prompt templates that the team adopts. |
| RAG & Retrieval Systems | Cannot build a basic RAG pipeline. Doesn't understand embeddings, chunking strategies, or the difference between vector and keyword search. | Builds end-to-end RAG pipelines: loading, chunking, embedding, storing, retrieving, generating. Understands tradeoffs between chunking strategies and can evaluate retrieval quality. | Implements advanced patterns — hybrid search, reranking, self-RAG, or corrective RAG. Can diagnose retrieval failures and systematically improve pipeline performance. |
| Agents & Function Calling | Cannot explain what an agent is or how function calling works. Has not built a working agent that uses tools. | Builds agents that use multiple tools to complete tasks. Understands agent loops, tool schemas, and error recovery. Can implement ReAct or reflection patterns. | Designs agent architectures for complex, multi-step workflows. Implements human-in-the-loop patterns, memory systems, and robust error handling. Agents handle edge cases gracefully. |
| LangGraph & Orchestration | Cannot build a basic LangGraph workflow. Doesn't understand states, nodes, edges, or checkpointing. | Builds stateful LangGraph workflows with conditional routing, tool nodes, and checkpointing. Can implement an agentic RAG system or a multi-step processing pipeline. | Designs complex multi-agent LangGraph applications with shared state, reflection loops, and persistent memory. Understands when LangGraph is the right tool versus simpler alternatives. |
| Multi-Agent Systems | Cannot explain the difference between single-agent and multi-agent patterns. Has not coordinated multiple agents on a task. | Understands supervisor, swarm, and hierarchical patterns. Has built a working multi-agent system where agents coordinate on a shared task with clear role separation. | Designs multi-agent architectures with sophisticated coordination — task delegation, conflict resolution, shared state management, and cost-aware orchestration. Can evaluate when multi-agent adds value versus overhead. |
Evaluated after Demo Day. Covers four dimensions: technical quality, product thinking, development process, and communication. The engineer self-assesses first, the lead evaluates based on the live demo, code review, and Q&A.
| Dimension | Needs Improvement | Meets Expectations | Exceeds Expectations |
|---|---|---|---|
| Technical Quality | Code is disorganized, hard to follow, or doesn't work reliably. AI techniques (RAG, agents, LangGraph) are applied superficially or incorrectly. Architecture is unclear or not thought through. | Clean, well-structured code with a clear architecture. AI techniques are applied appropriately — the choice of RAG, agents, or multi-agent patterns makes sense for the problem. Error handling exists and the system is reasonably robust. | Production-grade code quality. Thoughtful architectural decisions with clear reasoning. AI techniques are combined effectively — e.g., agentic RAG with reflection, multi-agent coordination with human-in-the-loop. Handles edge cases and failure modes. |
| Product Thinking | Built something technically interesting but didn't address the client's actual problem. Requirements from the brief are ignored or misunderstood. The solution wouldn't work for real users. | The solution clearly addresses the client's stated problem and requirements. Reasonable assumptions were made and documented. The system would be usable by the described client with minimal changes. | Goes beyond the brief — anticipates edge cases the client didn't mention, identifies risks, and proposes a realistic roadmap for production deployment. Demonstrates genuine understanding of the business problem, not just the technical challenge. |
| Development Process | No clear approach — jumped straight into coding without planning. Build log is empty or sparse. Didn't iterate — the final product looks like a first attempt. Didn't leverage the AI tools or course content effectively. | Started with architecture and planning before building. Iterated meaningfully — there's evidence of testing, refining, and improving. Applied course concepts (TDD loop, CLAUDE.md, plan-then-build) during the build. Build log shows daily progress. | Exemplary process — clear architecture document, disciplined iteration, meaningful use of Git history showing progression. Used advanced workflows (subagents, parallel sessions, MCP integrations) naturally and effectively. Build log is detailed and reflective. |
| Communication & Demo | Demo is unstructured or confusing. Can't clearly explain what the system does or why design choices were made. Struggles to answer Q&A questions. Presentation runs over time or is unprepared. | Clear, structured demo: problem → approach → live walkthrough → what was learned. Explains design decisions with reasoning. Handles Q&A competently. Stays within time. | Compelling presentation that tells a story — from the client's problem through to the solution. Live demo is polished and handles unexpected inputs. Answers tough Q&A questions thoughtfully. Shares insights that help the whole team learn. |
Day 30 is a live presentation to the full team. Each engineer presents their capstone project. The lead evaluates; peers ask questions.
Each engineer gets a fixed slot. The format is designed to be tight, focused, and useful for the whole team — not a lecture, but a live walkthrough.
Each engineer should have these ready before their presentation. These form the basis for the lead's code review and evaluation.