![]() |
Code repo for in-progress O'Reilly book on GenAI design patterns by Valliappa Lakshmanan and Hannes Hapke. https://www.oreilly.com/library/view/generative-ai-design/9798341622654/ |
These are the design patterns covered in the book:
Chapter 2: Controlling Style
Pattern Number | Pattern Name | Problem | Solution | Usage Scenarios | Code Example |
---|---|---|---|---|---|
1 | Logits Masking | Need to ensure generated text conforms to specific style rules for brand, accuracy, or compliance reasons. | Intercept the generation at the sampling stage to zero out probabilities of continuations that don't meet the rules | Use words associated with specific brand; avoid repeating factual information; make content compliant with style book | examples/01_logits_masking |
2 | Grammar | Need text to conform to a specific format or data schema for downstream processing. | Specify rules as a formal grammar (e.g., BNF) or schema that the model framework applies to constrain token generation. | Generating valid SQL timestamps; extracting structured data in a specific format; ensuring output conforms to JSON schema. | examples/02_grammar |
3 | Style Transfer | Need to convert content into a form that mimics specific tone and style that is difficult to express through rules, but can be shown through example conversions. | Use few-shot learning or model fine-tuning to teach the model how to convert content to the desired style. | Rewriting generic content to match brand guidelines; converting academic papers to blog posts; transforming image and text content for different social media platforms or audiences. | examples/03_style_transfer |
4 | Reverse Neutralization | Need to generate content in a specific style that can be shown through example content. | Use an LLM to generate content in an intermediate neutral form, and a fine-tuned LLM to convert that neutral form into the desired style. | Generating letters in region-specific legalese; generating emails in personal style. | examples/04_reverse_neutralization |
5 | Content Optimization | Need to determine optimal style for content without knowing which factors matter. | Generate pairs of content, compare them using an evaluator, create a preference dataset, and perform preference tuning. | Optimizing ad copy, marketing content, or educational materials where effective style factors are unknown. | examples/05_content_optimization |
Chapters 3 and 4: Adding Knowledge
Pattern Number | Pattern Name | Problem | Solution | Usage Scenarios | Code Example |
---|---|---|---|---|---|
6 | Basic RAG | Knowledge cutoff, confidential data, and hallucinations pose problems for zero-shot generation by LLMs. | Ground the response generated by the LLM by adding relevant information from a knowledge base into the prompt context. | The applications of RAG are constantly expanding as the technology evolves. | examples/06_basic_rag |
7 | Semantic Indexing | Traditional keyword indexing/lookup approaches fail when documents get more complex, contain different media types like images or tables, or bridge multiple domains. | Use embeddings to capture the meaning of texts, images, and other media types. Find relevant chunks by comparing the embedding of the chunk to that of the query. | examples/07_semantic_indexing | |
8 | Indexing at Scale | Dealing with outdated or contradictory information in your knowledge base. | Using metadata, query filtering, and result reranking. | examples/08_indexing_at_scale | |
9 | Index-aware Retrieval | Comparing questions to chunks is problematic because the question itself will not appear in the knowledge base, may use synonyms or jargon, or may require holistic interpretation. | Hypothetical answers, query expansion, hybrid search, GraphRAG | examples/09_index_aware_retrieval | |
10 | Node Postprocessing | Irrelevant content, ambiguous entities, generic answers. | Reranking offer the ability to bring in a lot of other neat ideas: hybrid search, query expansion, filtering, contextual compression, disambiguation, personalization | examples/10_node_postprocessing | |
11 | Trustworthy Generation | How to retain users’ trust given that there is no way to completely avoid errors. | Out-of-domain detection, citations, guardrails, human feedback, corrective RAG, UX design can all help. | examples/11_trustworthy_generation | |
12 | Deep Search | RAG systems are less effective for complex information retrieval tasks because of context window constraints, query ambiguity, information verification, shallow reasoning, and multi-hop query challenges. | Iterative process of searching, reading, and reasoning to provide comprehensive answers to complex queries. | examples/12_deep_search |
Chapter 5: Extending Model Capability
Pattern Number | Pattern Name | Problem | Solution | Usage Scenarios | Code Example |
---|---|---|---|---|---|
13 | Chain of Thought (CoT) | Foundational models often struggle with multi-step reasoning tasks, leading to incorrect or fabricated answers. | CoT prompts the model to break down complex problems into intermediate reasoning steps before providing the final answer. | Complex mathematical problems, logical deductions, and sequential reasoning tasks where step-by-step thinking is required. | examples/13_chain_of_thought |
14 | Tree of Thoughts (ToT) | Many strategic or logical tasks cannot be solved by a single linear reasoning path, requiring exploration of multiple alternatives. | ToT treats problem-solving as a tree search, generating multiple reasoning paths, evaluating them, and backtracking as needed | Complex tasks involving strategic thinking, planning, or creative writing that require exploring multiple solution paths. | examples/14_tree_of_thoughts |
15 | Adapter Tuning | Fully fine-tuning large foundational models for specialized tasks is computationally expensive and requires significant data.nt. | Adapter Tuning trains small add-on neural network layers, leaving the original model weights frozen, making it efficient for specialized adaptation. | Adapting models for specific tasks like classification, summarization, or specialized chatbots with a small (100-10k) dataset of examples. | examples/15_adapter_tuning |
16 | Evol-Instruct | Creating high-quality datasets for instruction tuning models on new and complex enterprise tasks is difficult and time-consuming. | Evol-Instruct efficiently generates instruction-tuning datasets by evolving instructions through multiple iterations of LLM-generated tasks and answers. | Teaching models new, domain-specific tasks that are not covered by their pre-training data, particularly in enterprise settings. | examples/16_evol_instruct |
Chapter 6: Increasing Reliability
Pattern Number | Pattern Name | Problem | Solution | Usage Scenarios | Code Example |
---|---|---|---|---|---|
17 | LLM-as-Judge | Evaluation of GenAI capabilities is hard because the tasks that GenAI performs are open-ended. | Provide detailed, multi-dimensional feedback that can be used to compare models, track improvements, and guide further development. | Evaluation is core to many of the other patterns and to building AI applications effectively. | examples/17_llm_as_judge |
18 | Reflection | How to get the LLM to correct an earlier response in response to feedback or criticism. | The feedback is used to modify the prompt that is sent to the LLM a second time. | Reliable performance in most complex tasks where the approach can not be predetermined. | examples/18_reflection |
19 | Dependency Injection | Need to independently develop and test each component of an LLM chain. | When you build chains of LLM calls, build them such that it is easy to inject a mock implementation to replace any step of the chain. | In any situation where you chain LLM calls or use external tools. | examples/19_dependency_injection |
20 | Prompt Optimization | Need to easily update prompts when dependencies change to maintain level of performance | Systematically set the prompts used in a GenAI pipeline by optimizing them on a dataset of examples | In any situation where you have to reduce the maintenance overhead associated with LLM version changes (and other dependencies). | examples/20_prompt_optimiation |
Chapter 7: Enabling Action
Pattern Number | Pattern Name | Problem | Solution | Usage Scenarios | Code Example |
---|---|---|---|---|---|
21 | Tool Calling | How can you bridge the LLM and a software API so that the LLM is able to invoke the API and get the job done? | The LLM emits special tokens when it determines that a function needs to be called and also emits the parameters to pass to that function. A client-side postprocessor invokes the function with those parameters, and sends the results back to the LLM. The LLM incorporates the function results in its response. | Whenever you want the LLM to not just state the steps needed, but to execute those steps. Also allows you to incorporate up-to-date knowledge from real-time sources, connect to transactional enterprise systems, perform calculations, and use optimization solvers. | examples/21_tool_calling |
22 | Code Execution | You have a software system that can do the task, but invoking it involves a DSL. | LLMs generate code that is then executed by an external system. | Creating graphs, annotating images, updating databases. | examples/22_code_execution |
23 | Multi-agent Collaboration | Handle multi-step tasks that require different tools, maintain content over extended interactions, evaluate situations and take appropriate actions without human intervention, and adapt to user preferences. | Multi-agent architectures allow you to solve real-world problems using specialized single-purpose agents and organizing them in ways that mimic human organizational structures. | Complex reasoning, multi-step problem solving, collaborative content creation, adversarial verification, specialized domain integration, self-improving systems | examples/23_multi_agent |
Chapters 8: Meeting Constraints
Pattern Number | Pattern Name | Problem | Solution | Usage Scenarios | Code Example |
---|---|---|---|---|---|
24 | Small Language Models (SLMs) | examples/24_slms | |||
25 | Prompt Caching | examples/24_slms | |||
26 | Optimizing Inference | examples/24_slms | |||
27 | Inference Distribution Testing | examples/24_slms |
Chapters 9: Setting Safeguards
Pattern Number | Pattern Name | Problem | Solution | Usage Scenarios | Code Example |
---|---|---|---|---|---|
28 | Template Generation | The risk of sending content without human review is very high, but human review will not scale to the volume of communications. | Pregenerate templates that are reviewed beforehand. Inference time requires only deterministic string replacement, and is therefore safe to directly send to consumers. | Personalized communications in business to consumer settings. | examples/28_template_generation |
29 | Assembled Reformat | Content needs to be presented in an appealing way, but the risk posed by dynamically generated content is too high. | Reduce the risk of inaccurate or hallucinated content by separating out the task of content creation into two low-risk steps — first, assembling data in low-risk ways and second, formatting the content based on that data. | Situations where accurate content needs to be presented in appealing ways, such as in product catalogs. | examples/29_assembled_reformat |
30 | Self-Check | Identify potential hallucinations cost-effectively | Use token probabilities to detect hallucination in LLM responses | In any situation where factual (as opposed to creative) responses are needed. | examples/30_self_check |
31 | Guardails | Require safeguards for security, data privacy, content moderation, hallucination, and alignment to ensure that AI applications operate within ethical, legal, and functional parameters. | Wrap the LLM calls with a layer of code that preprocesses the information going into the model and/or post-processes the output of the model. Knowledge retrieval and tool use will also need to be protected. | Anytime your application could be subject to attacks by malicious adversaries. | examples/30_guardrails |
- If you have implemented any of the patterns in the book in production, submit a PR to update the USAGE.md in the folder corresponding to the pattern. See examples/15_adapter_tuning/USAGE.md for an example.
The GenAI Design Patterns book is a companion book to the O'Reilly book Machine Learning Design Patterns.