Skip to content

lakshmanok/generative-ai-design-patterns

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Generative AI Design Patterns

Code repo for in-progress O'Reilly book on GenAI design patterns by Valliappa Lakshmanan and Hannes Hapke. https://www.oreilly.com/library/view/generative-ai-design/9798341622654/

Summary of patterns

These are the design patterns covered in the book:

Chapter 2: Controlling Style
Pattern Number Pattern Name Problem Solution Usage Scenarios Code Example
1 Logits Masking Need to ensure generated text conforms to specific style rules for brand, accuracy, or compliance reasons. Intercept the generation at the sampling stage to zero out probabilities of continuations that don't meet the rules Use words associated with specific brand; avoid repeating factual information; make content compliant with style book examples/01_logits_masking
2 Grammar Need text to conform to a specific format or data schema for downstream processing. Specify rules as a formal grammar (e.g., BNF) or schema that the model framework applies to constrain token generation. Generating valid SQL timestamps; extracting structured data in a specific format; ensuring output conforms to JSON schema. examples/02_grammar
3 Style Transfer Need to convert content into a form that mimics specific tone and style that is difficult to express through rules, but can be shown through example conversions. Use few-shot learning or model fine-tuning to teach the model how to convert content to the desired style. Rewriting generic content to match brand guidelines; converting academic papers to blog posts; transforming image and text content for different social media platforms or audiences. examples/03_style_transfer
4 Reverse Neutralization Need to generate content in a specific style that can be shown through example content. Use an LLM to generate content in an intermediate neutral form, and a fine-tuned LLM to convert that neutral form into the desired style. Generating letters in region-specific legalese; generating emails in personal style. examples/04_reverse_neutralization
5 Content Optimization Need to determine optimal style for content without knowing which factors matter. Generate pairs of content, compare them using an evaluator, create a preference dataset, and perform preference tuning. Optimizing ad copy, marketing content, or educational materials where effective style factors are unknown. examples/05_content_optimization
Chapters 3 and 4: Adding Knowledge
Pattern Number Pattern Name Problem Solution Usage Scenarios Code Example
6 Basic RAG Knowledge cutoff, confidential data, and hallucinations pose problems for zero-shot generation by LLMs. Ground the response generated by the LLM by adding relevant information from a knowledge base into the prompt context. The applications of RAG are constantly expanding as the technology evolves. examples/06_basic_rag
7 Semantic Indexing Traditional keyword indexing/lookup approaches fail when documents get more complex, contain different media types like images or tables, or bridge multiple domains. Use embeddings to capture the meaning of texts, images, and other media types. Find relevant chunks by comparing the embedding of the chunk to that of the query. examples/07_semantic_indexing
8 Indexing at Scale Dealing with outdated or contradictory information in your knowledge base. Using metadata, query filtering, and result reranking. examples/08_indexing_at_scale
9 Index-aware Retrieval Comparing questions to chunks is problematic because the question itself will not appear in the knowledge base, may use synonyms or jargon, or may require holistic interpretation. Hypothetical answers, query expansion, hybrid search, GraphRAG examples/09_index_aware_retrieval
10 Node Postprocessing Irrelevant content, ambiguous entities, generic answers. Reranking offer the ability to bring in a lot of other neat ideas: hybrid search, query expansion, filtering, contextual compression, disambiguation, personalization examples/10_node_postprocessing
11 Trustworthy Generation How to retain users’ trust given that there is no way to completely avoid errors. Out-of-domain detection, citations, guardrails, human feedback, corrective RAG, UX design can all help. examples/11_trustworthy_generation
12 Deep Search RAG systems are less effective for complex information retrieval tasks because of context window constraints, query ambiguity, information verification, shallow reasoning, and multi-hop query challenges. Iterative process of searching, reading, and reasoning to provide comprehensive answers to complex queries. examples/12_deep_search
Chapter 5: Extending Model Capability
Pattern Number Pattern Name Problem Solution Usage Scenarios Code Example
13 Chain of Thought (CoT) Foundational models often struggle with multi-step reasoning tasks, leading to incorrect or fabricated answers. CoT prompts the model to break down complex problems into intermediate reasoning steps before providing the final answer. Complex mathematical problems, logical deductions, and sequential reasoning tasks where step-by-step thinking is required. examples/13_chain_of_thought
14 Tree of Thoughts (ToT) Many strategic or logical tasks cannot be solved by a single linear reasoning path, requiring exploration of multiple alternatives. ToT treats problem-solving as a tree search, generating multiple reasoning paths, evaluating them, and backtracking as needed Complex tasks involving strategic thinking, planning, or creative writing that require exploring multiple solution paths. examples/14_tree_of_thoughts
15 Adapter Tuning Fully fine-tuning large foundational models for specialized tasks is computationally expensive and requires significant data.nt. Adapter Tuning trains small add-on neural network layers, leaving the original model weights frozen, making it efficient for specialized adaptation. Adapting models for specific tasks like classification, summarization, or specialized chatbots with a small (100-10k) dataset of examples. examples/15_adapter_tuning
16 Evol-Instruct Creating high-quality datasets for instruction tuning models on new and complex enterprise tasks is difficult and time-consuming. Evol-Instruct efficiently generates instruction-tuning datasets by evolving instructions through multiple iterations of LLM-generated tasks and answers. Teaching models new, domain-specific tasks that are not covered by their pre-training data, particularly in enterprise settings. examples/16_evol_instruct
Chapter 6: Increasing Reliability
Pattern Number Pattern Name Problem Solution Usage Scenarios Code Example
17 LLM-as-Judge Evaluation of GenAI capabilities is hard because the tasks that GenAI performs are open-ended. Provide detailed, multi-dimensional feedback that can be used to compare models, track improvements, and guide further development. Evaluation is core to many of the other patterns and to building AI applications effectively. examples/17_llm_as_judge
18 Reflection How to get the LLM to correct an earlier response in response to feedback or criticism. The feedback is used to modify the prompt that is sent to the LLM a second time. Reliable performance in most complex tasks where the approach can not be predetermined. examples/18_reflection
19 Dependency Injection Need to independently develop and test each component of an LLM chain. When you build chains of LLM calls, build them such that it is easy to inject a mock implementation to replace any step of the chain. In any situation where you chain LLM calls or use external tools. examples/19_dependency_injection
20 Prompt Optimization Need to easily update prompts when dependencies change to maintain level of performance Systematically set the prompts used in a GenAI pipeline by optimizing them on a dataset of examples In any situation where you have to reduce the maintenance overhead associated with LLM version changes (and other dependencies). examples/20_prompt_optimiation
Chapter 7: Enabling Action
Pattern Number Pattern Name Problem Solution Usage Scenarios Code Example
21 Tool Calling How can you bridge the LLM and a software API so that the LLM is able to invoke the API and get the job done? The LLM emits special tokens when it determines that a function needs to be called and also emits the parameters to pass to that function. A client-side postprocessor invokes the function with those parameters, and sends the results back to the LLM. The LLM incorporates the function results in its response. Whenever you want the LLM to not just state the steps needed, but to execute those steps. Also allows you to incorporate up-to-date knowledge from real-time sources, connect to transactional enterprise systems, perform calculations, and use optimization solvers. examples/21_tool_calling
22 Code Execution You have a software system that can do the task, but invoking it involves a DSL. LLMs generate code that is then executed by an external system. Creating graphs, annotating images, updating databases. examples/22_code_execution
23 Multi-agent Collaboration Handle multi-step tasks that require different tools, maintain content over extended interactions, evaluate situations and take appropriate actions without human intervention, and adapt to user preferences. Multi-agent architectures allow you to solve real-world problems using specialized single-purpose agents and organizing them in ways that mimic human organizational structures. Complex reasoning, multi-step problem solving, collaborative content creation, adversarial verification, specialized domain integration, self-improving systems examples/23_multi_agent
Chapters 8: Meeting Constraints
Pattern Number Pattern Name Problem Solution Usage Scenarios Code Example
24 Small Language Models (SLMs) examples/24_slms
25 Prompt Caching examples/24_slms
26 Optimizing Inference examples/24_slms
27 Inference Distribution Testing examples/24_slms
Chapters 9: Setting Safeguards
Pattern Number Pattern Name Problem Solution Usage Scenarios Code Example
28 Template Generation The risk of sending content without human review is very high, but human review will not scale to the volume of communications. Pregenerate templates that are reviewed beforehand. Inference time requires only deterministic string replacement, and is therefore safe to directly send to consumers. Personalized communications in business to consumer settings. examples/28_template_generation
29 Assembled Reformat Content needs to be presented in an appealing way, but the risk posed by dynamically generated content is too high. Reduce the risk of inaccurate or hallucinated content by separating out the task of content creation into two low-risk steps — first, assembling data in low-risk ways and second, formatting the content based on that data. Situations where accurate content needs to be presented in appealing ways, such as in product catalogs. examples/29_assembled_reformat
30 Self-Check Identify potential hallucinations cost-effectively Use token probabilities to detect hallucination in LLM responses In any situation where factual (as opposed to creative) responses are needed. examples/30_self_check
31 Guardails Require safeguards for security, data privacy, content moderation, hallucination, and alignment to ensure that AI applications operate within ethical, legal, and functional parameters. Wrap the LLM calls with a layer of code that preprocesses the information going into the model and/or post-processes the output of the model. Knowledge retrieval and tool use will also need to be protected. Anytime your application could be subject to attacks by malicious adversaries. examples/30_guardrails

Want to be cited in future versions of the book?

  • If you have implemented any of the patterns in the book in production, submit a PR to update the USAGE.md in the folder corresponding to the pattern. See examples/15_adapter_tuning/USAGE.md for an example.

Further reading

The GenAI Design Patterns book is a companion book to the O'Reilly book Machine Learning Design Patterns.

About

A catalog of design patterns when building generative AI applications

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages