azure-ai-foundry · jugi92 · Sep 18, 2025
diff --git a/samples/mistral/python/mistral-docai-annotations.ipynb b/samples/mistral/python/mistral-docai-annotations.ipynb
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 1,
    "id": "d9290f1f",
    "metadata": {},
    "outputs": [],
@@ -54,8 +54,9 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "AZURE_MISTRAL_DOCUMENT_AI_ENDPOINT = \"\"\n",
+    "AZURE_MISTRAL_DOCUMENT_AI_ENDPOINT = \"https://YOUR-AI-FOUNDRY-NAME.services.ai.azure.com/providers/mistral/azure/ocr\"\n",
     "AZURE_MISTRAL_DOCUMENT_AI_KEY = \"\"\n",
+    "\n",
     "REQUEST_HEADERS = {\n",
     "    \"Content-Type\": \"application/json\",\n",
     "    \"Authorization\": f\"Bearer {AZURE_MISTRAL_DOCUMENT_AI_KEY}\",\n",
@@ -64,12 +65,38 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "id": "95f536be",
+   "execution_count": 3,
+   "id": "72639aad",
    "metadata": {},
    "outputs": [],
    "source": [
-    "!wget https://raw.githubusercontent.com/mistralai/cookbook/refs/heads/main/mistral/ocr/mistral7b.pdf"
+    "#create folder images\n",
+    "import os\n",
+    "\n",
+    "os.makedirs(\"../images\", exist_ok=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "6f3d5ddd",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "('../images/mistral7b.pdf', <http.client.HTTPMessage at 0x1edba81e150>)"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import urllib.request\n",
+    "\n",
+    "urllib.request.urlretrieve(\"https://raw.githubusercontent.com/mistralai/cookbook/refs/heads/main/mistral/ocr/mistral7b.pdf\", \"../images/mistral7b.pdf\")"
    ]
   },
   {
@@ -82,7 +109,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 5,
    "id": "706550d7",
    "metadata": {},
    "outputs": [],
@@ -114,17 +141,20 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 6,
    "id": "7a1329cb",
    "metadata": {},
    "outputs": [],
    "source": [
-    "encodedDocument = encode_image(\"../images/mistral7b.pdf\")"
+    "encodedDocument = encode_image(\"../images/mistral7b.pdf\")\n",
+    "\n",
+    "# check if doc was read\n",
+    "assert encodedDocument"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 7,
    "id": "b3ef89b0",
    "metadata": {},
    "outputs": [],
@@ -162,7 +192,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 8,
    "id": "465e1f81",
    "metadata": {},
    "outputs": [],
@@ -176,10 +206,48 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "id": "c4d74ba6",
+   "execution_count": 9,
+   "id": "01a1372a",
    "metadata": {},
    "outputs": [],
+   "source": [
+    "bb1Response.raise_for_status()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "c4d74ba6",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "page 0\n",
+      "Image type: Logo\n",
+      "Short description: A 3D-rendered logo of the text 'Mistral AI' in a gradient of warm colors, transitioning from orange to yellow.\n",
+      "page 1\n",
+      "Image type: diagram\n",
+      "Short description: This image shows a comparison between Vanilla Attention and Sliding Window Attention mechanisms in the context of natural language processing. The left part of the image illustrates the attention patterns for the sentence 'The cat sat on the' using Vanilla Attention, where each word attends to all previous words. The middle part shows the attention patterns using Sliding Window Attention, where each word attends to a limited number of previous words within a sliding window. The right part of the image depicts the effective context length across different layers, indicating how the context length changes with the depth of the layers in the model. The diagram highlights the difference in how attention mechanisms handle the context of words in a sequence, with Vanilla Attention providing a broader context and Sliding Window Attention focusing on a more localized context.\n",
+      "page 2\n",
+      "Image type: Diagram\n",
+      "Short description: A diagram showing the process of tokenization over three timesteps. Each row represents a different sentence being tokenized, with each word or part of a word represented in colored boxes. The colors change to indicate the progression of tokenization over time.\n",
+      "page 2\n",
+      "Image type: matrix\n",
+      "Short description: This image shows a matrix used in natural language processing, specifically illustrating the presence of words in different contexts. The matrix is divided into three sections: Past, Cache, and Current. Each row represents a specific word (the, dog, go, to), and each column represents a word in the sentence 'The cat sat on the mat and saw the dog go to'. The numbers in the cells indicate the presence (1) or absence (0) of the row word in the corresponding position of the sentence. The Past section contains zeros, indicating no presence of the row words. The Cache section shows the presence of words as they transition, and the Current section shows the current presence of words in the sentence.\n",
+      "page 3\n",
+      "Image type: bar chart\n",
+      "Short description: This bar chart compares the accuracy percentages of different models (Mistral 7B, LLaMA 2 13B, LLaMA 2 7B, and LLaMA 1 34B) across various categories. The left chart includes categories such as MMLU, Knowledge, Reasoning, and Comprehension, while the right chart includes AGI Eval, Math, BBH, and Code. Each category shows the performance of the models, with Mistral 7B generally performing the best in most categories, particularly in MMLU and AGI Eval. LLaMA 2 13B also shows strong performance, especially in Reasoning and Comprehension. LLaMA 2 7B and LLaMA 1 34B have more varied performances, with notable strengths in specific categories like Knowledge and Code respectively. The chart highlights the comparative strengths and weaknesses of each model across different tasks, providing insights into their relative effectiveness and areas of specialization.\n",
+      "page 4\n",
+      "Image type: chart\n",
+      "Short description: The image contains four line charts comparing the performance of LLaMA 2 and Mistral models across different metrics (MMLU, Reasoning, Knowledge, and Comprehension) as a function of model size (in billions of parameters). Each chart shows the performance percentage on the y-axis and the model size on the x-axis. The charts indicate that LLaMA 2 consistently outperforms Mistral across all metrics and model sizes. The charts also highlight the effective sizes of LLaMA 2 models that match the performance of Mistral models, showing significant improvements in performance with larger model sizes for LLaMA 2.\n",
+      "page 6\n",
+      "Image type: 3D Render\n",
+      "Short description: A 3D rendering of an orange letter 'M' with a small, cartoonish figure wearing boxing gloves standing on top of it.\n"
+     ]
+    }
+   ],
    "source": [
     "for page in bb1Response.json()[\"pages\"]:\n",
     "    for image in page[\"images\"]:\n",
@@ -215,7 +283,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 11,
    "id": "46ef11c7",
    "metadata": {},
    "outputs": [],
@@ -290,7 +358,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 12,
    "id": "36a2eba8",
    "metadata": {},
    "outputs": [],
@@ -304,10 +372,22 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 13,
    "id": "775b1eec",
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Language: English\n",
+      "Summary: The document introduces Mistral 7B, a 7-billion-parameter language model designed for superior performance and efficiency. It outperforms other models like Llama 2 and Llama 1 in various benchmarks, including reasoning, mathematics, and code generation. The model uses grouped-query attention (GQA) and sliding window attention (SWA) for faster inference and reduced cost. Mistral 7B is released under the Apache 2.0 license and includes a fine-tuned instruction-following model, Mistral 7B-Instruct, which surpasses Llama 2 13B in human and automated benchmarks. The document also discusses the architectural details, including the use of sliding window attention and a rolling buffer cache to handle sequences of arbitrary length efficiently. Additionally, it presents results comparing Mistral 7B with Llama models across various tasks, showing Mistral 7B's superior performance. The document concludes with a discussion on instruction fine-tuning and the model's efficiency in terms of size and performance.\n",
+      "Chapter Titles: Abstract, Introduction, Architectural details, Results, Instruction Finetuning\n",
+      "URLs: https://github.com/mistralai/mistral-src, https://mistral.ai/news/announcing-mistral-7b/\n",
+      "Translated Summary: Le document présente Mistral 7B, un modèle de langage de 7 milliards de paramètres conçu pour des performances et une efficacité supérieures. Il surpasse d'autres modèles comme Llama 2 et Llama 1 dans divers benchmarks, y compris le raisonnement, les mathématiques et la génération de code. Le modèle utilise l'attention par requête groupée (GQA) et l'attention par fenêtre coulissante (SWA) pour une inférence plus rapide et un coût réduit. Mistral 7B est publié sous la licence Apache 2.0 et comprend un modèle d'instruction affiné, Mistral 7B-Instruct, qui surpasse Llama 2 13B dans les benchmarks humains et automatisés. Le document discute également des détails architecturaux, y compris l'utilisation de l'attention par fenêtre coulissante et d'un cache tampon roulant pour traiter efficacement des séquences de longueur arbitraire. De plus, il présente des résultats comparant Mistral 7B avec les modèles Llama dans diverses tâches, montrant la performance supérieure de Mistral 7B. Le document se conclut par une discussion sur l'affinage des instructions et l'efficacité du modèle en termes de taille et de performance.\n"
+     ]
+    }
+   ],
    "source": [
     "docAnnotation = json.loads(comboResponse.json()[\"document_annotation\"])\n",
     "print(\"Language: \" + docAnnotation[\"properties\"][\"language\"])\n",
@@ -340,11 +420,17 @@
    "source": [
     "Being able to extract text and images from documents is powerful, when you combine this with structured extraction and enrichment it grants you the ability to create powerful document processing and intelligence capabilities. We hope you found this notebook useful, and look forward to seeing what you build with Mistral Document AI."
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35186867",
+   "metadata": {},
+   "source": []
   }
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": ".venv (3.13.5)",
+   "display_name": ".venv",
    "language": "python",
    "name": "python3"
   },
@@ -357,7 +443,8 @@
    "mimetype": "text/x-python",
    "name": "python",
    "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3"
+   "pygments_lexer": "ipython3",
+   "version": "3.12.10"
   }
  },
  "nbformat": 4,