Skip to content

Conversation

AshishViradiya153
Copy link
Contributor

@AshishViradiya153 AshishViradiya153 commented Sep 4, 2025

Summary by CodeRabbit

  • New Features

    • Automatic, background document indexing for datarooms after uploads, views, link/folder actions (non-blocking).
    • Enhanced AI chat experience: toggle, scoped queries with draggable pills, inline citations, reasoning panel, conversation view, loader, carousel, resizable panels, improved messages, and code blocks with copy.
    • Streaming chat responses for faster feedback.
  • Improvements

    • More accurate results via smarter chunking/embeddings and integrated vector search.
    • Automatic vector cleanup when deleting documents, folders, or datarooms.
  • Database

    • Added indexing status, progress, and token counters for documents and datarooms.
  • Chores

    • Upgraded AI SDK and added supporting UI/AI libraries.

Copy link
Contributor

coderabbitai bot commented Sep 4, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Adds a complete RAG indexing system (queue, task, processing, embeddings, vector store) and integrates background triggers across multiple API routes. Refactors chat message types to UIMessage and introduces many new UI components (chat, code, citations, loaders). Expands Prisma schema for RAG tracking and updates build/config and dependencies.

Changes

Cohort / File(s) Summary
RAG Core Processing & Types
lib/rag/document-processor.ts, lib/rag/enhanced-chunking-strategy.ts, lib/rag/embedding-generator.ts, lib/rag/ai-sdk-integration.ts, lib/rag/file-format-detector.ts, lib/rag/utils/*, lib/rag/errors/*, lib/rag/cache/index.ts, lib/rag/utils/lruCache.ts
New document processing via Docling, markdown-aware chunking, embedding generation with batching/caching, AI SDK embedding helper, format configs, token/hash/text utils, error system, cache exports and TTL LRU caches.
RAG Infrastructure & Vector Store
lib/rag/queue-manager.ts, lib/rag/indexing-trigger.ts, lib/rag/vector-manager.ts, lib/rag/qdrant-client.ts, lib/trigger/rag-indexing.ts
Redis-backed queue/lock manager, trigger coordination, singleton vector manager, Qdrant client wrapper, and a Trigger.dev task orchestrating full indexing pipeline and status updates.
API Integration — Background Indexing
app/(ee)/api/links/[id]/upload/route.ts, app/api/views-dataroom/route.ts, pages/api/links/*, pages/api/webhooks/services/[...path]/index.ts, pages/api/teams/*/datarooms/*/*, pages/api/teams/*/datarooms/*, pages/api/teams/*/documents/*, pages/api/teams/*/index.ts
Adds waitUntil-based calls to triggerDataroomIndexing across create/update/view/duplicate/folder operations; adds vector cleanup on deletions via vectorManager.
API — Assistants Chat Flow
pages/api/assistants/chat.ts
Replaces experimental_AssistantResponse with streamText; adds run polling and message aggregation; returns text stream response.
Chat Type Refactor
components/chat/chat-*.tsx, lib/utils.ts
Switches Message → UIMessage; adapts status handling and text extraction from parts; updates convertThreadMessagesToMessages to return UIMessage with parts.
Chat Enhancements (Drag/Scope/Input)
components/chat/draggable-*.tsx, components/chat/droppable-chat-input.tsx, components/chat/enhanced-chat-input.tsx, components/chat/scope-*.tsx, components/chat/chat-toggle-button.tsx
Adds draggable pills/cards, droppable input wrapper, enhanced chat input with scoping to docs/folders and pills UI, and a toggle button.
General UI Components
components/actions.tsx, components/branch.tsx, components/code-block.tsx, components/conversation.tsx, components/loader.tsx, components/message.tsx, components/reasoning.tsx, components/response.tsx, components/image.tsx, components/inline-citation.tsx, components/sources.tsx
New reusable UI: action bars, branching, code blocks with copy, conversation wrappers, loader, message shells, reasoning collapsible, markdown response, image, inline citations, and sources collapsible.
UI Primitives
components/ui/carousel.tsx, components/ui/hover-card.tsx, components/ui/resizable.tsx, components/ui/conversation.tsx
Adds Embla carousel, Radix hover-card wrapper, resizable panel group/handle, and conversation utilities.
Prisma Schema & Migration
prisma/migrations/*, prisma/schema/schema.prisma, prisma/schema/rag.prisma, prisma/schema/dataroom.prisma
Adds ParsingStatus enum; extends Document with RAG fields; introduces DataroomRAGSettings model and relation; adds indices and migration SQL.
Next/Build Config & Deps
package.json, next.config.mjs, trigger.config.ts
Upgrades to AI SDK v5; adds OpenAI SDKs, LangChain, Qdrant, UI libs, tokenizers; enables WebAssembly/layers in webpack; marks js-tiktoken externals for trigger build.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant API as API Route (e.g., link/doc POST)
  participant Trigger as triggerDataroomIndexing
  participant Queue as RAGQueueManager
  participant Task as Trigger.dev ragIndexingTask
  participant Proc as DocumentProcessor
  participant Chunk as EnhancedChunker
  participant Embed as EmbeddingGenerator
  participant Vec as QdrantVectorStore
  participant DB as Prisma

  User->>API: Create/Update/View action
  API->>Trigger: waitUntil(triggerDataroomIndexing(dataroomId, teamId, userId))
  Trigger->>Queue: addToQueue(dataroomId, request)
  Trigger->>Queue: tryStartWorker(dataroomId)
  alt Worker started
    Trigger->>Task: tasks.trigger("rag-indexing", payload)
  else Already running
    Trigger-->>API: "queued"
  end
  Note over Task: Acquire lock, loop until queue empty
  Task->>Queue: getNextFromQueue(dataroomId)
  Task->>DB: Read RAG status, mark IN_PROGRESS
  Task->>Proc: processDocuments(docs, dataroomId, teamId)
  Proc->>Chunk: createEnhancedChunks(markdown,…)
  Chunk-->>Proc: EnhancedChunk[]
  Proc-->>Task: Processing results
  Task->>Embed: generateEmbeddings(chunks)
  Embed-->>Task: embeddings, token usage
  Task->>Vec: upsertPoints(collection, points)
  Vec-->>Task: success
  Task->>DB: Mark docs COMPLETED, update RAG stats
  Task->>Queue: hasPendingRequests? loop/finish
  Task->>Queue: releaseLock(dataroomId)
  Task-->>API: Background completion (no effect on response)
Loading
sequenceDiagram
  autonumber
  actor Admin
  participant API as Delete APIs
  participant VM as vectorManager
  participant Q as QdrantVectorStore
  participant DB as Prisma

  Admin->>API: Delete document/folder/dataroom
  API->>VM: deleteDocumentVectors / deleteMultipleDocumentVectors / deleteDataroomVectors
  VM->>Q: deletePointsByFilter / deleteCollection
  Q-->>VM: result
  VM-->>API: success/failure
  API->>DB: Delete records
  API-->>Admin: 204/200 response
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120+ minutes

✨ Finishing touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

vercel bot commented Sep 4, 2025

@AshishViradiya153 is attempting to deploy a commit to the mftsio Team on Vercel.

A member of the Team first needs to authorize it.

Copy link

socket-security bot commented Sep 4, 2025

@AshishViradiya153 AshishViradiya153 marked this pull request as draft September 4, 2025 20:03
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 57

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (10)
pages/api/assistants/chat.ts (2)

74-89: Apply plan-specific rate limits (paid/pro paths currently unused).

ratelimit.paid and ratelimit.pro are defined but never used. Route all authenticated users through plan-aware buckets.

-  if (input.userId && input.plan !== "pro") {
-    const { success, limit, reset, remaining } = await ratelimit.free.limit(
-      `ratelimit_${input.userId}`,
-    );
+  if (input.userId) {
+    const bucket =
+      input.plan === "pro"
+        ? ratelimit.pro
+        : input.plan === "paid"
+        ? ratelimit.paid
+        : ratelimit.free;
+    const { success, limit, reset, remaining } = await bucket.limit(
+      `ratelimit_${input.userId}`,
+    );

100-104: Guard against missing Assistant IDs.

Non-null assertion can crash at runtime. Fail fast with a clear 500.

-  const assistantId = input.isPublic
-    ? (process.env.OAI_PUBLIC_ASSISTANT_ID as string)
-    : (process.env.OAI_ASSISTANT_ID as string);
+  const assistantId = input.isPublic
+    ? process.env.OAI_PUBLIC_ASSISTANT_ID
+    : process.env.OAI_ASSISTANT_ID;
+  if (!assistantId) {
+    return new Response("Assistant is not configured.", { status: 500 });
+  }
lib/utils.ts (5)

352-361: Fix filter: metadata.initialMessage is misspelled and type-unsafe

Currently checks intitialMessage === 'True' (typo) and only string. Handle both spellings and boolean.

-  const filteredMessages = threadMessages.filter((threadMessage) => {
+  const filteredMessages = threadMessages.filter((threadMessage) => {
     if (
       typeof threadMessage.metadata === "object" &&
       threadMessage.metadata !== null
     ) {
-      // Safely typecast metadata to an object with the expected structure
-      const metadata = threadMessage.metadata as { intitialMessage?: string };
-      return metadata.intitialMessage !== "True";
+      const md = threadMessage.metadata as {
+        initialMessage?: string | boolean;
+        intitialMessage?: string | boolean; // legacy misspelling
+      };
+      const flag = md.initialMessage ?? md.intitialMessage;
+      return String(flag).toLowerCase() !== "true";
     }
     return true; // Include messages where metadata is not an object or is null
   });

373-386: Aggregate all text content; build a single text part

Only the first text item is used; others are dropped. Join all text segments.

-  // Assuming content is an array and you want to convert it into a string or JSX element
-  const messageContent = content.map((item) => {
-    if (item.type === "text") {
-      return item.text.value;
-    } else {
-      return "";
-    }
-  });
+  const text = content
+    .filter((item) => item.type === "text" && (item as any).text?.value)
+    .map((item: any) => item.text.value)
+    .join("\n");
 
   return {
     id,
-    role: role === "assistant" ? "assistant" : "user",
-    parts: [{ type: "text", text: messageContent[0] || "" }],
+    role: role === "assistant" ? "assistant" : "user",
+    parts: [{ type: "text", text }],
   };

546-567: AES key derivation is incorrect; derive a 32-byte key (Buffer), not a base64 string substring

Using digest('base64').substring(0, 32) yields 24 bytes of UTF-8 text, not 32 raw bytes. This weakens encryption and can fail on some runtimes.

 export async function generateEncrpytedPassword(
   password: string,
 ): Promise<string> {
   // If the password is empty, return an empty string
   if (!password) return "";
   // If the password is already encrypted, return it
   const textParts: string[] = password.split(":");
   if (textParts.length === 2) {
     return password;
   }
   // Otherwise, encrypt the password
-  const encryptedKey: string = crypto
-    .createHash("sha256")
-    .update(String(process.env.NEXT_PRIVATE_DOCUMENT_PASSWORD_KEY))
-    .digest("base64")
-    .substring(0, 32);
-  const IV: Buffer = crypto.randomBytes(16);
-  const cipher = crypto.createCipheriv("aes-256-ctr", encryptedKey, IV);
+  const secret = String(process.env.NEXT_PRIVATE_DOCUMENT_PASSWORD_KEY || "");
+  if (!secret) throw new Error("Missing NEXT_PRIVATE_DOCUMENT_PASSWORD_KEY");
+  const key: Buffer = crypto.createHash("sha256").update(secret, "utf8").digest(); // 32 bytes
+  const IV: Buffer = crypto.randomBytes(16);
+  const cipher = crypto.createCipheriv("aes-256-ctr", key, IV);
   let encryptedText: string = cipher.update(password, "utf8", "hex");
   encryptedText += cipher.final("hex");
   return IV.toString("hex") + ":" + encryptedText;
 }

569-586: Decryption must use the same 32-byte derived key type as encryption

Mirror the fix above.

 export function decryptEncrpytedPassword(password: string): string {
   if (!password) return "";
-  const encryptedKey: string = crypto
-    .createHash("sha256")
-    .update(String(process.env.NEXT_PRIVATE_DOCUMENT_PASSWORD_KEY))
-    .digest("base64")
-    .substring(0, 32);
+  const secret = String(process.env.NEXT_PRIVATE_DOCUMENT_PASSWORD_KEY || "");
+  if (!secret) return "";
+  const key: Buffer = crypto.createHash("sha256").update(secret, "utf8").digest();
   const textParts: string[] = password.split(":");
   if (!textParts || textParts.length !== 2) {
     return password;
   }
   const IV: Buffer = Buffer.from(textParts[0], "hex");
   const encryptedText: string = textParts[1];
-  const decipher = crypto.createDecipheriv("aes-256-ctr", encryptedKey, IV);
+  const decipher = crypto.createDecipheriv("aes-256-ctr", key, IV);
   let decrypted: string = decipher.update(encryptedText, "hex", "utf8");
   decrypted += decipher.final("utf8");
   return decrypted;
 }

350-361: Replace all legacy intitialMessage occurrences with initialMessage

Update the setter in pages/api/assistants/threads/index.ts (line 53) and the filter in lib/utils.ts (lines 350–361) to use the correctly spelled field to prevent metadata mismatches.

pages/api/teams/[teamId]/datarooms/[id]/index.ts (1)

204-206: Allow header and comment are outdated; include DELETE.

Clients will get incorrect Allow metadata.

-  } else {
-    // We only allow GET, and PATCH requests
-    res.setHeader("Allow", ["GET", "PATCH"]);
+  } else {
+    // We only allow GET, PATCH and DELETE requests
+    res.setHeader("Allow", ["GET", "PATCH", "DELETE"]);
pages/api/teams/[teamId]/datarooms/[id]/duplicate.ts (2)

210-232: Guard brand creation to avoid creating empty/invalid brand rows.

If dataroomContents.brand is null/undefined this will still attempt a nested create, which can violate non-null constraints or create empty rows.

Apply this diff:

           folders: {
             create: [],
           },
-          brand: {
-            create: {
-              banner: dataroomContents.brand?.banner,
-              logo: dataroomContents.brand?.logo,
-              accentColor: dataroomContents.brand?.accentColor,
-              brandColor: dataroomContents.brand?.brandColor,
-            },
-          },
+          ...(dataroomContents.brand && {
+            brand: {
+              create: {
+                banner: dataroomContents.brand.banner ?? null,
+                logo: dataroomContents.brand.logo ?? null,
+                accentColor: dataroomContents.brand.accentColor ?? null,
+                brandColor: dataroomContents.brand.brandColor ?? null,
+              },
+            },
+          }),

111-121: Surface failures from Promise.allSettled to avoid silent partial duplicates.

Silent rejections can leave the dataroom half‑copied while returning 201. At minimum, aggregate and log failures; or fail fast if that’s the desired contract.

Apply these diffs:

-  await Promise.allSettled(
-    folder.documents.map((doc) =>
-      prisma.dataroomDocument.create({
-        data: {
-          documentId: doc.documentId,
-          dataroomId: dataroomId,
-          folderId: newFolder.id,
-        },
-      }),
-    ),
-  );
+  {
+    const results = await Promise.allSettled(
+      folder.documents.map((doc) =>
+        prisma.dataroomDocument.create({
+          data: {
+            documentId: doc.documentId,
+            dataroomId: dataroomId,
+            folderId: newFolder.id,
+          },
+        }),
+      ),
+    );
+    const failures = results.filter((r) => r.status === "rejected");
+    if (failures.length) {
+      console.error(
+        `Failed to duplicate ${failures.length} document(s) into folder ${newFolder.id}`,
+      );
+    }
+  }
-  await Promise.allSettled(
-    folder.childFolders.map((childFolder) =>
-      duplicateFolders(dataroomId, childFolder, newFolder.id),
-    ),
-  );
+  {
+    const results = await Promise.allSettled(
+      folder.childFolders.map((childFolder) =>
+        duplicateFolders(dataroomId, childFolder, newFolder.id),
+      ),
+    );
+    const failures = results.filter((r) => r.status === "rejected");
+    if (failures.length) {
+      console.error(
+        `Failed to duplicate ${failures.length} child folder(s) under ${newFolder.id}`,
+      );
+    }
+  }

Also applies to: 124-128

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review continued from previous batch...

Comment on lines +64 to +65
const [hasAutoClosedRef, setHasAutoClosedRef] = useState(false);
const [startTime, setStartTime] = useState<number | null>(null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

State variable name doesn't match its boolean type

The state variable hasAutoClosedRef is declared as a boolean state but the "Ref" suffix suggests it should be a ref. This naming inconsistency is confusing.

-const [hasAutoClosedRef, setHasAutoClosedRef] = useState(false);
+const [hasAutoClosed, setHasAutoClosed] = useState(false);

And update all usages:

-if (defaultOpen && !isStreaming && isOpen && !hasAutoClosedRef) {
+if (defaultOpen && !isStreaming && isOpen && !hasAutoClosed) {
    // Add a small delay before closing to allow user to see the content
    const timer = setTimeout(() => {
        setIsOpen(false);
-        setHasAutoClosedRef(true);
+        setHasAutoClosed(true);
    }, AUTO_CLOSE_DELAY);
🤖 Prompt for AI Agents
In components/reasoning.tsx around lines 64-65, the boolean state variable is
named hasAutoClosedRef which misleadingly implies a ref; rename it to
hasAutoClosed (and the setter to setHasAutoClosed) and update every usage and
import/exports accordingly throughout the file (and any other files referencing
it) so the identifier reflects a boolean state rather than a ref.

Comment on lines 19 to 32
export interface DocumentChunk {
id: string;
content: string;
metadata: {
chunkIndex: number;
documentId: string;
documentName: string;
dataroomId: string;
teamId: string;
contentType: string;
pageNumbers?: number[];
};
chunkHash: string;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Type mismatch with downstream usage (pageRanges/tokenCount missing).

DocumentChunk.metadata declares pageNumbers, but code emits pageRanges and later reads tokenCount/sectionHeader in rag-indexing. This will cause TS errors or undefineds.

Update the interface to reflect actual payload:

 export interface DocumentChunk {
   id: string;
   content: string;
   metadata: {
     chunkIndex: number;
     documentId: string;
     documentName: string;
     dataroomId: string;
     teamId: string;
     contentType: string;
-    pageNumbers?: number[];
+    pageRanges?: string[];
+    tokenCount?: number;
+    sectionHeader?: string;
   };
   chunkHash: string;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export interface DocumentChunk {
id: string;
content: string;
metadata: {
chunkIndex: number;
documentId: string;
documentName: string;
dataroomId: string;
teamId: string;
contentType: string;
pageNumbers?: number[];
};
chunkHash: string;
}
export interface DocumentChunk {
id: string;
content: string;
metadata: {
chunkIndex: number;
documentId: string;
documentName: string;
dataroomId: string;
teamId: string;
contentType: string;
pageRanges?: string[];
tokenCount?: number;
sectionHeader?: string;
};
chunkHash: string;
}
🤖 Prompt for AI Agents
In lib/rag/document-processor.ts around lines 19 to 32, the
DocumentChunk.metadata currently only declares pageNumbers but downstream code
emits/reads pageRanges and tokenCount/sectionHeader, causing type mismatches and
possible undefined values; update the interface to include pageRanges?:
number[]; tokenCount?: number; sectionHeader?: string; (you may keep
pageNumbers?: number[] for compatibility) and ensure their types match how
rag-indexing expects them so consumers no longer get TS errors or undefined
fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant