Skip to content

carrolldominic/nbiDatabase

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

NBI Biotech Database Processor

This Node.js script processes biotech company data from an Excel file using the Groq API to automatically fill in missing company information including lead assets, clinical phases, indications, therapeutic areas, modalities, and targets.

Features

  • Reads biotech company names from Excel file (column A)
  • Uses Groq API to research and fill company information
  • Adds confidence scoring (1-100) for data reliability
  • Includes rationale/source information for transparency
  • Exports refined data to new Excel file
  • Rate limiting to respect API limits
  • Error handling and logging

Setup

  1. Install Dependencies

    npm install
  2. Configure API Key

    GROQ_API_KEY=gsk_your_actual_api_key_here
    
  3. Prepare Input File

    • Ensure your NBI.xlsx file is in the project directory
    • Column A should contain company names
    • First row should be headers: Company, Lead Asset, Phase, Indication, Therapeutic Area, Modality, Target

Usage

Run the script:

npm start

Or directly with Node:

node main.js

Output

The script will create NBI_Refined.xlsx with:

  • All original data preserved
  • Missing information filled from Groq API research
  • Additional columns:
    • Rationale/Source: Explanation and source information
    • Confidence: Score from 0-100 indicating data reliability

Column Descriptions

  • Company: Biotech company name (from input)
  • Lead Asset: Primary drug/asset in development
  • Phase: Clinical development phase (Pre-clinical, Phase I/II/III, Approved)
  • Indication: Primary disease/condition being treated
  • Therapeutic Area: Medical specialty (Oncology, Neurology, etc.)
  • Modality: Type of therapy (Small molecule, Antibody, Gene therapy, etc.)
  • Target: Biological target or mechanism of action
  • Rationale/Source: Explanation and source information
  • Confidence: Reliability score (0-100)

Rate Limiting

The script includes a 1-second delay between API calls to respect Groq's rate limits. Processing time will depend on the number of companies in your input file.

Error Handling

  • Invalid company names are marked with confidence 0
  • API errors are logged and marked in the output
  • Processing continues even if individual companies fail

Troubleshooting

  1. API Key Error: Ensure your GROQ_API_KEY is set correctly in .env
  2. File Not Found: Verify NBI.xlsx exists in the project directory
  3. Rate Limiting: If you hit rate limits, the script will log errors but continue
  4. Excel Format: Ensure your input file has proper headers in row 1

Dependencies

  • xlsx: Excel file reading/writing
  • dotenv: Environment variable management
  • https: Built-in Node.js module for API calls

About

LLM to find drug asset info for NBI constituents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published