This Node.js script processes biotech company data from an Excel file using the Groq API to automatically fill in missing company information including lead assets, clinical phases, indications, therapeutic areas, modalities, and targets.
- Reads biotech company names from Excel file (column A)
- Uses Groq API to research and fill company information
- Adds confidence scoring (1-100) for data reliability
- Includes rationale/source information for transparency
- Exports refined data to new Excel file
- Rate limiting to respect API limits
- Error handling and logging
-
Install Dependencies
npm install
-
Configure API Key
- Get your Groq API key from https://console.groq.com/keys
- Edit the
.env
file and replaceyour_groq_api_key_here
with your actual API key:
GROQ_API_KEY=gsk_your_actual_api_key_here
-
Prepare Input File
- Ensure your
NBI.xlsx
file is in the project directory - Column A should contain company names
- First row should be headers: Company, Lead Asset, Phase, Indication, Therapeutic Area, Modality, Target
- Ensure your
Run the script:
npm start
Or directly with Node:
node main.js
The script will create NBI_Refined.xlsx
with:
- All original data preserved
- Missing information filled from Groq API research
- Additional columns:
- Rationale/Source: Explanation and source information
- Confidence: Score from 0-100 indicating data reliability
- Company: Biotech company name (from input)
- Lead Asset: Primary drug/asset in development
- Phase: Clinical development phase (Pre-clinical, Phase I/II/III, Approved)
- Indication: Primary disease/condition being treated
- Therapeutic Area: Medical specialty (Oncology, Neurology, etc.)
- Modality: Type of therapy (Small molecule, Antibody, Gene therapy, etc.)
- Target: Biological target or mechanism of action
- Rationale/Source: Explanation and source information
- Confidence: Reliability score (0-100)
The script includes a 1-second delay between API calls to respect Groq's rate limits. Processing time will depend on the number of companies in your input file.
- Invalid company names are marked with confidence 0
- API errors are logged and marked in the output
- Processing continues even if individual companies fail
- API Key Error: Ensure your GROQ_API_KEY is set correctly in
.env
- File Not Found: Verify
NBI.xlsx
exists in the project directory - Rate Limiting: If you hit rate limits, the script will log errors but continue
- Excel Format: Ensure your input file has proper headers in row 1
xlsx
: Excel file reading/writingdotenv
: Environment variable managementhttps
: Built-in Node.js module for API calls