Skip to content

Conversation

jobandhanjal
Copy link

Summary

This pull request introduces two new tools, HuggingFaceModelSearchTool and HuggingFaceDatasetSearchTool, to enable LangChain agents to search the Hugging Face Hub for models and datasets directly.

Motivation

As proposed in issue #291 , this contribution fills a key gap by allowing agents to directly query the central repository for the AI community. This is a highly relevant and in-demand feature for developers building agents that perform AI/ML research, development, or discovery tasks.

Key Changes

  • Two New Tools: HuggingFaceModelSearchTool and HuggingFaceDatasetSearchTool are provided for clear, specific functionality that is easy for an LLM to choose.
  • Official Library: The implementation is built on the official huggingface-hub Python library, ensuring stability and long-term maintainability.
  • Structured Output: Search results are formatted into a clean, easy-to-parse string, including the asset's ID, author, and tags, making the output highly useful for LLMs.
  • Thoroughly Tested: Includes a full suite of unit tests that mock the external API calls. This ensures the tool's logic is tested reliably and quickly without network dependencies.
  • Async Support: Implements the _arun method for asynchronous use cases, following standard LangChain patterns.

Comment on lines 21 to 22
def _run(self, query: str) -> str:
"""Use the tool."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _run method docstring violates the 'Use Google-Style Docstrings (with Args section)' rule. The current docstring is insufficient and missing the required Args section. Replace with a proper Google-style docstring that includes an Args section describing the query parameter:

"""Use the tool to search Hugging Face Hub.

Args:
    query: The search query string to find models or datasets.
"""
Suggested change
def _run(self, query: str) -> str:
"""Use the tool."""
def _run(self, query: str) -> str:
"""Use the tool to search Hugging Face Hub.
Args:
query: The search query string to find models or datasets.
"""

Spotted by Diamond (based on custom rule: Code quality)

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Comment on lines 47 to 48
except Exception as e:
return f"An error occurred: {e}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The broad except Exception as e: handler catches all exceptions and returns a generic error message, which obscures debugging information and masks different failure types. Replace with specific exception handlers for ImportError, ConnectionError, or API-specific exceptions to provide more meaningful error handling and debugging context.

Suggested change
except Exception as e:
return f"An error occurred: {e}"
except ImportError as e:
return f"Error: Missing required dependencies. {str(e)}"
except ConnectionError as e:
return f"Error: Failed to connect to HuggingFace API. {str(e)}"
except ValueError as e:
return f"Error: Invalid input to HuggingFace API. {str(e)}"
except Exception as e:
return f"Unexpected error when calling HuggingFace API: {str(e)}"

Spotted by Diamond (based on custom rule: Code quality)

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

@@ -0,0 +1,81 @@
from __future__ import annotations
from langchain.tools import BaseTool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import statement should be updated to use langchain_community.tools instead of langchain.tools to maintain consistency with the package structure. This ensures proper dependency management and avoids potential import conflicts:

from langchain_community.tools import BaseTool
Suggested change
from langchain.tools import BaseTool
from langchain_community.tools import BaseTool

Spotted by Diamond

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Comment on lines 11 to 25
def __init__(self, **kwargs):
super().__init__(**kwargs)
try:
from huggingface_hub import HfApi
self.api_client = HfApi()
except ImportError as e:
raise ImportError(
"huggingface_hub is not installed. Please install it with `pip install huggingface-hub`"
) from e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The __init__ method requires a Google-style docstring with an Args section. This public method accepts **kwargs but lacks proper documentation explaining its purpose and parameters according to the 'Use Google-Style Docstrings (with Args section)' rule.

Suggested change
def __init__(self, **kwargs):
super().__init__(**kwargs)
try:
from huggingface_hub import HfApi
self.api_client = HfApi()
except ImportError as e:
raise ImportError(
"huggingface_hub is not installed. Please install it with `pip install huggingface-hub`"
) from e
def __init__(self, **kwargs):
"""Initialize the Hugging Face tool.
Args:
**kwargs: Keyword arguments to be passed to the parent class constructor.
"""
super().__init__(**kwargs)
try:
from huggingface_hub import HfApi
self.api_client = HfApi()
except ImportError as e:
raise ImportError(
"huggingface_hub is not installed. Please install it with `pip install huggingface-hub`"
) from e

Spotted by Diamond (based on custom rule: Code quality)

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Comment on lines 58 to 73
async def _arun(self, query: str) -> str:
"""Use the tool asynchronously."""
import asyncio
return await asyncio.get_running_loop().run_in_executor(
None, self._run, query
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _arun method docstring is incomplete and violates Google-Style Docstring requirements. The docstring lacks an Args section to document the query parameter. According to Google-Style Docstring format, the docstring should include:

async def _arun(self, query: str) -> str:
    """Use the tool asynchronously.
    
    Args:
        query: The search query string to find models or datasets.
    """
Suggested change
async def _arun(self, query: str) -> str:
"""Use the tool asynchronously."""
import asyncio
return await asyncio.get_running_loop().run_in_executor(
None, self._run, query
)
async def _arun(self, query: str) -> str:
"""Use the tool asynchronously.
Args:
query: The search query string to find models or datasets.
"""
import asyncio
return await asyncio.get_running_loop().run_in_executor(
None, self._run, query
)

Spotted by Diamond (based on custom rule: Code quality)

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

class HuggingFaceHubTool(BaseTool):
"""Base tool for interacting with the Hugging Face Hub."""

api_client: "HfApi"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type annotation for api_client is using a string literal "HfApi" instead of a proper type hint, which will cause type checking issues. To fix this, either:

  1. Use a proper forward reference:
from __future__ import annotations
  1. Or import and use Any:
from typing import Any
# Then use:
api_client: Any
  1. Or import the actual type (preferred if possible):
from huggingface_hub import HfApi
# Then use:
api_client: HfApi

The first option is already in place, so the annotation can be kept as is, but using quotes is generally not recommended for type hints.

Suggested change
api_client: "HfApi"
api_client: HfApi

Spotted by Diamond

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant