Skip to content

Conversation

@GuyKh
Copy link
Owner

@GuyKh GuyKh commented Jun 6, 2025

PR Type

enhancement


Description

  • Switch authorization HTML parsing to BeautifulSoup4 for robustness

  • Add BeautifulSoup4 as a new project dependency

  • Improve error handling for non-HTML and missing code input

  • Set default value for code in ResponseDescriptor


Changes walkthrough 📝

Relevant files
Enhancement
login.py
Switch to BeautifulSoup4 for HTML parsing and add error checks

iec_api/login.py

  • Replace regex-based HTML parsing with BeautifulSoup4 for extracting
    code
  • Add validation to ensure response is HTML before parsing
  • Raise explicit errors if HTML or code input is missing
  • Import BeautifulSoup4 for HTML parsing
  • +13/-4   
    response_descriptor.py
    Set default value for optional code field                               

    iec_api/models/response_descriptor.py

    • Set default value of code to None in ResponseDescriptor
    +1/-1     
    Dependencies
    pyproject.toml
    Add BeautifulSoup4 to project dependencies                             

    pyproject.toml

    • Add BeautifulSoup4 as a required dependency
    +1/-0     

    Need help?
  • Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
  • Check out the documentation for more information.
  • @qodo-merge-pro
    Copy link

    qodo-merge-pro bot commented Jun 6, 2025

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Error Handling

    The new code raises ValueError exceptions, but it might be better to raise domain-specific exceptions (like IECLoginError) for consistency with the rest of the codebase.

    if not authorize_response.strip().startswith("<!DOCTYPE html>") and not authorize_response.strip().startswith("<html"):
        raise ValueError("Response is not an HTML document")
    
    # B) Use BeautifulSoup to extract the code value
    soup = BeautifulSoup(authorize_response, "html.parser")
    code_input = soup.find("input", {"name": "code"})
    if not code_input:
        raise ValueError("Code input not found in HTML")
    HTML Parser Selection

    The code uses the default HTML parser without specifying a parser type explicitly. Consider specifying a parser like 'html.parser' or 'lxml' for consistent behavior across environments.

    soup = BeautifulSoup(authorize_response, "html.parser")
    code_input = soup.find("input", {"name": "code"})

    @qodo-merge-pro
    Copy link

    qodo-merge-pro bot commented Jun 6, 2025

    CI Feedback 🧐

    (Feedback updated until commit 2c77698)

    A test triggered by this PR failed. Here is an AI-generated analysis of the failure:

    Action: Run Tests and Lint

    Failed stage: Running Lint [❌]

    Failed test name: docker/lint

    Failure summary:

    The action failed during the linting process. The linter found an undefined name error in
    iec_api/login.py at line 25. The code is trying to use random but it's not imported or defined. The
    specific error is: F821 Undefined name random.

    Relevant error logs:
    1:  ##[group]Runner Image Provisioner
    2:  Hosted Compute Agent
    ...
    
    375:  �[36;1mmake docker/lint�[0m
    376:  shell: /usr/bin/bash -e {0}
    377:  ##[endgroup]
    378:  docker compose run "iec-api" poetry run ruff check .
    379:  time="2025-06-06T06:56:27Z" level=warning msg="/home/runner/work/py-iec-api/py-iec-api/docker-compose.yml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion"
    380:  Network py-iec-api_default  Creating
    381:  Network py-iec-api_default  Created
    382:  iec_api/login.py:25:17: F821 Undefined name `random`
    383:  |
    384:  23 | APP_REDIRECT_URI = "com.iecrn:/"
    385:  24 | code_verifier, code_challenge = pkce.generate_pkce_pair()
    386:  25 | STATE = "".join(random.choice(string.digits + string.ascii_letters) for _ in range(12))
    387:  |                 ^^^^^^ F821
    388:  26 | IEC_OKTA_BASE_URL = "https://iec-ext.okta.com"
    389:  |
    390:  Found 1 error.
    391:  make: *** [Makefile:41: docker/lint] Error 1
    392:  ##[error]Process completed with exit code 2.
    393:  Post job cleanup.
    

    @qodo-merge-pro
    Copy link

    qodo-merge-pro bot commented Jun 6, 2025

    PR Code Suggestions ✨

    Latest suggestions up to 2c77698

    CategorySuggestion                                                                                                                                    Impact
    Incremental [*]
    Avoid consuming response content

    The params.response.text() is a coroutine that consumes the response body, which
    can only be read once. Calling it here will consume the response, making it
    unavailable for subsequent processing. This could cause errors when the actual
    handler tries to read the response content later.

    iec_api/commons.py [245-248]

     logger.debug(
    -    f"HTTP {params.method} call from {params.url} - Response <{params.response.status}>: \
    -    {await params.response.text()}"
    +    f"HTTP {params.method} call from {params.url} - Response <{params.response.status}>"
     )
    • Apply / Chat
    Suggestion importance[1-10]: 8

    __

    Why: This is a critical issue as calling await params.response.text() in a debug handler consumes the response body, making it unavailable for subsequent processing. This could break application functionality.

    Medium
    Maintain field consistency

    The code field is being changed from a required field with no default value to
    an optional field with a default value of None. This might cause issues if
    existing code relies on this field always having a value. Consider keeping it
    required or adding validation logic if a non-None value is expected in certain
    contexts.

    iec_api/models/response_descriptor.py [14]

    -code: Optional[str] = None
    +code: Optional[str] = field(default=None)
    • Apply / Chat
    Suggestion importance[1-10]: 5

    __

    Why: Valid concern about changing from required to optional field, but the improved_code using field(default=None) is functionally equivalent to = None and doesn't address the underlying compatibility concern.

    Low
    Fix typo in error message

    There's a typo in the error message "Autorize" instead of "Authorize". Fix the
    spelling to ensure consistent and professional error messages.

    iec_api/login.py [50-54]

     # A) Validate that the response is indeed an HTML
     if not authorize_response.strip().startswith("<!DOCTYPE html>") and not authorize_response.strip().startswith(
         "<html"
     ):
    -    raise IECLoginError(-1, "Autorize Response is not an HTML document")
    +    raise IECLoginError(-1, "Authorize Response is not an HTML document")
    • Apply / Chat
    Suggestion importance[1-10]: 4

    __

    Why: Minor typo fix changing "Autorize" to "Authorize" improves code professionalism but has minimal functional impact.

    Low
    • Update

    Previous suggestions

    Suggestions up to commit 00efa2d
    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Add null check

    Add a null check for the extracted code value. If the input element exists but
    has no value attribute, this could return None and cause issues downstream.

    iec_api/login.py [62]

     code = code_input.get("value")
    +if not code:
    +    raise IECLoginError("Code value is empty or missing")
    Suggestion importance[1-10]: 6

    __

    Why: Adding a null check for the code value is good defensive programming that prevents potential issues if the HTML input element has no value attribute. This improves robustness.

    Low
    General
    Use specific error type

    Replace the generic ValueError with the more specific IECLoginError that's
    already imported in this module. This ensures consistent error handling
    throughout the application and provides better context for debugging.

    iec_api/login.py [52-54]

     # A) Validate that the response is indeed an HTML
     if not authorize_response.strip().startswith("<!DOCTYPE html>") and not authorize_response.strip().startswith("<html"):
    -    raise ValueError("Response is not an HTML document")
    +    raise IECLoginError("Response is not an HTML document")
    Suggestion importance[1-10]: 5

    __

    Why: Using IECLoginError instead of ValueError provides better error handling consistency and more specific context for debugging. This is a minor but worthwhile improvement.

    Low

    @GuyKh GuyKh force-pushed the htmlParsing branch 3 times, most recently from 2c77698 to e279b1f Compare June 6, 2025 06:58
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    1 participant