get_legistar_content_uris initial media URL extraction inaccurate for some events

### Describe the Bug

[In `get_legistar_content_uris`](https://github.com/CouncilDataProject/cdp-scrapers/blob/03d782a47fcea609b29cdf36ce3f1f07ecdd4a3d/cdp_scrapers/legistar_utils.py#L517), the `BeautifulSoup` code to `extract_url` from a Legistar Event URL (`legistar_ev[LEGISTAR_EV_SITE_URL]`) misses available video links in some circumstances.

### Expected Behavior

The City of Olympia has what appears to be a pretty standard Legistar implementation including Granicus-hosted media files.  So I was surprised when the stock `get_content_uris` call didn't result in matches.

Here's [an example Olympia Planning Commission event detail screen](https://olympia.legistar.com/MeetingDetail.aspx?LEGID=2403&GID=218&G=19510D34-31FB-48B8-9C02-4D026953451C) and the corresponding valid "Media" anchor tag:

```bash
<a id="ctl00_ContentPlaceHolder1_gridMain_ctl00_ctl06_hypVideo" onclick="window.open('Video.aspx?Mode=Granicus&amp;ID1=1536&amp;ID2=120417&amp;G=19510D34-31FB-48B8-9C02-4D026953451C&amp;Mode2=Video','video');return false;" href="#" style="color:Blue;font-family:Tahoma;font-size:10pt;">Media</a>
```

I identified three potential issues which could be addressed while hopefully not impacting existing matches in the wild.  Here's the operative CDP code:

```bash
    extract_url = soup.find(
        "a",
        id=re.compile(r"ct\S*_ContentPlaceHolder\S*_hypVideo"),
        class_="videolink",
    )
    if extract_url is None:
        return (ContentUriScrapeResult.Status.UnrecognizedPatternError, None)
    # the <a> tag will not have this attribute if there is no video
    if "onclick" not in extract_url.attrs:
        return (ContentUriScrapeResult.Status.ContentNotProvidedError, None)
```

1. **`videolink` class** - City of Olympia Media links do not have a `videolink` class assigned.  Is this a requirement to differentiate links on other Legistar instances, or is the highly specific ID enough?
2. **`find` only identifies the first Media link instance** - and in the example provided, the first Media link is not associated with a video, therefore resulting in a failure for the entire event.  You could do a `find_all` and iterate through, but a different approach might be...
3. **`onclick` is a distinguishing attribute** - while checked subsequently to provide a unique error, we could test for the presence of the `onclick` attribute to more quickly identify a valid Media link.

Here's how I suggest modifying the code:

```bash
    extract_url = soup.find(
        "a",
        id=re.compile(r"ct\S*_ContentPlaceHolder\S*_hypVideo"),
        onclick=True,
    )
    if extract_url is None:
        return (ContentUriScrapeResult.Status.UnrecognizedPatternError, None)
```

### Reproduction

You can see where the Event Gather workflow is failing on the `cdp-usa-wa-olympia` instance here; while not specifically pointing out this issue, this is the next hiccup:
https://github.com/CannObserv/cdp-usa-wa-city-olympia/actions/runs/6999433306/job/19038863304

If this change isn't apt to break anything, I'd much rather change things here than have to derive a dedicated scraper class (at least not yet) and then override `get_legistar_content_uris` in that file.  I'm not sure how to get the Python import hierarchy to respect an override otherwise.

### Environment



-   OS Version: _[e.g. macOS 11.3.1]_
-   cdp-scrapers Version: _[e.g. 0.5.0]_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

get_legistar_content_uris initial media URL extraction inaccurate for some events #145

Describe the Bug

Expected Behavior

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

get_legistar_content_uris initial media URL extraction inaccurate for some events #145

Description

Describe the Bug

Expected Behavior

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions