From 7370181b2b8682e6325062a69f7e2e98b5847a24 Mon Sep 17 00:00:00 2001
From: Peter Tanner
Date: Sat, 8 Feb 2025 03:07:05 +0800
Subject: [PATCH 1/6] Add vtt subtitle download `-s`
---
.gitignore | 2 ++
README.md | 33 +++++++++++++++++++++++++--------
echo360/course.py | 5 +++--
echo360/main.py | 14 ++++++++++++--
echo360/videos.py | 29 ++++++++++++++++++++++++++---
5 files changed, 68 insertions(+), 15 deletions(-)
diff --git a/.gitignore b/.gitignore
index 4284229..e315d54 100644
--- a/.gitignore
+++ b/.gitignore
@@ -59,3 +59,5 @@ target/
bin/
default_out_path/
_browser_user_data_dir/
+
+.vscode/
\ No newline at end of file
diff --git a/README.md b/README.md
index 21795a5..3664cee 100644
--- a/README.md
+++ b/README.md
@@ -15,7 +15,7 @@ See it in action:
-**NEWS:** It now works with `echo360.org` platform as well. Special thanks to [*@cloudrac3r*](https://github.com/cloudrac3r) and *Emma* for their kind offering of providing sources and helped debugging it. Read [FAQ](#echo360-cloud) for details.
+**NEWS:** It now works with `echo360.org` platform as well. Special thanks to [_@cloudrac3r_](https://github.com/cloudrac3r) and _Emma_ for their kind offering of providing sources and helped debugging it. Read [FAQ](#echo360-cloud) for details.
# Getting Started
@@ -44,7 +44,7 @@ echo360-downloader COURSE_URL # where COURSE_URL is your course url
### Optional
-- ffmpeg (for transcoding ts file to mp4 file) See [here (windows)](https://www.easytechguides.com/install-ffmpeg/) or [here](https://github.com/adaptlearning/adapt_authoring/wiki/Installing-FFmpeg) for a brief instructions of installing it in different OS.
+- ffmpeg (for transcoding ts file to mp4 file) See [here (windows)](https://www.easytechguides.com/install-ffmpeg/) or [here](https://github.com/adaptlearning/adapt_authoring/wiki/Installing-FFmpeg) for a brief instructions of installing it in different OS.
## Manual
@@ -62,9 +62,9 @@ python echo360.py
### Operating System
-- Linux
-- OS X
-- Windows
+- Linux
+- OS X
+- Windows
# Usage
@@ -78,6 +78,7 @@ python echo360.py
```
### Script args
+
```
>>> usage: echo360.py [-h] [--output OUTPUT_PATH]
[--after-date AFTER_DATEYYYY-MM-DD)]
@@ -135,6 +136,7 @@ optional arguments:
downloader will also try to download the second
video, which could be the alternative feed. Might
only work on some 'echo360.org' hosts.
+ --subtitles, -s Download VTT subtitles for each video feed.
--debug Enable extensive logging.
--auto Only effective for 'echo360.org' host. When set, this
script will attempts to automatically redirects after
@@ -145,6 +147,7 @@ optional arguments:
default behaviour and exists only for backward
compatibility reason.
```
+
# Examples
```shell
@@ -208,11 +211,15 @@ This is first built for the echo system in the University of Sydney, and then va
```shell
https://$(hostname)/ess/portal/section/$(UUID)
```
+
or
+
```shell
https://echo360.org[.xx]/
```
+
or with a dot net variant
+
```shell
https://echo360.net[.xx]/
```
@@ -252,34 +259,38 @@ Echo360 cloud refers to websites in the format of `https://echo360.org[.xx]`. Th
This method requires you to setup SSO credentials, therefore, it needs to open up a browser for you to setup your own university's SSO credentials.
To download videos, run:
+
```shell
./run.sh https://echo360.[.xx]/section/$(UUID)/home
```
-where `[.xx]` is an optional country flag specific to your echo360 platform and `$(UUID)` is the unique identifier for your course. This should the url that you can retrieve from your course's *main page* like the following.
+
+where `[.xx]` is an optional country flag specific to your echo360 platform and `$(UUID)` is the unique identifier for your course. This should the url that you can retrieve from your course's _main page_ like the following.
Note that this implies `setup-credential` option and will use chrome-webdriver by default. If you don't have chrome or prefer to use firefox, run it with the ` --firefox` flag like so:
+
```shell
./run.sh https://echo360.[.xx]/section/$(UUID)/home --firefox
```
After running the command, it will opens up a browser instance, most likely with a login page. You should then login with your student's credentials like what you would normally do. After you have successfully logged in, the module should automatically redirects you and continues. If the script hangs (e.g. failed to recognises that you have logged in), feel free to let me know.
-
### I'm not sure of how to run it?
First, you'd need to install [Python](https://www.python.org/downloads/) in your system. Then, you can follow the youtube tutorial videos to get an idea of how to use the module.
- For [Windows users](https://www.youtube.com/watch?v=Lv1wtjnCcwI) (and showcased how to retrieve actual echo360 course url)
-[](https://www.youtube.com/watch?v=Lv1wtjnCcwI)
+ [](https://www.youtube.com/watch?v=Lv1wtjnCcwI)
### My credentials does not work?
You can setup any credentials need with manually logging into websites, by running the script with:
+
```sh
./run.sh ECHO360_URL --setup-credential
```
+
This will open up a chrome instance that allows you to log into your website as you normally do. Afterwards, simply type 'continue' into your shell and press enter to continue to proceeds with the rest of the script.
### My credentials does not work (echo360.org)?
@@ -287,21 +298,27 @@ This will open up a chrome instance that allows you to log into your website as
For echo360.org, the default behaviour is it will always require you to setup-credentials, and the module will automatically detect login token and proceed the download process. For some institutions, this seems to be not sufficient (#29).
You can disable such behaviour with
+
```sh
./run.sh ECHO360_ORG_URL --manual
```
+
for manual setup; and once you had logged in, type
+
```sh
continue
```
+
in your terminal to continue.
### How do I download only individual video(s)?
You are in luck! It is now possible to pick a subset of videos to download from (instead of needing to download everything like before). Just pass the interactive argument like this:
+
```sh
./run.sh ECHO360_URL --interactive # or ./run.sh ECHO360_URL -i
```
+
...and it shall presents an interactive screen for you to pick each individual video(s) that you want to download, like the screenshot as shown below.
diff --git a/echo360/course.py b/echo360/course.py
index 8e95795..49b2b26 100644
--- a/echo360/course.py
+++ b/echo360/course.py
@@ -12,13 +12,14 @@
class EchoCourse(object):
- def __init__(self, uuid, hostname=None, alternative_feeds=False):
+ def __init__(self, uuid, hostname=None, alternative_feeds=False, subtitles=False):
self._course_id = None
self._course_name = None
self._uuid = uuid
self._videos = None
self._driver = None
self._alternative_feeds = alternative_feeds
+ self._subtitles = subtitles
if hostname is None:
self._hostname = "https://view.streaming.sydney.edu.au:8443"
else:
@@ -139,7 +140,7 @@ def get_videos(self):
course_data_json = self._get_course_data()
videos_json = course_data_json["data"]
self._videos = EchoCloudVideos(
- videos_json, self._driver, self.hostname, self._alternative_feeds
+ videos_json, self._driver, self.hostname, self._alternative_feeds, self._subtitles
)
# except KeyError as e:
# print("Unable to parse course videos from JSON (course_data)")
diff --git a/echo360/main.py b/echo360/main.py
index 95bd9a4..a2ca8cc 100644
--- a/echo360/main.py
+++ b/echo360/main.py
@@ -162,6 +162,14 @@ def handle_args():
the second video, which could be the alternative feed. Might only work on \
some 'echo360.org' hosts.",
)
+ parser.add_argument(
+ "--subtitles",
+ "-s",
+ action="store_true",
+ default=False,
+ dest="subtitles",
+ help="Download VTT subtitles for each video feed.",
+ )
parser.add_argument(
"--debug",
action="store_true",
@@ -253,6 +261,7 @@ def handle_args():
args["alternative_feeds"],
args["echo360cloud"],
args["persistent_session"],
+ args["subtitles"],
)
@@ -274,6 +283,7 @@ def main():
alternative_feeds,
usingEcho360Cloud,
persistent_session,
+ subtitles,
) = handle_args()
setup_logging(enable_degbug)
@@ -350,7 +360,7 @@ def cmd_exists(x):
course_uuid = re.search(
"[^/]([0-9a-zA-Z]+[-])+[0-9a-zA-Z]+", course_url
).group() # retrieve the last part of the URL
- course = EchoCloudCourse(course_uuid, course_hostname, alternative_feeds)
+ course = EchoCloudCourse(course_uuid, course_hostname, alternative_feeds, subtitles=subtitles)
else:
# import it here for monkey patching gevent, to fix the followings:
# MonkeyPatchWarning: Monkey-patching ssl after ssl has already been
@@ -360,7 +370,7 @@ def cmd_exists(x):
course_uuid = re.search(
"[^/]+(?=/$|$)", course_url
).group() # retrieve the last part of the URL
- course = EchoCourse(course_uuid, course_hostname)
+ course = EchoCourse(course_uuid, course_hostname, subtitles=subtitles)
downloader = EchoDownloader(
course,
output_path,
diff --git a/echo360/videos.py b/echo360/videos.py
index 1efe0e0..1f2c136 100644
--- a/echo360/videos.py
+++ b/echo360/videos.py
@@ -187,7 +187,7 @@ def get_all_parts(self):
class EchoCloudVideos(EchoVideos):
def __init__(
- self, videos_json, driver, hostname, alternative_feeds, skip_video_on_error=True
+ self, videos_json, driver, hostname, alternative_feeds, subtitles, skip_video_on_error=True
):
assert videos_json is not None
self._driver = driver
@@ -199,7 +199,7 @@ def __init__(
try:
self._videos.append(
EchoCloudVideo(
- video_json, self._driver, hostname, alternative_feeds
+ video_json, self._driver, hostname, alternative_feeds, subtitles
)
)
except Exception:
@@ -219,13 +219,14 @@ class EchoCloudVideo(EchoVideo):
def video_url(self):
return "{}/lesson/{}/classroom".format(self.hostname, self.video_id)
- def __init__(self, video_json, driver, hostname, alternative_feeds):
+ def __init__(self, video_json, driver, hostname, alternative_feeds, subtitles):
self.hostname = hostname
self._driver = driver
self.video_json = video_json
self.is_multipart_video = False
self.sub_videos = [self]
self.download_alternative_feeds = alternative_feeds
+ self.download_subtitles = subtitles
if "lessons" in video_json:
# IS a multi-part lesson.
self.sub_videos = [
@@ -330,6 +331,28 @@ def download_single(self, session, single_url, output_dir, filename, pool_size):
# NOW we can finally start downloading!
from .hls_downloader import urljoin
+ if self.download_subtitles:
+ # hacky way to get the current url media id
+ # not sure if each feed can have a different media id, so better download it for every feed.
+ try:
+ media_id = [media["id"] for media in self.video_json['lesson']['medias'] if media["id"] in single_url][0]
+ except IndexError:
+ media_id = None
+ if media_id is not None:
+ print(" > Downloading subtitles:")
+ vtt_url = f"{self.hostname}/api/ui/echoplayer/lessons/{self.video_id}/medias/{media_id}/transcript-file?format=vtt"
+ cookies = {cookie['name']: cookie['value'] for cookie in self._driver.get_cookies()}
+ response = requests.get(vtt_url, cookies=cookies)
+ if response.status_code == 200:
+ head = requests.head(vtt_url, cookies=cookies)
+ if head.status_code == 200:
+ print(f"Original subtitle name: {head.headers['Content-Disposition']}")
+ # Use same filename as mp4 since VLC will automatically use a vtt if the filename matches.
+ with open(os.path.join(output_dir, f"{filename}.vtt"), "wb") as file:
+ file.write(response.content)
+ else:
+ print("No subtitles found.")
+
audio_file = None
if m3u8_audio is not None:
print(" > Downloading audio:")
From 2c71d8e3e8831ed3635064505ba0a528bc01f3d6 Mon Sep 17 00:00:00 2001
From: Peter Tanner
Date: Mon, 10 Feb 2025 17:33:09 +0800
Subject: [PATCH 2/6] Add retry on error
---
.gitignore | 3 +-
echo360/hls_downloader.py | 93 +++++++++++++++++++++------------------
2 files changed, 53 insertions(+), 43 deletions(-)
diff --git a/.gitignore b/.gitignore
index e315d54..bf15ceb 100644
--- a/.gitignore
+++ b/.gitignore
@@ -60,4 +60,5 @@ bin/
default_out_path/
_browser_user_data_dir/
-.vscode/
\ No newline at end of file
+.vscode/
+_browser_persistent_session/
diff --git a/echo360/hls_downloader.py b/echo360/hls_downloader.py
index 2b77e94..57c4f42 100644
--- a/echo360/hls_downloader.py
+++ b/echo360/hls_downloader.py
@@ -1,3 +1,4 @@
+from itertools import count
import ffmpy
import gevent
from gevent.pool import Pool
@@ -78,49 +79,57 @@ def run(self, m3u8_url, dir="", convert_to_mp4=True):
self.dir = dir
if self.dir and not os.path.isdir(self.dir):
os.makedirs(self.dir)
- r = self.session.get(m3u8_url, timeout=10)
- if r.ok:
- body = r.content
- if body:
- # use set to prevent duplicates
- ts_list = {
- urljoin(m3u8_url, n.strip())
- for n in body.decode().split("\n")
- if n and not n.startswith("#")
- }
- ts_list = list(ts_list)
- # this is very hacky as well.. But idk how to overcome some m3u8 has nested
- # m3u8 and some don't.
- if len(ts_list) == 1 and ts_list[0].split(".")[-1] not in (
- "ts",
- "mp4",
- "m4s",
- ):
- file_name = ts_list[0].split("/")[-1].split("?")[0]
- chunk_list_url = "{0}/{1}".format(
- m3u8_url[: m3u8_url.rfind("/")], file_name
+ for try_n in count(start=1):
+ r = self.session.get(m3u8_url, timeout=10)
+ if r.ok:
+ body = r.content
+ if body:
+ # use set to prevent duplicates
+ ts_list = {
+ urljoin(m3u8_url, n.strip())
+ for n in body.decode().split("\n")
+ if n and not n.startswith("#")
+ }
+ ts_list = list(ts_list)
+ # this is very hacky as well.. But idk how to overcome some m3u8 has nested
+ # m3u8 and some don't.
+ if len(ts_list) == 1 and ts_list[0].split(".")[-1] not in (
+ "ts",
+ "mp4",
+ "m4s",
+ ):
+ file_name = ts_list[0].split("/")[-1].split("?")[0]
+ chunk_list_url = "{0}/{1}".format(
+ m3u8_url[: m3u8_url.rfind("/")], file_name
+ )
+ r = self.session.get(chunk_list_url, timeout=20)
+ if r.ok:
+ body = r.content
+ ts_list = [
+ urljoin(m3u8_url, n.strip())
+ for n in body.decode().split("\n")
+ if n and not n.startswith("#")
+ ]
+ # re-retrieve to get all ts file list
+
+ ts_list = zip(ts_list, [n for n in range(len(ts_list))])
+ ts_list = list(ts_list)
+
+ if ts_list:
+ self.ts_total = len(ts_list)
+ self.ts_current = 0
+ g1 = gevent.spawn(self._join_file)
+ self._download(ts_list)
+ g1.join()
+ break
+ else:
+ print(
+ "Failed status code: {}, try {}, waiting {} minutes. Ctrl+C to cancel".format(
+ r.status_code, try_n, try_n
)
- r = self.session.get(chunk_list_url, timeout=20)
- if r.ok:
- body = r.content
- ts_list = [
- urljoin(m3u8_url, n.strip())
- for n in body.decode().split("\n")
- if n and not n.startswith("#")
- ]
- # re-retrieve to get all ts file list
-
- ts_list = zip(ts_list, [n for n in range(len(ts_list))])
- ts_list = list(ts_list)
-
- if ts_list:
- self.ts_total = len(ts_list)
- self.ts_current = 0
- g1 = gevent.spawn(self._join_file)
- self._download(ts_list)
- g1.join()
- else:
- print("Failed status code: {}".format(r.status_code))
+ )
+ time.sleep(60 * try_n)
+
infile_name = os.path.join(
self.dir,
self._result_file_name.split(".")[0]
From 13c5eb69d359b11c4a2f982046edd4d700ad0748 Mon Sep 17 00:00:00 2001
From: Peter Tanner
Date: Mon, 10 Feb 2025 19:33:23 +0800
Subject: [PATCH 3/6] Handle groups/folders in courses and strip illegal
filenames
---
echo360/utils.py | 41 +++++++++++++++++++++++++++++++++++
echo360/videos.py | 55 +++++++++++++++++++++++++++++++++++++++++------
2 files changed, 90 insertions(+), 6 deletions(-)
diff --git a/echo360/utils.py b/echo360/utils.py
index 29ea025..9bddcfe 100644
--- a/echo360/utils.py
+++ b/echo360/utils.py
@@ -6,4 +6,45 @@ def naive_versiontuple(v):
return tuple(map(int, (v.split("."))))
+def strip_illegal_path(path: str) -> str:
+ illegal_chars = '<>:"/\\|?*' + "".join(chr(c) for c in range(0, 32))
+ for ch in illegal_chars:
+ path = path.replace(ch, "_")
+
+ reserved_names = {
+ "CON",
+ "PRN",
+ "AUX",
+ "NUL",
+ "COM1",
+ "COM2",
+ "COM3",
+ "COM4",
+ "COM5",
+ "COM6",
+ "COM7",
+ "COM8",
+ "COM9",
+ "LPT1",
+ "LPT2",
+ "LPT3",
+ "LPT4",
+ "LPT5",
+ "LPT6",
+ "LPT7",
+ "LPT8",
+ "LPT9",
+ }
+ name, *ext = path.rsplit(".", 1)
+ if name.upper() in reserved_names:
+ path = f"_{path}"
+
+ path = path.rstrip(" .")
+
+ if path in {".", ".."}:
+ path = "_"
+
+ return path
+
+
PERSISTENT_SESSION_FOLDER = "_browser_persistent_session"
diff --git a/echo360/videos.py b/echo360/videos.py
index 1f2c136..d7954de 100644
--- a/echo360/videos.py
+++ b/echo360/videos.py
@@ -16,6 +16,7 @@
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import StaleElementReferenceException
+from .utils import strip_illegal_path
from .hls_downloader import Downloader
from .naive_m3u8_parser import NaiveM3U8Parser
@@ -187,11 +188,39 @@ def get_all_parts(self):
class EchoCloudVideos(EchoVideos):
def __init__(
- self, videos_json, driver, hostname, alternative_feeds, subtitles, skip_video_on_error=True
+ self,
+ course_json,
+ driver,
+ hostname,
+ alternative_feeds,
+ subtitles,
+ skip_video_on_error=True,
):
- assert videos_json is not None
+ assert course_json is not None
self._driver = driver
self._videos = []
+
+ # Traverse groups/folders
+ queue = [("", course_json)]
+ videos_json = []
+ # Not sure if the only two types are 'SyllabusLessonType' and 'SyllabusGroupType'.
+ while len(queue) > 0:
+ path, items = queue.pop()
+ for item in items:
+ if type(item) is dict:
+ if "lesson" in item["type"].lower():
+ item["path_prefix"] = path
+ videos_json.append(item)
+ else:
+ queue.append(
+ (
+ os.path.join(
+ path, strip_illegal_path(item["groupInfo"]["name"])
+ ),
+ item["lessons"],
+ )
+ )
+
total_videos_num = len(videos_json)
update_course_retrieval_progress(0, total_videos_num)
@@ -222,6 +251,7 @@ def video_url(self):
def __init__(self, video_json, driver, hostname, alternative_feeds, subtitles):
self.hostname = hostname
self._driver = driver
+ self._path_prefix = video_json["path_prefix"]
self.video_json = video_json
self.is_multipart_video = False
self.sub_videos = [self]
@@ -261,6 +291,7 @@ def __init__(self, video_json, driver, hostname, alternative_feeds, subtitles):
self._title = video_json["lesson"]["lesson"]["name"]
def download(self, output_dir, filename, pool_size=50):
+ output_dir = os.path.join(output_dir, self._path_prefix)
print("")
print("-" * 60)
print('Downloading "{}"'.format(filename))
@@ -297,6 +328,7 @@ def download(self, output_dir, filename, pool_size=50):
return final_result
def download_single(self, session, single_url, output_dir, filename, pool_size):
+ filename = strip_illegal_path(filename)
if os.path.exists(os.path.join(output_dir, filename + ".mp4")):
print(" > Skipping downloaded video")
print("-" * 60)
@@ -335,20 +367,31 @@ def download_single(self, session, single_url, output_dir, filename, pool_size):
# hacky way to get the current url media id
# not sure if each feed can have a different media id, so better download it for every feed.
try:
- media_id = [media["id"] for media in self.video_json['lesson']['medias'] if media["id"] in single_url][0]
+ media_id = [
+ media["id"]
+ for media in self.video_json["lesson"]["medias"]
+ if media["id"] in single_url
+ ][0]
except IndexError:
media_id = None
if media_id is not None:
print(" > Downloading subtitles:")
vtt_url = f"{self.hostname}/api/ui/echoplayer/lessons/{self.video_id}/medias/{media_id}/transcript-file?format=vtt"
- cookies = {cookie['name']: cookie['value'] for cookie in self._driver.get_cookies()}
+ cookies = {
+ cookie["name"]: cookie["value"]
+ for cookie in self._driver.get_cookies()
+ }
response = requests.get(vtt_url, cookies=cookies)
if response.status_code == 200:
head = requests.head(vtt_url, cookies=cookies)
if head.status_code == 200:
- print(f"Original subtitle name: {head.headers['Content-Disposition']}")
+ print(
+ f"Original subtitle name: {head.headers['Content-Disposition']}"
+ )
# Use same filename as mp4 since VLC will automatically use a vtt if the filename matches.
- with open(os.path.join(output_dir, f"{filename}.vtt"), "wb") as file:
+ with open(
+ os.path.join(output_dir, f"{filename}.vtt"), "wb"
+ ) as file:
file.write(response.content)
else:
print("No subtitles found.")
From a70245c6673ac76285576a349a73336d9be18161 Mon Sep 17 00:00:00 2001
From: Peter Tanner
Date: Tue, 11 Feb 2025 22:14:11 +0800
Subject: [PATCH 4/6] Add `--dump-json`, fix course names, don't re-download
subtitles
---
echo360/course.py | 44 ++++++++++++++++++----------
echo360/downloader.py | 30 ++++++++++++++-----
echo360/main.py | 14 ++++++++-
echo360/videos.py | 68 ++++++++++++++++++++++---------------------
4 files changed, 100 insertions(+), 56 deletions(-)
diff --git a/echo360/course.py b/echo360/course.py
index 49b2b26..d581357 100644
--- a/echo360/course.py
+++ b/echo360/course.py
@@ -1,11 +1,12 @@
+import functools
import json
-import re
import sys
import requests
import selenium
import logging
+from .utils import strip_illegal_path
from .videos import EchoVideos, EchoCloudVideos
_LOGGER = logging.getLogger(__name__)
@@ -140,7 +141,11 @@ def get_videos(self):
course_data_json = self._get_course_data()
videos_json = course_data_json["data"]
self._videos = EchoCloudVideos(
- videos_json, self._driver, self.hostname, self._alternative_feeds, self._subtitles
+ videos_json,
+ self._driver,
+ self.hostname,
+ self._alternative_feeds,
+ self._subtitles,
)
# except KeyError as e:
# print("Unable to parse course videos from JSON (course_data)")
@@ -174,20 +179,29 @@ def course_id(self):
return self._course_id
@property
+ @functools.lru_cache
def course_name(self):
- if self._course_name is None:
- # try each available video as some video might be special has contains
- # no information about the course.
- for v in self.course_data["data"]:
- try:
- self._course_name = v["lesson"]["video"]["published"]["courseName"]
- break
- except KeyError:
- pass
- if self._course_name is None:
- # no available course name found...?
- self._course_name = "[[UNTITLED]]"
- return self._course_name
+ cookies = {
+ cookie["name"]: cookie["value"] for cookie in self._driver.get_cookies()
+ }
+ response = requests.get(
+ "https://echo360.net.au/user/enrollments", cookies=cookies
+ )
+ if response.status_code == 200:
+ course_list = response.json()["data"]
+ for sections_parts in course_list:
+ matching = [
+ x
+ for x in sections_parts["userSections"]
+ if x["sectionId"] == self._uuid
+ ]
+ if len(matching) > 0:
+ course = matching[0]
+ return strip_illegal_path(
+ f"{course['courseCode']} - {course['sectionName']} {course['courseName']}"
+ )
+
+ return "[[UNTITLED]]"
@property
def nice_name(self):
diff --git a/echo360/downloader.py b/echo360/downloader.py
index 0a17fd2..e94d222 100644
--- a/echo360/downloader.py
+++ b/echo360/downloader.py
@@ -1,3 +1,5 @@
+from datetime import datetime
+import json
import dateutil.parser
import os
import sys
@@ -6,7 +8,7 @@
from .course import EchoCloudCourse
from .echo_exceptions import EchoLoginError
-from .utils import naive_versiontuple, PERSISTENT_SESSION_FOLDER
+from .utils import naive_versiontuple, PERSISTENT_SESSION_FOLDER, strip_illegal_path
import pip_ensure_version
from pick import pick
@@ -191,6 +193,7 @@ def __init__(
webdriver_to_use="phantomjs",
interactive_mode=False,
persistent_session=False,
+ dump_json=False,
):
self._course = course
root_path = os.path.dirname(os.path.abspath(sys.modules["__main__"].__file__))
@@ -200,6 +203,7 @@ def __init__(
self._date_range = date_range
self._username = username
self._password = password
+ self._dump_json = dump_json
self.interactive_mode = interactive_mode
self.regex_replace_invalid = re.compile(r"[\\\\/:*?\"<>|]")
@@ -342,14 +346,26 @@ def download_all(self):
self.login()
sys.stdout.write(">> Retrieving echo360 Course Info... ")
sys.stdout.flush()
- videos = self._course.get_videos().videos
- print("Done!")
+
# change the output directory to be inside a folder named after the course
- self._output_dir = os.path.join(
- self._output_dir, "{0}".format(self._course.nice_name).strip()
- )
# replace invalid character for folder
- self.regex_replace_invalid.sub("_", self._output_dir)
+ if isinstance(self._course, EchoCloudCourse):
+ self._output_dir = os.path.join(
+ self._output_dir,
+ "{0}".format(self._course.nice_name).strip(),
+ )
+ if self._output_dir and not os.path.isdir(self._output_dir):
+ os.makedirs(self._output_dir)
+ if self._dump_json:
+ dump_json_path = os.path.join(
+ self._output_dir,
+ f"course_{datetime.now().replace(microsecond=0).isoformat().replace(':','_')}.json",
+ )
+ with open(dump_json_path, "w") as f:
+ f.write(json.dumps(self._course._get_course_data()))
+
+ videos = self._course.get_videos().videos
+ print("Done!")
filtered_videos = [video for video in videos if self._in_date_range(video.date)]
videos_to_be_download = []
diff --git a/echo360/main.py b/echo360/main.py
index a2ca8cc..680a6bf 100644
--- a/echo360/main.py
+++ b/echo360/main.py
@@ -170,6 +170,13 @@ def handle_args():
dest="subtitles",
help="Download VTT subtitles for each video feed.",
)
+ parser.add_argument(
+ "--dump-json",
+ action="store_true",
+ default=False,
+ dest="dump_json",
+ help="Download JSON representation of course to output directory.",
+ )
parser.add_argument(
"--debug",
action="store_true",
@@ -262,6 +269,7 @@ def handle_args():
args["echo360cloud"],
args["persistent_session"],
args["subtitles"],
+ args["dump_json"],
)
@@ -284,6 +292,7 @@ def main():
usingEcho360Cloud,
persistent_session,
subtitles,
+ dump_json,
) = handle_args()
setup_logging(enable_degbug)
@@ -360,7 +369,9 @@ def cmd_exists(x):
course_uuid = re.search(
"[^/]([0-9a-zA-Z]+[-])+[0-9a-zA-Z]+", course_url
).group() # retrieve the last part of the URL
- course = EchoCloudCourse(course_uuid, course_hostname, alternative_feeds, subtitles=subtitles)
+ course = EchoCloudCourse(
+ course_uuid, course_hostname, alternative_feeds, subtitles=subtitles
+ )
else:
# import it here for monkey patching gevent, to fix the followings:
# MonkeyPatchWarning: Monkey-patching ssl after ssl has already been
@@ -382,6 +393,7 @@ def cmd_exists(x):
webdriver_to_use=webdriver_to_use,
interactive_mode=interactive_mode,
persistent_session=persistent_session,
+ dump_json=dump_json,
)
_LOGGER.debug(
diff --git a/echo360/videos.py b/echo360/videos.py
index d7954de..a61dc40 100644
--- a/echo360/videos.py
+++ b/echo360/videos.py
@@ -329,6 +329,41 @@ def download(self, output_dir, filename, pool_size=50):
def download_single(self, session, single_url, output_dir, filename, pool_size):
filename = strip_illegal_path(filename)
+ if self.download_subtitles:
+ # hacky way to get the current url media id
+ # not sure if each feed can have a different media id, so better download it for every feed.
+ try:
+ media_id = [
+ media["id"]
+ for media in self.video_json["lesson"]["medias"]
+ if media["id"] in single_url
+ ][0]
+ except IndexError:
+ print(" > No subtitles found.")
+ else:
+ subtitle_path = os.path.join(output_dir, f"{filename}.vtt")
+ if os.path.exists(subtitle_path):
+ print(" > Skipping downloaded subtitle")
+ else:
+ print(" > Downloading subtitles:")
+ vtt_url = f"{self.hostname}/api/ui/echoplayer/lessons/{self.video_id}/medias/{media_id}/transcript-file?format=vtt"
+ cookies = {
+ cookie["name"]: cookie["value"]
+ for cookie in self._driver.get_cookies()
+ }
+ response = requests.get(vtt_url, cookies=cookies)
+ if response.status_code == 200:
+ head = requests.head(vtt_url, cookies=cookies)
+ if head.status_code == 200:
+ print(
+ f"Original subtitle name: {head.headers['Content-Disposition']}"
+ )
+ # Use same filename as mp4 since VLC will automatically use a vtt if the filename matches.
+ with open(subtitle_path, "wb") as file:
+ file.write(response.content)
+ else:
+ print("No subtitles found.")
+
if os.path.exists(os.path.join(output_dir, filename + ".mp4")):
print(" > Skipping downloaded video")
print("-" * 60)
@@ -363,39 +398,6 @@ def download_single(self, session, single_url, output_dir, filename, pool_size):
# NOW we can finally start downloading!
from .hls_downloader import urljoin
- if self.download_subtitles:
- # hacky way to get the current url media id
- # not sure if each feed can have a different media id, so better download it for every feed.
- try:
- media_id = [
- media["id"]
- for media in self.video_json["lesson"]["medias"]
- if media["id"] in single_url
- ][0]
- except IndexError:
- media_id = None
- if media_id is not None:
- print(" > Downloading subtitles:")
- vtt_url = f"{self.hostname}/api/ui/echoplayer/lessons/{self.video_id}/medias/{media_id}/transcript-file?format=vtt"
- cookies = {
- cookie["name"]: cookie["value"]
- for cookie in self._driver.get_cookies()
- }
- response = requests.get(vtt_url, cookies=cookies)
- if response.status_code == 200:
- head = requests.head(vtt_url, cookies=cookies)
- if head.status_code == 200:
- print(
- f"Original subtitle name: {head.headers['Content-Disposition']}"
- )
- # Use same filename as mp4 since VLC will automatically use a vtt if the filename matches.
- with open(
- os.path.join(output_dir, f"{filename}.vtt"), "wb"
- ) as file:
- file.write(response.content)
- else:
- print("No subtitles found.")
-
audio_file = None
if m3u8_audio is not None:
print(" > Downloading audio:")
From 6448ddaa3f0c1b41c0b87d7d61b1cf906967d63d Mon Sep 17 00:00:00 2001
From: Peter Tanner
Date: Tue, 11 Feb 2025 23:13:36 +0800
Subject: [PATCH 5/6] Download attached media for echo cloud videos
---
echo360/videos.py | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/echo360/videos.py b/echo360/videos.py
index a61dc40..8cb9d7e 100644
--- a/echo360/videos.py
+++ b/echo360/videos.py
@@ -311,6 +311,28 @@ def download(self, output_dir, filename, pool_size=50):
# download_alternative_feeds defaults to False, slice to include only the first one
urls = urls[:1]
+ # Download attached media (Example: mediaType: Presentation can contain PDF slides)
+ cookies = {
+ cookie["name"]: cookie["value"] for cookie in self._driver.get_cookies()
+ }
+ for media in self.video_json["lesson"]["medias"]:
+ if media["mediaType"] != "Video":
+ media_filename = media["title"]
+ media_filepath = os.path.join(output_dir, media_filename)
+ media_url = (
+ f"{self.hostname}/media/download/{media['id']}/{media_filename}"
+ )
+ if os.path.exists(media_filepath):
+ print(
+ "> Media {} already downloaded, skipped.".format(media_filename)
+ )
+ else:
+ response = requests.get(media_url, cookies=cookies)
+ if response.status_code == 200:
+ print("> Downloading media {}...".format(media_filename))
+ with open(media_filepath, "wb") as file:
+ file.write(response.content)
+
final_result = True
for counter, single_url in enumerate(urls):
if self.download_alternative_feeds:
From 377047cb3c0f9ae8d1563ee1514998fbdcddee7f Mon Sep 17 00:00:00 2001
From: Peter Tanner
Date: Wed, 12 Feb 2025 03:43:37 +0800
Subject: [PATCH 6/6] Add --dump-json to readme
---
README.md | 1 +
1 file changed, 1 insertion(+)
diff --git a/README.md b/README.md
index 3664cee..e0f6489 100644
--- a/README.md
+++ b/README.md
@@ -137,6 +137,7 @@ optional arguments:
video, which could be the alternative feed. Might
only work on some 'echo360.org' hosts.
--subtitles, -s Download VTT subtitles for each video feed.
+ --dump-json Download JSON representation of course to output directory.
--debug Enable extensive logging.
--auto Only effective for 'echo360.org' host. When set, this
script will attempts to automatically redirects after