yt dlp initial pull request #63

mheryerznkanyan · 2024-06-01T12:52:13Z

No description provided.

Alcray · 2024-07-07T09:41:47Z

dataset_configs/armenian/youtube_audio_tmp/config.yaml

+processors_to_run: "0:"
+workspace_dir: /workspace/nemo_capstone
+final_manifest: ${workspace_dir}/final_manifest.json
+
+processors:


Please add

Nvidia copyright text

config documentation text

Alcray · 2024-07-07T09:42:54Z

docker-compose.yaml

Where do you use this docker compose file? Is it possible to run the scripts without it?

Alcray · 2024-07-07T09:43:20Z

sdp/processors/datasets/ytdlp/create_initial_manifest.py

+    Make sure to install yt-dlp tool before funning this code. 
+
+    Tool link: https://github.com/yt-dlp/yt-dlp


Since you use a 3rd party tool here, could you add a specific check in the script whether the tool is installed or not, and log message for user to install it?

Alcray · 2024-07-07T09:47:07Z

sdp/processors/datasets/ytdlp/create_initial_manifest.py

+    Args:
+        raw_data_dir (str): Root directory of the files to be added to the manifest. Recursively searches for files with the given 'extension'.
+        output_field (str): Field to store the file paths in the dataset. Default is "audio_filepath".
+        extension (str): Extension of the files to include in the dataset. Default is "wav".
+        **kwargs: Additional keyword arguments for the base class `BaseParallelProcessor`.
+    """
+
+    def __init__(
+        self,
+        raw_data_dir: str,
+        output_field: str = "audio_filepath",
+        # extension: str = "wav",
+        **kwargs,
+    ):
+        super().__init__(**kwargs)
+        self.raw_data_dir = Path(raw_data_dir)
+        self.output_field = output_field
+        file_path = "sdp/processors/datasets/ytdlp/search_terms.json"


If you don't use an "extension", remove it from both the commented version and from documentation of the function

For convenience we use "key" instead of "field". Replace output_field with output_key

Maybe it would be more convenient to change file_path to more informative name? Also could you remove it's hardcoding, make it a variable passed from config file with default value?

Alcray · 2024-07-07T09:47:37Z

sdp/processors/datasets/ytdlp/downlaod_youtube_audio.py

+    Args:
+        links_filepath_field (str): Field to get the YouTube video link.
+        output_audio_path (str): Path to save the downloaded audio files.
+        **kwargs: Additional keyword arguments for the base class `BaseParallelProcessor`.


Please use "key" instead of field

Alcray · 2024-07-07T09:49:16Z

sdp/processors/datasets/ytdlp/search_terms.json

If you want to use this as an example, not as a working configuration, please mention somewhere how user should work with this file and remove any personal information from here, like the name

mheryerznkanyan added 2 commits June 1, 2024 16:48

ytdkp initial commit

b88fa09

Update config.yaml

4c55021

Alcray suggested changes Jul 7, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

yt dlp initial pull request #63

yt dlp initial pull request #63

Uh oh!

mheryerznkanyan commented Jun 1, 2024

Uh oh!

Alcray Jul 7, 2024

Uh oh!

Alcray Jul 7, 2024

Uh oh!

Alcray Jul 7, 2024

Uh oh!

Alcray Jul 7, 2024

Uh oh!

Alcray Jul 7, 2024

Uh oh!

Alcray Jul 7, 2024

Uh oh!

Uh oh!

		Make sure to install yt-dlp tool before funning this code.

		Tool link: https://github.com/yt-dlp/yt-dlp

yt dlp initial pull request #63

Are you sure you want to change the base?

yt dlp initial pull request #63

Uh oh!

Conversation

mheryerznkanyan commented Jun 1, 2024

Uh oh!

Alcray Jul 7, 2024

Choose a reason for hiding this comment

Uh oh!

Alcray Jul 7, 2024

Choose a reason for hiding this comment

Uh oh!

Alcray Jul 7, 2024

Choose a reason for hiding this comment

Uh oh!

Alcray Jul 7, 2024

Choose a reason for hiding this comment

Uh oh!

Alcray Jul 7, 2024

Choose a reason for hiding this comment

Uh oh!

Alcray Jul 7, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!