Replies: 2 comments 1 reply
-
Check out VideoSubFinder: You can take your VideoSubFinder output and OCR it with Subtitle Edit (which in my experience is slow and laborious for accurate results), or use FineReader (fast, sometimes inaccurate, but fixing the mistakes later is trivial). I believe there are instructions on how to perform the OCR with FineReader and combine the text with the empty timed SRT file that VideoSubFinder provides. |
Beta Was this translation helpful? Give feedback.
-
Hardsubs detection & extraction is not a simple task, so it won't be implemented in SE. Personally for me VideoSubFinder wasn't good enough, so I implemented it in InpaintDelogo [there as it already had some needed functionality employed for logo detection stuff]. InpaintDelogo has more robust "trash" detection/extraction-skip and temporal refinement, and it's faster, lower video/subs quality - more edge InpaintDelogo should have over VSF. And some users said that it's easier to use as it's better documented. |
Beta Was this translation helpful? Give feedback.
-
This was previously mentioned in #5885 and at the time apparently this was not possible.
Since it has been some time, I wonder if this could be revisited.
SE already employs OCR for bitmap subtitles like VobSub and PGS. So I wonder why the same could not be used for burned-in hardsubs.
Getting the subtitles really only requires 2 things: getting the timing, and getting the text.
The text is already taken care of via OCR, so the only consideration left is timing.
Bitmap subtitles obviously already have timestamps attached, which doesn't exist for burned-in subtitles. In a perfect world SE could employ some text-similarity metrics to figure out where a line begins and ends, but I realise that this isn't necessarily realistic given the complexity.
However, as an alternative users could be given the option to create the start and end timestamps manually. The "conversion" process would then only have to extract one frame per line (i.e. where the line starts) and run the OCR process. While this does require manual work, it is magnitutes less complex then figuring out timings automatically and also doesn't require running OCR on every frame of the video, which might take longer then setting the timestamps manually in the first place.
What are the thoughts on this? It would be really nice to get this functionality as there don't seem to be many tools out there that do this in the first place. There are a handful of python scripts across GitHub, but they all seem barely maintained.
Beta Was this translation helpful? Give feedback.
All reactions