Skip to content

test-cli: drop dependency, add check #22

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

bertsky
Copy link

@bertsky bertsky commented Mar 22, 2022

No description provided.

@bertsky
Copy link
Author

bertsky commented Mar 23, 2022

@crater2150, regarding 22cbdcb, could you please comment what kind of image input the default and legacy models expect? From my experiments, it looks like

  • RGB and binary does not work (the first input must only have 1 channel in the last axis)
  • grayscale and binary does not work (there will be no detections whatsoever)
  • binary * 255 and binary kinda works (with same low-quality results as the test-cli results, which only takes a binary image as input)

In particular, if during training only binarized images were seen, I wonder why there are two inputs at all. Also, should we do cropping and deskewing first? And what about the xheight and resize_height parameters? (They seem to have a large influence, so I'd like to have some guidance beyond the ocrd-tool descriptions.)

@crater2150
Copy link
Contributor

@bertsky

regarding 22cbdcb, could you please comment what kind of image input the default and legacy models expect?

In particular, if during training only binarized images were seen, I wonder why there are two inputs at all.

As far as I can remember, the models were trained on the binary images only, but it is certainly possible to train a model on the color images too. The bundled models are for separating text from nontext only, which worked better on binary images in our experiments at the time. The reason for the two inputs is, that some postprocessing relies on the binary images for distinguishing foreground and background, so they are required even when training on full color images.

Also, should we do cropping and deskewing first?

The models were trained on deskewed images. The input images weren't cropped, but in my experience, cropping images before prediction reduces errors, even if the model was trained on uncropped images.

And what about the xheight and resize_height parameters? (They seem to have a large influence, so I'd like to have some guidance beyond the ocrd-tool descriptions.)

The xheight parameter should be set to match the scaling used in the model, which was set to the default value of 6 during the training of the bundled models. The preprocessing estimates the average line height in the input and scales it so that a lowercase letter like x should be sized xheight pixels. Maybe setting model to __DEFAULT__ should also set xheight?

The resize_height is model-independent, it is only used for scaling down the output of the neural network before postprocessing (splitting regions etc.). This is an upper limit on the image height, so for inputs smaller than this value, it has no effect. For larger images, it reduces the resolution and therefore may affect the quality of the segmentation. Setting it to something higher than the height of the largest input image would disable downscaling, if the performance or memory usage is bad, this value can be decreased.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants