Skip to content

Conversation

dependabot-preview[bot]
Copy link
Contributor

Bumps tika-parsers from 1.24.1 to 1.27.

Changelog

Sourced from tika-parsers's changelog.

Release 2.0.0 - ???

  • Cleanup of fetcher integration with tika-server.

Release 2.0.0-BETA - 05/19/2021

  • Refactor pipes module for resilience

  • Add transcribe capability (TIKA-94).

Release 2.0.0-ALPHA - 01/13/2021

BREAKING CHANGES in 2.0.0

  • General

    • OCR is now triggered automatically for PDFs if tesseract is on the user's path see (https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#TikaOCR-disable-ocr) for how to disable OCR.
    • We upgraded from log4j to log4j2 in tika-app, tika-server and anywhere else we used to use log4j.
    • By default, when rendering a page for OCR, the PDFParser does not render glyphs/text.
    • Removed deprecated Metadata keys/properties (TIKA-1974).
    • Removed deprecated PDFPreflightParser (TIKA-3437).
    • Removed dangerous calls to read an inputstream or convert to bytes without specifying a charset
    • Parsers can be configured via tika-config.xml on instantiation. We have moved away from configuration via .properties files because of confusion among users. This affects the PDFParser, TesseractOCRParser and the StringsParser.
  • tika-parsers

    • The parser modules have been broken into three main modules: tika-parsers-standard, tika-parsers-extended and tika-parsers-ml. Users may now need to add tika-parsers-extended to tika-app and tika-server to include parsers that used to be included by default (for example: envi, gdal, grib, isatab, netcdf).
    • ChmParser was moved to org.apache.tika.parser.microsoft.chm
    • RTFParser was moved to org.apache.tika.parser.microsoft.rtf
    • We are now using non-shaded versions of xmpcore with namespaces com.adobe.internal.* vs com.adobe.*.
    • We switched the underlying MP4 parser to Drew Noakes' metadata-extractor's MP4 parser from sannies' isoparser.
  • tika-app

  • tika-server

    • tika-server now by default forks a process to isolate the parsing in the forked process (this was called the -spawnChild option in tika-1.x). Clients must now expect that tika-server will restart on OOM, timeouts, crashes or after parsing a large number of files. When this happens tika-server will restand and not

... (truncated)

Commits
  • ccf9442 [maven-release-plugin] prepare release 1.27-rc1
  • 31d44e9 prep for 1.27-rc1
  • f414130 TIKA-3459 -- integrate Drew Noakes metadata-extractor as the underlying MP4 p...
  • 74c5e5a TIKA-3460 -- add missing properties files for jaiimageio-core
  • 57f5912 TIKA-3457 -- general upgrades for 1.27
  • 4ba5fd7 TIKA-3456 -- LanguageDetector should chunk long strings and test for hasEnoug...
  • 90c6ea4 TIKA-3444 -- upgrade to pdfbox 2.0.24
  • 1224f88 TIKA-3441 -- improve likelihood that tesseract processes will be shutdown on ...
  • e8ec223 Merge remote-tracking branch 'origin/branch_1x' into branch_1x
  • d7fa2cd TIKA-3441 -- improve likelihood that tesseract processes will be shutdown on ...
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
  • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
  • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
  • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language
  • @dependabot badge me will comment on this PR with code to add a "Dependabot enabled" badge to your readme

Additionally, you can set the following in your Dependabot dashboard:

  • Update frequency (including time of day and day of week)
  • Pull request limits (per update run and/or open at any time)
  • Out-of-range updates (receive only lockfile updates, if desired)
  • Security updates (receive only security updates, if desired)

Bumps [tika-parsers](https://github.com/apache/tika) from 1.24.1 to 1.27.
- [Release notes](https://github.com/apache/tika/releases)
- [Changelog](https://github.com/apache/tika/blob/main/CHANGES.txt)
- [Commits](apache/tika@1.24.1...1.27)

Signed-off-by: dependabot-preview[bot] <[email protected]>
@dependabot-preview dependabot-preview bot added the dependencies Pull requests that update a dependency file label Jul 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants