Skip to content

Optimization Ideas #238

Open
Open
@hoelzro

Description

@hoelzro

Optimization isn't a big priority right now, but I thought I'd at least gather my thoughts on how it can be done. Not every idea here is necessarily a good one; consider this a brain dump:

Profiling

  • Devel::NYTProf
  • Using strace -e trace=open can show how many files ack is opening on Linux.

Potential Places to Optimize

  • We create a resource object for every file and every filter (which results in an open system call). It'd be nice to use one resource per file. (Done in optimization branch)
  • We could lazily open resource objects so that objects filtered out early on never result in an open system call. (Done in optimization branch)
  • Caching the first line read from a file would probably help performance somewhat. (Done in optimization branch)
  • We can order filters (used for --type and --ignore-file/--ignore-dir) by some sort of efficiency rating, so is filters go first, and firstlinematch goes last. (Done in optimization branch)
  • Since ack needs to open a bunch of files to look at their first lines for shebangs, maybe we could cache the first lines in a shared memory segment or sequentially in a cache file?
  • Cache the config files? (this is probably overkill)
  • We currently build up a list of filter objects and iterate over them for each resource, checking if any match. For some filter types, we might be able to perform specific optimizations. For example, we could extract the extensions we look for into a lookup table, and look up the resource's extension there instead of iterating over a list of objects. (Done for extension filters in optimization branch)
  • Using Git's packfiles (like git ls-files or git-grep) can result in some serious speed gains. It would be nice to leverage this if we could (probably a plugin).
  • I'm sure that App::Ack::iterate (and probably some functions that invoke it) could use some optimization. However, I can't say for certain without running ack through a profiler.
  • App::Ack::iterate (which is probably a hot code path) could probably use some love. I'm not 100% sure that context stuff is completely avoided when not using context options, and I think the context stuff could probably be better (@before_context and @after_context are simply pushed to/shifted from on each iteration)
  • We create a resource object for filtering, and then we create it again for actual searching. Doing this once would be nice.
  • See http://superuser.com/questions/319286/how-to-totally-clear-the-filesytems-cache-on-linux for when we're testing against I/O stuff and we want to make sure the cache isn't skewing things.
  • Inlining the logic of App::Ack::iterate would help us cut down on subroutine call overhead.
  • Using filehandles directly (rather than via $resource->next_text) would help us reduce subroutine call overhead.
  • Performing newline normalization on output rather than for each line would probably help.
  • Using openat in ack and File::Next might help.
  • Having some sort of memory about files might be nice; ex. if we've checked that a file is binary in the past, we probably don't need to do it again.
  • We could probably cache ignore-dir results for each individual directory in the path (this only applies to the implementation in _compile_file_filter)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions