Optimization Ideas

Optimization isn't a big priority right now, but I thought I'd at least gather my thoughts on how it can be done.  Not every idea here is necessarily a good one; consider this a brain dump:
# Profiling
- Devel::NYTProf
- Using `strace -e trace=open` can show how many files ack is opening on Linux.
# Potential Places to Optimize
- [x] We create a resource object for every file and every filter (which results in an open system call).  It'd be nice to use one resource per file. (**Done in optimization branch**)
- [x] We could lazily open resource objects so that objects filtered out early on never result in an open system call. (**Done in optimization branch**)
- [x] Caching the first line read from a file would probably help performance somewhat. (**Done in optimization branch**)
- [x] We can order filters (used for --type and --ignore-file/--ignore-dir) by some sort of efficiency rating, so is filters go first, and firstlinematch goes last. (**Done in optimization branch**)
- [ ] Since ack needs to open a bunch of files to look at their first lines for shebangs, maybe we could cache the first lines in a shared memory segment or sequentially in a cache file?
- [ ] Cache the config files? (this is probably overkill)
- [x] We currently build up a list of filter objects and iterate over them for each resource, checking if any match.  For some filter types,  we might be able to perform specific optimizations.  For example, we could extract the extensions we look for into a lookup table, and look up the resource's extension there instead of iterating over a list of objects. (**Done for extension filters in optimization branch**)
- [ ] Using Git's packfiles (like git ls-files or git-grep) can result in some serious speed gains.  It would be nice to leverage this if we could (probably a plugin).
- [ ] I'm sure that `App::Ack::iterate` (and probably some functions that invoke it) could use some optimization.  However, I can't say for certain without running ack through a profiler.
- [ ] `App::Ack::iterate` (which is probably a hot code path) could probably use some love.  I'm not 100% sure that context stuff is completely avoided when not using context options, and I think the context stuff could probably be better (`@before_context` and `@after_context` are simply pushed to/shifted from on each iteration)
- [ ] We create a resource object for filtering, and then we create it again for actual searching.  Doing this once would be nice.
- [ ] See http://superuser.com/questions/319286/how-to-totally-clear-the-filesytems-cache-on-linux for when we're testing against I/O stuff and we want to make sure the cache isn't skewing things.
- [ ] Inlining the logic of `App::Ack::iterate` would help us cut down on subroutine call overhead.
- [ ] Using filehandles directly (rather than via `$resource->next_text`) would help us reduce subroutine call overhead.
- [ ] Performing newline normalization on _output_ rather than for each line would probably help.
- [ ] Using `openat` in ack and `File::Next` might help.
- [ ] Having some sort of memory about files might be nice; ex. if we've checked that a file is binary in the past, we probably don't need to do it again.
- [ ] We could probably cache ignore-dir results for each individual directory in the path (this only applies to the implementation in `_compile_file_filter`)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimization Ideas #238

Profiling

Potential Places to Optimize

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimization Ideas #238

Description

Profiling

Potential Places to Optimize

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions