- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 5.5k
Functional style #2908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Functional style #2908
Conversation
This cost 5 bytes, but it's both more readable and more efficient.
| if (isFunction(value)) return optimizeCb(value, context, argCount); | ||
| if (isObject(value) && !isArray(value)) return matcher(value); | ||
| return property(value); | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarity: this code was merged into the public-facing modules/iteratee.js.
|  | ||
| // Iteratively cut `array` in half to figure out the index at which `obj` should | ||
| // be inserted so as to maintain the order defined by `compare`. | ||
| export default function binarySearch(array, obj, iteratee, compare) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a more general version of _.sortedIndex, using compare instead of builtin operator <. _.sortedIndex now calls binarySearch internally.
| var result, iterResult; | ||
| iteratee = cb(iteratee, context); | ||
| var first = true; | ||
| find(collection, function(value, key) { | ||
| var iterValue = iteratee(value, key, collection); | ||
| if (first || compare(iterValue, iterResult)) { | ||
| result = value; | ||
| iterResult = iterValue; | ||
| first = false; | ||
| } | ||
| }); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop represents the "slow path" that I mentioned in the opening post. The fast path is currently still duplicated in _.min and _.max, for reasons explained under _.max.
| switch (argCount == null ? 3 : argCount) { | ||
| case 1: return function(value) { | ||
| return func.call(context, value); | ||
| }; | ||
| // The 2-argument case is omitted because we’re not using it. | ||
| case 3: return function(value, index, collection) { | ||
| return func.call(context, value, index, collection); | ||
| }; | ||
| case 4: return function(accumulator, value, index, collection) { | ||
| return func.call(context, accumulator, value, index, collection); | ||
| }; | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part has been replaced by bindCb4.
| return function() { | ||
| return func.apply(context, arguments); | ||
| }; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part has been replaced by bindCb.
| for (var l = getLength(collection), i = 1; i < l; i++) { | ||
| val = collection[i]; | ||
| if ( | ||
| res == null || val != null && +val > +res || +res !== +res | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the -Infinity initializer, but at the same time committed myself to preserving the aforementioned crazy comparison semantic (until Underscore 2.0). This shifted all the crazyness into the if condition here. Note that this would just be val > res if we were following standard semantics.
| iteratee = cb(iteratee, context); | ||
| each(obj, function(v, index, list) { | ||
| computed = iteratee(v, index, list); | ||
| if (computed > lastComputed || computed === -Infinity && result === -Infinity) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the original slow path, computed and result were compared to -Infinity on every iteration as a way to ensure that an infinity produced by the iteratee would be preferred over the infinity initializer. This was the second factor that caused #2688. In the new slow path (in extremum), this special condition is no longer present; it is no longer required because the -Infinity initializer is gone.
Note that the special case for null, which I discussed in the fast branch, is not present here. This is due to an oversight of the person who originally added that special case. In effect, this created an additional internal inconsistency in the comparison semantics of _.max and _.min.
| } | ||
| return result; | ||
| return extremum(collection, function(val, res) { | ||
| return res == null || val != null && +val > +res || +res !== +res; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new slow path wraps the comparison condition in a function that is passed to extremum. It is the exact same expression as in the fast path; this is meant to ensure that, despite the code duplication, gzip can efficiently compress both copies together.
| // A **less general** backward variant of `_.each`, specifically catered to | ||
| // implementing `_.reduceRight`. | ||
| function eachRight(obj, func) { | ||
| if (isArrayLike(obj)) { | ||
| findLastIndex(obj, func); | ||
| } else { | ||
| findLastIndex(keys(obj), function(key) { | ||
| func(obj[key], key, obj); | ||
| }); | ||
| } | ||
| } | ||
|  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the one exception to _.find being the single source of truth about how to iterate a collection. The problem is that there is no findRight, and I think there never should be; "rightward" iteration is only meaningful for arrays. Objects are unordered and Map, Set and iterators can be iterated in one direction only. In my opinion, reduceRight should disappear in Underscore 2.0. People who want to do right-to-left iteration can use _.findLastIndex directly.
| if (getLength(array) < 1) return -1; | ||
| if (typeof controlArg == 'number') { | ||
| start = controlArg; | ||
| } else if (controlArg && forward) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The && forward condition is only here to prevent the isSorted option from creeping into _.lastIndexOf as a side effect, purely because I vowed not to make any interface changes on this branch. It can be removed in a future update.
| High-level review by @jashkenas, quoted from private email with his permission: 
 | 
| First benchmark is in, thanks to awesome work by @m90! | 
| Update: I will be writing an additional real-world benchmark in week 46, which is November 13–17. I hope to finish and merge this branch soon afterwards. | 
This is a substantial refactoring of the Underscore source code, which I've been working on since shortly after the modularization (#2849). Before merging, I would like this to be reviewed extensively, by multiple people, in multiple ways and in multiple stages. Please see the final third of this post for review details.
Edit: I need help from people who can contribute benchmarks based on real-world applications. Please leave a comment if you think you might be able to help!
Goals
Map,Setand iterators in Underscore 2.0. This can only be done in a maintainable way if there is a single source of truth about how to iterate any given collection. More about this below.Principles
Approach
I started by asking myself which existing Underscore function should be the single source of truth about collection iteration. Initially, I considered
_.reduce. It is a very general function and many other collection functions can be cleanly expressed in terms of it, for example_.map,_.filterand_.min/_.max. I also considered_.each, because it doesn't use an accumulator by default._.eachand_.reduceare equally powerful, in the sense that either can be cleanly expressed in terms of the other.I soon realized, however, that there is a function that can't be cleanly expressed in terms of
_.eachor_.reduce:_.find. Like the proceduralfor-inloop with abreakoption,_.findmay stop iteration early. Otherwise, it basically does the same thing as_.each. All Underscore collection functions can be cleanly expressed using_.find, including not only_.each,_.reduce,_.mapand_.minbut also_.some,_.everyand_.contains. For this reason, I have chosen_.findto be the one function that defines how to iterate a collection.Conveniently,
_.findwas already implemented by branching over the collection types: it calls_.findIndexon arrays and_.findKeyon objects. These latter functions, in turn, are the single sources of truth about iterating arrays and objects, respectively (although the situation is a bit more nuanced with regard to arrays; more on this shortly). In Underscore 2.0, I plan to add_.findEntryforMap/Setand_.findIterationfor iterators. By including these in_.findand implementing all other collection functions in terms of_.find, all collection functions would automatically support all five collection types in a consistent way.Besides using
_.findin all collection functions, one of the first things I did was factoring out very generallinearSearchandbinarySearchfunctions. These are the actual single sources of truth on how to iterate/search arrays;_.findIndex,_.findLastIndex,_.indexOf,_.lastIndexOfand_.sortedIndexare all implemented usinglinearSearchand/orbinarySearchunder the hood.linearSearchis so general that I was able to substitute it for nearly all hand-writtenfor/whileloops in the source code. This proved very effective in both producing cleaner code and reducing the bundle size. However, I have reverted this in many places because function call overhead turned out to be costlier than I expected. It seems that even modern JS engines rarely perform inlining, if ever. After discovering this, I adopted the following "safe" rule to choose betweenlinearSearchand a hand-written loop: if the loop body didn't previously involve any function call, keep the hand-written loop; otherwise, replace it bylinearSearchor another Underscore function. In this way, the extra function call introduced by using an Underscore function can never slow down the loop by more than a factor two. I expect the slowdown to be less in real-world scenarios, because a loop body that purely consists of two function calls (i.e., with functions that don't do anything) is trivial. Also, loops that already involved a function call often already involved multiple function calls.I wrote an extensive microbenchmark suite to measure performance, in which each operation is repeated at least 3.6 million times and during at least a whole second. Please follow that link for details. All performance claims in the current post are informed by repeated measurements in Node.js 10, mobile Safari 14 and desktop Firefox 84 (on macOS) with those microbenchmarks. However, I believe that a final performance impact assessment should be made by comparing execution time in real-world applications and across a wider range of engines. More on this in the review section at the bottom of this post.
With this overall approach in mind, I made several passes over all the source modules, looking for opportunities to make the code cleaner, DRYer and more functional. The code I'm presenting today is the result of this effort and the subject of the first round of reviews. Once I have received and processed the first round of reviews, I plan to make one more pass over the source code, in order to make function parameter names more consistent and to fix the linear order of the bundles. After that, I'll take one smaller, final review before finally merging the branch.
Results (stage 1)
_.findare now written in a collection-agnostic way, i.e., they use_.findor another Underscore function to abstract over the collection type.linearSearch,binarySearch,extremum,greater,less,lessEqual. I believe thatlinearSearchandbinarySearchshould remain internal, in favor of the more user-friendly_.*index*family of functions (which can however expand parameters to support more options from the underlying primitives); the others could potentially be made public in a future update._.sortedLastIndexin the future (in fact it's implemented but not added to theindex.jsyet) as well as anisSortedoption for_.lastIndexOf(which is manually disabled for now).linearSearchor another Underscore function. BesideslinearSearch, the main loop primitives are_.findKeyand_.times. Besides those primitives, 20 functions (out of the original 38) still contain hand-written loops for performance reasons.extremum, I fixed _.max and _.min may behave incorrectly if iteratee returns infinite number #2688 and opened the possibility of making_.minand_.maxwork with non-numbers in the future. This is also how I uncovered the "crazy comparison semantic" that I discuss in the next bullet._.minand_.max, which transforms the elements through aniterateebefore comparing, through the outfactoring ofextremum. I couldn't deduplicate the fast path (withoutiteratee) yet, because, for historical reasons, they currently have a rather crazy comparison semantic; they do something that is wildly different from just the builtin</>operators. I could abstract this semantic in a function and pass it toextremum, and in fact, this is exactly what I do in the slow path, but the fast path would suffer a performance degradation of roughly a factor 25 if I were to do the same there. Alternatively, I could inline the comparison logic inextremum, but I don't want to do that because it is really crazy. I want to change it back to regular</>in Underscore 2.0, in which case I'll be able to move the fast path toextremumas well and finish the deduplication. Please see my pre-review comments after this post for more details._.restArgumentsandoptimizeCbare gone. As far as my benchmarks could tell, they had no performance benefit whatsoever over the simpler code that I put in their place._.iterateeand the internalcbhas been simplified substantially._.mapand_.each, take a slight performance hit due to the changes. In my microbenchmarks, with very trivial loop bodies, this could be up to a factor two. I consider this unavoidable, due to the need for a single source of truth about collection iteration, though I have tried hard to keep the sacrifices at a minimum. I'd like to see real-world performance comparisons before drawing any conclusions._.restArguments, which are used internally throughout Underscore.Review
This PR is unusual because it may lead to changes that sacrifice performance in favor of other considerations. For the sake of accountability, I'd like the review to be very thorough and entirely public. This is going to be a lot of work and I don't want all that work to land on the shoulders of only one or two people. Instead, I hope to involve many people, each doing only light work. As a token of appreciation, everyone making a substantial contribution to this review will get a permanent honorable mention in the changelog.
I propose to divide the review work as follows. I'll be inviting specific people for specific parts, but please rest assured that anyone is welcome to participate in any part of the review. I'll keep track of review progress below.
modules/, preferably by at least two independent expert Underscore users. I would like to specifically invite @cambecc, @reubenrybnik and @joshuacc for this part. Please spread the word if you know other expert Underscore users who might be willing to contribute a code review.Benchmark preparation
If you have a real-world JavaScript application (or library function) that meets all of the following criteria:
and you are willing to do the following things, with help as needed:
then please let us know by leaving a comment below!
Performance measurements
You can contribute performance measurements by doing the following.
Final words
I make major contributions to Underscore, like the current one, because I believe it is an excellent library and because I want to help lift it to the next level. If you like what I'm doing, please consider supporting my open source work on Patreon. Shoutout to my existing backers, who encouraged me to do this work, and to @jashkenas, who created this awesome library.