-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Weight vector
VW's weight vector has
float (4-byte) weights (where is specified by the
-b option) and each example's features are hashed to an index in . The weight vector is also used to store other vectors needed by more sophisticated learning algorithms, such as the conjugate gradient method (
--conjugate_gradient), or adaptive gradient descent (--adaptive --invariant, and --normalized). In these more sophisticated cases, some small integer multiplier will be used on the size of the weight vector so there's enough room to store all these auxilliary weights side-by-side, in the same 'hash-bucket'.
In other words: when more than one vector is stored in the same global space, every hash-value slot will store multiple "weights". The size (number of floats) in the hash-bucket is called the
stride in the vw source.
VW uses -b 18 by default. 2^18 is 262144 meaning if you have much less than 262144 distinct features in your training set you should be relatively safe from hash-collisions. If you auto-generate many new features of the fly, like when you use -q (quadratic), -c (cubic), or --nn, you may want to increase the default by requesting a bigger -b value to avoid hash collisions.
By default, vw uses -b 18 and normalized/adaptive/invariant SGD. So the overall size allocated for the weight vector is:
= 2^18 * weights_per_stride * (sizeof float) bytes
= 262144 * 3 * 4 bytes
= 3,145,728 bytes
= A bit over 3MB
- Home
- First Steps
- Input
- Command line arguments
- Model saving and loading
- Controlling VW's output
- Audit
- Algorithm details
- Awesome Vowpal Wabbit
- Learning algorithm
- Learning to Search subsystem
- Loss functions
- What is a learner?
- Docker image
- Model merging
- Evaluation of exploration algorithms
- Reductions
- Contextual Bandit algorithms
- Contextual Bandit Exploration with SquareCB
- Contextual Bandit Zeroth Order Optimization
- Conditional Contextual Bandit
- Slates
- CATS, CATS-pdf for Continuous Actions
- Automl
- Epsilon Decay
- Warm starting contextual bandits
- Efficient Second Order Online Learning
- Latent Dirichlet Allocation
- VW Reductions Workflows
- Interaction Grounded Learning
- CB with Large Action Spaces
- CB with Graph Feedback
- FreeGrad
- Marginal
- Active Learning
- Eigen Memory Trees (EMT)
- Element-wise interaction
- Bindings
-
Examples
- Logged Contextual Bandit example
- One Against All (oaa) multi class example
- Weighted All Pairs (wap) multi class example
- Cost Sensitive One Against All (csoaa) multi class example
- Multiclass classification
- Error Correcting Tournament (ect) multi class example
- Malicious URL example
- Daemon example
- Matrix factorization example
- Rcv1 example
- Truncated gradient descent example
- Scripts
- Implement your own joint prediction model
- Predicting probabilities
- murmur2 vs murmur3
- Weight vector
- Matching Label and Prediction Types Between Reductions
- Zhen's Presentation Slides on enhancements to vw
- EZExample Archive
- Design Documents
- Contribute: