This documentation is up-to-date for voc-release4.01. It mostly applies to the model format used in voc-release5. Updated documentation is in progress. Weight vector documentation =========================== The model parameters are encoded in a weight vector that is updated by a stochastic gradient descent procedure. The mapping from indexes in the weight vector to pieces of the object model (e.g., filters, deformation models, rule offests, ...) is done through 'blocklabels'. The weight vector is composed of contiguous blocks of parameters. Each block holds a related set of weights. A blocklabel is an index into the list of blocks that comprise the weight vector. For example the weight vector w = (w_1, w_2, ..., w_30) has 30 blocks, and a part filter for a model might be stored in w_12 (blocklabel = 12). The block layout of the weight vectors allows feature vectors to be stored in a sparse format where only the non-zero blocks are written. Likewise, learn.cc takes advantage of the sparse representation when computing dot products. Model format documentation ========================== This document describes how the model format has changed relative to the format used in the previous release. The new model format supports describing object models in terms of acyclic grammars. At the top level, the model structure has the following new fields: filters: [1x42 struct] rules: {1x81 cell} symbols: [1x81 struct] start: 2 (Fields are shown with example values for the purpose of explaining their usage.) I'll go through each of these fields starting with the symbols struct array. model.symbols(i) = type: 'T' filter: 3 The grammar is built out of symbols. Each symbol is labeled by a number (which will be synonymous with the word "symbol") and has a type: 'T' => terminal; 'N' => non-terminal. Each terminal corresponds to exactly one filter. In this case, the filter field is used to hold the filter's index. For non-terminal symbols, the filter field is empty. Since we're talking about filters, let's look at the filter struct array next. model.filters{i} = w: [7x11x31 double] blocklabel: 1 symmetric: 'M' size: [7 11] flip: 0 symbol: 1 As expected, each entry holds the filter weights and the blocklabel. The symmetric field can be either: 'M' => has a vertically mirrored partner or 'N' => no symmetry constraint. If a filter has symmetric == 'M', then there will be two filters that share the same blocklabel. The flip field is used to indicate if the weights read from a model block should be flipped to form the filter (and likewise, if features written to the cache should be flipped). The symbol field is simply a reference back to the terminal symbol that uses the filter. Now let's look at model.rules. This cell array holds the grammar's productions. Each symbol has a (possibly empty) cell in model.rules. The cell model.rules{i} holds an array struct that lists the rules for which i acts as the left-hand side symbol. So, model.rules{4}(1) and model.rules{4}(2) are the first two productions for symbol 4 in some imaginary grammar. The field model.start holds the distinguished start symbol for the grammar. Let's use that symbol as an example. model.rules{model.start}(1) = type: 'S' lhs: 2 rhs: [1 11 15 19 23 27 31] offset: { w: -3.394 blocklabel: 2 } anchor: {[0 0 0] [12 0 1] [4 5 1] [13 8 1] [0 0 1] [16 0 1] [0 5 1]} Here is the first rule with model.start on the LHS. In the case of a 6 component mixture model, model.rules{model.start} is a struct array of 6 elements. The field lhs is simply a convenience field indicating what the production's LHS is. The field rhs holds the symbols that appear on the production's right-hand side. The field type is 'S' if this is a structural rule or 'D' if this is a deformation rule. Now I'll split the description into two cases. case 1: structural rule The field offset holds the offset value and its blocklabel (these will be shared for mirrored components). The anchor field holds the parameters of the "structure function" that defines how each of the symbols in the rhs is placed relative to a placement of the lhs symbol. The format is [dx dy ds], where 2^ds is the scale change. So ds = 1 implies that the rhs symbol lives at twice resolution of the lhs symbol. The values of dx, dy are HOG cell displacements at the rhs's native scale. Note that dx and dy are displacements, so they are 1 less than the anchor values that were defined in the old model. The first symbol on the rhs has anchor = [0 0 0], because this symbol is a terminal for the root filter. case 2: deformation rule Let's look at the second symbol on the rhs in the rule above. model.rules{11} = type: 'D' lhs: 11 rhs: 10 def: { w: [0.0209 -0.0015 0.0155 0.0010] blocklabel: 8 flip: 0 symmetric: 'M' } offset: { w: 0 blocklabel: 7 } Deformation rules don't have an anchor field, but do have a def field. The def field describes the deformation model for this rule, and so it includes the coefficients and the blocklabel. The def.symmetric field can be 'M' if there is a vertically mirrored deformation rule or 'N' if there is no symmetry constraint. In the case of 'M', flip indicates how to write features and read models (just like features.flip). There's an implicit assumption in the code that deformation rules only have one symbol on the right-hand side (though it need not be a terminal).