Documentation for specialize.py

Documentation structure

User manual

Overview

specialize.py is a Python script that generates a matrix-vector multiplier for a given matrix using a given method. To use it, first create a folder matrices (which will be used to store the raw contents of matrices). Invoke the script like this:

   python specialize.py matrix method-specification

This first downloads the given matrix from the Matrix Market into the matrices folder (if it is not yet there). It then creates a folder with a set of .c files that together define a function

   void multByM (double v[], double w[])
where v is the input vector and w the output vector.

The name of the folder is derived from the method specification, and the code is generated according to that specification. The .c files include one or more auxiliary function definitions and a file of declarations. The latter, named generated_decls.c, includes mainly array declarations; it also includes declarations of n (the size of the matrix, which can only be square), and a char array matrix, giving the name of the matrix. (For the unfolding specializer, all data is built into the code, so the declarations file declares only n and matrix.)

Method specifications are explained in detail below.

specialize.py runs only in Python 2.6 or higher. (Actually, I've only tested it in version 2.6.1.)

Inputs and outputs

Matrices

The folder in which specialize.py is run must contain a folder called matrices containing matrices in either "mtx" or "mm" format, or both. If a matrix is named on the command line and is not in matrices, specialize will attempt to fetch it from the Matrix Market. "mtx" is the native Matrix Market format, but specialize.py prefers the data in "mm" format. If it needs to fetch the matrix, it will store it in both formats, and if it happens to be in the folder only in mtx format, it will convert it and store the mm format. (Warning: Some matrices in MM are not full matrices but only templates, that is, they give only the indices where non-zeros would go; the input routine will, obviously, report an error on these cases.)

mtx and mm format are very similar. Both begin with a line containing "m n nz" (in our case, m always equals n, since we only handle square matrices), and the remaining lines have the form "r c v", where v = M[r,c]. The differences are: (1) mtx format uses indexing from 1, while mm uses indexing from zero. (2) mtx files can begin with comment lines, which start with %; mm files do not have these. (3) mtx files sometimes contain zero entries (so that the nz given on the first line is not actually the number of non-zeroes, but instead the number of entries given in the file; I do not know why zeroes are ever included, but they sometimes are; mm files contain only non-zero values. (4) mm files are sorted in row-major order.

Command line arguments

The script is invoked as:

    python specialize.py matrix method-specification
where matrix is the name of the matrix and method-specification is a specification describing how the code for this matrix should be generated. In the simplest case, it is just the name of a specializer, as in:
    python specialize.py fidap005 stencil
However, specifications are usually more complicated. Here is a specification that says to extract a band of width 21 and implement multiplication by that part of the matrix using stencil, and the rest using unfolding (with blocking factor of, in effect, infinity):
    python specialize.py fidap031 "[split_by_band, 10, 10, [stencil], [unfolding, 100000, 100000]]"
Here, split_by_band is a function that partitions a matrix into two parts (in-band and out-of-band); such a function is called a splitter. stencil and unfolding are methods of generating code, which are applied here to the two parts of fidap031; these are called specializers.

Specifications are explained in detail below.

Generated files

The output is a folder with a set of C files (with .c extension) containing functions and declarations, plus one non-C file containing data calculated during generation. The name of the folder is obtained from the matrix name and the specification, so that it uniquely identifies a particular specialization of a particular matrix (and can be quite long). The code files have names generated.c (the file containing multByM) and generated_i.c; the declarations file is generated_decls.c. All need to be compiled; the result is a definition of multByM(double *v, double *w) and any auxiliary functions and declarations needed by it.

The decls file includes definitions of variables n and matrix (giving the name of the matrix as a character array). These may be of use to clients.

The file of data is called info.csv, and contains data in csv form, which depends upon the method specification. Note that each matrix that appears (i.e. the original matrix and any matrix produced by a splitter) has a distinct name; splitters create names of the returned matrices by adding _1 or _2 to the name of the matrix being split. For each matrix, the first line of info.csv is the filename, and its n (which in fact is the same for all matrices), and its nz (the number of non-zeros, which is different for each partition). Furthermore, each row of data for the matrix begins with the matrix name. After that, the data depend upon the specializer. (Warning: this list is not complete; the info.csv files are written to be self-explanatory, so they should be consulted.)

(Technical note: At present, only specializers, and not splitters, can add info to a matrix; this doesn't seem to be a problem.)

Specialization specifications

The main idea of this script is that multiple specifications can be used on one matrix, by partitioning the array and assigning different specializers to the different partitions. The allows complete freedom in choosing secondary specializers, which was the initial goal of the script. The script includes several functions to generate code ("specializers"), and several functions to partition matrices ("splitters"). Splitters always divide matrices into two pieces (it is always possible that one of the pieces is empty); in the future, we may want to generalize this.

The command-line arguments to the script are, as already noted, a matrix name and a specification. Abstractly, a specification is a tree whose internal nodes are labelled by splitters (and their arguments) and whose leaf nodes are labelled by specializers (and their arguments). Concretely, the specification is a Python list of one of these two forms:

If the only argument is a specializer (no splitter), and it has no arguments, the brackets can be omitted. (The third case is special; it is explained below, in the section on specializers.) We have seen two examples above. Here are those and two more:
python specialize.py fidap005 stencil
- use stencil for fidap005
python specialize.py fidap005 "[stencil]"
- same as the previous one
python specialize.py add32 "[unrolling, 5]"
- use CSR with an unrolling factor of 5
python specialize.py fidap031 "[split_by_band, 10, 10, [stencil], [unfolding, 100000, 100000]]"
- banded stencil, with band [10, 10], and unfolding for the out-of-band elements
python specialize.py fidap031 "[split_by_block, 3, 4, [OSKI, 3, 4], [unfolding, 100000, 100000]]"
- extract the part of the matrix that divides evenly into 3-by-4 blocks, and use OSKI for that part; treat the remaining few elements by unfolding.
Note that the specification has to be quoted (except in the rare cases when it is a single word).

Specializers and splitters

The following are the current specializers. Most will reliably generated code for any square matrix; OSKI, row-segment, and diagonal can only be used after specific splitters, to ensure the matrices have the required properties. Working Paper 2 describes the methods in detail. Working Paper 3, Low-level Coding Variants, discusses how they are implemented; in most cases, several versions of the specializer was written and one chosen based on experiments.

Earlier we mentioned that a method specification can begin with repeat followed by a number (say, r). The effect is to generate the code given by the specification, and then duplicate it r times in multByM. The result is a multiplication routine that is wrong; in fact, every value in the output vector will be exactly r times what it should be. The purpose of this feature is to allow experimentation on the effect of varying code sizes; by duplicating the code for a method, we ensure that two versions with different values of r will differ only in the code size, and not in any other way (e.g. memory access patterns). Of course, when doing these experiments, one must remember to divide the time by r.

The following are the current splitters. Each produces a pair of matrices [mat1, mat2] that partition the elements of the argument matrix; we describe only the contents of mat1; mat2 contains all the other elements.

The splitters given above do not attempt to split the matrix intelligently. There is no reason that we could not write a splitter that would attempt to divide the matrix "optimally" for some pair of specializers.

Technical documentation

The code contains a lot of comments. Here I'll just write about what specializers and splitters do, and how to write them. Naturally, the code itself is the best guide.

One "gotcha": If you write a new splitter or specializer, you must add it to the function evalspec; just follow the obvious model there.

Matrices

Matrices are represented as Python dictionaries. Initially - that is, when first read in from the '.mm' file - they have these entries:

When splitters are called, as explained below, they return new matrices; those matrices have the same n, but different names and (usually) different data and nz counts.

After specializers are called, they add one or more (but usually all) of the following:

Splitters

A splitter takes a matrix, and possibly some additional arguments, and returns a pair of matrices which together represent a partition of the original matrix (i.e. each element in the original matrix occurs in exactly one of the returned matrices). The two matrices have different names, and usually different non-zero counts (although it is legal for a partition to return a pair with the original matrix and an empty matrix), but the same n.

Aside from producing a partition, the only rule about splitters is that they must have names beginning split_.

Two auxiliary functions useful for writing splitters are:

Also, if you want to write a splitter that is based on the stencils or row-non-zero counts of the matrix, you can call groupRowsByStencil(matrix) or groupRowsByNZ(matrix). Both return the original matrix, with an additional field ('stencils' or 'rowsbynz'); the format of the those fields is explained in the code.

Specializers

Specializers take a matrix and possibly other arguments, add fields to the matrix giving the generated code, and return the modified matrix. At the top level, the fields of the various matrices (if the original was split) are combined and output into the files described above in the section "Generated files".

The fields that can be added are listed above. We elaborate a bit on each here. Two points to keep in mind are that (a) generated code may be divided up into functions and files (this is not done by the specializers, but in a separate pass), and (b) it is possible that the same specializer may be used twice in a specialization tree, so care must be taken to avoid name collisions. A global variable in the script, called stage is incremented for every specializer, and can be used to make names unique.

Abstract syntax

Specializer create code using a simple form of abstract syntax. The purpose of this is to make it easier to divide the code into functions and files. The abstract syntax operators are give in the script (search for AST operations). Many are used rarely or not at all. Here are the important ones:

To-do list

Deal with "first +=" problem.
To allow for multiple methods to be applied (using splitters), we follow this convention: the output vector w is initialized to zeros, and all multiplication routines operate by adding to the elements of w; that is, every assignment generated by a specializer has the form w[i] += .... The problem is that this may mean using "+=" more than necessary, since obviously we don't need to use "+=" the first time we assign to w[i]. Using "+=" means doing one more fetch and add than would be needed if we just used "=". For example, if an entire matrix is treated using unfolding, with infinite block size, then every row gets assigned to exactly once, so we can write "=" instead of "+="; on the other hand, if we use unrolling (i.e. standard CSR), the "+=" is unavoidable. The question is whether we can eliminate "+=" when it is actually unnecessary. This seems to be somewhat complicated, but it might improve performance, albeit only slightly.
Not sure how to filter for zeros in mtx files.
Currently just testing "float(input)<>0.0"; seems to work.