Working papers on staging matrix multiplication


This project is divided into three parts, as described on the matrix multiplication page, each with a number of sub-projects. Since not every sub-problem is independently publishable, we are recording our progress in a set of working papers (formatted in HTML).

WP 1: Timings
Discusses the machines and matrices were are using for testing, and the testing procedure, and give links to csv files containing the timings themselves. As usual, we derive our timings from a set of measurements, to minimize noise; in this note, we explain why we take the minimum time from a set of tests, rather than the median.
WP 2: Matrix multiplication methods
We describe the various methods we are using (a list which keeps growing). Note that we will often use multiple methods for a single matrix, first partitioning it (usually into two parts, but sometimes more) and applying a different method to each part. This WP describes the basic methods.
WP 3: Low-level coding variants.
The methods all have, at their core, one or more simple loops. Still, despite their simplicity, they can be coded in different ways that can have a measurable impact on their performance. The main differences are in the use of array subscripting vs. address arithmetic and explicit dereferencing (all our codes are written in C). In this note, we describe how we chose a specific implementation of each method.
WP 4: Testing framework
This is not so much a note as a page linking to a Python script that generates our variants, and to its documentation. This script allows for matrices to be partitioned in a variety of ways, and different methods applied to each partition.
WP 5: The run-time compiler
We are developing a C language extension for staging, and an LLVM-based run-time compiler. This note describes the current state of that compiler.

Last updated on Fri Jul 6 09:00:35 CDT 2012