Week 7 Hamming codes

Version 2023-11-04. To PDF of this week only Open the whole set of notes in PDF

Synopsis. Hamming codes are essentially the first non-trivial family of codes that we shall meet. We give a construction of a \(q\)-ary Hamming code and prove that it is perfect with minimum distance \(3.\) We show that syndrome decoding works for Hamming codes in an especially simple way.

Finding a check matrix

Before we can construct Hamming codes, we need to discuss check matrices further and prove a result (the Distance Theorem) which will allow us to find the minimum distance of a linear code from its check matrix.

The following result allows us to find a generator matrix of \(C^\perp ,\) assuming that \(C\) has a generator matrix in standard form.

Theorem 7.1: a check matrix construction

Assume that \(C\) has a \(k\times n\) generator matrix \(G=[\,I_k\,|\,A\,]\) in standard form. Then the dual code \(C^\perp \) has generator matrix

\[ H = [\,-A^T\, |\, I_{n-k}\, ]. \]

  • Proof. \(H\) has \(n-k\) rows which are linearly independent (due to \(I_{n-k}\) present). It is enough to show that each row \(\ul r\) of \(H\) is a codevector of \(C^\perp :\) indeed, we have \(n-k\) linearly independent vectors in \(C^\perp ,\) and \(n-k\) is the dimension of \(C^\perp \) by Theorem 5.1, so a linearly independent set of \(n-k\) vectors must be a basis of \(C^\perp .\)

    By Theorem 5.1, it is enough to show that \(\ul rG^T =\ul 0.\) We will show this at once for all rows of \(H,\) by proving that \(HG^T\) is the zero matrix. Indeed,

    \begin{equation*} [\,-A^T\, |\, I_{n-k}\, ] \left [\begin{smallmatrix}&I_k& \\ & -\!\!-\!\!-& \\ &A^T &\end {smallmatrix}\right ] = -A^T I_k + I_{n-k}A^T = -A^T+A^T=0. \qedhere \end{equation*}

    (As claimed.)

How can one find a check matrix of \(C\) if \(C\) has no generator matrix in standard form? We address this question below.

Linearly equivalent codes

Definition: linearly equivalent codes

Two linear codes \(C,C'\subseteq \F _q^n\) are linearly equivalent, if \(C'\) can be obtained from \(C\) by a sequence of linear transformations of the following types:

  • (C1) choose indices \(i,\) \(j;\) in every codeword, swap symbols \(x_i\) and \(x_j;\)

  • (C2) choose index \(i\) and non-zero \(\lambda \in \F _q;\) in every codeword, multiply \(x_i\) by \(\lambda .\)

  • Exercise. Linearly equivalent codes have the same length, dimension and weight. They have the same weight enumerator. (Reason: (C1) and (C2) do not change the weight of any vector.)

Fact: known from linear algebra

Every generator matrix can be brought into the standard form by using row operations (R1), (R2), (R3) considered above and column operations (C1).

Reason: any matrix can be brought to reduced row echelon form, RREF, by (R1)–(R3); a generator matrix has linearly independent rows so the RREF won’t have zero rows and will have a leading entry \(1\) in each of the \(k\) rows; the \(k\) columns which contain the leading entries are columns of the identity matrix of size \(k;\) use (C1) to move all these columns to the left.

Conclusion: we can always find a generator matrix in standard form for a linearly equivalent code.

The Distance Theorem

We already know how to read the length and the dimension of a linear code \(C\) off a check matrix \(H\) of \(C:\)

  • • the number of columns of \(H\) is the length of \(C;\)

  • • the number of columns minus the number of rows of \(H\) is the dimension of \(C.\)

The following theorem tells us how to determine the minimum distance of \(C\) using \(H.\)

Theorem 7.2: Distance Theorem for linear codes

Let \(C\subseteq \F _q^n\) be a linear code with check matrix \(H.\) Then \(d(C ) = d\) if and only if every set of \(d - 1\) columns of \(H\) is linearly independent and some set of \(d\) columns of \(H\) is linearly dependent.

  • Proof. Let \(e\) be the size of a smallest linearly dependent subset of the set \(\{\co h_1,\ldots ,\co h_n\}\) of columns of \(H.\) The theorem claims that \(e=d(C).\) Note that \(e\) is the minimum positive number of non-zero coefficients \(x_i\) in the linear combination

    \begin{equation*} x_1 \co h_1+x_2\co h_2+\ldots +x_n \co h_n=\co 0, \end{equation*}

    i.e., the minimum weight of non-zero \(\ul x =(x_1,\dots ,x_n)\) such that \(\ul x H^T = \ul 0.\) By Theorem 5.1, such vectors \(\ul x\) are exactly the codevectors of \(C,\) so \(e=w(C)=d(C)\) as claimed.

Example: calculate \(d(C)\) using the Distance Theorem

Use the Distance Theorem to find the minimum distance of the ternary linear code with check matrix \(H=\begin {bmatrix}0&1&2&1 \\ 2&0&1&1 \end {bmatrix}.\)

Solution. Step 1. \(H\) has no zero columns. Hence every set of \(1\) column is linearly independent (a one-element set is linearly dependent iff that element is zero). So \(d\ge 2.\)

Step 2. Any two columns of \(H\) are linearly independent, because no two columns are proportional to each other. So \(d\ge 3.\)

Step 3. There are three linearly dependent columns in \(H:\) for example, columns 1, 2 and 3 form linear combination \(\begin {bmatrix}0\\2\end {bmatrix} + \begin {bmatrix}1\\0\end {bmatrix} + \begin {bmatrix}2\\1\end {bmatrix}=\co 0.\) Therefore, \(d=3.\)

Hamming codes: the construction

Definition: line, representative vector, projective space

A line is a \(1\)-dimensional subspace of the vector space \(\F _q^n.\)

A representative vector of a line is a non-zero vector \(\ul u\) from that line. The line is then given by \(\{\lambda \ul u\mid \lambda \in \F _q\}.\)

The projective space \(\PP _{n-1}(\F _q)\) is the set of all lines in \(\F _q^n.\)

Remark: the terminology comes from euclidean geometry — in the euclidean plane, the set of all vectors proportional to a given non-zero vector is a straight line through the origin. Projective spaces over the field \(\mathbb R\) of real numbers are well-studied geometric objects.

For example, \(\PP _1(\mathbb R)\) — the set of all lines through the origin in the euclidean plane — can be thought of as the unit circle with antipodes identified. We are working over a finite field \(\F _q\) where these notions are less intuitive.

Definition: Hamming codes

Let \(r \ge 2\) be given. We let \(\Ham (r, q)\) denote an \(\F _q\)-linear code whose check matrix has columns which are representatives of the lines in \(P_{r-1}(\F _q ),\) exactly one representative vector from each line.

Remark: \(\Ham (r,q)\) is not one code but a class of linearly equivalent codes

\(\Ham (r,q)\) is defined up to a linear equivalence. Indeed, we can:

  • • multiply a column by non-zero \(\lambda \) to get another representative of the same line;

  • • put columns in any order.

This means that \(\Ham (r,q)\) is not just one code but a class of linearly equivalent codes. We will therefore say “a \(\Ham (r,q)\) code” to mean any of the linearly equivalent codes.

Let us see how the construction works in historically the first example of a Hamming code.

Example: \(\Ham (3,2)\)

Construct a parity check matrix for a binary Hamming code \(\Ham (3,2).\) Then find a generator matrix in standard form for \(\Ham (3,2).\)

Solution: we need to take one non-zero column from each line in \(\F _2^3.\) For binary vectors, a line \(\{\lambda \ul u\mid \lambda \in \F _2\}\) consists only of two points, \(\ul 0\) and \(\ul u.\) This means that a check matrix for a binary Hamming code consists of all non-zero binary columns or the required size.

Start filling in the check matrix by putting the identity columns at the end (this is convenient for finding a generator matrix). In total, there are \(7\) non-zero binary vectors of size \(3:\)

\[ H=\left [\begin {array}{cccc|ccc} 1& 1& 1&0&1&0&0\\ 1& 1& 0&1&0&1&0\\ 1& 0& 1&1&0&0&1 \end {array}\right ] \]

From this \(H,\) we can reconstruct the generator matrix \(G=[I_k \mid A]\) by Theorem 7.1:

\[ G=\left [\begin {array}{cccc|ccc} 1& 0& 0&0&1&1&1\\ 0& 1& 0&0&1&1&0\\ 0& 0& 1&0&1&0&1\\ 0& 0& 0&1&0&1 &1 \end {array}\right ] \]

This is, up to linear equivalence, the generator matrix of the original code of R. Hamming.

Historical remark. Despite their name, the \(q\)-ary Hamming codes for \(q>2\) were not invented by Hamming. Richard Hamming told Claude Shannon (who he shared an office with at Bell Labs) about his binary \([7,4,3]\)-code, and Shannon mentioned it in his paper of 1948. That paper was read by Marcel J. E. Golay (1902–1989), a Swiss-born American mathematician and electronics engineer, who then suggested the \(\Ham (r,q)\) construction in his paper published in 1949. Golay went further and constructed two perfect codes which are not Hamming codes. He asked whether there are any more perfect codes.

We will see the Golay codes, and will learn about an answer to Golay’s question about perfect codes, later in the course.

Parameters of a Hamming code

We considered an example of a \(\Ham (3,2)\) code, which — by looking at its generator matrix — turns out to be a \([7,4,d]_2\) code. It is not difficult to see directly that \(d=3.\) By explicitly computing the Hamming bound, one can show that all \([7,4,3]_2\)-codes are perfect.

We will now generalise this and show that all Hamming codes are perfect.

Theorem 7.3: properties of Hamming codes

\(\Ham (r,q)\) is a perfect \([n,k,d]_q\) code where \(n=\dfrac {q^r-1}{q-1},\) \(k=n-r,\) \(d=3.\)

  • Proof. The length \(n\) of the code is equal to the number of columns in the check matrix, which is \(\#\mathbb P_{r-1}(\F _q),\) the number of lines in \(\F _q^r.\)

    Observe that two lines intersect only at one point, namely \(\co 0.\) The set \(\F _q^r\setminus \{\co 0\}\) is therefore a disjoint union of lines. Each line \(\{\lambda \co u:\lambda \in F\}\) contains \(q-1\) non-zero points.

    So the number of lines in \(\F _q^r\) can be found as \(\dfrac {\#(\F _q^r\setminus \{\co 0\})}{q-1} =\dfrac {q^r-1}{q-1}.\)

    We have \(k=\dim \Ham (r,q)=n-r\) since, by construction, the check matrix \(H\) has \(r\) rows.

    To find \(d,\) we use the Distance Theorem for linear codes. Any two columns of \(H\) are linearly independent because they are from different lines in \(\F _q^r.\) (Two vectors are linearly dependent only if they are proportional to each other, i.e., belong to the same line.) Therefore, \(d\ge 3.\)

    On the other hand, \(H\) has columns \((a,0,0,\ldots ,0)^T,\) \((0,b,0,\ldots ,0)^T\) and \((c,c,0,\ldots ,0)^T,\) from three different lines (where \(a,b,c\in \F _q\setminus \{0\}).\) These columns are linearly dependent:

    \[ a^{-1}\begin {bmatrix}a\\0\\\vdots \\0\end {bmatrix} + b^{-1}\begin {bmatrix}0\\b\\\vdots \\0\end {bmatrix} -c^{-1}\begin {bmatrix}c\\c\\\vdots \\0\end {bmatrix} =\co 0. \]

    So \(d=3\) by the Distance Theorem.

    It remains to show that \(\Ham (r,q)\) is perfect. We calculate \(t=[(d-1)/2]=[2/2]=1.\) The Hamming bound (in logarithmic form) then says

    \[ k\le n-\log _q\Bigl ( \binom n0 + \binom n1 (q-1)\Bigr ) = n-\log _q\left ( 1 + n(q-1) \right ). \]

    By the already proved formulae for \(n\) and \(k\) we have \(n(q-1)=q^r-1\) and \(k=n-r.\) Hence the bound is \(n-r\le n-\log _q(q^r)=n-r\) — attained. Thus, \(\Ham (r,q)\) is perfect.

Remark: \((q^r-1)/(q-1)\) is an integer

The proof shows that the fraction \(\dfrac {q^r-1}{q-1}\) is an integer. In fact, this can be seen for all integers \(q,r>1\) by a formula for summing a geometric progression, \(\dfrac {q^r-1}{q-1} =q^{r-1}+q^{r-2}+\dots +q+1;\) the right-hand side is obviously an integer.

Decoding a Hamming code

Algorithm 7.4: decoding algorithm for a Hamming code

Let a Hamming code be given by its check matrix \(H.\) Suppose a vector \(\ul y\) is received.

  • • Calculate \(S(\ul y)=\ul y H^T.\) If \(S(\ul y)=\ul 0,\) \(\Decode (\ul y)=\ul y.\)

  • Otherwise, \(S(\ul y)=\lambda \ \times \) some column of \(H.\) Let this be the \(i\)th column of \(H.\)

  • • Subtract \(\lambda \) from the \(i\)th position in \(\ul y.\) The result is the codevector \(\Decode (\ul y).\)

  • Proof of validity of the algorithm. We prove that the algorithm outputs the nearest neighbour of \(\ul y\) in the code \(C.\) This is clear if \(S(\ul y)=\ul yH^T = \ul 0:\) by Proposition 5.2 \(\ul y\) is a codevector, and so its own nearest neighbour in \(C.\) Hence it is correct to decode \(\ul y\) to itself.

    If \(\ul y H^T \ne \ul 0,\) the line in \(\F _q^r\) which contains \(\ul y H^T\) has a representative column in \(H\) — say, \(h_i.\) As \(\ul yH^T\) lies on the line spanned by \(h_i,\) we must have \(\ul y H^T=\lambda h_i\) for some \(\lambda \in \F _q.\)

    Note that \(\lambda h_i\) equals \((\lambda \ul e_i)H^T\) where \(\ul e_i\) is the unit vector with symbol \(1\) in position \(i\) and zeros elsewhere. It follows that

    \[ (\ul y-\lambda \ul e_i)H^T = \lambda h_i - \lambda h_i = \ul 0, \]

    hence \(\ul y-\lambda \ul e_i\) is a codevector. Finally, since \(d(\ul y,\ul y-\lambda \ul e_i)=1,\) and no codevector can be at distance less than \(1\) from \(\ul y,\) we conclude that \(\ul y-\lambda \ul e_i\) is the nearest neighbour of \(\ul y\) in \(C.\)

Remark: properties of a Hamming decoder

The Algorithm and the proof above imply:

  • • Every coset leader of \(\Ham (r,q)\) is \(\ul 0\) or \(\lambda \ul e_i,\) i.e., a vector of weight \(0\) or \(1.\)

  • • The decoder changes at most one symbol in the received vector.

Note that the fact that every coset leader is of weight \(\le 1\) also follows, in a different way, from Exercise 4.3.

For \(\Ham (3,2),\) a clever ordering of columns in the parity check matrix can make the decoding algorithm especially elegant:

Example: special check matrix for \(\Ham (3,2)\)

Construct a decoder for the \(\Ham (3,2)\) code with parity check matrix

\[H=\begin {bmatrix} 0&0&0&1&1&1&1 \\ 0&1&1&0&0&1&1 \\ 1&0&1&0&1&0&1\end {bmatrix}.\]

Solution. If \(\ul y\in \F _2^7\) is received, \(\ul y H^T\) is either \(\ul 0\) or one of the columns of \(H.\) Now note that, by Algorithm 7.4,

  • • if \(\ul yH^T=001,\) the decoder must subtract \(1\) from the first bit in \(\ul y,\) because \(001\) is the first column of \(H;\)

  • • if \(\ul yH^T=010,\) the decoder must subtract \(1\) from the second bit in \(\ul y,\) because \(010\) is the second column of \(H;\)

and so on. Subtracting \(1\) from a bit in \(\F _2\) is the same as “flipping” the bit, i.e., replacing \(0\) by \(1\) and \(1\) by \(0.\)

Thus, to decode the received vector \(\ul y,\) we calculate the syndrome \(\ul y H^T.\) If this is \(000,\) output \(\ul y,\) otherwise read the syndrome \(\ul y H^T\) as the binary representation of a number \(i\in \{1,2,\dots ,7\}\) and decode by flipping the \(i\)th bit in \(\ul y.\)