Week 4 Decoding linear codes

Version 2023-10-07. To PDF of this week only Open the whole set of notes in PDF

Synopsis. We explicitly describe a decoder \(\Decode \colon \F _q^n \to C\) based on coset leaders and a standard array for \(C.\) For binary \(C\) sent via a binary symmetric channel, we find the probability \(P_{\mathrm {undetect}}(C)\) of an undetected transmission error. It is related to the weight enumerator of \(C.\) We also find the probability \(P_{\mathrm {corr}}(C)\) that a codeword is decoded correctly.

Cosets and coset leaders

It turns out that the following notion is of direct relevance to decoding:

Definition: coset

Given a linear code \(C\subseteq \F _q^n\) and a vector \(\ul y\in \F _q^n,\) the coset of \(\ul y\) is the set

\[ \ul y + C = \{\ul y + \ul c\mid \ul c \in C\}. \]

We recall basic facts about cosets (see for example Algebraic Structures 1):

  • • \(C=\ul 0+C\) is itself a coset. (\(C\) is called the trivial coset.) Moreover, \(C\) is the coset of any codeword \(\ul c\in C.\)

  • • If \(\ul y,\ul z\in \F _q^n,\) then either \(\ul y + C = \ul z+C\) (if \(\ul y-\ul z\in C\)) or \((\ul y + C)\cap (\ul z+C)=\varnothing .\)

  • • \(\#(\ul y+C)=\#C=q^k.\)

  • • There are \(\dfrac {\#\F _q^n}{\#C}=q^{n-k}\) distinct cosets.

Thus, the whole space \(\F _q^n\) is split (partitioned) into \(q^{n-k}\) cosets:

\[ \F _q^n = C \sqcup (\ul a_1+C)\sqcup \ldots \sqcup (\ul a_{q^{n-k}-1}+C). \]

The above is true for any abelian group; but the following is specific to Coding Theory:

Definition: coset leader

A coset leader of a coset \(\ul y+C\) is a vector of minimum weight in \(\ul y +C.\)

Remark: warning — a coset leader may not be unique

There may be more than one coset leader in a coset. However, all coset leaders of a given coset are of the same weight.

Proposition 4.1: the formula for a decoder for a linear code

For a linear code \(C\subseteq \F _q^n,\) any decoder \(\Decode \colon \F _q^n \to C\) satisfies:

\[ \forall \ul y\in \F _q^n\quad \Decode (\ul y)=\ul y - \ul e \text { where $\ul e$ is a coset leader of the coset $\ul y + C$ of $\ul y.$} \]

  • Proof. Let \(\ul v=\Decode (\ul y).\) Then \(\ul v\in C.\) Put \(\ul e=\ul y - \ul v.\) Since \(C\) is a linear code, \(-\ul v\in C,\) so \(\ul e=\ul y + (-\ul v)\in \ul y + C.\) We have proved that \(\ul e\) must lie in the coset \(\ul y+C.\)

    Vector \(\ul y\) must be decoded to its nearest neighbour in \(C,\) i.e., \(d(\ul y,\ul v)\) must be minimised. Yet by Lemma 3.1 \(d(\ul y,\ul v)=w(\ul y-\ul v)=w(\ul e).\) Hence the decoder must choose \(\ul e\) so that \(w(\ul e)\) is minimal in the coset \(\ul y+C.\) By definition, \(\ul e\) must be a coset leader of \(\ul y+C.\)

Standard array: construction

We now give a method to construct all cosets and to find one coset leader in each coset.

Definition: standard array

A standard array for a linear code \(C\subseteq \F _q^n\) is a table with the following properties:

  • • the table has \(|C|=q^k\) columns and \(q^{n-k}\) rows;

  • • each row is a coset;

  • • the leftmost entry in each row is a coset leader of that row;

  • • the top row is the trivial coset (i.e., \(C\) itself);

  • • each entry is the sum of the leftmost entry in its row and the top of its column;

  • • the table contains every vector from \(\F _q^n\) exactly once.

We explain how to construct a standard array, using the linear code \(C=\{0000,\) \(0111,\) \(1011, 1100\} \subseteq \F _2^4\) as an example.

Row 0 of the standard array: lists all codevectors (elements of \(C=\ul 0+C).\) They must start from \(\ul 0,\) but otherwise the order is arbitrary.

\(0000\qquad 0111 \qquad 1011 \qquad 1100\)

Row 1: out of vectors not yet listed, choose \(\ul a_1\) of smallest weight — this guarantees that \(\ul a_1\) will be a coset leader. Fill in Row 1 by adding \(\ul a_1\) to each codevector in Row 0.
Say, \(\ul a_1=0001.\) To list its coset, add it to row \(0:\) e.g., \(0001+0111=0110,\) etc.

\(0001\qquad 0110\qquad 1010 \qquad 1101\)

Row 2: choose \(\ul a_2\) of smallest weight not yet listed, and do the same as for Row 1.
Say, \(\ul a_2=0010,\) add it to row 0:

\(0010\qquad 0101\qquad 1001\qquad 1110\)

Row 3: same with, say, \(\ul a_3=0100:\)

\(0100\qquad 0011\qquad 1111\qquad 1000\)

Since we have filled \(4\) rows, and \(q^{n-k} =2^{4-2}=4,\) our standard array is complete:

Example: a standard array for the code \(C=\{0000,\) \(0111,\) \(1011, 1100\}\)

\[ \begin {matrix} 0000& 0111 & 1011 & 1100 \\ 0001& 0110& 1010 & 1101 \\ 0010& 0101& 1001& 1110 \\ 0100& 0011& 1111& 1000 \end {matrix} \]

Standard array: decoding

Let \(C\subseteq \F _q^n\) be a linear code. By Proposition 4.1, any decoder is given by

\[ \Decode (\ul y) = \ul y - \texttt {COSET LEADER}(\ul y+C). \]

This suggests the following decoding algorithm for \(C.\)

Algorithm 4.2: the standard array decoder

Preparation: construct a standard array for \(C.\)

Decoding:

  • • Receive a vector \(\ul y\in \F _q^n.\)

  • • Look up \(\ul y\) in the standard array.

  • • Return the topmost vector of the column of \(\ul y\) as \(\Decode (\ul y).\)

Justification: the algorithm is correct because, by definition of a standard array,

  • (a) Look-up of \(\ul y\) will succeed as every vector in \(\F _q^n\) is present in the array;

  • (b) the row of \(\ul y\) starts with \(\texttt {COSET LEADER}(\ul y+C),\) so

  • (c) the top of \(\ul y\)’s column is \(\ul y - \texttt {COSET LEADER}(\ul y+C)\) so this is \(\ul y\) decoded.

Example: use the standard array decoder

For the code \(C=\{0000,0111,1011,1100\}\) and standard array constructed above,

  • • decode the received vectors \(0011\) and \(1100;\)

  • • give an example of one bit error occurring in a codeword and being corrected;

  • • give an example of one bit error occurring in a codeword and not being corrected.

Solution. We work with the following standard array for \(C:\)

\[ \begin {matrix} 0000& 0111 & 1011 & 1100 \\ 0001& 0110& 1010 & 1101 \\ 0010& 0101& 1001& 1110 \\ 0100& 0011& 1111& 1000 \end {matrix} \]

The received vector \(0011\) is in the second column, so \(\Decode (0011)=0111.\) The received vector \(1100\) is a codeword (in the fourth column), so \(\Decode (1100)=1100.\)

Suppose that the codeword \(0000\) is sent. If an error occurs in the last bit, the word \(0001\) is received and decoded correctly as \(0000.\) If an error occurs in the first bit, the word \(1000\) is received and decoded incorrectly as \(1100.\)

Discussion: is a standard array decoder unique?

Recall that there may be more than one possible standard array for the code \(C.\) Indeed, in the above example the coset \(0100+C\) has two coset leaders: \(0100\) and \(1000.\) Thus, we could construct a different standard array for \(C:\)

\[ \begin {matrix} 0000& 0111 & 1011 & 1100 \\ 0001& 0110& 1010 & 1101 \\ 0010& 0101& 1001& 1110 \\ 1000& 1111& 0011& 0100 \end {matrix} \]

The decoder associated to this standard array is different from the decoder considered above. Both decoders decode the same linear code \(C.\) A linear code can have more than one decoder.

However, if \(C\) is a perfect linear code, then each coset has only one coset leader, so the decoder is unique. This property of perfect codes appears on the example sheets.

Reminder (the number of errors corrected by a code)

Recall that a code with minimum distance \(d\) corrects \(t=\left [ (d-1)/2 \right ]\) errors.

The code \(C\) in the above example is linear, hence \(d(C)=w(C)=2\) (it is easy to find the minimum weight of the code by inspection). This means that the code corrects \(\left [\frac {2-1}{2}\right ]=0\) errors. That is, \(C\) is not guaranteed to correct even a single bit error occurring in a codevector. And indeed, we saw in an example how one bit error occurred in a codevector and was not corrected.

So, from the point of view of Hamming’s theory, this code \(C\) has no error-correcting capability. It still detects up to one error.

But in Shannon’s theory, error-detecting and error-correcting performance of a code are measured probabilistically.

Error-detecting and error-correcting performance of a linear code: Shannon’s theory point of view

Shannon’s Information Theory is interested in how likely is it that a transmission error in a codeword is not detected/corrected by a decoder of \(C.\) We will answer these questions for a binary linear code \(C\) transmitted via \(\mathit {BSC}(p).\)

Recall that this means that one bit (\(0\) or \(1),\) transmitted via the channel, arrives unchanged with probability \(1-p,\) and gets flipped with probability \(p:\)

(image)

When a codeword \(\ul v\) is transmitted, the channel generates a random error vector and adds it to \(\ul v.\) By definition of \(\mathit {BSC}(p),\) for a given \(\ul e\in \F _2^n\) one has

\[ P(\text {the error vector equals }\ul e) = (1-p)^{n-i}p^i, \qquad \text {where }i=w(\ul e). \]

In determining \(P_{\mathrm {undetect}}(C),\) the following notion is very useful:

Definition: the weight enumerator

The weight enumerator of a linear code \(C\subseteq \F _q^n\) is the polynomial

\[W_C(x,y) = \sum \nolimits _{\ul v\in C}x^{n-w(\ul v)}y^{w(\ul v)} = A_0x^n+A_1x^{n-1}y+A_2x^{n-2}y^{2}+\ldots +A_ny^n \]

in two variables \(x,y,\) where \(A_i = \#\{\ul v\in C: w(\ul v)=i\}.\)

Theorem 4.3: \(P_{\mathrm {undetect}}(C),\) the probability of an undetected error

Suppose that a codevector of a binary linear code \(C\) of length \(n\) is transmitted via \(\mathit {BSC}(p).\) The probability of an undetected error is

\[ P_{\mathrm {undetect}}(C) = W_C(1-p,p)-(1-p)^n. \]

  • Proof. Let \(\ul v\in C\) be the codevector being transmitted. Recall that an undetected error means that the received vector \(\ul v+\ul e\) is a codevector not equal to \(\ul v.\) Note that, since \(\ul v\in C\) and \(C\) is a vector space,

    \[ \ul v + \ul e \in C, \ \ul v+\ul e\ne \ul v \quad \iff \quad \ul e \in C, \ \ul e\ne \ul 0. \]

    Therefore, an undetected error means that the error vector is a non-zero codevector. We can now calculate

    \[P_{\mathrm {undetect}}(C) = \sum _{\ul e\in C,\, \ul e\ne \ul 0} P(\text {the error vector is }\ul e)\\ =\sum _{\ul e\in C,\, \ul e\ne \ul 0} (1-p)^{n-w(\ul e)}p^{w(\ul e)}. \]

    This is \(W_C(1-p,p)\) without exactly one term, excluded by the constraint \(\ul e\ne \ul 0,\) namely

    \[ (1-p)^{n-w(\ul 0)}p^{w(\ul 0)} = (1-p)^n, \]

    which gives the expression for \(P_{\mathrm {undetect}}(C) \) as stated.

Remark: In general, \(P_{\mathrm {undetect}}(C) \) is calculated assuming that the codeword \(\ul v\) is picked at random from the code. However, our proof shows that for a linear binary code and for the binary symmetric channel the probability is the same for all codevectors.

Example: calculating the weight enumerator and \(P_{\mathrm {undetect}}\)

The binary linear code \(C=\{0000,0111,1011,1100\}\) has one codeword of weight \(0,\) zero codewords of weight \(1,\) one codeword of weight \(2\) and two codewords of weight \(3.\) Hence the weight enumerator of \(C\) is

\[ W_C(x,y)= x^4 + x^2y^2 + 2xy^3. \]

If a codeword of \(C\) is sent via \(\mathit {BSC}(p),\) an undetected error occurs with probability

\[ P_{\mathrm {undetect}}(C) = (1-p)^2 p^2 + 2(1-p)p^3. \]

Discussion. Knowing \(P_{\mathrm {undetect}}(C)\) is useful when a code is used for error detection, e.g., if the receiver can request retransmission if an error is detected. Then \(P_{\mathrm {undetect}}(C)\) is on average the proportion of incorrect codevectors, hence incorrect symbols, accepted by the receiver. A code should be designed for a particular channel so as to keep this probability below an agreed threshold.

The probability of correct decoding

We will now find the probability of an error being corrected for \(C.\)

Theorem 4.4: \(P_{\mathrm {corr}}(C),\) the probability of correct decoding

Suppose that a codevector of a binary linear code \(C\) is transmitted via \(\mathit {BSC}(p).\) The probability that the received vector will be decoded correctly is

\[ P_{\mathrm {corr}} (C)= \sum \nolimits _{i=0}^n \alpha _i (1-p)^{n-i}p^i, \]

where \(\alpha _i\) denotes the number of cosets where the coset leader is of weight \(i.\)

  • Proof. Recall that \(\ul v\in C\) is decoded correctly if \(\Decode (\ul v+\ul e)=\ul v.\) By Proposition 4.1,

    \begin{align*} \Decode (\ul v+\ul e)&=\ul v + \ul e - \texttt {COSET LEADER}(\ul v+\ul e)\\ & = \ul v + \ul e - \texttt {COSET LEADER}(\ul e). \end{align*} Therefore, correct decoding occurs if the error vector is the chosen coset leader of its coset.

    We therefore have one good outcome per coset: namely, \(\ul e\) equals the chosen coset leader of the coset. Recall that that happens with probability \((1-p)^{n-i}p^i\) where \(i\) is the weight of the coset leader of the given coset. Summing over all cosets and gathering the like terms, we obtain the formula for \(P_{\mathrm {corr}} (C)\) as stated (it does not depend on \(\ul v).\)

Discussion. When is it important to know \(P_{\mathrm {corr}}(C)\)? In one-way communication channels without retransmission even if an error is detected, the decoder produces a best guess as to which codevector was sent. An example is computer memory, where information could have been written (“sent”) long time ago, and it is not possible to “resend” it. Thus, \(1-P_{\mathrm {corr}}(C)\) is on average the proportion of incorrect codevectors, hence incorrect symbols, accepted by the receiver. A code should be designed for a particular channel to keep \(1-P_{\mathrm {corr}}(C)\) below an agreed threshold.

Example: calculation of \(P_{\mathrm {corr}}\)

Let \(C=\{ 0000, 0111 , 1011 , 1100\}.\) From the standard array

\[ \begin {matrix} 0000& 0111 & 1011 & 1100 \\ 0001& 0110& 1010 & 1101 \\ 0010& 0101& 1001& 1110 \\ 1000& 1111& 0011& 0100 \end {matrix} \]

we can see that \(\alpha _0=1,\) \(\alpha _1=3,\) \(\alpha _2=\alpha _3=0\) (given by the leftmost column), so

\[ P_{\mathrm {corr}} (C)=(1-p)^4+3(1-p)^3 p. \]

Comparing codes using approximate values of \(P_{\mathrm {undetect}}\) or \(P_{\mathrm {corr}}\)

Our analysis above gives, for a binary linear code \(C\) transmitted via \(\mathit {BSC}(p),\) the probabilities \(P_{\mathrm {undetect}}(C)\) and \(P_{\mathrm {corr}}(C)\) as polynomials in \(p.\)

In practical situations \(p\) is typically very small, so it is rarely useful to know all terms of these polynomials in \(p.\) The term with the lowest power of \(p\) will dominate and can be used as an approximate value of the probability. This makes comparing codes easier.

Example. We compare use of the code \(C\) against transmission of undencoded information (“trivial binary code of length \(1\)”).

If \(C\) is used for error detection: \(P_{\mathrm {undetect}}(C) = (1-p)^2 p^2 + 2(1-p)p^3.\) This is a polynomial of the form \(p^2+o(p^2)\) where \(o(p^2)\) contains powers of \(p\) higher than \(2.\) For small \(p,\) the terms in \(o(p^2)\) are negligible compared to \(p^2.\) We conclude:

\[ P_{\mathrm {undetect}}(C)\sim p^2 \text { whereas } P_{\mathrm {undetect}}(\text {no encoding}) = p, \]

i.e., the use of \(C\) improves the proportion of bad bits in the output from \(p\) to \(p^2.\)

If \(C\) is used for error correction: \(1-P_{\mathrm {corr}}(C)=1-(1-p)^4-3(1-p)^3 p=p+o(p)\) (check this by opening the brackets). Thus,

\[ 1-P_{\mathrm {corr}}(C)\sim p \text { whereas } 1-P_{\mathrm {corr}}(\text {no encoding}) = p. \]

Hence \(C\) is useless for error correction: its use does not improve the proportion of bad bits in the output while increasing the volume of information transmitted two-fold (\(R=0.5).\)

The above suggests that for error correction, more mathematically sophisticated codes need to be designed. In the rest of the course, we will see how ideas from different areas of mathematics are used in code constructions.