On the maximum number of minimal codewords

Minimal codewords have applications in decoding linear codes and in cryptography. We study the maximum number of minimal codewords in binary linear codes of a given length and dimension. Improved lower and upper bounds on the maximum number are presented. We determine the exact values for the case of linear codes of dimension $k$ and length $k+2$ and for small values of the length and dimension. We also give a formula for the number of minimal codewords of linear codes of dimension $k$ and length $k+3$.


Introduction
The minimal codewords of a linear code are those whose supports, i.e., the set of nonzero coordinates, do not properly contain the supports of other nonzero codewords. They are equivalent to circuits in matroids and cycles in graphs. In coding theory, minimal codewords were first used in decoding algorithms [1,2,7,15]. They have also found applications in cryptography: in secret sharing schemes [19] and in secure two-party computation [9].
The set of minimal codewords is only known for a few classes of codes (see [1,6,7,8,11,12,18,20,21,22]) and, in general, it is a very hard problem to determine this set. In this work, we consider the following question: what is the maximum number of minimal codewords of linear codes of a given length and dimension? This problem is already studied in the case of cycles in graphs [14]. In the matroid setting, the maximum number of circuits was first addressed in [13]. The study of the maximum and minimum number of minimal codewords of linear codes was initiated in [3,4,5,10].
The results in this paper are described as follows. We determine the maximum number of minimal codewords for binary linear codes of dimension k and length k + 2. We also give a formula for the number of minimal codewords for the case of dimension k and length k + 3. A general construction of linear codes with a relatively large number of minimal codewords is also presented. This gives a lower bound that is asymptotically close to the matroid upper bound. An upper bound that is better than the matroid upper bound is also derived. The key idea is to use the systematic generator matrix for a linear code and analyze the properties of the subsets of rows that produce minimal codewords. We also compute the exact values of the for maximum number of minimal codewords small values of length and dimension (completing the table in [4]).

Preliminaries
Let q be a power of a prime p and F q be the finite field of order q. An [n, k] q linear code C is a k-dimensional subspace of F n q . Given a vector x ∈ F n q , the support of x is defined as supp(x) = {i : x i = 0, 1 ≤ i ≤ n}. A k × n matrix G whose rows form a basis for C is called a generator matrix. If G = [I k |A], where I k is the k × k identity matrix, then we say that G is systematic or in standard form.
A nonzero codeword c ∈ C is minimal if there does not exist a nonzero codeword c ′ such that supp(c ′ ) ⊂ = supp(c). Otherwise (including the case c = 0), we call the codeword c non-minimal. General properties of minimal codewords can be found in [7]. Note that a codeword and its nonzero scalar multiples have the same support. We say that two codewords are equivalent if one is a scalar multiple of the other. We use the notation M (C) for the number of non-equivalent minimal codewords of C. Let M q (n, k) be the maximum of M (C) for all [n, k] q codes C. Since C has q k − 1 nonzero codewords, we have Bounds for M q (n, k) and some exact values can be found in [2,4,7,5,10,13]. In the setting of matroids, it was shown in [13], that This is bound is also called the matroid upper bound. Alternative proofs were given in [5]. Inequality (1) is satisfied with equality for MDS codes. Another upper bound was derived by Agrell in [2] for binary codes with high rate: for k−1 n > 1 2 , we have Based on random coding, the lower bound was given in [7].
It is clear that we have M q (n, 1) = 1 and M q (k, k) = k for all k ≥ 1. In [4], it was shown that For small values of k and n, the authors in [4] presented some exact values and bounds on M 2 (n, k). In addition, exact values for the case of cycle codes were obtained.
3 Relations between minimal codewords and the rows of a systematic generator matrix Let C be a linear [k + t, k] 2 , i.e. binary, code with systematic generator matrix G. By g i we denote the ith row of G, where 1 ≤ i ≤ k. For each subset S ⊆ {1, . . . , k} let c S denote the sum of the rows of G with indices in S, i.e., c S = i∈S g i ∈ C. For each codeword c ∈ C let c S ∈ F k 2 denote the systematic part of c, i.e., the restriction of c to the first k coordinates c 1 , . . . , c k . Similarly, for each codeword c ∈ C let c I ∈ F t 2 denote the information bits, i.e., the restriction of c to the last t coordinates c k+1 , . . . , c k+t . Some of the subsequent observations can also be found in [18].  Proof. The largest cardinality of a set of linearly independent vectors in F t 2 is t. Thus, if #S ≥ t+1, then there exists a subset T ⊆ S with c T I = 0 and #T ≤ t + 1. We finally apply Lemma 3.1 to conclude #S ≤ t + 1.
As a direct consequence we conclude which asymptotically tends to k+t t+1 for a fixed value of t (if k tends to infinity). In Proposition 4.3 we will present a strict improvement over the matroid upper bound n k−1 = k+t t+1 , see (1), provided that k is large enough. Lemma 3.5. Let G be a systematic generator matrix of an [k + t, k] 2 code C and 1 ≤ i ≤ k be an index with g i I = 0. By G ′ we denote the matrix that arises from G by removing the ith row g i and by G ′′ the matrix if we additionally remove the ith column. Let C ′ and C ′′ be the linear codes generated by G ′ and G ′′ , respectively. Then Since C ′ is a subcode of C we only need to consider the case where c S is non-minimal in C. Then, there exists a subset ∅ = T S with supp(c T ) supp(c S ). Since g i I = 0 we can assume i / ∈ T , so that c T ∈ C ′ and c S is also non-minimal in C ′ .
So, in the following we may assume c S I = 0 whenever needed and we mention the implication M 2 (k, k) = k for all k ≥ 1.
Definition 3.1. Let C and especially t be given. By T we denote the set of the 2 t elements of F t 2 . For each τ ∈ T we set a τ = # 1 ≤ i ≤ k : r i I = τ . The counting vector of all a τ is denoted by a. More precisely, we write a τ (C) and a(C) whenever the code C is not clear from the context.
Since column and row permutations of a generator matrix do not change the number of minimal codewords, we have: For the case t = 1 we can easily determine M (C) given the vector a(C) = (a 0 , a 1 ).
Proof. For all subsets S ⊆ {1, . . . , k} of cardinality 1, the codeword c S is minimal, which give k minimal codewords. Due to Corollary 3.3 is suffices to consider codewords of the form c S with ∅ ⊆ S ⊆ {1, . . . , k} and #S ≤ 2, so that it remains to consider the cases with #S = 2. Due to Lemma 3.1, Corollary 3.3, and Lemma 3.4 the codeword c {i,j} is minimal iff i = j and g i I = g j I = 1.
The same result was also obtained in [4]. Not that the matroid upper bound M 2 (n, k) ≤ n k−1 = k+t k−1 = k+t t+1 , see (1), is matched with equality. We remark that the unique code attaining this upper bound is the so-called projective base (of F k 2 ) given by a generator matrix consisting of the k unit vectors and the all-1-vector as columns.
In Lemma 3.4 we have characterized whether c S is minimal for the special case when c S I = 0 using the information bits of g i , where i ∈ S, only. This can be generalized and formalized as follows.
. . , k} a subset. With this, we set We call C S the reduced code of C with respect to S. For the other direction we first assume that c S I is non-minimal in C S and c S I = 0. Here, there exists a subset ∅ = T S with supp c T I supp c S I , which implies supp c T supp c S , i.e., c S is non-minimal in C. In the other case we assume c S I = 0 and the existence of a subset ∅ = T S with c T I = 0. Here we have supp c T supp c S , i.e., c S is non-minimal.
Note that no minimal generating set of cardinality at least two can contain the zero vector.
Theorem 3.10. Let C be a linear [k + t, k] 2 code and a its corresponding vector counting the multiplicities of the occurring information vectors. With this, we have Proof. Let c S be a minimal codeword in C for a subset S ⊆ {1, . . . , k}. Since c S = 0 we have S = ∅. If #S = 1, then c S is minimal in all cases, which gives k possibilities. If S contains two different elements i and j with g i I = g j I , then we deduce #S = 2 from Lemma 3.1 and Lemma 3.4.
aτ 2 further possibilities. In the remaining cases we have 2 ≤ #S ≤ t+1, see Corollary 3.3 for the upper bound, and g i I = g j I for all different i, j ∈ S. In other wordsŜ := g i I : i ∈ S has cardinality #S. Due to Lemma 3.9 and Definition 3.3 c S is minimal iffŜ is minimal generating. GivenŜ, the number of choices for S are τ ∈Ŝ a τ .
In some cases it is possible to concretely describe the minimal generating sets in the formula of Theorem 3.10: Proposition 3.11. Let C be a linear [k + t, k] 2 code and a its corresponding vector counting the multiplicities of the occurring information vectors. If a τ > 0 implies τ ∈ T := {e 1 , . . . , e t , 1}, where 1 = e 1 + · · · + e t , then we have Proof. Due to Theorem 3.10 it suffices to check which subsets of T are minimal generating. If 1 / ∈Ŝ, then x∈Ŝ x is clearly not minimal withinŜ. In all other casesŜ is minimal generating, which easily follows from Lemma 3.9.
As an example let k ≥ 2t be integers and A be the k × t matrix whose rows consist of 2 copies each of the unit vectors e 1 , . . . , e t and k − 2t copies of the zero vector. Consider the [k + t, k] linear code C with generator matrix G = [I k | A]. Note that C is projective and In [16,Lemma 5.1] it is shown that each projective [k + t, k] 2 code C satisfies M (C) ≥ k + t.

Bounds for the maximum number of minimal codewords
A projective base can also be used to construct linear [k +t, k] 2 codes with a relatively large number of minimal codewords. To this end, let e i denote the ith unit vector and 1 denote the all-1-vector (in F t 2 ).
Proposition 4.1. The essential property of {e 1 , . . . , e t , 1} used in the above proof is that of a projective basis. The explicit choice of vectors is called canonical basis in that context. We remark that it is also possible to precisely determine M (C) if a τ (C) = 0 implies τ ∈ {e 1 , . . . , e t , 1} and those a τ are given, see Proposition 3.11. The codes constructed in Proposition 4.1 show that the matroid upper bound M 2 (n, k) ≤ n k−1 = k+t t+1 is, up to a constant, asymptotically tight for every fixed value of t. Our next aim is to conclude an upper bound for M 2 (k + t, k) from Theorem 3.10. To this end, we will utilize an optimization problem 1 : Then, the optimization problem max f (x 1 , . . . , x r ) subject to the constraint r i=1 x i = m has the unique optimal solution x i = m r for all 1 ≤ i ≤ r with target value r s · m r s . If we additionally require that the x i have to be integers, then an optimal solution is given by Proof. For r = 1 the statements are obvious, so that we assume r ≥ 2 in the following. Assume that for a given optimal solution of the real-valued optimization problem stated above, there are indices 1 ≤ i, j ≤ r with x i = x j . From the given vector ..,r}\{i,j} : #S=s−2 Thus, we have f (x) ≥ f (x). Next we remark that we have equality iff h∈S x h = 0 for all subsets S ∈ {1, . . . , r}\{i, j} : #S = s − 2, i.e., there are most s − 3 indices h ∈ {1, . . . , r}\{i, j} with x h = 0, so that f (x) = 0, which clearly is not an optimal solution. Thus, in an optimal solution x all entries have to be equal. Since r i=1 x i = m we obtain x i = m r and the stated target value is a direct conclusion.
For the case with integral variables we assume that x = (x 1 , . . . , x r ) is an optimal solution such that there exist indices 1 ≤ i, j ≤ r with x i − x j ≥ 2. Now letx arose from x by increasing x j and decreasing x i by one, respectively. Since x ∈ N r and x i − x j ≥ 2, alsox ∈ N r and r h=1x h = r h=1 x h = m. Next we will show f (x) ≥ f (x). To this end, we proceed as before and distinguish the summands in S⊆{1,...,r} : #S=s i∈S x i and S⊆{1,...,r} : #S=s i∈Sx i according to the cardinality of S ∩ {i, j}. As before, for #S ∩ {i, j} ≤ 1 there is no difference if we compare the sum over all respective subsets S. For the cases #S ∩ {i, j} = 2 we can utilize the inequality for z ≥ 0 to conclude f (x) ≥ f (x). Thus, there exists an optimal solution x with |x i − x j | ≤ 1 for all 1 ≤ i, j ≤ r. Due to symmetry we can assume x 1 ≤ · · · ≤ x r w.l.o.g. Since r i=1 x i = m, we obtain the stated formula x i = m+i−1 r for 1 ≤ i ≤ r. Proposition 4.3. Let C be a linear [k + t, k] 2 code and a its corresponding vector counting the multiplicities of the occurring information vectors. With this, we have Proof. We want to apply Theorem 3.10 and remark that we clearly have Since no minimal generating set of cardinality at least two contains the zero vector and the a τ are non-negative, we conclude Ŝ ⊆F t 2 :Ŝ is minimal generating and 2≤#Ŝ≤t+1 τ ∈Ŝ Since τ ∈F t 2 a τ = k we can assume a 0 = 0 when maximizing the right-hand side of Inequality (2). Applying Lemma 4.2 onto the right-hand side of Inequality (2), with s = #S, r = 2 t − 1, and m = k, gives the stated upper bound for M (C).
We remark that Proposition 4.3 improves upon the matroid upper bound M 2 (k + t, k) ≤ k+t t+1 . As an example we state that Proposition 4.3 yields Note however that the fraction between the coefficients of the leading terms tend to 1 as t tends to infinity. In order to obtain tighter bounds we need to study the properties of minimal generating sets. Proof. Note that we have a + b = 0. Since b = 0 the statement follows from the equivalence supp(a) ⊆ supp(a + b) iff supp(a) ∩ supp(b) = ∅.
As an application of Theorem 3.10 we compute M (C) in dependence of a for t = 2.
Proof. Due to Lemma 4.4 the set {10, 01} is the only subset of F 2 2 \{0} that has cardinality 2 and is not minimal generating. The unique subset {01, 10, 11} of F 2 2 \{0} of cardinality 3 is indeed minimal generating. For the second equation note that k = a 00 + a 01 + a 10 + a 11 .
Maximizing the formula from Proposition 4.5 we obtain: Proposition 4.6. We have for all k ≥ 1.
We have computationally checked Conjecture 4.8 for all k ≤ 150. For the leading term of M 2 (k + 3, k), in terms of k, the situation is different to the one of Lemma 4.2, i.e., choosing a 000 = 0 and a τ = k 7 for τ ∈ F 3 2 \{0} just gives M 2 (k + 3, k) ≥ k 4 343 + O k 3 , while a 000 = a 110 = a 101 = a 011 = 0 and a 100 = a 010 = a 001 = a 111 = k 4 gives M 2 (k + 3, k) ≥ k 4 256 + O k 3 (ignoring the rounding to integers, whose effect is in O k 3 ). Conjecture 4.8 of course implies M 2 (k + 3, k) = k 4 256 + O k 3 . Next we focus on the leading term: Proposition 4.9. Let C be a linear [k + t, k] 2 code and a its corresponding vector counting the multiplicities of the occurring information vectors. If t ≥ 2, then Proof. We apply Theorem 3.10. If t ≥ 2 then only the contributions of the minimal generating setsŜ of cardinality exactly t + 1 are not covered by the O k t term. Due to Corollary 3.3 we have x∈Ŝ x = 0 in those remaining cases. By Lemma 3.4 we have to guarantee that no proper subset ∅ =T Ŝ satisfies x∈T x = 0. Since there are (t + 1)! possible orders of the elements ofŜ we obtain the stated summation formula (which mimics the construction or counting of projective bases of F t 2 ).
We remark that the minimal generating sets of F t 2 of the maximum cardinality t + 1 have a lot of equivalent descriptions. As mentioned before, they correspond to the projective bases of F t 2 . Due to Corollary 3.3 and Lemma 3.4 they also correspond to minimal dual codewords (of the t-dimensional simplex code).
where T i = F t 2 \ {τ j : 1 ≤ j < i} for 1 ≤ i ≤ t, attains its maximum on R 2 t −1 ≥0 subject to the constraint τ ∈F t 2 \{0} a τ = k at a τ = k t+1 for all τ ∈ P and a τ = 0 otherwise. If additionally a τ ∈ N is assumed, then the maximum is attained at the points where |a τ − a τ ′ | ≤ 1 for all τ, τ ′ ∈ P and a τ = 0 otherwise.
A direct implication of this conjecture is M 2 (k + t, k) = k t+1 t+1 + O k t . For t = 2 or t = 3, k ≤ 100 Conjecture 4.10 is indeed true.

Exact values for small parameters
The aim of this subsection is to determine the exact value of M 2 (n, k) for cases with 1 ≤ k ≤ n ≤ 15. First note that if a linear code C contains a codeword of weight 1 then removing the corresponding coordinate yields a code C ′ with n(C ′ ) = n(C) − 1 and M (C ′ ) = M (C) − 1. (In general we have M (C) = M (C 1 ) + M (C 2 ) whenever C = C 1 ⊕ C 2 , i.e., it is sufficient to consider indecomposable codes.) Removing zero or duplicate columns from the generator matrix of a binary code (scalar multiples for q > 2) does not change the number of minimal codewords of the corresponding codes. Thus it is sufficient to consider all projective [n, k] 2 codes with minimum distance at least 2. These can be generated easily and for each code we can simply count the number of minimal codewords. To this end we have applied the enumeration algorithm from [17], see Table 1 for the numerical results. In most cases we have verified the lower bounds from [4] to be exact and only improved the upper bounds. However, for n = 15 there are also some improvements for the lower bounds. We remark that the rather complicated structure of the formula of M 2 (k + 3, k) for k ≤ 26 in Conjecture 4.8 suggests that the exact determination of M 2 (k + t, k) might not admit an easy explicit solution when k is small. n/k 1 2 3 4