Classifying optimal binary subspace codes of length 8, constant dimension 4 and minimum distance 6

The maximum size $A_2(8,6;4)$ of a binary subspace code of packet length $v=8$, minimum subspace distance $d=6$, and constant dimension $k=4$ is $257$, where the $2$ isomorphism types are extended lifted maximum rank distance codes. In finite geometry terms the maximum number of solids in $\operatorname{PG}(7,2)$, mutually intersecting in at most a point, is $257$. The result was obtained by combining the classification of substructures with integer linear programming techniques. This implies that the maximum size $A_2(8,6)$ of a binary mixed-dimension code of packet length $8$ and minimum subspace distance $6$ is $257$ as well.


Introduction
Let q > 1 be a prime power, F q the field with q elements, and V ∼ = F v q a v-dimensional vector space over F q . By L(V ) we denote the set of all subspaces of V . The set L(V ) forms a lattice with respect to the inclusion order U ≤ W ⇔ U ⊆ W , the lattice of flats * Supported by the grants KU 2430/3-1, WA 1666/9-1 -"Integer Linear Programming Models for Subspace Codes and Finite Geometry" -from the German Research Foundation and by Grant no. of the projective geometry PG(V ) ∼ = PG(F v q ) = PG(v − 1, q), and a metric space with respect to the subspace distance d s (U, W ) = dim(U + W ) − dim(U ∩ W ) = dim(U ) + dim(W ) − 2 dim(U ∩ W ), which may be viewed as a q-analogue of the Hamming space (F v 2 , d Ham ). The metric space (L(V ), d s ) plays an important role in network coding. It was introduced as part of the subspace channel model in [20] to describe error-resilient data transmission in packet networks employing random linear network coding.
For k ∈ {0, . . . , v}, V k denotes the set of all k-dimensional subspaces in V . We have A subset C of V k is called a k-dimensional constant-dimension code (CDC ). As usual, the elements of C are called codewords. For #C ≥ 2, the minimum distance of C is defined as d s (C) = min{d s (U, W ) | U, W ∈ C, U = W }. The most important parameters of a CDC C are the order q of the base field, the dimension v of the ambient space V , the minimum (subspace) distance d = d s (C) of C, the cardinality N = #C, and the constant dimension k of each element in C. We denote them by (v, N, d; k) q . In a (v, N, d; k) q CDC the minimum distance d is always an even number satisfying 2 ≤ d ≤ 2 min{k, v − k}.
The determination of the corresponding maximal size A q (v, d; k) and the classification of the optimal codes is known as the main problem of subspace coding, since it forms a q-analogue of the main problem of classical coding theory (cf. [23, page 23]).
Without the restriction of all codewords having the same dimension, i.e., C ⊆ L(V ), the code C is called a subspace code (per se) or a mixed-dimension code (MDC ). The maximal cardinality of an MDC in V having subspace distance d is denoted as A q (v, d).
Clearly, A q (v, d; k) ≤ A q (v, d) for all k.
In the following, let β be a fixed non-degenerate symmetric bilinear form on V and π : L(V ) → L(V ), U → U ⊥ the corresponding polarity. The orthogonal code or dual code of a subspace code C is defined as Up to isomorphism of subspace codes as defined further below, the code C ⊥ does not depend on the particular choice of β.
Considering orthogonal codes allows us to almost halve the parameter space: , so that we can assume k ≤ v 2 in the following. The iterative application of the so-called Johnson type bound II ([27, Theorem 3], [7,Theorem 4,5]), which is a q-generalization of [18,Inequality (5)], gives the upper bound where It is attained with equality at v = ak and d = 2k, i.e., for spreads, and also at v = 13, k = 3, d = 4 with A 2 (13, 4; 3) = 1597245, see [2]. Using q r -divisible linear codes over F q with respect to the Hamming metric, this bound was sharpened very recently, see [19], to . . .
where a/ k (1) is implied by Inequality (2). Both bounds refer back to bounds for so-called partial spreads, i.e., (v, N, 2k; k) q codes, for which the minimum distance has the maximal value d = 2k. For upper bounds in this special subclass of CDCs, there is a recent series of improvements [21,22,24]. The underlying techniques can be explained using the language of projective q k−1 -divisible codes and the linear programming method, see [16]. While a lot of upper bounds for the maximum sizes of CDCs have been proposed in the literature, most of them are dominated by Inequality (1), see [11]. Indeed, besides Inequality (2), the only known improvements were A 2 (6, 4; 3) = 77 < 81 [13] and A 2 (8, 6; 4) ≤ 272 < 289 [12]. The latter result is improved in this paper. For numerical values of the known lower and upper bounds on the sizes of subspace codes we refer the reader to the online tables at http://subspacecodes.uni-bayreuth.de associated with [10]. The tables in particular contain representatives for the two isomorphism types of (8, 257, 6; 4) 2 CDCs. A survey on Galois geometries and coding theory can be found in [6].
This article investigates binary CDCs with v = 8, d = 6 and k = 4. The so-called Echelon-Ferrers construction, see e.g. [4], gives A 2 (8, 6; 4) ≥ 257. More precisely, a corresponding code is given by a lifted maximum rank distance code (LMRD code), extended by a single codeword. In Corollary 3 we will show that up to isomorphism there are two such codes. By [5,Theorem 10], this construction is optimal for subspace codes containing an LMRD code. Our main theorem states that this construction is optimal even without the restriction of containing an LMRD code and, moreover, that all subspace codes of maximum possible size 257 are extended LMRD codes. Theorem 1 is the main theorem of this paper. 1 As an example we consider A2(9, 6; 4) ≤ Since 4 and 19 cannot be written as a non-negative linear combination of 8, 12, 14, and 15, but 34 = 14 + 12 + 8, we have A2(9; 6; 4) ≤ 1156, which improves upon the iterative Johnson bound by two. Let us remark that [19] contains an easy and fast algorithm to check the representability as non-negative integer linear combination as specified above.
Theorem 1 and Fact 1 together give the maximum cardinality in the corresponding mixed-dimension case: Given Theorem 1, one may ask whether there exists an integer k ≥ 4 with A 2 (2k, 2k − 2; k) > 2 2k + 1.
The remaining part of the paper is structured as follows. In Section 2 we provide the necessary preliminaries on lifted maximum rank distance codes, acting symmetry groups, and upper bounds for code sizes based on the number of incidences of codewords with a fixed subspace. As in [13], we want to apply integer linear programming methods in order to determine the exact maximum size of CDCs with the specified parameters. Since this algorithmic approach suffers from the presence of a large symmetry group 2 , we use the inherent symmetry to prescribe some carefully chosen substructures up to isomorphism. A general outline of the proof of Theorem 1 is presented in Section 3. The substructures involved are described in Section 4 and the integer linear programming formulations are described in Section 5. All these parts are put together in the proof of our main theorem in Section 6.

Preliminaries
Let m, n be positive integers. The rank distance of m × n matrices A and B over F q is defined as d r (A, B) = rk(A − B). The rank distance provides a metric on F m×n q . Any subset C of the metric space (F m×n q , d r ) is called rank metric code. Its minimum distance d is the minimum of the rank distances between pairs of distinct codewords (defined for #C ≥ 2). If C is a subspace of the F q -vector space F m×n q , then C is called linear. If m ≤ n (otherwise transpose), then #C ≤ q (m−d+1)n by [3,Theorem 5.4]. Codes achieving this bound are called maximum rank distance (MRD) codes. In fact, MRD codes do always exist. A suitable construction has independently been found in [3,8,26]. Today these codes are known as Gabidulin codes. In the square case m = n, after the choice of an F q -basis of F q n the Gabidulin code is given by the matrices representing the F q -linear maps given by the q-polynomials a 0 x q 0 + a 1 x q 1 + · · · + a n−d x q n−d ∈ F q n [x]. The lifting map Λ : F m×n q → F m+n q m maps an (m×n)-matrix A to the row space (I m |A) , where I m denotes the m × m identity matrix. The mapping Λ is injective and its image is given by all m-dimensional subspaces of F m×n q having trivial intersection with the special subspace S = e m+1 , . . . , e m+n of F m+n q (e i denoting the ith unit vector). In fact, the lifting map defines an isometry from (F m×n q , 2d r ) into (L(F m+n q ), d s ). Of particular interest are the LMRD codes, i.e., CDCs obtained by lifting MRD codes, which are CDCs of fairly large, though not of maximal size.
Although we use the algebraic dimension v instead of the geometric dimension v − 1 in this paper, we adopt the use of geometric language: Abbreviating k-dimensional subspaces as k-spaces, we call 1-spaces points, 2-spaces lines, 3-spaces planes, 4-spaces solids, and (v − 1)-spaces hyperplanes.
For dimensions v ≥ 3 the automorphism group of the metric space (L(V ), d s ) is generated by PΓL(V ) and the polarity π. It carries the structure of a semidirect product PΓL(V ) ⋊ π ∼ = PΓL(v, q) ⋊ Z/2Z. Hence, for classifications of CDCs in V k up to isomorphism, the relevant acting group is PΓL(V ), except for the case v = 2k in which it is the larger group PΓL(V ) ⋊ π .
In order to describe suitable substructures of (8, N, 6; 4) 2 codes with large cardinality N , we will consider incidences with fixed subspaces. To this end, let I (S, X) be the set of subspaces in S ⊆ L(V ) that are incident with X ≤ V , i.e., I (S, X) = {U ∈ S | U ≤ X ∨ X ≤ U }. As special subspaces we explicitly label a point P = (0, 0, 0, 0, 0, 0, 0, 1) and a hyperplane H = {x ∈ V | x 8 = 0}. Note that P and H are not incident. By ι : F 7 2 → H we denote the canonical embedding, which we will apply to subspaces and sets of subspaces.
To keep the paper self-contained, we restate upper bounds for #I (S, X) and N from the earlier conference paper [12] with their complete but short proofs.
In particular, Corollary 2 shows that each point and hyperplane is incident with at most 17 codewords of an (8, N, 6; 4) 2 CDC. The next lemma refines this counting by including points which are not incident with a fixed hyperplane.
Proof. Let H be a hyperplane and P = F 8 2 1 be the set of points. Double counting of the set {(P, U ) ∈ P × C | P ≤ U } gives P ∈I(P,H) By Corollary 2, #I (C, P ) ≤ 17 for all points P . Assuming #I (C, P ) ≤ 13 for all points P ≤ H, the left hand side is ≤ 127 · 17 + 128 · 13 = 255 · 15 − 2, which is a contradiction.
Furthermore we need the following lemma to split a difficult problem into multiple small problems.
where the corresponding orbit sizes are decreasing, and τ : This lemma will be applied in Section 6 to exploit the symmetry for the computation of representatives of cliques of maximal size as well as for the solving of a binary linear program, cf. Section 5. If G = (V, E) is a graph with nontrivial automorphism group Aut(G), we use Lemma 3 with X = V , f defined by f (S) = 1 iff S is a clique, and Γ ≤ Aut(G). Let T be a transversal for the action of Γ on V . Then any nonempty clique of size c is in the same orbit as the clique {t}∪S ′ with t ∈ T and #S ′ = c − 1. This argument can also be applied recursively.
3 General outline of the proof of Theorem 1 In the first phase we try to extend the 715 + 14445 hyperplane configurations from Theorems 2 and 3 to (8, N, 6; 4) 2 CDCs with N ≥ 257. This is accomplished by using the linear programming relaxation of the integer linear programming model from Lemma 4. It turns out that such an extension is not possible for all but 38 of those hyperplane configurations.
For the remaining 38 hyperplane configurations the integer linear programming model for the extension to an (8, N, 6; 4) 2 CDC with N ≥ 257 is used. This test fails for all but seven of the 38 cases.
In the second phase we try to enlarge the remaining hyperplane configurations to larger substructures. Overall, we get 73 234 possible 31-point-hyperplane configurations.
In the third phase it is again tested if these configurations can be extended to (8, N, 6; 4) 2 CDCs with N ≥ 257. For this, the linear programming relaxation of the integer linear programming model from Lemma 5 is used. All but three hyperplane configurations with 195 + 98 + 240 31-point-hyperplane configurations fail this test.
Finally, the integer linear programming model shows that from the remaining 195 + 98 + 240 cases exactly two give (8, 257, 6; 4) 2 CDCs. All other configurations lead to smaller codes. We call those configurations hyperplane configurations and denote a transversal of the isomorphism classes of sets of planes in Theorems 2 and 3 by A 17 and A 16 , respectively. So, ι −1 I C, H ⊥ is isomorphic to exactly one set in A 16 ∪ A 17 . Computing the LP relaxation of a suitable integer linear programming formulation, see the next section, one can check easily that all but 38 of the 715 + 14445 hyperplane configurations can not be extended to (8, 257, 6; 4) 2 CDCs. These 38 remaining elements are listed in Table 3 and their LP values are stated Table 2. By F i we denote the corresponding sets of solids in F 8 2 for 1 ≤ i ≤ 38. Next we want to enlarge some of the possible hyperplane configurations to larger substructures, more precisely those with indices 1 ≤ i ≤ 7 in Table 3. Therefore we distinguish both possibilities for #I C, H . If it is 17, then Lemma 2 guarantees a point P ≤ H such that #I C, H + #I (C, P ) ≥ 17 + 14 = 31. If #I C, H = 16 then we can assume w.l.o.g. that #I (C, P ) ≤ 16 for all points P , since otherwise we can apply the orthogonality and have the first case. Then Lemma 2 guarantees a point P ≤ H such that #I C, H + #I (C, P ) ≥ 16 + 15 = 31. Since the stabilizer of H in GL(F 8 2 ) acts transitively 3 on the set of points not incident with H, we can assume #I C, P + #I C, H ≥ 31. We call sets of a solids in H and b solids containing P with 16 ≤ a ≤ 17, a + b = 31, and minimum subspace distance d = 6 briefly 31-pointhyperplane configurations.
Anticipating the results from Section 6, we mention that altogether just 242 nonisomorphic 31-point-hyperplane configurations can be extended to CDCs with cardinality 257. Moreover, we will verify indirectly that in all those extensions there exists a codeword U such that C\{U } is isomorphic to an LMRD code. This result has been achieved computationally in the context of the work [9]. However, to make this article as self-contained as possible, we decided to include the idea of the proof.
Proof. Let C be a 4×4 MRD code over F 2 of minimum rank distance 3. Then #C = 256. For each vector v ∈ F 4 2 , there are exactly 16 matrices in C having v as their last row. After removing this common row, these 16 matrices form a binary 3 × 4 MRD code of minimum rank distance 3. These MRD codes have been classified in [14] into 37 isomorphism classes.
Let C ′ be one of these codes, extended to size 4 × 4 by appending the zero vector as the last row to all the matrices in C ′ . Up to isomorphism, C is the extension of one of these 37 codes C ′ by 256− 16 = 240 matrices. In particular, for each v ∈ F 4 2 \{0}, it must be possible to add 16 matrices of size 4 × 4 with last row v without violating the rank distance condition. For fixed v, this question can be formulated as a clique problem: We define a graph G v whose vertex set is given by all 4 × 4 matrices with last row v having rank distance ≥ 3 to all matrices in C ′ . Two vertices are connected by an edge if the corresponding matrices have rank distance ≥ 3. Now the question is whether all graphs G v , v ∈ F 4 2 \{0}, admit a clique of size 16. Using the software [25], we found that out of the 37 types of codes C ′ , this is possible only for a single type.
For this remaining type, the full extension problem to a 4 × 4 MRD code is again formulated as a clique problem. The graph is defined in a similar way, but without the restriction on the last row of the matrices in the vertex set. This yields a graph with 1920 vertices. The maximum clique problem is solved within seconds for this graph 4 The result are 8 cliques of maximum possible size 240. In other words, there are 8 extensions to a rank distance 3 code of size 16 + 240 = 256, i.e., an MRD code. All 8 codes turned out to be isomorphic to the Gabidulin code.
and b ∈ F 7 2 , any point that is not incident with H, i.e., (p | 1) with p ∈ F 7 2 , can be mapped via I 7 0 p 1 −1 to P . 4 We noticed that the order of the vertices makes a huge difference for the running time. For fast results, matrices with the same last row should be numbered consecutively.
By the last theorem, in our setting there is only a single type of LMRD code, which is the lifted Gabidulin code. It is iso-dual (isomorphic to its orthogonal code).
Corollary 3. Let C be an (8, 257, 6; 4) Gabidulin code with minimum rank distance 3, I 4 is the 4 × 4 identity matrix, and 0 m×n is the m × n all-zero matrix.
Proof. From Theorem 4 we conclude that C ′ is the lifted Gabidulin code M . The automorphism group A of C ′ has order 4 · 15 2 · 2 8 = 230 400. Identifying V with F 16 × F 16 and denoting by α a generator of F × 16 , A is generated by (x, y) → (x 2 , y 2 ), (x, y) → (αx, y), (x, y) → (x, αy), and the "translations" (x, y) → (x, a 0 x+ a 1 x 2 + y) with a 0 x+ a 1 x 2 ∈ M . From this it is readily seen that A partitions the 451 solids intersecting each codeword of C ′ in at most a point (these are precisely the solids intersecting the special solid S of C ′ in at least a plane) into two orbits: An orbit of size 1 containing S, which is fixed by A, and an orbit of size 450 containing the solids that meet S in a plane. This accounts for the two indicated isomorphism classes of C.

Integer linear programming models
It is well known that the determination of A q (v, d; k) can be formulated as an integer linear programming problem with binary variables (BLP). If all constraints of the form x ∈ {0, 1} are replaced by x ∈ R ≥0 , we speak of the corresponding linear programming relaxation (LP). Suppose that we already know that a CDC C contains the solids from F ⊆ x Proof. Interpreting (x U ) U ∈Var 8 as characteristic vector of C, the objective function equals #C. The first two sets of constraints are feasible by Lemma 1 and the choice of f . The third set of constraints is feasible since F ⊆ C.
If #F is rather small, then the computation of z BLP 8 (F, f ) takes too much time, so that we also consider a linear programming formulation for #{U ∩ H | U ∈ C}, i.e., we consider the image of C in H.
Interpreting (x U ) U ∈Var 7 (F ) as characteristic vector of {U ∩ H | U ∈ C ∧ U ≤ H}, one can check the correctness of the objective function and the last two lines. Since two solids in C intersect in at most a point, any two elements in {U ∩ H | U ∈ C} also intersect in at most a point, which gives the constraints with dim(W ) ∈ {2, 4}. Any 5-space W contains at most ω(F, W ) planes by choice of ω, also ι(W ) is incident with 8−5 6−5 2 = 7 6-spaces, which in turn contain at most one codeword of C. If W contains a solid of F , then any plane in W meets this solid in at least a line. This gives the constraints with dim(W ) = 5.
For any point W its embedding ι(W ) is incident with at most #F codewords of C giving the constraints with dim(W ) = 1.
For any 6-subspace W its embedded ι(W ) is contained in 8−6 7−6 2 = 3 hyperplanes in F 8 2 of which one of them is H. Since each hyperplane is incident with at most #F codewords andH is incident with exactly #F codewords, i.e., ι(F ), the other two hyperplanes are each incident with either #F codewords if W contains no element of F or #F − 1 codewords if W contains one element of F . Obviously two solids in a 6-space intersect in at least a line and hence W contains at most one element of F . This gives the constraints with dim(W ) = 6.
The last inequality allows the BLP solver to cut the branch & bound tree early since we are only interested in solutions of cardinality at least 255.

Proof of the main theorem
The algorithmic proof of Theorem 1 is split into several phases that are described in detail in the following subsections; Subsection 6.i corresponds to Phase i. The (integer) linear programming problems are solved with CPLEX [17].
Let C be an (8, #C, 6; 4) 2 CDC with #C ≥ 257. As argued at the beginning of Section 4, C has to contain one of the 715 + 14445 hyperplane configurations from A 17 ∪ A 16 . This list is reduced in Phase 1, see Section 6.1, and then extended to 31-point-hyperplane configurations in Phase 2, see Section 6.2. The resulting list is reduced in Phase 3, see Section 6.3. Then we deduce that C must be an LMRD code extended by a single codeword, see Section 6.4. The classification of such structures at the end of Section 4 concludes the proof. Let us mention that the termination of Phase 1 proves A 2 (8, 6; 4) ≤ 271 and the termination of Phase 3 proves A 2 (8, 6; 4) = 257. The required computation times for the four phases are 42 087, 2 214, 1 804, and 2 168 hours, respectively, i.e., 48 273 hours in total.
Besides the internal parallelization performed by the ILP solvers, we employed parallelization only by setting up independent subproblems. We used the cluster of the University of Bayreuth 5 for solving the subproblems and other computers for the management and generation of the subproblems.

Excluding hyperplane configurations
For all A ∈ A 16 ∪ A 17 we computed z LP 8 (ι(A ⊥ ), #A) and found that all but 33 elements in A 16 (37 251 hours) and 5 elements in A 17 (1021 hours) have an optimal value smaller than 256.9, i.e., we have implemented a safety threshold of ε = 0.1. These 38 elements are listed in Table 3 and their LP values are stated in Table 2.
For indices 1 ≤ i ≤ 38 we computed z BLP 7 (ι(F i )) and obtained 6 elements in A 16 and 2 elements in A 17 that may allow z BLP 7 (ι(F i )) ≥ 256.9, cf. Table 2 for details. This computation was aborted after 100 hours of wall time for each of these 38 subproblems.
Var 7 (ι(F 8 )) has exactly 948 planes which form 56 orbits (4 3 8 13 16 28 32 12 ) under the action of the automorphism group of order 32. We apply Lemma 3 to obtain 56 subproblems. Less than 15 hours were needed to verify z BLP 7 ≤ 256 in all cases.

Extending hyperplane configurations to 31-point-hyperplane configurations
The seven hyperplane configurations, with indices 1 ≤ i ≤ 7 remaining after Phase 1, are extended to 31-point-hyperplane configurations. We define a graph G i = (V i , E i ), whose vertex set V i consists of all solids in  Table 1 for details about the running times and #V i ; the computation time for the transversal was negligible). The transversal is denoted by T (F i ); see Column 6 of Table 2 for the corresponding orbit lengths. The clique computation for G 5 was aborted after 600 hours and then performed in parallel by applying Lemma 3 with X as the vertex set of G 5 , Γ the automorphism group of F 5 , and the function f defined by f (S) equals 1 iff S is a clique in G 5 . In general, we label the elements of T in decreasing order of the corresponding orbit lengths, since large orbits admit small stabilizers and forbid many elements from X in the subsequent subproblems, resulting in few rather asymmetric large subproblems and many small subproblems. The 1258 vertices of G 5 are partitioned into 24 orbits of size 1 and 617 orbits of size 2 by Γ, leaving 641 graphs where we have to enumerate all cliques of size 31−#F 5 −1 = 14. Since some of these graphs still consist of many vertices, we iteratively apply Lemma 3 with the identity group as Γ for at most two further times: After the first round we split the 68 subproblems, which lead to graphs with at least 700 vertices. Then, we split the 81 subproblems, which lead to graphs with at least 600 vertices. We are left with 104 029 graphs, for which we have to enumerate all cliques of size 14, 13 or 12. All of these instances were solved in parallel with Cliquer to get a superset of the transversal of all cliques of size 15 of G 5 . Applying the action of the automorphism group of order 2 then allowed us to obtain a transversal as well as all cliques, simply as union of the orbits. This took about 750 hours of CPU time, the smaller problems being preprocessed on a single computer and the remaining 55 420 larger subproblems being processed in parallel with 16 cores.
The extension of the configuration with index 5 took 750 hours, and the extension of the other indices took 1464 hours; see Table 1 for details.

Excluding 31-point-hyperplane configurations
For the 73 234 31-point-hyperplane configurations resulting from Section 6.2, we computed z LP 8 (.) in 953 hours. The maximum value aggregated by the contained hyperplane configuration with index i is stated in Column 7 of Table 2, see also  Table 2) for these remaining 195 + 98 + 240 cases in 851 hours (see Table 1).
The counts for value exactly 257 are 2 + 0 + 240.  . Note that since we are encoding matrices in reduced row echelon form, the three pivot columns are the first numbers 1, 2, and 4 appearing in this order and no digit is larger than 7. Table 2 lists for these CDCs whether it is in A 16 or A 17 , the size of their automorphism group, the relaxations z LP 8 (.) and z BLP 7 (.), which are applied to the hyperplane configurations, then the orbits of the extension to point-hyperplane configurations of each hyperplane configuration and finally the maximum of z LP 8 (.) with prescribed point-hyperplane configuration grouped by the contained hyperplane configuration and, if needed, the maximum z LP 8 (.), again for prescribed point-hyperplane configuration grouped by the contained hyperplane configuration. Details for the extension of one of the first seven hyperplane configurations to the corresponding point-hyperplane configurations are shown in Table 1.