PIR Codes with Short Block Length

In this work private information retrieval (PIR) codes are studied. In a $k$-PIR code, $s$ information bits are encoded in such a way that every information bit has $k$ mutually disjoint recovery sets. The main problem under this paradigm is to minimize the number of encoded bits given the values of $s$ and $k$, where this value is denoted by $P(s,k)$. The main focus of this work is to analyze $P(s,k)$ for a large range of parameters of $s$ and $k$. In particular, we improve upon several of the existing results on this value.


I. INTRODUCTION
A k-private information retrieval (k-PIR) code is a coding scheme which encodes by linear operation some s information bits to n encoded bits such that each information bit has k mutually disjoint recovery sets. The main figure of merit when studying PIR codes is the length n of the code, given the values of s and k. Thus, the value P(s, k) denotes the minimum value of n for which a length-n k-PIR code exists.
PIR codes are similar in their definition to locally repairable codes (LRC) with availability [21], [25], [28], however PIR codes do not impose any constraint on the size of the recovery sets as done for LRCs. In fact, these codes have more in common with one-step majority-logic decodable codes that were studied a while ago by Massey [20] and later by Lin and others [16] for applications of fast decoding. The main difference is that one-step majority-logic decodable codes require that each symbol (both information and redundancy) will have multiple recovery sets.
The rest of this section is organized as follows. In Section II, we formally define the codes studied in the paper, list some of the known previous results which are relevant to our work, and discuss several preliminary results. In Section III, it is shown to how to construct PIR codes by lengthening existing codes. Additionally we give a geometric construction using s-dimensional simplex codes as a starting point. In Section IV we present a general linear programming formulation for PIR codes which provides lower bounds on the parameters of these codes and in many cases completely determine the value of P(s, k). Coding theoretic methods are applied in Section V in order to obtain a few more lower bounds and exact values. We fully solve the cases where s = 4 or s = 5 and present further lower and upper bounds in Section VI. There we also summarize the best known lower and upper bounds for small parameters in Table I.

A. Definitions
In this section we formally define the codes studied in this paper. A binary linear code of length n and dimension s will be denoted by [n, s] or [n, s, d], where d denotes its minimum Hamming distance. The set [n] denotes the set of integers {1, 2, . . . , n}. The binary field is denoted by F 2 .
In this work we focus on private information retrieval (PIR) codes that were defined recently in [10]. This family of codes requires to encode some s information bits into n encoded bits such that every information bit has k mutually disjoint recovery sets. Formally, these codes are defined as follows.
Definition 1. An [n, s] binary linear code C will be called a k-PIR code, and will be denoted by [n, s, k] P , if for every information bit e i , i ∈ [s], there exist k mutually disjoint sets R i,1 , . . . , R i,k ⊆ [n] such that e i is the sum of the bits in R i, j .
The main problem in studying PIR codes is to minimize the length n given the values of s and k. We denote by P(s, k) the smallest n such that there exists an [n, s, k] P code and the optimal redundancy of k-PIR codes is defined by r P (s, k) P(s, k) − s. In case k = 1, the code [s, s] which simply stores all the information symbols is an [s, s, 1] P PIR code, so that P(s, 1) = s. Similarly, the simple parity check code [s + 1, s] is an [s + 1, s, 2] P PIR code which implies that P(s, 2) = s + 1.

B. Previous Work
In [10], it was shown that for any fixed k 3 it is possible to construct [n, s, k] PIR codes where n = s + O( √ n), so r P (s, k) = O( √ s) for any fixed k 3, and in [24], [29] it was proved that r P (s, 3) = Ω( √ s). Since r P (n, k) r P (n, 3) for any fixed k 3, these results assure also that for any fixed k 3, r P (n, k) = Θ( √ n). There are several known results and constructions of PIR codes; see e.g. [11], [17], [27]. We summarize here the most relevant known results for our problem: 1) P(s 1 + s 2 , k) P(s 1 , k) + P(s 2 , k).
Since the minimum Hamming distance of every k-PIR code is at least k it holds that where N(s, k) denotes the the smallest integer n such that a [n, s, k] code exists. In [14] the so-called Griesmer bound was proven. Interestingly enough, for every fix integer s we have N(s, k) = G(s, k) if k is sufficiently large [2], i.e., for every fix integer s the determination of the function N(s, k) is a finite problem. Those functions are explicitly known for all s 8 [4]. Much research has been devoted on the determination of N(s, k) for specific parameters s and k. For the currently best known lower and upper bounds on N(s, k) we refer to the online tables [13]. 1 The values of N(s, k) are a good benchmark for constructions of s-dimensional binary k-PIR codes, i.e., constructive upper bounds for P(s, k). If N(s, k) is met, then the corresponding PIR code is obviously optimal. It is quite hard to get better lower bounds than P(s, k) N(s, k). One parametric improvement was stated in the literature so far: P(s, 3) s + 2s + 1 4 + 1 2 , see [24,Theorem 3,Equation 10]. If we combine it with the puncturing constraint P(s, k) P(s, k − 1) + 1, then we obtain P(s, k) 2s for k 3. We remark that for k = 3 or k = 4 this inequality is always at least as good as the coding theoretic lower bound P(s, k) N(s, k) and it is indeed a strict improvement for larger values of s. 2 For systematic PIR codes, see the explanation below, the same bound was also proved in [29] and [27].
Despite significant progress on determining the exact value of P(s, k), this problem is far from being solved. The goal of this paper is to build upon previous work and develop new tools which are specifically targeted towards deriving upper and lower bound on P(s, k) in order to establish many cases which still remained open. For example, if we apply Theorem 4 to a 9-dimensional 10-PIR code of length 28, we can conclude P(10, 10) 33, which improved the best known construction. By solving an instance of the general lengthening problem this can even be improved to P (10,10) 31, see the discussion in Section II-C. 1 We remark that with respect to lower bounds on N(s, k) it makes also sense to check the entries at http://mint.sbg.ac.at which sometimes contain improvements. 2 More precisely, it is a strict improvement for s = 4 and all s 7. An exact formula for N(s, 3), and so also for N(s, 4), exists. It is attained by the Hamming codes and puncturings thereof. The lower bound follows from the Hamming or sphere packing bound.

C. Preliminaries
Note that an [n, s] code is a k-PIR code if it admits a generator matrix G ∈ F s×n 2 such that for each 1 i s there exist disjoint sets R i 1 , . . . , R i k ⊆ [n] such that ∑ h∈R i j G h = e i for all 1 j k, where G h denotes the hth column of G and e i denotes the ith unit vector. The interpretation is that e i can be recovered by the k disjoint sets R i j , which are therefore also called recovery sets. The set of all recovery sets for e i is denoted by R i , i.e., R i = R i j | 1 j k . We call a recovery set R for e i minimal if no proper subset R ′ R is a recovery set for e i . Example 1. An example for a generator matrix attaining P(4, 4) = 9 is given by: For e 4 we can use the recovery sets Note that there is also a different list of recovery sets: The later might have the advantage that it only uses recovery sets of cardinality at most 3. As a different notation for recovery sets we also use the columns directly (instead of their labels). In our last example R 4 then reads For minimal recovery sets these are indeed sets of non-zero vectors in F s 2 and multisets in general. In the latter case points can have a multiplicity larger than 1. ✷ Each binary linear [n, s] code C of effective length n is in bijection to a multiset P of points, i.e., 1-dimensional subspaces of F s 2 , of cardinality n. Starting from a generator matrix G we can obtain a multiset of points by choosing the points G k for every column G. In the other direction we can choose an arbitrary generator for each point, i.e., a column vector, and build up a generator matrix with those column vectors. This geometrical point of view can gives an easy way to form constructions of PIR codes. For example, by taking the set of all 2 s − 1 points in F s 2 , we get the so-called s-dimensional simplex code in order to get the known result of P(s, 2 s−1 ) 2 s − 1 for all s 1. 3 We will use this geometric formulation when deriving several of our results, see especially Proposition 7.
For a lower bound let H be an arbitrary hyperplane of F s 2 . Since dim(H) = s − 1 there exists at least one index 1 i s such that e i is not contained in H. Thus, there have to be at least k points outside of H to form the recovery sets: [26,Theorem 2], [19,Lemma 2]) Let P be the multiset of points corresponding to an sdimensional k-PIR. For every hyperplane H of F s 2 we have counting points with there respective multiplicity.
Summing up Inequality (3) for the 2 s − 1 hyperplanes gives P(s, k) [10,Theorem 16], since each point of F s 2 is contained in exactly 2 s−1 − 1 hyperplanes. While this gives P s, 2 s−1 = 2 s − 1, it can be improved easily. Since each hyperplane of F s 2 corresponds to a codeword c H of the code whose weight equals the number of points outside of H, we have the well known fact that each k-PIR code has a minimum Hamming distance of at least k. Thus, we have where N(s, k) denotes the the smallest integer n such that a [n, s, k] code exists. In [14] the so-called Griesmer bound was proven. Interestingly enough, for every fix integer s we have N(s, k) = G(s, k) if k is sufficiently large [2], i.e., for every fix integer s the determination of the function N(s, k) is a finite problem. Those functions are explicitly known for all s 8 [4]. Much research has been devoted on the determination of N(s, k) for specific parameters s and k. For the currently best known lower and upper bounds on N(s, k) we refer to the online tables [13]. 4 The values of N(s, k) are a good benchmark for constructions of s-dimensional binary k-PIR codes, i.e., constructive upper bounds for P(s, k). If N(s, k) is met, then the corresponding PIR code is obviously optimal. It is quite hard to get better lower bounds than P(s, k) N(s, k). One parametric improvement was stated in the literature so far: P(s, 3) s + 2s + 1 4 + 1 2 , see [24, Theorem 3, Equation 10]. If we combine it with the puncturing constraint P(s, k) P(s, k − 1) + 1, then we obtain P(s, k) 2s for k 3. We remark that for k = 3 or k = 4 this inequality is always at least as good as the coding theoretic lower bound P(s, k) N(s, k) and it is indeed a strict improvement for larger values of s. 5 For systematic PIR codes, see the explanation below, the same bound was also proved in [29] and [27].
We call a generator matrix G of a linear code systematic if it contains a unit matrix. While every linear code admits a systematic generator matrix it is not clear whether there always exists a systematic PIR code matching P(s, k), see Question 4 in [26, Sec. 10]. Here we give an example that this is not the case. More specifically, we will show P(6, 8) = 19 in Section IV while every systematic PIR code has length at least 20, see Proposition 9. An optimal non-systematic generator matrix (of length 19) is given by and an optimal systematic generator matrix (of length 20) is given by  We call a code projective if all columns of an arbitrary generator matrix are pairwise linear independent, i.e., no column is a multiple of another one. Note that the last-but-one generator matrix (and so the code) is projective, while the last generator matrix is not, since the last four columns correspond to two points with multiplicity 2 each.
According to this observation we notice that while the objects under consideration are called PIR codes, in fact their properties actually can depend on their generator matrix. This is different for codes with disjoint repair groups as e.g. studied in [29]. The code property of disjoint repair groups is more demanding than that of PIR codes (depending on the generator matrix). For systematic generator matrices both notions are the same, so that the lower bound from [29] only works (directly) for systematic generator matrices.
Let P be a multiset of points in F s 2 and n denote its cardinality #P. By h i we denote the number of hyperplanes of F s 2 that contain exactly i points. Counting incidences (H), (H, P), and (H, P, P ′ ), where the H are hyperplanes and the P, P ′ are different points in P, gives the so-called standard equations where y 2 denotes the number of ordered different pairs of P that correspond to the same geometric point. We remark that the above so-called standard equations are a geometric variant of the first 3 MacWilliams identities. We will use them in the proof of Lemma 12.

III. BASIC CONSTRUCTIONS OF PIR CODES
Next we consider how to lengthen a k-PIR code in order to increase the number of information bits it stores and still preserve its property as a k-PIR code. Hence, we add columns and a row to a generator matrix, where we consider the special case, where the new columns all are unit vectors with the one entry in the last row. Proposition 6. Let G be a generator matrix of an s-dimensional linear code of length n. If G can be lengthened by one row and t columns of e s+1 to a generator matrix G ′ of an for all 1 i s there exist recovery sets R i of e i in G such that for G ′ the recovery sets each yield either e i or e i + e s+1 , and the latter case occurs at most t times, (4) the recovery sets of e s+1 in G ′ without those that only consist of a unit vectors e s+1 have the property that for G they sum to zero and there are at least k − t of them.
Proof: The length of G ′ directly follows from the construction, so that (1) holds. Let R ′i be the recovery sets of e i in G ′ for 1 i s. Over G ′ one of these recovery sets sums to e i . W.l.o.g. we assume that the recovery sets are reduced so that they contain e s+1 at most once. If e s+1 is not contained in the recovery set, then it is also a recovery set for e i in G. If e s+1 is contained in the recovery set, then we can remove e s+1 and obtain a recovery set of e i in G that sums to e i + e s in G ′ . Of course the latter case can occur at most t times since the e s+1 is contained exactly t times as a column in G ′ . This gives (3) and (2). Consider the set R ′s+1 of the k recovery sets of e s+1 in G ′ . At least k − t are not given by the singleton {e s+1 }. Since the corresponding columns in G ′ sum to e s+1 , they sum to zero in G.
The insights of Proposition 6 can be turned into the following algorithm to generate (s + 1)-dimensional k-PIR codes of length n + t from s-dimensional k-PIR codes of length n.
(1) Compute a list C 1 of recovery sets R i 1 i s of the unit vectors in G such that |R i | = k.
(2) Compute a list C 2 of a set Z of disjoint sets whose corresponding columns in G sum to zero, where |Z | k − t. is the generator matrix of an (s + 1)-dimensional k-PIR code of length n + t. We remark that recovery sets as well as dual codewords might be found looping over all binary vectors of length n. Collections of disjoint recovery sets R i or the disjoint zero sets Z might be found by clique search. Of course a promising heuristic is to check only binary vectors of rather small weight.
Example 2. In this example, we consider the 9-dimensional 10-PIR code of length P(9, 10) = 28 with generator matrix and try t = 3. It comes with a collection of recovery sets with cardinality distribution 1 1 3 9 for each 1 i s, which we took as our single candidate in C 1 . For C 2 we build up a graph with 524 287 nodes and determined a clique of maximum cardinality 7. After less than 2 minutes computation time we found the first extension r = 0000000101001000100110010000 . For e 10 the recovery sets in label notation are given by  17, 18, 27}. After less than six minutes we found 9 different extensions certifying P(10, 10) 31 in total. So, in our example we have chosen both C 1 and C 2 of cardinality 1. ✷ In general, if we apply the above algorithm as a heuristic and not for exhaustive enumeration, we do not need to find all possibilities. As mentioned above, the same applies to the possibilities for the extension row r. This rough idea leaves a lot of space for algorithmic implementations.
It was stated in [17] that the coding theoretic lower bound is tighter if, given dimension s, the number k of disjoint recovery sets is relatively small. Their formulation might be interpreted in the way that they claim that for k = 6 (or equivalently k = 5) the coding theoretic bound is always at least as tight as Inequality (5). 6 However, this is not the case. An example is given by s = 92 and k = 5, where N(92, 5) = 106 7 but P(92, 5) 107 due to Inequality (5). Also for t > 6 there are such examples, however, they require rather large values of s. So, the situation should be as follows: If k > 4 and the dimension s is not too big with respect to k, then the coding theoretic lower bound is superior. If the dimension gets huge, then Inequality (5) is tighter. For k = 3 or k = 4, see Footnote 5.
Using even more geometric terms, we can formulate a parametric construction. The rough idea is a follows. We start with an s-dimensional simplex code, represented by a set of points P, and the recovery sets stated in Footnote 3. A line in F s 2 is a set of three collinear points, i.e., three non-zero vectors a, b, c in F s 2 with a + b + c = 0. We iteratively remove the three points from a line from P and modify the list R i of recovery sets accordingly. A partial line spread in F s 2 is a set of lines that do not have a non-zero vector in common. The maximum possible cardinality of a line spread is 2 s −1 3 if s is even and 2 s −5 3 if s is odd, see e.g. [3].
Proposition 7. For every integer s 3 and every integer 0 λ Proof: Let P be the set of 2 s − 1 points in F s 2 and R i the corresponding lists of recovery sets for 1 i s of the sdimensional simplex code, see Footnote 3. We will remove 3λ points from P and modify the R i accordingly. To this end, let H be a hyperplane of if s is even. Note that the upper bound for λ equals this cardinality. For an arbitrary subset of L of cardinality λ we are remove the corresponding 3λ points from P to get the set of points of our final PIR code. We also have to adjust the lists R i . Since H does not contain any unit vector all elements of the R i have cardinality two in the beginning and no recovery set of cardinality two is completely contained in a line from L. So, consider one line that is removed and assume that its points are given by {a, b, c}. For a fixed but arbitrary 1 i s the recovery sets for e i that contain either a, b, or c are given by {a, a + e i }, {b, b + e i }, and {c, c + e i }. Those three recovery sets are destroyed by our operation of removing {a, b, c}, but we can additionally add the recovery set {a + e i , b + e i , c + e i }, noting that (a + e i ) + (b + e i ) + (c + e i ) = (a + b + c) + (e i + e i + e i ) = 0 + e i = e i .
It remains to check that no point x is used in two recovery sets of cardinality three for the same e i and that we do not remove a point y that is contained in a constructed recovery set of cardinality three. If x is contained in two recovery sets of cardinality three for e i , then x + e i is contained in two lines of L, which contradicts the disjointness. If y is removed, then y ∈ H. If, additionally, y is contained in a recovery set of cardinality three for e i , then y + e i is removed, so that y + e i ∈ H. Thus, e i ∈ H, which is a contradiction.
Two disjoint lines in H are e.g. given by Proof: Since the result is trivial for s = 2 we assume s 3 and consider the lower bound This lower bound is attained by the construction from Proposition 7.

IV. INTEGER LINEAR PROGRAMMING FORMULATIONS
In this section we present an integer linear programming (ILP) formulation for the exact determination of P(s, k). Given the dimension s, we set X = F s 2 \ {0}. The generator matrix is sufficiently characterized if we know for each element j ∈ X the integer-valued multiplicity x j of that column. By Y i we denote the set of all minimal recovery sets for e i . For each j ∈ Y i we denote by the integer y i j the number of times recovery set j for e i is used. With this, the value of P(s, k) is given by x j ∈ N ∀ j ∈ X (13) Inequality (10) guarantees that the column multiplicities are sufficiently large for the chosen recovery sets. Inequality (12) ensures that there are at least k disjoint recovery sets for each e i . Inequality (11) implements Lemma 5. In principle, those inequalities are not necessary, but in practice they usually speed up the solution process. Additional lower and upper bounds on x j can be deduced from Proposition 18, Lemma 20, respectively. (Tighter bounds can be obtained from more sophisticated coding theoretic arguments, see Section V.) The problem with that ILP formulation is that it quickly gets too huge to be solved exactly. So, it is applicable for rather small parameters s and k only, where the size of k plays almost no role.
By imposing further restrictions we can use the above ILP formulation as a heuristic to find good codes that eventually attain the known lower bounds or improve known constructions from the literature. The restriction to systematic PIR codes can be enforced by x e i 1 for all 1 i s and the restriction to projective PIR codes can be implemented by x j ∈ {0, 1} for all j ∈ X . If the dimension s is not too large, then #X , i.e., the number of x variables is still manageable. In order to prevent the combinatorial explosion of #Y i we can restrict ourselves to recovery sets of cardinality at most λ by modifying the definition of Y i accordingly. In our numerical experiments we mostly chose λ = 3 (and λ = 4 in some very small cases). The intuition between this heuristic is as follows. We know that the simplex code is optimal and uses recovery sets of cardinalities 1 and 2 only. For large values of k the coding theoretic lower bound is usually not too far away from the fractional simplex code, i.e., the mentioned lower bound P(s, k) N(s, k) So, for a good PIR code not too many recovery sets of cardinality larger than 2 can occur. There is some hope that recovery sets of cardinality at most 3 are sufficient provided k is large enough. The ILP constructions in Table I support this hope. For smaller values of k it seems that larger recovery sets are necessary.
Another way to decrease the computational complexity is to prescribe a subgroup of the final automorphism group of the code. In our context automorphisms are permutations of the columns and rows of the generator matrix, c.f. [12] where transitive automorphism groups are considered. Counting point multiplicities for the columns, as above, this leaves row permutations only. So, for some subgroup H S s of the symmetric group on s elements we can require x j = x π j and y i j = y i π j for every π ∈ H, 1 i j, and j ∈ X or j ∈ Y i , respectively. This reduces the number of variables as well as the number of constraints, since several of them become identical. If the corresponding substitutions and removals of identical constraints are performed directly this is also known under the name Kramer-Mesner approach. As an example we state that prescribing a cyclic Z 6 allowed us to construct an example for P(6, 16) 36 improving the previous bound P(6, 16) 39 [27] and prescribing a cyclic Z 3 8 allowed us to construct an example for P (7,16) 39 improving the previous bound P(7, 16) 43 [27].
With respect to lower bounds, we remark that it is possible to modify the initial ILP formulation by restricting the possible sizes of recovey sets to at most λ while still obtaining a lower bound that is valid without this assumption. Given a fix value of λ we cannot require Inequality (12) any more, since we ignore recovery sets that have a cardinality strictly larger than λ. If we set n = ∑ j∈X x j , then we can replace Inequality (12) with the following relaxation: where # j denotes the size of the recovery set. The idea is simple: k − ∑ j∈Y i y j recovery sets for e i have to be of cardinality at least λ + 1 and the number of all recovery sets for e i cannot be smaller than n. So, choosing recovery sets of large size has no consequences for the x j directly but imposes lower bounds on n, which is a relaxation of the original inequality. In the other direction, Inequality (11) can be tightened: where j ∩ F s 2 \H denotes the number of elements of the recovery set j ∈ Y i that are not contained in H. The argument for the hyperplane conditions of Inequality (11), see Lemma 5, was that vectors in H cannot build a recovery set for e i on their own for at least one 1 i s, so that at least one column outside H is needed for each recovery set. Now, if we fix i and we know that some recovery sets use more than one column outside of H than the total number of columns outside of H increases, which gives Inequality (16).
We remark, that for λ 2 Inequality (16) is the same as Inequality (11). Assume the contrary and suppose that for a given hyperplane H of F s 2 and a given unit vector e i both points of the two-element recovery set {a, b} for e i are not contained in H. Since the line {a, b, a + b + e i } intersects H in a point, we have e i ∈ H, which is a contradiction.
Here we applied symmetry breaking techniques and additional inequalities. More concretely, we started by adding x e 1 x e 2 . . . x e 6 , n 35 and maximizing e 1 . After an upper bound of one was verified we stopped and concluded the additional inequalities x e i 1 for all 1 i 6. Similarly, we tried x 1−e 1 x 1−e 2 . . . x 1−e 6 , n 35 and maximized x 1−e 1 . After 14 028 seconds and 4 129 360 branch&bound (B&B) nodes an upper bound of 1 was verified, so that x 1−e i 1 are valid inequalities for all 1 i 6 (assuming n 35). With these additional 1-inequalities we started a last round of symmetry breaking: We introduced 8 Note that for s = 7 there are two possible cycle types of the cyclic group Z 3 in S 7 up to conjugation. the integer variables s i counting the sum of those x p where point p has a one in the ith coordinate. By symmetry we can assume s 1 s 2 . . . s 6 . Maximizing s 1 with the additional assumption n 35 was solved after 102 967 seconds and 24 767 382 B&B nodes to be infeasible, so that P(6, 16) = 36. We applied the same technique to computationally verify P (6,14) 32 in 21 566 seconds and 12 701 361 B&B nodes.
Proof: We apply the lower bound ILP with λ = 3 while additionally prescribing the use of the vectors from an s-dimensional unit matrix. After less than 900 seconds and 125603 B&B nodes a solution with n = 20 was proven to be optimal.

V. LOWER BOUNDS AND THE DUAL MINIMUM DISTANCE/DUAL CODE(S)
In this section we want to use coding theoretic methods to provide lower bounds for P(s, k). To this end let C be an [n, s] code. The corresponding dual code C ⊥ is the [n, n − s] code whose codewords are those that are perpendicular to all codewords in C. By d ⊥ we denote the minimum distance of C ⊥ , which is also called the dual minimum distance.
Lemma 10. Let C be a linear [n, s, d] code with minimum dual distance d ⊥ and generator matrix G. (a) If R and R ′ are two different recovery sets for the same symbol i in G, then |R| + |R ′ | d ⊥ . (b) If G is a k-PIR generator matrix with k 2, then n k⌈d ⊥ /2⌉ − 1 if d ⊥ is odd and n k⌈d ⊥ /2⌉ if d ⊥ is even. (c) If G is a k-PIR generator matrix that contains a unit vector e i , then n 1 + (k − 1)(d ⊥ − 1).
is the support of a dual codeword, which gives part (a). Next, we consider an arbitrary unit vector e i and let m denote the cardinality of the smallest recovery set for e i . From (a) we conclude that every other recovery set has cardinality at least d ⊥ − m, so that n (k − 1)(d ⊥ − m) + m. The special case m = 1 corresponds to (c). For part (b) we can argue as follows. If m ⌈d ⊥ /2⌉, then n k⌈d ⊥ /2⌉, so that we assume m ⌈d ⊥ ⌉ − 1 and conclude Proposition 11. For each integer s 4 we have P(s, 2 s−2 ) 2 s−1 + 1.
Proof: It is well known that N(s, 2 s−2 ) = 2 s−1 with the unique solution being the first order Reed-Muller code, i.e., in geometric terms, all points of F s 2 except those in a distinguished hyperplane, see e.g. Lemma 12 for a short self-contained proof. As no multiple points or lines (sets of three collinear points) are contained, the dual minimal distance d ⊥ is at least 4 (indeed it is 4). Let G be a generator matrix that is a 2 s−2 -PIR code. If G contains a unit vector, then Lemma (10).(c) gives n 1 + 2 s−2 − 1 · 3, which is a contradiction for s 4. Thus, G does not contain any unit vector and every recovery set has cardinality at least 2. Since n = 2k every recovery set has cardinality exactly 2.
In order to obtain a contradiction we now prove the following statement by induction on 1 j s − 1. For each 1 j s − 1 there exist vectors x 1 , . . . , x l ∈ F s 2 , where l = 2 s−1− j , such that the columns of G are given by x h + e 1 , . . . , e j | 1 h l , where we slightly abuse notation. By x h + e 1 , . . . , e j we abbreviate the list of 2 j vectors contained in the affine F 2 -vector space x h + e 1 , . . . , e j .
For the induction start we remark that the 2 s−1 columns are partitioned into pairs {x h , x h + e 1 } corresponding to the recovery sets of e 1 . For the induction step we assume that the columns of G are partitioned into l = 2 s−1− j sets, which we call blocks, of the form x h + e 1 , . . . , e j . Now we are considering the recovery sets of e j+1 . Let R = {a, b} a recovery set of e j+1 that was not considered before. Note that a and b have to be contained in different blocks since a + b = e j+1 . W.l.o.g. let a be contained in the first block x 1 + e 1 , . . . , e j and b in the second block x 2 + e 1 , . . . , e j , so that we can reparameterize to a + e 1 , . . . , e j and b + e 1 , . . . , e j . Since a + b = e j+1 the union of the two blocks can be described by B = a + e 1 , . . . , e j , e j+1 . Note that this new block B contains 2 j recovery sets for e j+1 of cardinality 2. In principle those recovery sets do not need to coincide with those from which we started. However, we can perform the following swaps. Let {b 1 , c 1 } be an so far unconsidered recovery set for e j+1 of cardinality 2 with b 1 ∈ B and c 1 / Instead of the recovery pairs {b 1 , c 1 } and {b 2 , c 2 } we swap to the recovery pairs {b 1 , b 2 } ⊆ B and {c 1 , c 2 }. Thus, we can assume that all nodes of B pair within B. Going on with another unconsidered recovery pair gives us a new block each time so that the induction step is proven.
For e s we use the structural information that the columns of G can be described as x + e 1 , . . . , e s−1 for some vector x ∈ F s 2 . Thus there can be no recovery set of cardinality two for e s .
We remark that the uniqueness of the first order Reed-Muller code is not needed in the above proof. It is sufficient to have the information that any length optimal code with dimension s and minimum distance 2 s−2 satisfies d ⊥ 4, which can e.g. be concluded from the MacWilliams equations.
Another application of Lemma 10 is to use the uniqueness of the binary extended Golay code with parameters [24,12,8], see [22] 9 Since the code is self-dual, we have d ⊥ = 8 so that part (b) implies that any generator matrix of the binary extended Golay code cannot be a 7-PIR code. Since N(12, 8) = 24 this implies P(12, 8) 25.
As a further relation between the minimum dual distance d ⊥ and PIR codes we note that the lower bound (5) was proven in [29] using the dual code and especially the minimum dual distance.
Lemma 12. Let s 1 and ℓ 0 be integers and C be a linear 2 s−1 + ℓ (2 s − 1) , s, (2ℓ + 1)2 s−2 code. If P denotes the corresponding multiset of points, then the multiplicity of every point in F s 2 with respect to P is either ℓ or ℓ + 1. Moreover, the 2 s−1 − 1 points with multiplicity ℓ form a hyperplane in F s 2 . Proof: Assume that there exists a point p with multiplicity larger or equal to ℓ + 2. W.l.o.g. we assume that p = e i for some 1 i s, so that shortening gives an (2ℓ + 1) · 2 s−1 − 2ℓ − 2, s − 1, (2ℓ + 1)2 s−2 code, which is a contradiction since each [n ′ , s − 1, (2ℓ + 1)2 s−2 ] code satisfies n ′ (2ℓ − 1) · 2 s−1 − 1 = (2ℓ + 1) · 2 s−1 − 2ℓ − 1. Now consider the complementary multiset of points P ′ where the multiplicity of each point of F s 2 is given by ℓ + 1 minus its original multiplicity with respect to P. Counting points gives that |P ′ | = 2 s−1 − 1. Let H be an arbitrary hyperplane of F s 2 . Due to Inequality (3) in Lemma 5 we have | (P ′ ∩ H) | 2 s−2 − 1 for every hyperplane. Now we are using linear combinations of the left hand and right hand sides of the standard equations. Using the abbreviation x = 2 s−2 − 1, x(2x + 1) times Equation (6) minus 3x times Equation (7) plus Equation (8) gives Due to Inequality (3) we have h i = 0 for i < x and since the number of points if 2x and strictly negative for all x < i < 2x + 1. Since h i 0 for all integer i the left hand side is at most zero. From x 0 and y 2 0 we conclude that the right hand side is at least zero, so that both side have to be equal to zero. This directly implies y 2 = 0 and h i = 0 for all x < i < 2x + 1. From Equation (6) and Equation (7) we then conclude h x = 2 s−2 and h 2x+1 = 1. y 2 = 0 tells us that the point multiplicity with respect to P ′ is at most 1, so that the point multiplicity with respect to P is at least ℓ. From h 2x+1 = 1 we read of that there is exactly one hyperplane H whose 2x + 1 = 2 s−1 − 1 points form the set P ′ , so that the stated result follows.
We remark that a more complicated proof has been given for example in [6]. However, the result should be well-known for several decades.
Lemma 13. Let C be an s-dimensional binary k-PIR code of length n that contains every non-zero vector of F s 2 at least once as a column of a generator matrix, then n P(s, k − 2 s−2 ) + 2 s−1 − 1.
Proof: Let R i be corresponding recovery sets. We will now show that we can modify the recovery sets so that they contain the recovery sets of the s-dimensional simplex code as a subset. First of all, we assume that all recovery sets in R i are minimal. Especially, we have that {e i } is contained in R i . Due to symmetry we only consider the modification of R 1 . For every vector x ∈ F s 2 \0 with first coordinate equal to zero we have the recovery set {x, x + e 1 } in the simplex code. If that recovery set is contained in R i that's fine. Otherwise x is contained in a recovery set A with |A| 3 and x + e 1 is contained in a recovery set B with |B| 3. We replace the recovery sets A and B by {x, x + e 1 } and A ∪ B\{x, x + e 1 } (considered as a multiset union or with removed duplicates).
An example is P(4, 12) 24, which can indeed be attained and improves the coding theoretic lower bound N(4, 12) = 23 by one. We note that we only need the information that every non-zero point of F s 2 is taken at least once, i.e., y 2 = 0 for the complementary multiset of points.
Another example, which is a bit more involved, is P(5, 8) 18. The coding theoretic lower bound N(5, 8) = 16 is improved by two.
Proof: Let P be a column with multiplicity m. Shortening then gives a [17 − m, 4, 8] code C ′ , which implies m 2. So, we assume m = 2 and note that the unique [15,4,8] code is the 4-dimensional simplex code. I.e., the columns of a generator matrix G ′ of C ′ consist of all non-zero vectors of F 4 2 . Now consider any lengthened [17,5,8] code. The two new columns can be contained in at most two different recovery sets for e 5 . Recovery sets for e 5 that consist solely of some of the first 15 columns of G ′ have cardinality at least three, since C ′ has dual minimum distance 3, i.e., no two columns of G ′ sum to zero. Thus, 2 · 1 + 6 · 3 > 17 gives a contradiction, so that m = 1 and the code has to be projective.
Finally, we note that there are exactly four [17,5,8] codes. Only one of these is projective and has the stated weight distribution. 10 The unique code determined in Lemma 15 has a dual weight distribution starting with 0 1 3 8 The weight distribution of dual codes can be used even more directly than in the proof of Lemma 15. Assume that we have an s-dimension k-PIR code of length n with generator matrix G. Let G ′ denote the matrix that arises if we remove the ith row of G. The recovery sets of cardinality w for e i in G correspond to dual codewords in G ′ of weight exactly w. Obviously, this is not a bijection, since we completely ignore the entries in the ith row of G.
Lemma 16. Let G is the generator matrix of an s-dimensional k-PIR code of length n. If the cardinality vector of R i , where 1 i s is arbitrary, is given by 1 m 1 2 m 2 3 m 3 . . . (clearly ∑ j m j = k), then there exists a matrix G ′ that is the generator matrix of an (s − 1)-dimensional k-PIR code of length n − m 1 such that there exist k − m 1 disjoint dual codewords with weight distribution 2 m 2 3 m 3 . . . . Proof: Apply expurgation, i.e., remove the ith row from G. Proof: Due to Corollary 14 we only have to consider length n = 17. We apply Lemma 16 and enumerate the [17,4,8] codes C i . There are exactly 23 of them. However, we can use more information of a putative 5-dimensional 8-PIR code of length 17. The possible cardinality vectors of one list R i are 1 1 2 5 3 2 , 1 1 2 6 3 1 , 1 1 2 6 4 1 , 1 1 2 7 , 2 7 3 1 , 10 All exhaustive lists of binary linear codes have been enumerated with the software package LinCode, see [15].
i.e., in any case we have at least five recovery sets of cardinality 2. For the codes C i this translates to the requirement that in a generator matrix there have to be at least five disjoint pairs of identical columns and at least 7 disjoint pairs of identical columns if the effective length is 17. This leaves the following four codes with generator matrices The last one contains a column with multiplicity 3. Since by adding an additional row to the generator matrix the multiplicity of each point can decrease by a factor of at most 2, this contradicts Lemma 15.
The first one is a doubled Reed-Muller code with dual weight distribution 0 1 2 8 4 252 6 952 . . . . Of course the code itself is a 8-PIR code. No recovery set of cardinality one can be used, since there is no dual codeword of weight 3 and at least one recovery set of cardinality 2 has to be used for each e i . Thus, every cardinality set has exactly cardinality 2. So, everything could be partitioned into two half's and we would obtain two 4-dimensional 4-PIR codes of length 8 each, which do not exist.
The second code has weight distribution 0 1 8 7 9 4 11 4 , so that it clearly cannot be augmented to a code with weight distribution 0 1 8 14 9 16 16 1 due to the codeword of weight 11.
So there remains the third code with weight distribution 0 1 8 6 9 8 16 1 . Thus, a 5-dimensional 8-PIR code has a generator matrix without any unit vector, since expurgation would otherwise give a 4-dimensional code of length strictly less than 17. So, the cardinality distribution for every R i is 2 7 3 1 and all rows of the generator matrix have a weight of exactly 8.
(In a recovery set of cardinality 2 for e i there is exactly one 1 in coordinate i and in a recovery set of cardinality 3 there are either 1 or 3 ones in coordinate i. Since there is no codeword of weight 10 in the code the stated observation follows.) So, the weight of any row of the generator matrix is divisible by 8, the sum of any two rows has a weight divisible by 4, and the sum of any three different rows has a weight divisible by 2. Thus, the number of codewords of weight 9 is at most ( 5 4 ) + ( 5 5 ) = 6 < 16, which is a contradiction.

VI. BOUNDS AND EXACT VALUES OF PIR CODES
Lemma 13 has another important consequence.
Proof: Since P(s, 2 s−1 ) 2 s − 1, we obviously have P s, k + 2 s−1 · ℓ P s, k + 2 s−1 · (ℓ − 1) + 2 s − 1. Now let G be a generator matrix of a matching PIR code attaining length P s, k + 2 s−1 · ℓ . If every non-zero vector of F s 2 occurs as a column of G, then Lemma 13 gives P s, k + 2 s−1 · ℓ P s, k + 2 s−1 · (ℓ − 1) + 2 s − 1. Thus, it remains to assume the existence of a non-zero point P ∈ F s 2 with multiplicity zero. By x j we denote the number of occurrences of vector j ∈ F s 2 \0 as column vector of G. From Inequality (3) we conclude ∑ j / ∈H x j k ′ for every hyperplane H of F s 2 , where