The subject of this paper is differential compression, the algorithmic task of finding common strings between versions of data and using them to encode one version compactly by describing it as a set of changes from its companion. A main goal of this work is to present new differencing algorithms that (i) operate at a fine granularity (the atomic unit of change), (ii) make no assumptions about the format or alignment of input data, and (iii) in practice use linear time, use constant space, and give good compression. We present new algorithms, which do not always compress optimally but use considerably less time or space than existing algorithms. One new algorithm runs in O(n) time and O(1) space in the worst case (where each unit of space contains log n bits), as compared to algorithms that run in O(n) time and O(n) space or in O(n²) time and O(1) space. We introduce two new techniques for differential compression and apply these to give additional algorithms that improve compression and time performance. We experimentally explore the properties of our algorithms by running them on actual versioned data. Finally, we present theoretical results that limit the compression power of differencing algorithms that are restricted to making only a single pass over the data.
It is a well-known result of Fagin that the complexity class NP coincides with the class of problems expressible in existential second-order logic (Σ_{1}^{1}), which allows sentences consisting of a string of existential second-order quantifiers followed by a first-order formula. Monadic NP is the class of problems expressible in monadic Σ_{1}^{1}, i.e., Σ_{1}^{1} with the restriction that the second-order quantifiers are all unary, and hence range only over sets (as opposed to ranging over, say, binary relations). For example, the property of a graph being 3-colorable belongs to monadic NP, because 3-colorability can be expressed by saying that there exists three sets of vertices such that each vertex is in exactly one of the sets and no two vertices in the same set are connected by an edge. Unfortunately, monadic NP is not a robust class, in that it is not closed under first-order quantification. We define closed monadic NP to be the closure of monadic NP under first-order quantification and existential unary second-order quantification. Thus, closed monadic NP differs from monadic NP in that we allow the possibility of arbitrary interleavings of first-order quantifiers among the existential unary second-order quantifiers. We show that closed monadic NP is a natural, rich, and robust subclass of NP. As evidence for its richness, we show that not only is it a proper extension of monadic NP, but that it contains properties not in various other extensions of monadic NP. In particular, we show that closed monadic NP contains an undirected graph property not in the closure of monadic NP under first-order quantification and Boolean operations. Our lower-bound proofs require a number of new game-theoretic techniques.
We prove a lower bound of Ω(log n / loglog n) on the competitive ratio of any (deterministic or randomized) distributed algorithm for solving the mobile user problem, introduced by Awerbuch and Peleg, on certain networks of n processors. Our lower bound holds for various networks, including the hypercube, any network with sufficiently large girth, and any highly expanding graph. A similar Ω(log n / loglog n) lower bound is proved for the competitive ratio of the maximum job delay of any distributed algorithm for solving the distributed scheduling problem of Awerbuch, Kutten and Peleg on any of these networks. The proofs combine combinatorial techniques with tools from linear algebra and harmonic analysis and apply, in particular, a generalization of the vertex isoperimetric problem on the hypercube, which may be of independent interest.
This paper investigates the placement of data and parity on redundant disk arrays. Declustered organizations have been traditionally used to achieve fast reconstruction of a failed disk's contents. In previous work, Holland and Gibson identified six desirable properties for ideal layouts; however, no declustered layout satisfying all properties has been published in the literature. We present a complete, constructive characterization of the collection of ideal declustered layouts possessing all six properties. Given that ideal layouts exist only for a limited set of configurations, we also present two novel layout families. PRIME and RELPR can tolerate multiple failures in a wide variety of configurations with slight deviations from the ideal. Our simulation studies show that the new layouts provide excellent parallel access performance and reduced incremental loads during degraded operation, when compared with previously published layouts. For large accesses and under high loads, response times for the new layouts are typically smaller than those of previously published declustered layouts by a factor of 2.5.
Upper and lower bounds are proved for the time complexity of the problem of reaching agreement in a distributed network in the presence of process failures and inexact information about time. It is assumed that the amount of (real) time between any two consecutive steps of any nonfaulty process is at least c_{1} and at most c_{2}; thus, C = c_{2}/c_{1} is a measure of the timing uncertainty. It is also assumed that the time for message delivery is at most d. Processes are assumed to fail by stopping, so that process failures can be detected by timeouts.
A straightforward adaptation of an (f+1)-round round-based agreement algorithm takes time (f+1)Cd if there are f potential faults, while a straightforward modification of the proof that f+1 rounds are required yields a lower bound of time (f+1)d. The first result of this paper is an agreement algorithm in which the uncertainty factor C is only incurred for one round, yielding a running time of approximately 2fd + Cd in the worst case. (It is assumed that c_{2} << d.) The second result shows that any agreement algorithm must take time at least (f-1)d + Cd in the worst case.
The new agreement algorithm can also be applied in a model where processors are synchronous (C=1), and where message delay during a particular execution of the algorithm is bounded above by a quantity δ which could be smaller than the worst-case upper bound d. The running time in this case is approximately (2f-1)δ + d.
In-place reconstruction of differenced data allows information on devices with limited storage capacity to be updated efficiently over low-bandwidth channels. Differencing encodes a version of data compactly as a set of changes from a previous version. Transmitting updates to data as a version difference saves both time and bandwidth. In-place reconstruction rebuilds the new version of the data in the storage or memory the current version occupies – no scratch space is needed for a second version. By combining these technologies, we support highly-mobile applications on space-constrained hardware. We present an algorithm that modifies a differentially encoded version to be in-place reconstructible. The algorithm trades a small amount of compression to achieve this property. Our treatment includes experimental results that show our implementation to be efficient in space and time and verify that compression losses are small. Also, we give results on the computational complexity of performing this modification while minimizing lost compression.
The purpose of this paper is to study reducibilites that can be computed by combinational logic networks of polynomial size and constant depth containing AND's, OR's, and NOT's, with no bound placed on the fan-in of AND-gates and OR-gates. Two such reducibilities are defined, and reductions and equivalences among several common problems such as parity, sorting, integer multiplication, graph connectivity, bipartite matching and network flow are given. Certain problems are shown to be complete, with respect to these reducibilities, in the complexity classes deterministic logarithmic space, nondeterministic logarithmic space, and deterministic polynomial time. New upper bounds on the size-depth (unbounded fan-in) circuit complexity of symmetric Boolean functions are established.
In data storage applications, a large collection of consecutively numbered data "buckets" are often mapped to a relatively small collection of consecutively numbered storage "bins". For example, in parallel database applications, buckets correspond to hash buckets of data and bins correspond to database nodes. In disk array applications, buckets correspond to logical tracks and bins correspond to physical disks in an array. Measures of the "goodness" of a mapping method include (1) the time (number of operations) needed to compute the mapping; (2) the storage needed to store a representation of the mapping; (3) the balance of the mapping, i.e., the extent to which all bins receive the same number of buckets; and (4) the cost of relocation, that is, the number of buckets that must be relocated to a new bin if a new mapping is needed due to an expansion of the number of bins or the number of buckets. One contribution of this paper is to give a new mapping method, the Interval-Round-Robin (IRR) method. The IRR method has optimal balance and relocation cost, and its time complexity and storage requirements compare favorably with known methods. Specifically, if m is the number of times that the number of bins and/or buckets has increased, then the time complexity is O(log m) and the storage is O(m²). Another contribution of the paper is to identify the concept of a history-independent mapping, meaning informally that the mapping does not "remember" the past history of expansions to the number of buckets and bins, but only the current number of buckets and bins. Thus, such mappings require very little information to be stored. Assuming that balance and relocation are optimal, we prove that history-independent mappings are possible if the number of buckets is fixed (so only the number of bins can increase), but not possible if the number of bins and buckets can both increase.
Reaching agreement is a primitive of distributed computing. Whereas this poses no problem in an ideal, failure-free environment, it imposes certain constraints on the capabilities of an actual system: A system is viable only if it permits the existence of consensus protocols tolerant to some number of failures. Fischer et al. have shown that in a completely asynchronous model, even one failure cannot be tolerated. In this paper their work is extended: Several critical system parameters, including various synchrony conditions, are identified and how varying these affects the number of faults that can be tolerated is examined. The proofs expose general heuristic principles that explain why consensus is possible in certain models but not possible in others.
We present a randomized parallel algorithm for term matching. Let n be the number of nodes of the directed acyclic graphs (dags) representing the terms to be matched. Then our algorithm uses O(log²n) parallel time and M(n) processors, where M(n) is the complexity of n-by-n matrix multiplication. The randomized algorithm is of the Las Vegas type, that is, the answer is always correct, although with small probability the algorithm might fail to produce an answer. The number of processors is a significant improvement over previously known bounds. Under various syntactic restrictions on the form of the input dags, only O(n²) processors are required in order to achieve deterministic O(log²n) parallel time. Furthermore, we reduce directed graph reachability to term matching using constant parallel time and O(n²) processors. This is evidence that no deterministic algorithm can significantly beat the processor bound of our randomized algorithm. We also improve the P-completeness result of Dwork, Kanellakis and Mitchell on the unification problem, showing that unification is P-complete even if both input terms are linear, i.e., no variable appears more than once in each term.
The concept of partial synchrony in a distributed system is introduced. Partial synchrony lies between the cases of a synchronous system and an asynchronous system. In a synchronous system, there is a known fixed upper bound Δ on the time required for a message to be sent from one processor to another and a known fixed upper bound Φ on the relative speeds of different processors. In an asynchronous system no fixed upper bounds Δ and Φ exist. In one version of partial synchrony, fixed bounds Δ and Φ exist, but they are not known a priori. The problem is to design protocols that work correctly in the partially synchronous system regardless of the actual values of the bounds Δ and Φ. In another version of partial synchrony, the bounds are known, but are only guaranteed to hold starting at some unknown time T, and protocols must be designed to work correctly regardless of when time T occurs. Fault-tolerant consensus protocols are given for various cases of partial synchrony and various fault models. Lower bounds that show in most cases that our protocols are optimal with respect to the number of faults tolerated are also given. Our consensus protocols for partially synchronous processors use new protocols for fault-tolerant "distributed clocks" that allow partially synchronous processors to reach some approximately common notion of time.
We prove that three apparently unrelated fundamental problems in distributed computing, cryptography, and complexity theory, are essentially the same problem. These three problems and brief descriptions of them follow.
We construct 2-round (i.e., 2-message), public-coin, black-box (concurrent) zero-knowledge proof systems and arguments for any language in NP under the assumption that the prover is resource-bounded during the execution of the protocol.
An investigation of interactive proof systems (IPS's) where the verifier is a 2-way probabilistic finite state automaton (2pfa) is initiated. In this model it is shown:
The zero knowledge properties of interactive proof systems (IPS's) are studied in the case that the verifier is a 2-way probabilistic finite state automaton (2pfa). The following results are proved:
It is shown that if a 2-way probabilistic finite state automaton (2pfa) M recognizes a nonregular language L with error probability bounded below 1/2, then there is a positive constant b (depending on M) such that, for infinitely many inputs x, the expected running time of M on input x must exceed 2^{n^b} where n is the length of x. This complements a result of Freivalds showing that 2pfa's can recognize certain nonregular languages in exponential expected time. It also establishes a time complexity gap for 2pfa's, since any regular language can be recognized by some 2pfa in linear time. Other results give roughly exponential upper and lower bounds on the worst-case increase in the number of states when converting a polynomial-time 2pfa to an equivalent 2-way nondeterministic finite-state automaton or to an equivalent 1-way deterministic finite-state automaton.
Any notion of "closeness" in pattern matching should have the property that if A is close to B, and B is close to C, then A is close to C. Traditionally, this property is attained because of the triangle inequality (d(A,C) ≤ d(A,B) + d(B,C), where d represents a notion of distance). However, the full power of the triangle inequality is not needed for this property to hold. Instead, a "relaxed triangle inequality" suffices, of the form d(A,C) ≤ c(d(A,B) + d(B,C)), where c is a constant that is not too large. In this paper, we show that one of the measures used for distances between shapes in (an experimental version of) IBM's "Query by Image Content" system (Niblack et al., 1993) satisfies a relaxed triangle inequality, although it does not satisfy the triangle inequality.
It is a well-known result of Fagin that the complexity class NP coincides with the class of problems expressible in existential second-order logic (Σ_{1}^{1}). Monadic NP is the class of problems expressible in monadic Σ_{1}^{1}, i.e., Σ_{1}^{1} with the restriction that the second-order quantifiers range only over sets (as opposed to ranging over, say, binary relations). We prove that connectivity of finite graphs is not in monadic NP, even in the presence of arbitrary built-in relations of moderate degree (that is, degree (log n)^{o(1)}). This extends earlier results of Fagin and de Rougemont. Our proof uses a combination of three techniques: (1) an old technique of Hanf for showing that two (infinite) structures agree on all first-order sentences, under certain conditions, (2) a recent new approach to second-order Ehrenfeucht-Fraisse games by Ajtai and Fagin, and (3) playing Ehrenfeucht-Fraisse games over random structures (this was also used by Ajtai and Fagin). Regarding (1), we give a version of Hanf's result that is better suited for use as a tool in inexpressibility proofs for classes of finite structures. The power of these techniques is further demonstrated by using them (actually, using just the first two techniques) to give a very simple proof of the separation of monadic NP from monadic co-NP without the presence of built-in relations.
A new method for fault-tolerant routing in arbitrary dimensional meshes is introduced. The method was motivated by certain routing requirements of an initial design of the Blue Gene supercomputer project currently underway in IBM Research. The machine is planned to be organized as a 3-dimensional mesh containing many thousands of nodes. Among the requirements were to provide deterministic deadlock-free wormhole routing in a 3-dimensional mesh, in the presence of many faults (up to a few percent of the number of nodes in the machine), while using two virtual channels. It was also desired to minimize the number of "turns" in each route, i.e., the number of times that the route changes direction. There has been much work on routing methods for meshes that route messages around faults or regions of faults. The new method is to declare certain good nodes to be "lambs"; a lamb is used for routing but not processing, so a lamb is neither the source nor the destination of a message. The lambs are chosen so that every "survivor node", a node that is neither faulty nor a lamb, can reach every survivor node by at most two rounds of dimension-ordered (such as e-cube) routing. An algorithm for finding a set of lambs is presented. The results of simulations on 2D and 3D meshes of various sizes with various numbers of random node faults are given. For example, on a 32-by-32-by-32 3D mesh with 3% random faults, and using two rounds of e-cube routing for each message, the average number of lambs is less than 68, which is less than 0.21% of the number 32768 of nodes and less than 7% of the number 983 of faults. The computational complexity of finding the minimum number of lambs for a given fault set is also explored, and this problem is shown to be NP-hard for 3-dimensional meshes with two rounds of e-cube routing.
We consider regular expressions extended with the interleaving operator, and investigate the complexity of membership and inequivalence problems for these expressions. For expressions using the operators union, concatenation, Kleene star, and interleaving, we show that the inequivalence problem (deciding whether two given expressions do not describe the same set of words) is complete for exponential space. Without Kleene star, we show that the inequivalence problem is complete for the class Σ_{2}^{p} at the second level of the polynomial-time hierarchy. Certain cases of the membership problem (deciding whether a given word is in the language described by a given expression) are shown to be NP-complete. It is also shown that certain languages can be described exponentially more succinctly by using interleaving.
To provide a logic for reasoning about concurrently executing programs, Abrahamson has defined an extension of propositional dynamic logic (PDL) by allowing interleaving as an operator for combining programs, in addition to the regular PDL operators union, concatenation, and star. We show that the satisfiability problem for interleaving PDL is complete for deterministic double-exponential time, and that this problem requires time double-exponential in cn/log n, for some positive constant c. Moreover, this lower bound holds even when restricted to formulas where each program appearing in the formula has the form a_{1} | a_{2} | ... | a_{k} where | denotes the interleaving operator and where a_{1}, ..., a_{k} are regular programs, i.e., programs built from atomic programs using only the regular operators. Another consequence of the method used to prove this result is that the equivalence problem for regular expressions with interleaving requires space 2^{cn/log n} and that this lower bound holds even to decide whether (E_{1} | E_{2} | ... | E_{k}) ∪ F is equivalent to Σ*, where E_{1}, ..., E_{k}, F are ordinary regular expressions; this improves a previous result of the authors. Moreover, the same lower bound holds for the containment problem for expressions of the form E_{1} | E_{2} | ... | E_{k}.
In this paper, we propose and study a new algorithm for choosing segments for garbage collection in Log-Structured File Systems (LFS) and Log-Structured Arrays (LSA). We compare the performance of our new algorithm against previously known algorithms such as greedy and cost-benefit through simulation. The basic idea of our algorithm is that segments which have been recently filled by writes from the system should be forced to wait for a certain amount of time (the age-threshold) before they are allowed to become candidates for garbage collection. The expectation is that if the age-threshold is properly chosen, segments that have reached the age-threshold are unlikely to get significantly emptier due to future rewrites. Among segments that pass the age-threshold and become candidates for garbage collection, we select ones that will yield the most amount of free space. We show, through simulation, that our age-threshold algorithm is more efficient at garbage collection (produces more free space per garbage-collected segment) than greedy or cost-benefit; this means that designs using age-threshold will give better system performance than designs using greedy or cost-benefit. It is also simpler to implement a scalable version of the age-threshold algorithm than to implement a scalable version of the cost-benefit algorithm. The performance of the age-threshold algorithm depends on good choice of an age-threshold; therefore, we also give an analysis which can be used to choose an optimal age-threshold under certain workload assumptions. We also suggest how to choose good age-thresholds when nothing is known about the workload.
The purpose of this paper is a study of computation that can be done locally in a distributed network, where "locally" means within time (or distance) independent of the size of the network. Locally Checkable Labeling (LCL) problems are considered, where the legality of a labeling can be checked locally (e.g., coloring). The results include the following:
The theme of this paper is to investigate to what extent approximation, possibly together with randomization, can reduce the complexity of problems in Valiant's class #P. In general, any function in #P can be approximated to within any constant factor by a function in the class Δ_{3}^{p} of the polynomial-time hierarchy. Relative to a particular oracle, Δ_{3}^{p} cannot be replaced by Δ_{2}^{p} in this result. Another part of the paper introduces a model of random sampling where the size of a set X is estimated by checking, for various "sample sets" S, whether or not S intersects X. For various classes of sample sets, upper and lower bounds on the number of samples required to estimate the size of X are discussed. This type of sampling is motivated by particular problems in #P such as computing the size of a backtrack search tree. In the case of backtrack search trees, a sample amounts to checking whether a certain path exists in the tree. One of the lower bounds suggests that such tests alone are not sufficient to give a polynomial-time approximation algorithm for this problem, even if the algorithm can randomize.
A methodology of VLSI layout described by several authors first determines the relative positions of indivisible pieces, called cells, on the chip. Various optimizations are then performed on this initial layout to minimize some cost measure such as chip area or perimeter. If each cell is a rectangle with given dimensions, one optimization problem is to choose orientations of all the cells to minimize the cost measure. A polynomial time algorithm is given for this optimization problem for layouts of a special type called slicings. However, orientation optimization for more general layouts is shown to be NP-complete (in the strong sense).
The purpose of this paper is to report results of simulations of two algorithms for free space collection in log-structured storage systems. The algorithms considered are the age-threshold algorithm of Menon and Stockmeyer and the fitness algorithm of Butterworth. The simulations were done using a trace collected by Ruemmler and Wilkes from a file system over a period of two months. The performance of an algorithm is measured by the amount of disk I/O done as a result of free space collection. The performance of the algorithms and several variations of them are compared.
An exponential lower bound on the circuit complexity of deciding the weak monadic second-order theory of one successor (WS1S) is proved. Circuits are built from binary operations, or 2-input gates, which compute arbitrary Boolean functions. In particular, to decide the truth of logical formulas of length at most 610 in this second-order language requires a circuit containing at least 10^{125} gates. So even if each gate were the size of a proton, the circuit would not fit in the known universe. This result and its proof, due to both authors, originally appeared in 1974 in the Ph.D. thesis of the first author. In this paper the proof is given, the result is put in historical perspective, and the result is extended to probabilistic circuits.
The goal of this paper is to establish links between computational complexity theory and the theory and practice of constrained block coding. In particular, the complexities of several fundamental problems in constrained block coding are precisely classified in terms of the existing complexity-theoretic structure. One type of problem studied is that of designing encoder and decoder circuits using minimum or approximately minimum hardware; for our purposes, an "input" to this problem is (i) a deterministic, irreducible finite state transition diagram (abbreviated DIF) defining a set of constrained binary sequences, and (ii) a desired rate p:q. Several of these minimum-encoder and minimum-decoder problems are shown to be NP-hard, and more interestingly some are shown to be complete in the second and third levels of the polynomial-time hierarchy. Another fundamental problem is that of computing the maximum rate of a block code; that is, given a DIF and a codeword length q, find the maximum p such that a rate p:q block code exists for the constraint defined by the DIF. This problem is shown to be NP^{#P}-complete. Although it is not known whether NP^{#P} contains problems of super-polynomial complexity, it lies "higher" in the complexity-class structure than NP in the sense that it is possible, given current knowledge, that NP^{#P} contains problems of super-polynomial complexity even if P = NP. Another question studied is whether maximum rate block codes can always be implemented by encoders and decoders of polynomial size. The answer to this question is shown to be closely related to whether the class #P lies "lower" in the complexity-class structure than currently believed -- a proof of either answer to this question would have major implications in complexity theory.
A relationship is established between (i) parallel random-access machines that allow many processors to concurrently read from or write into a common memory, including simultaneous writing into the same memory location (CRCW PRAM), and (ii) combinational logic circuits that contain AND's, OR's, and NOT's, with no bound placed on the fan-in of AND-gates and OR-gates. Parallel time and number of processors for CRCW PRAM's are shown to correspond respectively (and simultaneously) to depth and size for circuits, where the time-depth correspondence is to within a constant factor and the processors-size correspondence is to within a polynomial. By applying a recent result of Furst, Saxe and Sipser, we obtain the corollary that parity, integer multiplication, graph transitive closure and integer sorting cannot be computed in constant time by a CRCW PRAM with a polynomial number of processors. This is the first nonconstant lower bound on the parallel time required to solve these problems by a CRCW PRAM with a polynomial number of processors. We also state and outline the proof of a similar result, due to W. L. Ruzzo and M. Tompa, that relates time and processor bounds for CRCW PRAM's to alternation and space bounds for alternating Turing machines.