Syntactic discontinuities in Latin — A treebank-based study

Syntactic discontinuities are very frequent in classical Latin and yet this data was never considered in debates on how expressive grammar formalisms need to be to capture natural languages. In this paper I show with treebank data that Latin frequently displays syntactic discontinuities that cannot be captured in standard mildly context-sensitive frameworks such as Tree-Adjoining Grammars or Combinatory Categorial Grammars. I then argue that there is no principled bound on Latin discontinuities but that they display a broadly Zipfian distribution where frequency drops quickly for the more complex patterns. Lexical-Functional Grammar can capture these discontinuities in a way that closely reflects their complexity and frequency distributions.


Introduction
Classical Latin, like classical Greek, is famous for its tolerance of syntactic discontinuities.One example is shown in (1).
( cave.abl 'What slender boy, drenched with perfumes, presses you on a bed of roses, Pyrrha, under the delightful cave?' (Horace,Carmina 1.5)This example features no less than four discontinuous noun phrases, as indicated with subscript indices on the words.The syntactic dependencies inside these NPs are marked with agreement in case (and number and gender, not shown in the glossing), but not with word order.
Discontinuous NPs are in fact attested in Latin up to the twentieth century, as in (2) from Dyvik (1968).Examples such as (2) are reminiscent of quantifier float, a type of discontinuity which is found even in highly configurational languages such as English.While (2) is in fact the only discontinuous NP in Dyvik (1968), the focus of this article is on the classical stage of Latin, where many other types of discontinuities are attested, as shown already in (1).
At the same time as Dyvik (1968) was composed, linguists discussed whether natural languages are context-free.The debate was sparked by the definition of the Chomsky hierarchy in Chomsky (1956), which raised the question whether natural languages could be described by context-free grammars.This was an open question throughout the 1960s and 70s and it was not until the 1980s that the question was settled (in the negative).¹ The classical languages played little role in this discussion.Occasionally, some Latin examples were cited -for example, Ross (1967Ross ( /1986, p. 74) , p. 74) used (1) to illustrate scrambling.There is no obvious characterization of the elements that can intervene between the different parts of the NPs in this example, and if we assume that there is no theoretical upper bound on the intervening material, it looks like it could be possible to construct an argument for the non-contextfreeness of natural language based on such examples.Of course, assuming that there is no upper bound is a leap of faith that could never be truly justified -but in that respect, Latin is not really different from English.The common claim that finite state automata cannot model center-embedding also depends on there being no theoretical upper bound on the level of embedding, and neither in English nor in Latin can we observe infinite embeddings.In fact, corpus studies suggest that the practical upper bound on levels of center embeddings is as low as three.So the main reason for using a context-free grammar to deal with center-embedding is theoretical simplicity and elegance, as pointed out by Harris (1957): "If we were to insist on a finite language, we would have to include in our grammar several highly arbitrary and numerical conditions -saying, for example, that in a given position there are not more than three occurrences of and between N".
Similar considerations would apply to Latin discontinuities.However, to the extent that Latin examples featured in the scholarly discussion, scholars did not object to them because their unboundedness could not be demonstrated.Instead, Pullum (1982) argued that (1) comes from the poet Horace, who "is noted for stretching tendencies in the living Latin language beyond all grammatical limits".And so no one attempted to build an argument based on classical data.Another reason for suspicion, no doubt, was the lack of hard facts concerning the extent of syntactic discontinuity in a dead 1 See Pullum (1986) for a fascinating account of the debate.language like Latin.The Latin grammatical tradition has been content to establish that word order is 'generally free' (not just in poetry, but also in prose) and to investigate the stylistic usage of discontinuity.Moreover, word order data from Greek and Latin used not to be very accessible.As shown in Haug (2015), it used to be the case that scholars could not even agree on the frequencies of basic word orders in Ancient Greek -never mind providing an account of them.

So how complex is natural language really?
Research in the Generalized Phrase Structure Grammar (GPSG) framework in the early 1980s was to a large extent motivated by the desire to keep the complexity of the formalism low and develop context-free analyses of seemingly non-context free phenomena such as long distance dependencies (Gazdar 1981).That program imploded when Shieber (1985) and Culy (1985) showed that there are phenomena in natural language that cannot be captured in a context-free grammar.There were two main responses to this discovery: either one tried to extend context-free formalisms as little as possible while achieving coverage of demonstrably non-context free phenomena such as the cross-serial dependencies from Dutch and Swiss German discussed in Shieber (1985), leading to so-called mildly context-sensitive formalisms such as (Lexicalized) Tree Adjoining Grammar (LTAG) and Combinatory Categorial Grammar (CCG); or one gave up (almost) completely on the concern about weak generative capacity, as in Lexical Functional Grammar (LFG) and Head Driven Phrase Structure Grammar (HPSG).A natural question to ask, then, is "Who was right?".Is it possible to keep the algorithmic complexity of the parsing problem low while maintaining good coverage of the data, as measured in modern treebanks?
Answering that question requires a detour into dependency grammar, since most treebanks these days -and in particular the Latin ones that we will look at here -are based on dependencies rather than phrase structures or CCG derivations.Fortunately, there are formal results that relate the complexity of formalisms like CFGs, LTAGs and CCGs to that of dependency grammars with various restrictions on non-projective (i.e.discontinuous) dependencies.

Measuring discontinuity in a dependency treebank
In order to study discontinuities in dependency trees, we need to introduce some terminology.The projection of a node in a dependency tree is its yield, i.e. the set of nodes in the transitive, reflexive closure of dominance, arranged in linear order.A gap is a discontinuity in a projection, and the gap degree of a node is the number of gaps in its projection.An equivalent measure is the block degree, i.e. the number of continuous blocks in the projection of a node, which will always be the gap degree + 1.Consider the dependency tree in Figure 1.The gap degree of mihi is 0, for its projection [mihi, Paulo] is uninterrupted.By contrast, the gap degree of terror is 1, for its projection [nullus, terror]   degree 2, for it consists of the two blocks [nullus] and [terror].Finally, we note that we may also talk about the gap degree of a dependency tree, which is defined as the highest gap degree among its nodes.
For our purposes, it will also be useful to study gap depth, which we define as in (3).
(3) A node d in the projection of r introduces a discontinuity in r iff d is in a different block b from r and there is no node in b that dominates d.The depth of the gap introduced by d is the number of edges between d and r.The gap depth of r is the maximum depth of a node that introduces a discontinuity in r.
In Figure 1, the gap depth of terror is 1, as the discontinuity is introduced by its direct dependent nullus.Let us now look at a deeper gap in a classic example of cross-serial dependencies in Dutch.
( The nodes helpen and zwemmen both have gap degree 1.The projection of helpen has the two blocks {Piet, Marie} and {helpen, zwemmen}.Both Piet and Marie introduce discontinuities in the projection of helpen, since neither dominates the other.The depth of those discontinuities are 1 and 2 respectively and hence the gap depth of helpen is 2. Thus the gap depth captures the fact that not only is helpen discontinuous, but it also dominates a discontinuous dependent without resolving the discontinuity.Intuitively, then, gap depth captures embedding of discontinuities e.g. in long-distance extraction, which is generally thought to be associated with human processing difficulty (see e.g.Gibson 2000).Gap depth has to my knowledge never been considered in measures of non-projectivity in dependency treebanks such as Kuhlmann and Nivre (2006), Havelka (2007) or Maier and Lichte (2011), but we will see that corpus evidence suggests this measure is useful.

Dependency structures and other grammatical formalisms
While most modern treebanks are based on dependencies, most grammatical theories are not.One early and fairly well-known result connecting dependency grammar to other grammatical formalisms is due to Gaifman (1965) and shows that projective dependency grammars, i.e. dependency grammars that allow no discontinuities, are weakly equivalent to context-free grammars.However, since the focus here is precisely on discontinuities, that result is of little value for us.
Multiple context-free grammars (MCFGs), also known as linear context-free rewriting systems, have emerged as a powerful tool to study complexity questions in the range of the Chomsky hierarchy between context-free grammars and full-blown context-sensitive grammars.Kuhlmann (2013) has established connections between dependency grammars and MCFGs which yield a close correspondence between the non-projectivity of the dependency trees admitted by a grammar on the one hand, and the parsing complexity of the grammar on the other.In the following, we briefly review these results as a background for what follows.
The MCFG formalism is a generalization of CFG which retains ordinary CFG productions for the expression of categorial structure, but uses explicit yield functions to compute the yield of the mother node from the yields of the daughters.In an ordinary CFG, yield computation is conflated with category formation: a rule such as DP → D NP says both that the category DP is formed of a D and an NP, and that the yield of the resulting DP is formed by concatenating the yields of D and NP.In effect, then, a CFG can be seen as an MCFG with concatenation as the only yield function.² To allow for greater expressivity, MCFG allows yields to be tuples of strings.For example, we may want to say that the yield of DP is a pair (2-tuple) consisting of the yields of D and NP.This pair will then be the input to further yield functions that apply to productions with DP on the right-hand side.More generally, we may allow yields to be n-tuples of strings.
For our purposes, it is important to note that there is a close correspondence between yield components in an MCFG and blocks in a corresponding dependency structure.We can extract MCFG rules from dependency trees, as shown in Kuhlmann (2013), Table 1: Rules extracted from the tree in Figure 1 where a formal exposition is given.Here I just provide an intuitive understanding of how the tree in Figure 1 gives rise to the rules in Table 1.
Looking at Paulo in Figure 1 we see that it has no dependents, hence the right-hand side of the first rule is a constant function which fixes the yield to the string Paulo, and similarly for nullus.For mihi, things are a bit more interesting: it takes an appos argument, and hence its yield depends on the yield of that argument.Concretely, the yield of the node mihi is computed by concatenating the string mihi with the yield of the appos argument, which is represented with x 1 according to the convention that we use x for the yield of the first argument and y for the yield of the second argument, and subscript those variables with an index referring to components of the yield.In this case, the yield of appos has only one component, so we use x 1 .Also terror takes an argument, a mod, but in this case, the resulting yield has two components, one consisting of the yield of the mod and one consisting of the string terror.Finally, the verb takes two arguments, subj and iobj.The yield is constructed by concatenating the yield of the iobj(i.e.x 1 ), the first component of the subj(i.e.y 1 ), the string est, and the second component of subj(y 2 ).
For our purposes, the primary interest of this construction lies in the fact that it provides a link between dependency treebanks and the required expressivity of corresponding grammars, as investigated in Kuhlmann (2013).On the one hand, the yield components correspond directly to blocks found in the treebanks.And on the other hand, the complexity of an MCFG grammar is easily read off the yield functions: The parsing complexity of a yield function equals the sum of the number of components in its input and output yields.For example, the parsing complexity of j in Table 1 is 4, as its two inputs have 1 and 2 component yields and it produces a 1 component yield.This yields a two-dimensional complexity hierarchy, as the complexity depends both on the number of arguments and the number of yield components of these arguments.In the presence of only wellnested discontinuities, we actually get a simple complexity hierarchy because any wellnested MCFG can be binarized without increasing the gap degree.A wellnested discontinuity is one whose projection does not interleave with another non-overlapping projection.³(5) gives an example of an illnested discontinuity from Latin poetry.
( The projections of the subject and the object do not overlap (neither dominates the other), but they interleave, producing an illnested discontinuity.MCFGs that generate only wellnested dependencies are called wellnested MCFGs.Since they can be binarized without increasing the gap degree, their parsing complexity is uniquely determined by their gap degree.We refer to an MCFG where no argument has more than k components in its yield as a k-MCFG.
There are several results linking linguistically motivated grammatical formalisms to MCFGs.For example, TAG is weakly equivalent to a wellnested 2-MCFG.The same result applies to 'classical' CCG and linear indexed grammars (Aho 1968), since those formalisms are weakly equivalent to TAG.However, modern lexicalized CCG (i.e. the current version where (restrictions on) the combinators are not grammar-specific but all linguistic variation is captured in the lexicon) is known to be strictly less powerful than TAG (Kuhlmann, Koller, et al. 2015).
The equivalence between wellnested 2-MCFGs and established grammatical formalisms takes on significance in the light of empirical investigations on dependency treebanks.For example, Kuhlmann (2013) shows that by restricting ourselves to wellnested trees of gap degree at most 1, i.e. trees describable by a wellnested 2-MCFG, we lose only between 0.1% (Arabic) and 0.9% (Turkish) of the trees in the CoNLL 2006 treebanks.This suggests that formalisms with the power of TAG are adequate for natural languages.Similar results have been reported by others and will also be shown below for the Universal Dependencies treebanks.But we will also see that Latin behaves in a crucially different way.

Complexity in LFG
Any LFG grammar that determines an upper bound n on the number of c-structure nodes corresponding to a given f-structure (a so-called 'finite copy LFG') can be translated into a weakly equivalent MCFG.This gives us polynomial time parsing, because parsing with a (wellnested) k-MCFG can be done in time O(n 3k ).But in the general case, parsing with an LFG grammar is NP-complete, as can be shown with a straightforward reduction from the 3SAT problem, i.e. the problem of determining whether a formula of propositional logic in conjunctive normal form where each clause is limited to at most three literals is satisfiable: we use c-structure rules to make sure each clause Dag Haug contains at least one true literal and use the f-structure to keep track of the assignment of truth values across clauses.⁴It is worth pointing out that the universal recognition problem for MCFGs is also NP-complete because, although any given MCFG is a k-MCFG, MCFG as a formalism does not bound that k.Put in other words, the difference between MCFGs and LFGs is that for any given (finite) instance of the 3SAT problem with n clauses, we can construct an MCFG that solves it, whereas we can write a general LFG that can solve any instance of the 3SAT problem.
If we think of the relations between different instances of the same literals in a 3SAT problem as analogues to discontinuous dependencies in linguistics, this means an LFG grammar can deal with an unbounded number of discontinuous dependencies across unbounded distances.We can ask ourselves whether there is any need for the expressivity that LFG gives us.As we will see in section 3.1, the answer is from one point of view negative: we can get extremely good coverage on existing dependency treebanks with a relatively low bound on the discontinuities.Nevertheless, it is worth making the point that the extra expressivity provides for extra linguistic insight.We will now show that this point holds even as we move up the complexity ladder from NP-complete to undecidable.
Undecidability was not a property of LFG originally.While unification grammars in general are Turing-equivalent and hence have an undecidable parsing problem, Kaplan and Bresnan (1982) avoided undecidability by restricting valid derivations as in ( 6).
(6) A c-structure derivation is valid if and only if no category appears twice in a nonbranching dominance chain, no nonterminal exhaustively dominates an optionality ϵ, and at least one lexical item or controlled e appears between two optionality ϵ's derived by the same rule element.
By disallowing nonbranching dominance chains, this constraint ensures that for any string the size and number of c-structure derivations is bounded as a function of the length of the string.The constraint seems well-motivated: after all, what could be the linguistic motivation for derivations in which e.g.some NP dominates another NP in a nonbranching structure?As it turns out, such structures can be motivated.In Bresnan, Kaplan, et al. (1982), it was argued that cross-serial dependencies in Dutch cannot be given a linguistically motivated analysis in a context-free grammar.Instead, the authors proposed to give the sentence in (4) the c-structure in Figure 3.This c-structure does not directly capture the object relation between Piet and zag or between Marie and helpen.Instead, the relationship is captured with functional annotations on the VP and V nodes which 'match up' the two branches in the f-structure and give the correct grammatical relations.So, Marie is the object of helpen by virtue of 4 See for example Francez and Wintner (2012, pp. 241-243) for details of the construction.being embedded under the same number of VP nodes as helpen is under V nodes.This kind of analysis is based on what Maxwell and Kaplan (1996) call 'zipper unification'.
However, as pointed out by Johnson (1986), this analysis actually leads to nonbranching dominance chains in cases where intermediate verbs in the structure are intransitive, as in ( 7) with the c-structure in Figure 4. So, if we want to keep the analysis from Bresnan, Kaplan, et al. (1982) we must give up the offline parsability constraint and hence the decidability of the LFG formalism.On the other hand, an alternative analysis was also proposed (Zaenen and Kaplan 1995), where NPs inside VP get the functional uncertainty annotation (↑ xcomp* obj) = ↓, rather than just (↑ obj) = ↓.From a linguistic point of view, there are several problems with this analysis: First, it is unclear how we can ever provide a principled structurefunction mapping if we allow non-local GF assignments like this.And second, in order to capture the word order facts, we need complex f-precedence constraints.
And in fact, what happened in this case is that the analysis of Bresnan, Kaplan, et al. (1982) is still well-known and cited, whereas the alternative analysis based on nonlocal GF assignment and functional precedence is more or less forgotten.Both the first and the second edition of Bresnan's LFG textbook (Bresnan 2001;Bresnan, Asudeh, et al. 2015) include exercises that ask the student to reproduce Bresnan, Kaplan, et al. (1982) -even if a generalization of this analysis to intransitive verbs (not used in the exercise) would not even be LFG as defined in Kaplan and Bresnan (1982).In other words, while 'intuitive' is a subjective notion, history lends some justification to the claim that the original analysis is more intuitive than the later one.
There are some lessons we can draw from this.First, the ban on nonbranching dominance chains looks stipulative: it can be removed from the definition of LFG without changing anything else, albeit at the cost of undecidability.An analysis like that in Figures 3 and 4 does not 'feel' substantially un-LFG-like.Second, the original analysis seems linguistically more informative than the alternative in that it captures the word order generalizations in an intuitive way while preserving locality of GF assignment.Again, this is subjective, but the fact that the analysis gets cited and is used in textbooks shows that the intuition is widespread.
Taken together, these two observations suggest that a more expressive grammatical formalism can lead to more linguistically adequate analyses -even if those analyses do not actually exploit that expressivity in a crucial way.In our case, the problem with unary branching dominance chains is that there will be no upper bound on the length of the unary VP chain in Figure 4.But chains of unbounded length are of course not crucial to the analysis.We only need VP-VP chains of a length corresponding to the number of consecutive intransitive verbs in the V-chain.For practical purposes, 5 will be more than sufficient.And even from a theoretical perspective, it is not clear that banning any category α from dominating five instances of α in a nonbranching dominance chain is any more objectionable than banning it from dominating a single instance, as Bresnan and Kaplan did with (6).

Quantitative data
Let us now have a look at how discontinuities actually distribute in Latin treebanks.To be able to compare across languages we use the Universal Dependencies (UD) corpora,⁵ in particular the version 2 release.This dataset contains three Latin treebanks, the Perseus treebank (Bamman and Crane 2011), the PROIEL treebank (Haug and Jøhndal 2008) and the Index Thomisticus Treebank (Martens and Passarotti 2014).
Table 2 shows the distribution gap degree and depth across all languages in the UD corpora.⁶As we can see, the vast majority of edges, 97.3%, are projective.Still, this means that the number of non-projective edges is high enough that we need to be able to deal with them in parsing.But at least from a practical standpoint, we can ignore everything but the simplest type of gap: restricting ourselves to edges of gap degree and depth ≤ 1 yields a coverage of 99.7%.
When we get to Latin, the picture is different.First of all, the number of simple (degree 1) non-projectivities is much higher: 9.1% of edges.More interesting is the fact Table 3: Gap degree and depth in the UD 2.0 Latin-PROIEL treebank that 0.4% of edges have gap degree 2 and thus reflect dependencies that cannot be captured in a TAG (or a forteriori, in a CCG).This becomes clearer if we think about tree coverage, as shown in Table 4 for a select number of treebanks.⁷Here we see that by restricting ourselves to trees where the highest gap degree is 1, we lose 1.8% of the trees in the PROIEL treebank, compared to zero loss in the Norwegian Bokmål treebank and 0.3% loss in the Czech treebank.Overall in the UD treebanks, 0.6% of trees contain an edge of gap degree 2, but it is worth pointing out that almost three quarters of these trees are found in one of the Ancient Greek and Latin treebanks, which only make up roughly a tenth of the trees.So there clearly is something special about these languages.
Finally, we look at the illnestedness numbers in Table 5.As has been observed several times in the literature, illnestedness is a strong constraint on discontinuities in most languages.We see that this constraint is strong also in the PROIEL corpus of Latin (and Greek), but not in the Perseus corpora.As with non-projective dependencies in general, this is likely due to due to the large portions of poetry in this treebank.In (5) we saw an example of an illnested dependency from Vergil.And this was in fact no accident, Verb placed in the midst, it is called a Golden Verse." It is not clear whether Latin poets in fact preferred illnested dependencies for their own sake, or whether their frequency results from other, conspiring factors.Whatever the motivation, it is interesting that the poets regularly produced these illnested structures which are so rare in prose.
The numbers reported in this section are based on data converted to Universal Dependencies.To my knowledge there is no in-depth study of discontinuity based on the original Perseus or PROIEL data for Latin, but there is a study on Greek (Mambrini and Passarotti 2013), which finds only 25.2% projective trees, compared to 37.6% in the UD version of the same treebank.It should be noted that the UD conversion only includes a subset of the original treebank due to conversion problems.One possibility is that the conversion script was particularly likely to fail on non-projective structures, which would explain why the projectivity rate is higher in the converted UD data.The illnestedness degree is also lower, at 1.5% versus 2.6% in the original version.

Examples
Let us now have a closer look at some examples of the discontinuities we find in Latin.An important first observation is that a large number of them arise from secondposition clitics, which normally appear after the first prosodic word, even if that breaks up a syntactic constituent.The frequency of this phenomenon contributes to the number of gap degree 2 trees, since it is then enough to have one other gap resulting from some other process.An example of this is (8).In this case, we have a normal long distance dependency, where eo die has been displaced out the embedded clause aliquid actum in senatu, resulting in one gap.When the clitic then lands inside the fronted constituent, we get a second gap.Such examples are controversial as illustrations of the syntactic complexity of a language, since it is not clear to what extent clitic positioning in Latin is syntactically conditioned: prosodic factors are clearly also important.From a parsing perspective, however, we need to have some way of dealing with clitics, so a more reasonable objection may be that the set of clitic strings is finite, i.e. there is only a finite number clitics and licit combinations of clitics that can occur in the position of autem in (8).Therefore, we can deal with them without using the full power of a formalism that can derive syntactic discontinuities.⁸However, trees of gap degree 2 are by no means restricted to those where clitics account for one of the gaps.Example (9) shows a discontinuous NP multa …genera ferarum, with an extraposed relative clause.( 9 'It is certain that many kinds of beasts are born in it which have not been seen in other places' (Caes.Gal.6.25.5)(10) shows another example, where we get gap degree 2 because the genitive is displaced from its head noun at the same time as the wh-word quantam is fronted alone.
8 An approach based on MCFGs can still be more perspicuous and insightful from a linguistic point of view, see Goldstein and Haug (2016).Even so, it is likely that such a grammar could be 'compiled' to a computationally more tractable grammar by exploiting the finiteness of the set of clitic strings.(10)  When it comes to illnestedness, we saw in the previous section that examples are extremely rare in the PROIEL treebank.Nevertheless, it is worth pointing out that the ones that do occur look perfectly 'natural', in the sense that it is hard to come up with alternative analyses that make linguistic sense and capture the sentence structure without an illnested dependency.(12) shows an example where the subject appears inside the object NP, at the same time as there is an extraposed relative clause belonging to it.Taken together with the metrical data discussed in section 3.1, this suggests that illnestedness is not ungrammatical in Latin, although it clearly is strongly dispreferred (in prose).

So how complex is Latin really?
We can clearly conclude that Latin is not a tree-adjoining language.As Table 4 shows, there are simply too many trees of gap degree > 1, and examples such as ( 8)-( 11) show that these arise through combinations of well-established processes of Latin grammar.
However, Table 4 also shows that we do not really need the ability of LFG to transport unbounded amounts of features across unbounded distances in the tree to capture the data found in Latin treebanks: Trees of gap degree 3 are already very rare.Nevertheless they arise through well-defined grammatical processes (and are not, for example, artefacts of the annotation scheme).It is therefore impossible to define a theoretical upper bound on gap degree in Latin.
In a way, the situation is analogous to what we see in center-embedded recursion.We can deal with finite levels of center-embedded recursion in a regular (finite state) grammar by adding states to the automaton.And center-embedding of more than three levels turns out to be nonexistent in corpora (Karlsson 2007), so for practical purposes, the finite state approach could work.But linguists prefer context-free grammars both because it is hard or impossible to define a theoretical upper bound on the levels of center-embedding and, crucially, because analyses cast in terms of a CFG are linguistically more perspicuous.A similar argument applies, I contend, to syntactic discontinuities: although we could deal with them in practical terms -at least when we confine the attention to the texts in the existing Latin treebanks -by adopting a k-MCFG as our formalism for some (quite small) k, it is hard to argue theoretically for any particular k and -as we have already seen -analyses that are cast in more expressive formalisms can turn out to be more intuitive.In other words, we can adapt Harris' argument for assuming infinite levels of center embeddings to unbounded discontinuous dependencies: fixing a k is a "highly arbitrary and numerical condition" that has no place in linguistic theory.In that respect, 2 -the number that (restricted to wellnested dependencies) would give us the expressive power of TAG or classical CCG -is no different from any other number.This gives us an argument for adopting LFG as a formalism even if that is expressive overkill in practical terms.⁹And although LFG does not provide an obvious way of restricting discontinuities, we will see that it does provide a way of analyzing them that gives us a natural metric for discontinuity complexity in the form of the number of reentrancies they require.Consider first the mock Latin sentence in (13).In LFG terms, this can be analyzed with the c-and f-structure in Figure 5.A characteristic feature of this is that the ϕ mapping from maximal projections (S and NP) is injective: there are no reentrancies, i.e. distinct maximal projections mapping to the same f-structure.Now consider what happens if we permute trusit and bonum to yield a discontinuous c-structure.If we want to avoid non-local assignment of grammatical functions, the obvious way to achieve this is by using a c-structure embedding as in Figure 6.This introduces a reentrancy: for this c-structure to yield the correct f-structure (namely the same as in Figure 5), we must make sure that NP 4 and NP 8 map to the same f-structure, i.e. gf on both these nodes must be resolved to the same grammatical function.¹⁰In other words, the syntactic discontinuity is mirrored by structural complexity in the form of a reentrancy.Obviously, if we had yet another discontinuous dependent of Fredericum that was discontinuous from bonum so that we had a gap degree 2 discontinuity, we would need another reentrancy to capture that.
Now observe what happens if we have a deeper gap as in ( 14).This yields the cstructure in Figure 7.We observe that the extra depth of the discontinuity yields an extra reentrancy as compared with the otherwise similar discontinuity in Figure 6, for to get the correct f-structure from Figure 7, we must map both NP 4 and NP 5 to the same f-structures as NP 9 and NP 10 respectively.( 14  The c-structure in Figure 7 clearly violates the principle in ( 6).Nevertheless, it is an attractive analysis compared with, say, an analysis where AdjP 6 would be directly embedded under S and annotated with (↑ gf + ) because that would introduce non-local constraints and because Figure 7 wears its complexity on its sleeves, in the form of the length of the unit branch under NP 4 .Like the analysis of Dutch cross-serial dependencies, this relies on zipper unification.As pointed out by Maxwell and Kaplan (1996, p. 24), zippers introduce computational complexity because they mean that depth of f-structures that must be unified can grow as a function of the length of the sentence.(In fact, because we allow a cyclic unit branch, it can grow even without the sentence increasing in length.)But this is really a practical problem and as such it allows a practical solution, namely a brute force bound on the length of zippers.And that is where the treebank data become interesting, for they suggest that this bound can be set quite low.

Conclusion and challenge
In sum, we have seen that extant Latin treebanks display syntactic discontinuities that require us to go beyond the capacity of well-known mildly context-sensitive grammar formalisms such as CCG and TAG.It has already been argued on theoretical grounds that these formalisms cannot capture data such as German scrambling (Becker et al. 1992).But as pointed out by Kuhlmann (2013), formalisms (weakly) equivalent to TAG still have very good coverage on treebanks.That, however, is not the case in Latin (and still less so in Ancient Greek), thereby verifying the inadequacy of TAG on actual treebank data.From a theoretical point of view, this means that trees of gap degree 1 have no particular theoretical importance.Rather, the corpus data suggests that gap degrees (and depths) have a Zipfian distribution that quickly decreases beyond 1.So there is no theoretical reason to stay with k-MCFGs.And in fact we have seen that although LFG parsing is intractable, the formalism reflects the complexity of syntactic discontinuities in a rather nice and intuitive way, paving the way for empirical studies on how much of the theoretically desired expressivity is actually needed for practical purposes.
One challenge remains: we have seen that illnested discontinuities are strongly dispreferred in most treebanks, with an exception for Latin poetry.But unlike gap degree and depth, illnestedness does not correspond to any complexity in the LFG formalism.In other words, LFG as it currently stands lacks the theoretical resources to express the strong dispreference that we observe in corpora.

Figure 7 :
Figure 7: C-structure for (14) is interrupted by est.Alternatively, we may say that terror has block

Table 4 :
Trees by gap degree in selected UD treebanks

Table 5 :
Wellnestednessbut a so-called 'golden line', a rhetorical pattern first discovered by Edward Burles in 1652: "If the Verse does consist of two Adjectives, two Substantives and a Verb only, the first Adjective agreeing with the first Substantive, the second with the second, and the With the camp fortified, he left two legions and a part of the auxiliaries there.' (Caes.Gal.1.49.4)Finally, since gap degree 2 examples arise naturally, even without clitics, there are examples where a clitic intrudes in an otherwise degree 2 discontinuity, yielding gap degree 3.And there are a few gap 3 examples without clitics.We refrain from showing examples here, as they inevitably get quite complex.