Urutan Leksikografis

Dari Wikipedia bahasa Indonesia, ensiklopedia bebas
Langsung ke: navigasi, cari
Orderings of the 3-subsets of \scriptstyle \{1,...,6\} (and the corresponding binary vectors)
When the (blue) triples are in lex order the (red) vectors are in revlex order, and vice versa. The arrangements on the right side show colex and revcolex order.

Dalam matematika, urutan leksikografik, (biasa dikenal sebagai urutan leksikal atau urutan alfabet, adalah bentuk umum dari urutan alfabet kata yang berdasarkan pada pengurutan huruf depan.

Definisi[sunting | sunting sumber]

Diberikan dua himpunan terurut sebagian yaitu A dan B,. Leksikografik urutan di dalam Cartesian product A × B is defined as

(a,b) ≤ (a′,b′) if and only if a < a′ or (a = a′ and bb′).

The result is a partial order. If A and B are totally ordered, then the result is a total order as well.

More generally, one can define the lexicographic order on the Cartesian product of n ordered sets, on the Cartesian product of a countably infinite family of ordered sets, and on the union of such sets.


Motivation and uses[sunting | sunting sumber]

The name of the lexicographic order comes from its generalizing the order given to words in a dictionary: a sequence of letters (that is, a word)

a1a2 ... ak

appears in a dictionary before a sequence

b1b2 ... bk

if and only if the first ai, which is different from bi, comes before bi in the alphabet.

That comparison assumes both sequences are the same length. To ensure they are the same length, the shorter sequence is usually padded at the end with enough "blanks" (a special symbol that is treated as coming before any other symbol). This also allows ordering of phrases. For the purpose of dictionaries, etc., padding with blank spaces is always done. See alphabetical order.

For example, the word "Thomas" appears before "Thompson" in dictionaries because the letter 'a' comes before the letter 'p' in the alphabet. The 5th letter is the first that is different in the two words; the first 4 letters are "Thom" in both. Because it is the first difference, the 5th letter is the most significant difference (for an alphabetical ordering).

A lexicographical ordering may not coincide with conventional alphabetical ordering. For example, the numerical order of Unicode codepoints does not always correspond to traditional alphabetic orderings of the characters, which vary from language to language. So the lexicographic ordering induced by codepoint value sorts strings in an unambiguous canonical order, but it does not necessarily "alphabetize" them in the conventional sense.

An important property of the lexicographical order is that it preserves well-orders, that is, if A and B are well-ordered sets, then the product set A × B with the lexicographical order is also well-ordered.

An important exploitation of lexicographical ordering is expressed in the ISO 8601 date formatting scheme, which expresses a date as YYYY-MM-DD. This date ordering lends itself to straightforward computerized sorting of dates such that the sorting algorithm does not need to treat the numeric parts of the date string any differently from a string of non-numeric characters, and the dates will be sorted into chronological order. Note, however, that for this to work, there must always be four digits for the year, two for the month, and two for the day, so for example single-digit days must be padded with a zero yielding '01', '02', ..., '09'.

Another example of digits ordered lexicographically is 101,102,103,104,105,106,107,108,109,110,111,112... 200, 201, 202 etc.

Another generalization of lexical ordering occurs in social choice theory (the theory of elections). Consider an election in which there are 4 candidates A, B, C and D, each voter expresses a top-to-bottom ordering of the candidates, and the voters' orderings are as follows:

18% 17% 33% 32%
A B C D
B A D B
C C A A
D D B C

The MinMax voting method is a simple Condorcet method that counts the votes as in a round-robin tournament (all possible pairings of candidates) and judges each candidate according to its largest "pairwise" defeat. The winner is the candidate whose largest defeat is the smallest. In the example:

  • The largest defeat of A is by D: 65% (33%+32%) rank D over A.
  • The largest defeat of B is by D: 65% (33%+32%) rank D over B.
  • The largest defeat of C is by A (or B): 67% (18%+17%+32%) rank A over C (and B over C).
  • The largest defeat of D is by C: 68% (18%+17%+33%) rank C over D.

MinMax declares a tie between A and B since the largest defeats for both are the same size, 65%. This is like saying "Thomas" and "Thompson" should be at the same position because they have the same first letter. However, if the defeats are compared lexically, we have the MinLexMax method. With MinLexMax, because the largest defeats of A and B are the same size, their next largest defeats are then compared:

  • A's next largest defeat is 0%. (This is a padding, since A has only one defeat.)
  • B's next largest defeat is by A: 51% (18%+33%) rank A over B.

Since B's next largest defeat is larger than A's, MinLexMax elects A, which makes more sense than the MinMax tie since a majority rank A over B.

Another usage in social choice theory is the Ranked Pairs voting method. Although usually defined by a procedure that constructs the order of finish, Ranked Pairs is equivalent to finding which of all possible orders of finish is best according to a minlexmax comparison of the majorities they reverse. In the example above, the Ranked Pairs order of finish is ABCD (which elects A). ABCD affirms the majorities who rank A over B, A over C, B over C and C over D, and reverses the majorities who rank D over A and D over B. The largest majority that ABCD reverses is 65%. The only other ordering that wouldn't reverse a larger majority is BACD (which also reverses 65%). ABCD is a better order of finish than BACD because the lexically relevant set of majorities—the majorities on which ABCD and BACD disagree—is {A over B} and BACD reverses the largest majority in this set.

Case of multiple products[sunting | sunting sumber]

Suppose


  \{ A_1, A_2, \cdots, A_n \}

is an n-tuple of sets, with respective total orderings


  \{ <_1, <_2, \cdots, <_n \}

The dictionary ordering


\ \ <^{d}

of


  A_1 \times A_2 \times \cdots \times A_n

is then


  (a_1, a_2, \dots, a_n) <^d (b_1,b_2, \dots, b_n) \iff
    (\exists\ m > 0) \  (\forall\ i < m) (a_i = b_i) \land (a_m <_m b_m)

That is, if one of the terms


 \ \   a_m <_m b_m

and all the preceding terms are equal.

Informally,


 \ \  a_1

represents the first letter,


 \ \  a_2

the second and so on when looking up a word in a dictionary, hence the name.

This could be more elegantly stated by recursively defining the ordering of any set


 \ \   C= A_j \times A_{j+1} \times \cdots \times A_k

represented by


 \ \   <^d (C)

This will satisfy


  a <^d (A_i) a'  \iff (a <_i a')

  (a,b) <^d (A_i \times B) (a',b') \iff
    a <^d (A_i) a' \lor ( a=a' \  \land \ b <^d (B) b')

where 
  B = A_{i+1} \times A_{i+2} \times \cdots \times A_n.

To put it more simply, compare the first terms. If they are equal, compare the second terms – and so on. The relationship between the first corresponding terms that are not equal determines the relationship between the entire elements.

Groups and vector spaces[sunting | sunting sumber]

If the component sets are ordered groups then the result is a non-Archimedean group, because e.g. n(0,1) < (1,0) for all n.

If the component sets are ordered vector spaces over R (in particular just R), then the result is also an ordered vector space.

Ordering of sequences of various lengths[sunting | sunting sumber]

Given a partially ordered set A, the above considerations allow to define naturally a lexicographical partial order <^\mathrm{d} over the free monoid A* formed by the set of all finite sequences of elements in A, with sequence concatenation as the monoid operation, as follows:

u <^\mathrm{d} v if
  • u is a prefix of v, or
  • u=wau' and v=wbv', where w is the longest common prefix of u and v, a and b are members of A such that a<b, and u' and v' are members of A*.

If < is a total order on A, then so is the lexicographic order <d on A*. If A is a finite and totally ordered alphabet, A* is the set of all words over A, and we retrieve the notion of dictionary ordering used in lexicography that gave its name to the lexicographic orderings. However, in general this is not a well-order, even though it is on the alphabet A; for instance, if A = {a, b}, the language {anb | n ≥ 0} has no least element: ... <d aab <d ab <d b. A well-order for strings, based on the lexicographical order, is the shortlex order.

Similarly we can also compare a finite and an infinite string, or two infinite strings.

Comparing strings of different lengths can also be modeled as comparing strings of infinite length by right-padding finite strings with a special value that is less than any element of the alphabet.

This ordering is the ordering usually used to order character strings, including in dictionaries and indexes.

Quasi-lexicographic order[sunting | sunting sumber]

The quasi-lexicographic order on the free monoid A over an ordered alphabet A orders strings firstly by length, so that the empty string comes first, and then within strings of fixed length n, by lexicographic order on An.[1]

Generalization[sunting | sunting sumber]

Consider the set of functions f from a well-ordered set X to a totally ordered set Y. For two such functions f and g, the order is determined by the values for the smallest x such that f(x) ≠ g(x).

If Y is also well-ordered and X is finite, then the resulting order is a well-order. As already shown above, if X is infinite this is in general not the case.

If X is infinite and Y has more than one element, then the resulting set YX is not a countable set, see also cardinal exponentiation.

Alternatively, consider the functions f from an inversely well-ordered X to a well-ordered Y with minimum 0, restricted to those that are non-zero at only a finite subset of X. The result is well-ordered. Correspondingly we can also consider a well-ordered X and apply lexicographical order where a higher x is a more significant position. This corresponds to exponentiation of ordinal numbers YX. If X and Y are countable then the resulting set is also countable.

Monomials[sunting | sunting sumber]

In algebra it is traditional to order terms in a polynomial, by ordering the monomials in the indeterminates. This is fundamental, to have a normal form. Such matters are typically left implicit in discussion between humans, but must of course be dealt with exactly in computer algebra. In practice one has an alphabet of indeterminates X, Y, ... and orders all monomials formed from them by a variant of lexicographical order. For example if one decides to order the alphabet by

X < Y < ...

and also to look at higher terms first, that means ordering

... < X3 < X2 < X

and also

X < Yk for all k.

There is some flexibility in ordering monomials, and this can be exploited in Gröbner basis theory.

Decimal fractions[sunting | sunting sumber]

For decimal fractions from the decimal point, a < b applies equivalently for the numerical order and the lexicographic order, provided that numbers with a recurring decimal 9 like .399999... are not included in the set of strings representing numbers. With that restriction there is an order-preserving bijection between the strings and the numbers.

Reverse lexicographic order[sunting | sunting sumber]

In a common variation of lexicographic order, one compares elements by reading from the right instead of from the left, i.e., the right-most component is the most significant, e.g. applied in a rhyming dictionary.

In the case of monomials one may sort the exponents downward, with the exponent of the first base variable as primary sort key, e.g.:

 x^2 y z^2 < x y^3 z^2 .

Alternatively, sorting may be done by the sum of the exponents, downward.

See also[sunting | sunting sumber]

References[sunting | sunting sumber]

  1. ^ Calude, Cristian (1994). Information and randomness. An algorithmic perspective. EATCS Monographs on Theoretical Computer Science. Springer-Verlag. hlm. 1. ISBN 3-540-57456-5. Zbl 0922.68073.