Semantic and syntactic patterns of multiword names: A cross-language study

responding syntactic patterns in English, Bulgarian, French, Greek, and Serbian. We will compare these patterns regarding grammatical categories of dependent constituents, definiteness, distribution of clitics, word order and various alternations. Our ultimate goal is to build a universal framework for Named Entity Recognition (NER).

responding syntactic patterns in English, Bulgarian, French, Greek, and Serbian.We will compare these patterns regarding grammatical categories of dependent constituents, definiteness, distribution of clitics, word order and various alternations.Our ultimate goal is to build a universal framework for Named Entity Recognition (NER).

Introduction
Proper names are usually defined as belonging to the following main classes: personal names, location names, and organization names, called also named entities (NEs).They can be single-word nouns or particular types of multiword expression (MWE).
The aim of this paper is to offer a common template for description and classification of proper names in different languages.Our objectives are: i) to formulate semantic patterns for personal, location and organization names that capture the general semantics and should be, to a great extent, language-neutral; ii) to describe language-specific syntactic patterns corresponding to a common semantic pattern.The syntactic patterns provide information about the grammatical class of the head and constituents; dependencies among the constituents; word order and contiguity; cliticisation (if applicable -for possessive pronoun and interrogative clitics).
This study is based on evidence gathered from five languages -English, Bulgarian, French, Greek, and Serbian -belonging to four different language groups (Germanic, Hellenic, Romance, and Slavic).The utility of language-neutral semantic patterns lies in the fact that they can be applied to new languages, thus paving the way towards more universal solutions for (rule-based) named entity recognition (NER).The set of language-specific syntactic patterns displays correspondences between morphological and syntactic language-specific characteristics, and they may serve as transformation rules in rule-based machine translation, cross-lingual information extraction and summarization.

Names: a general overview
In English (Huddleston 1988: 96), two different terms are often used: proper noun -referring to the part-of-speech of the word and comprising only single-word proper names, e.g., John, London, Adidas, and proper name -referring to the function of these words as referential elements and comprising single-and multiword proper names, as in John, John Smith Junior, London, the United States of America, Nike, Microsoft Corporation.Following this distinction, proper names can be further specified as: proper nouns (Anna, Asia, Google), multiword expressions (Jean-Pierre Deckles, New York, the United Nations), and noun phrases (Professor Deckles, New York City, the United Nations Organization).Proper names -expressed either by proper nouns or by MWEs -show common semantic and syntactic behavior and we describe them in a uniform way.
Proper names do not " [d]escribe or specify characteristics of objects" but are "logically connected with characteristics of the object to which they refer" (Searle 1958: 173).For example, Saint Petersburg may refer to the second largest city in Russia (They convened in Saint Petersburg), a city in Florida, a city in Pennsylvania, the fictional hometown of Tom Sawyer and Huckleberry Finn, St. Petersburg, Missouri, but also to a college in Florida -Saint Petersburg College.A particular common noun (i.e., city, street, president, actor) specifies the object whose instances may be represented by a set of proper names; and such an object is always presupposed for a given proper name even if the common noun is not mentioned explicitly in the text.Furthermore, multiword names may comprise a category word (square in Trafalgar Square; ocean in the Indian Ocean), and in these cases, the category of the particular name is always explicitly shown (Carroll 1985: 144).
The relation between a proper name and its category object is reflected in WordNet where the relation between a concept and its instances is defined as an instant hypernym (instant hyponym) relation (Rodrıǵuez et al. 1998).The instances (proper nouns) inherit characteristics from the concepts of the hierarchy to which they belong.For example, the name Saint Petersburg is an instance of a city, and the concept {city, metropolis, urban centre} links to the more general concept {region}, which, in its turn, links to an even more general concept {location}.
Different terms are used for common nouns that categorise the referents of proper names as members of different classes: descriptors, designators, category words (Carroll 1985), external evidence (McDonald 1996), triggers (Magnini et al. 2002), trigger words.To avoid confusion with the theory of reference, we will use the term trigger.
Triggers depend semantically on the referent of the personal name, and different names select different classes of triggers.In turn, triggers determine the characteristics of the object to which the name refers.For example, if we know that the word Washington is a family name, it can select the word president or the word actor.Further, the word president and the word actor are similar in the way they designate the concept for a person, and this determines the fact that both nouns can co-occur with adjectives denoting height, age, etc.The meaning of both words also implies that they may be specified by employing expressions for affiliation as complements (the President of the USA, the actor at the Muppet Theatre).However, not all words that are compatible with the first noun are compatible with the second (stage actor vs. *stage president).Therefore, the notion of triggers is central for the classification of the semantic patterns of proper names and accordingly -for the description of the respective syntactic patterns.
3 Grammatical features of names in Bulgarian, English, French, Greek and Serbian: a brief overview Personal names are singular and inherently definite (the same applies to location and organization names that cannot express definiteness).Some Bulgarian, English, French, Greek, and Serbian location and organization names are in singularia tantum or pluralia tantum, or marked for definiteness: with a definite article (English, French, Greek), with a definite article attached to the noun trigger with no pre-nominal modifiers or to the leftmost modifier in Bulgarian, and only with the definite form of adjectives in Serbian.Bulgarian, French, Greek, and Serbian personal, location (apart from cities in French which usually do not express gender) and organization names are marked for grammatical noun gender -masculine or feminine, in contrast to the English ones.Location and organization names in Bulgarian, Greek and Serbian can be marked for neuter, as well.In Greek and Serbian, proper names have the nominative, accusative, genitive and vocative case.In Serbian, names can also be declined in the dative, locative and instrumental, while in Bulgarian vocative is observed only with some forenames.
Syntactically, proper nouns are heads of noun phrases but show restricted combinatorial properties compared to common nouns.For the five languages discussed in this paper, the forenames can be extended with one or more (rarely more than two) proper nouns: a nickname, a patronym and/or a family name.Agreement in gender and number is observed if they are of Slavic and Greek origin.For feminine surnames of Slavic origin in Serbian, the agreement in gender is allowed but not obligatory.Bulgarian, French, Greek and Serbian adjectives and Bulgarian, French and Serbian possessive pronouns change to agree in gender and number with the nouns they modify.Greek and Serbian adjectives and possessive pronouns agree in case with the head noun.
Compared to personal names, location names have a more diverse structure, while organization names show the highest complexity.Both location and orga-nization names can be proper nouns or proper names, comprising proper and common nouns or noun phrases, which begin to function as names of geographical locations and organizations, respectively.

Names and multiword expressions
Many names are composed of more than one word and are classified as multiword names.They can comprise two or more proper nouns (Ray Jackendoff, Merill Lynch); common and proper nouns (Bulgarian: Republika Bălgariya 'Republic Bulgaria'); adjectives and a proper or a common noun (International Monetary Fund, Upper Manhattan); abbreviations (Financial Advisors Ltd., John Smith Jr., Miami, FL); numerals or numbers (the Second Generative Grammar Conference; XX Generative Linguistics Conference); verbs and adverbs with names of products such as books, movies, songs (Someone to Watch over Me; Killing Me Softly), etc. Anderson (2007) provides a detailed classification of proper names, a subset of which is relevant for our study, as follows: simple opaque names (John); simple names that have a resemblance to a common word (Prudence); names based on other names (Lincoln -for a boulevard); names overtly derived from other names (Slavic family names); names based on compounds, some of them containing a name (Queensland, Newtown); names based on longer phrases -they may include another name (the University of Queensland) or not (Long Island, Hen and Chicken Island); and names based on sentences (as with titles of movies).
An important feature that systematically distinguishes location and organization names from personal names is that the triggers may be their integral part, constituting a MWE (Bulgarian: Černo more 'Black Sea',1 English: President Roosevelt Boulevard, First Investment Bank, French: Banque de France 'Bank of France', Greek: Τράπεζα της Ελλάδος 'Bank of Greece', Serbian: Jadransko more 'Adriatic sea', Međunarodni sud pravde 'International Court of Justice').Carroll (1985) describes the non-classifying part of the location name as a name-stem (e. g., Trafalgar in Trafalgar Square) and explores rules according to which the name-stem can be used to stand for the whole name.Not only for some location names, but also for some organization names with internal triggers, the name-stem can replace the whole name, e.g., French: la maison d'édition Ha-chette 'the Hachette publishing company' or just Hachette, Greek: η Ολυμπιακή 'the Olympic' for Olympic Airways.
Further, a location name may feature a personal name specified by a personal trigger (San Jorge River) that cannot be omitted without loss of the name function; similarly, an organization name may feature a personal name specified by a personal trigger (San Jose State University) or a location name specified by a location trigger (Los Angeles City College).
For the purposes of our study, we differentiate the names on the basis of their structure: i) whether the name is a MWE or a noun; ii) whether the multiword name obligatorily incorporates a trigger (an internal trigger); and iii) whether the name (either single or a MWE) is optionally specified by a trigger (an external trigger).The external triggers may be explicit or implicit, depending on the context (the City of New York, New York City vs. New York): • Single-word personal name (Arthur).
• Multiword personal name, which incorporates an internal personal trigger.
When people are famous, combinations with triggers such as holy, aristocratic and religious titles can be widely used and are stable (Pope John Paul II ).
• Single-word personal name; it is specified by an external personal trigger (uncle John).Kinship terms are usually combined with a single-word personal name.
• Multiword personal name; it is specified by an external personal trigger (Professor Steven Pinker).
• Single-word location name (it can coincide or not with a personal name) (Danube, Washington).
• Multiword location name (it may -partially -coincide or not with a personal name) (Little Rock, San Antonio).
• Multiword location name, comprising an internal location trigger (Rocky Mountains).No additional location trigger of the same type can be added; being part of the name, the trigger cannot be omitted either.A multiword location name may include a personal name (and, rarely, an organization name) (Cristina Fort).
• Single-word location name; it is specified by an external location trigger (River Nile).
• Multiword location name; it is specified by an external location trigger (volcano Klyučevskaya Sopka).
• Single-word organization name (it may coincide or not with a personal name or a location name) (Matalan, Poundland).
• Multiword organization name (it may (partially) coincide or not with a personal or a location name) (Mercedes Benz).
• Multiword organization name, comprising an internal organization trigger as an integral part of the proper name.Another organization trigger of the same type cannot be added.The trigger, which is part of the name, cannot be omitted either.The multiword organization name may include a personal or a location name (Princess Basma Youth Resource Center, Melbourne Grammar School).
• Single-word organization name; it is specified by an external organization trigger (Supermarket Galaxy).
• Multiword organization name; it is specified by an external organization trigger (the company Business Models Inc.)

Semantic patterns for persons, locations and organizations
Names can be grouped into different semantic classes and subclasses with respect to the properties of their referents (explicated by triggers).A name from a given class (personal, location or organization) selects triggers from a particular set of semantic subclasses.For example, complex personal names are combined with triggers that define a legislative job title, executive job title, judicial position, academic position, academic title, military rank, and profession.The permissible combinations between types of names (proper nouns, MWEs), and semantic subclasses of triggers determine the semantic patterns applicable to the personal, location and organization names.The semantic patterns we propose show semantic compatibility valid for a particular semantic class and describe the permissible combinatory options.For example, a personal name can be extended with a kinship term (i.e., the beautiful step-daughter of John from Paris, Anne Nicole) and the kinship term can be specified in various ways and restricted for possessor and location, thus the respective semantic pattern is: (modifier: referent specification phrase) -trigger: kinship term -(complement: possessor phrase) -(complement: location phrase) -personal name.
As triggers refer to concepts, the semantic relations in which they are involved should be universal and must hold among the relevant concepts in any language.Thus, the semantic patterns describe language-neutral relations and can be regarded as universal structures with correlating language-specific syntactic patterns.
Following the detailed hierarchy employed by Giuliano (2009) for automatic classification of personal NEs, we can conclude that every common noun that determines the referent of a personal name can be a trigger, i.e., words such as chess-player, singer, footballer, etc. Magnini et al. (2002) use WordNet hierarchy for identification of large sets of triggers -hyponyms of high-level synsets such as {person}, {location}, {organization}.Some authors suggest verb triggers appearing in the local context of NEs (Zhang et al. 2004), e.g., for water bodies (like rivers) the verb flooded in The Sava flooded the village indicates that Sava is a river and not a person .There are detailed classifications of NEs (of more than 200 categories; cf.Sekine & Nobata 2004), while other classifications build shallow hierarchies with the major classes on the top and sets of subtypes with different granularity at the low levels (ACE 2018;Fleischman & Hovy 2002).
In our study, we distinguish the following semantic subclasses for person, location and organization names and their triggers: • Persons and personal triggers -legislative job title: prime minister; executive job title: executive officer; judicial position: judge; academic position: associate professor; military rank: major general; profession: engineer; academic title: Ph.D.; true honorific: Mister / Mr.; aristocratic title: Prince; religious title: Bishop; kinship term: sister; holy title: Saint.
We classify proper names (persons, locations and organizations) in the patterns (A) to (I) below according to their shared features.Patterns are described in terms of the categories (a)-( d Example: English: the (new) Hebros Bank (in Athens).

Language-specific syntactic patterns for persons, locations and organizations
We define the semantic patterns evoked by different types of proper names when combined with triggers, and the syntactic patterns that involve combinations of: modifiers, one or several, semantically restricted by the head proper noun and the trigger, and complements, semantically restricted by the trigger.
The syntactic patterns are language-specific and differ for personal, location and organization names.The syntactic patterns may involve combinations of adjectival modifiers in pre-or post-nominal position, one or several; pronoun modifiers in pre-nominal position (possessive and demonstrative); complements in post-nominal position, one or several; and a noun modifier in pre-or postnominal position, alternating with a prepositional phrase. 2 Adjectival modifiers (that in Bulgarian, French, Greek and Serbian agree with the head noun in gender and number) may indicate physical shape, status, etc. Complements may indicate (domain) specification, affiliation, location, possessor, and may be prepositional or case complements, depending on the language structure.
Multiword names have the structure of a noun phrase and exhibit specific properties with respect to constituency of the head noun and the components, including various constraints on modifiers, complements, clitics (in Bulgarian and Greek), etc.
The syntactic patterns represent language-specific grammatical features and dependencies and how these features and dependencies are manifested in a particular language.One or more syntactic patterns from one or different languages may correspond to the same semantic pattern.The syntactic patterns, as they are presented in this paper, define constituency and reflect the morphological and syntactic structure of a particular language, although they do not strictly describe phrase structure and grammatical dependencies.However, the syntactic patterns are formal enough to code the linguistic information correctly and to allow for the conversion to some formalism.
Syntactic patterns corresponding to the largely universal semantic patterns, described in §5, are formulated for English, Bulgarian, French, Greek, and Serbian.The generalizations for semantic patterns and respective syntactic patterns were constructed on the basis of observations and classifications made on dictionaries of NEs, annotated corpora of NEs and grammars for NE recognition developed so far (Krstev et al. 2013;Koeva & Dimitrova 2015).

Syntactic pattern A (single family name or multiword personal name)
Characteristics shared by the five languages3 : i) Triggers are placed to the left of the personal name; a complex trigger phrase is likely to be an apposition           iii) The location name may contain a personal or a location name (rarely an organization name).Example: English: Minnesota River; iv) The internal trigger may be specified by the same range of modifiers and complements

Comparison of the five languages
At the semantic level, languages do not differ (significantly).The semantic patterns of proper names define the common semantics, regardless of the language in which it is realized: semantic patterns are language-neutral.Languages differ in lexical and phrasal categories, constituency, word order permutations and alterations.The differences in word order and alterations insert some nuances in the expressed meaning, i.e., the viewpoint of the speaker, but they do not alter the general meaning.The syntactic patterns of proper names show the correspondences among languages at the syntactic level: syntactic patterns are languagespecific.A semantic pattern is a representation that can be linked with different syntactic frames in different languages and, vice versa, syntactic patterns from different languages may share a single semantic pattern.Thus, syntactic patterns make explicit the similarities and differences in the grammatical structure of the five languages.
The structure of the language-neutral semantic and language-specific syntactic patterns can be represented as a graph whose nodes are semantic and syntactic patterns while the arcs represent different languages.More than one languagespecific syntactic pattern may be linked to one language-neutral semantic frame; in such a case, syntactic patterns are synonymous to the extent that they represent a common semantic structure.Through this type of representation, we offer an interlingual mapping of the syntactic structures of named entities in the five languages.Some of the most distinctive grammatical characteristics of NEs in English, Bulgarian, French, Greek and Serbian with respect to the single and multiword morphology and syntax will be outlined below.

Grammatical categories of dependent constituents
The syntactic patterns of English proper name triggers involve combinations of adjectival modifiers in pre-nominal position, one or several (they can be preceded by a definite article) ('the great poet Burns'); possessive pronoun modifiers in pre-nominal position ('Welcome our new professor, Jennifer S. Locke!'); prepositional complements in post-nominal position; a noun modifier in pre-nominal position, alternating with a genitive determiner and a prepositional phrase, e.g., the Grieg Piano Concerto vs. Grieg's Piano Concerto vs. the Piano Concerto by Grieg.
In Bulgarian, the syntactic patterns for proper name triggers exhibit the following combinations: adjectival modifiers in pre-nominal position (noviyat predsedatel Petrov 'new-the chair Petrov'); possessive pronoun modifiers in prenominal position alternating with a possessive pronoun clitic in post-nominal position (moyata sestra Ana 'my-the sister Ana' vs. sestra mi Ana 'sister my.PossCL Ana'); prepositional complements in post-nominal position (kompaniyata na Ivan "Elit" 'company-the of Ivan "Elit"'); and a noun modifier in pre-nominal position, alternating with a PP (karate instruktorăt Ivan 'karate trainer-the Ivan' vs. instructorăt po karate Ivan 'trainer-the in karate Ivan').
In French, the phrase headed by a trigger is definite (with an alternation of the phrase with a definite article or possessive pronoun: la belle ville des Mille Fontaines, Aix-en-Provence 'the beautiful city of thousand fountains, Aix-en-Provence'; notre belle ville, Paris 'our beautiful city, Paris').The proper name can be introduced by a preposition: la belle ville de Paris 'the beautiful city of Paris'; notre belle ville de Paris 'our beautiful city of Paris'.
In Greek, simple and multiword proper names are preceded by a definite article, e.g., το Παρίσι 'the Paris', οι Ηνωμένες Πολιτείες Αμερικής 'the United States of America'.The phrase headed by the trigger is also definite: η όμορφη πόλη των καταρρακτών, η Έδεσσα 'the beautiful city of waterfalls, the Edessa'; η όμορφή μας πόλη, η Έδεσσα 'the beautiful our.PossCL city, the Edessa'.A location name can be put in the genitive case: η όμορφη πόλη του Παρισιού 'the beautiful city of Paris'.In that case, the use of the possessive pronoun clitic is not possible: *η όμορφή μας πόλη του Παρισιού 'the beautiful our.PossCL city of Paris'.
Coordinated phrases are possible in all languages, e.g., Bulgarian: vicepremierăt i ministăr na obrazovanieto i naukata, Meglena Kuneva 'Deputy Prime Minister -the and Minister of Education-the and Science-the, Meglena Kuneva'; French: Martin Vetterli, professeur à l'Ecole polytechnique fédérale de Lausanne et prési-dent du Conseil national de la recherche 'Martin Vetterli, Professor at the Federal Polytechnic School of Lausanne and President of the National Research Council'; Greek: ο πρωθυπουργός και πρόεδρος του ΣΥΡΙΖΑ Αλέξης Τσίπρας 'the prime minister and head of Syriza Alexis Tsipras'; Serbian: Hleb i kifle 'Bread and Rolls' (an organization name).Both triggers and proper names can appear in a coordinated construction (we do not encode coordination in the syntactic patterns).

Definiteness
Definiteness is expressed either by a morpheme as in Bulgarian and Serbian, or by an article as in English, French, and Greek.In English, French, and Greek, the definite article precedes the trigger, e.g., le premier ministre, Justin Trudeau 'the prime minister, Justin Trudeau'.There are other means to express definitenessi.e., the demonstrative pronouns, the possessive pronouns in English, French, and Serbian, e.g., French: notre belle ville, Paris 'our beautiful city, Paris'; Serbian: njeno rodno mesto Beograd 'her native city Belgrade'.
With personal names in English, the definite article is obligatory when it is modified by an adjective and/or a PP complement: the great poet Burns, the Scottish poet Burns; the poet from Kosovo, Fahredin Shehu; the author of the Concerto, Edvard Grieg.The possessive pronoun, the article and the genitive determiner are in complementary distribution in English.
The definite form in Bulgarian is required when the trigger is modified by an adjective or a possessive pronoun (in this case the definite adjective is part of the first phrasal constituent: noviyat ministăr Valentin Dimitrov 'new-the minister Valentin Dimitrov', and / or a prepositional phrase; if there are no pre-nominal modifiers, the article is on the trigger word: ministărăt na finansite Valentin Dimitrov 'minister-the of finance-the Valentin Dimitrov').
In Serbian, the definite article is not used; furthermore neither possessive pronouns nor adjectives are obligatory.However, when adjectives precede proper names, they are in definite form, e.g., od izvesnog Stevice Miletića vs. *od izvesna Stevice Miletića 'from certain Stevica Miletić'.In Bulgarian the interrogative particle li (which is always a clitic) may appear after the first definite modifier (if not followed by a possessive pronoun clitic) or after the whole NP, as in: noviyat li direktor Ivanov 'new-the li.QuCL director Ivanov' and noviyat direktor Ivanov li 'new-the director Ivanov li.QuCL').The above-stated rules for the definite article, possessive pronoun clitic and interrogative particle hold for the multiword names too, with the leftmost adjective being part of the proper name itself (Bălgarskata narodna banka 'Bulgarianthe National Bank').

Distribution of clitics in Bulgarian and Greek
Greek pronoun clitics are right-adjacent to the proper name, e.g., η Μαρία μου 'the Maria my.PossCL'.Once there is a trigger followed by the proper name, the possessive pronoun clitic is between the trigger and the proper name, e.g., ο καθηγητής μας Χρήστος Τσολάκης 'the professor our.PossCL Christos Tsolakis'.
In Serbian, pronoun clitics can sometimes be used to express possession, as in komšija mi Asan 'neighbor I.CL Asan (my neighbor Asan)'.However, these constructions are rarely used, being considered rather obsolete and non-standard and are therefore not included in the patterns.

Expression of semantic and grammatical dependencies
Prepositions are used to express semantic and grammatical dependencies, such as affiliation, domain specification, location in English, Bulgarian, French, Greek, and Serbian, and possession in English, Bulgarian, and French.Semantic and grammatical dependencies can be signified by cases in Greek and Serbian.In English, possession may also be expressed by a clitic -'s (marking the genitive determiner), and in Bulgarian by the derivational suffix of possessive adjectives.

Word order -position of the trigger with respect to the proper noun
In French, Greek and Serbian, word order permutations are common for personal names, as the first name and surname(s) can change places: In all languages, the head personal name can be specified by more than one triggers in a preferred order of appearance: English: Director General Prof. Smith; Bulgarian: generalniyat direktor prof.Smit 'general-the director prof.Smith'; French: Directeur général Prof. Smith 'Director General Prof. Smith'; Greek: ο Γενικός Διευθυντής Καθηγητής Σμιθ 'the General Director Professor Smith'; Serbian: Generalni direktor prof.Smit 'General director prof.Smith'.

Alternations
In English, genitive determiners may alternate with a possessive prepositional phrase.
The possessive PP in Bulgarian may alternate with a possessive or relational adjective in pre-position (stolicata na Italiya Rim 'capital-the of Italy Rome', italianskata stolica Rim 'Italian-the capital Rome').Alternations of possessive pronouns and possessive pronoun clitics in Bulgarian are also observed.A noun modifier in pre-nominal position can alternate with a prepositional phrase (ski instuktorăt 'ski instructor-the' vs. instruktorăt po ski 'instructor-the at ski').
In Greek, the genitive phrase may alternate with the preposition σε 'at' followed by accusative case, e.g., Γεώργιος Μπαμπινιώτης, Καθηγητής του Πανεπιστημίου Αθηνών 'Georgios Babiniotis, Professor of the University of Athens' or Γεώργιος Μπαμπινιώτης, Καθηγητής στο Πανεπιστήμιο Αθηνών 'Georgios Babiniotis, Professor at the University of Athens'.In this case, the two structures may convey a different meaning.The alternation is not possible for all proper names, e.g., Αλέξης Τσίπρας, Πρωθυπουργός της Ελλάδας 'Alexis Tsipras, Prime Minister of the Greece', *Αλέξης Τσίπρας, Πρωθυπουργός στην Ελλάδα 'Alexis Tsipras, Prime Minister in the Greece'.A location proper name describing residency at a continent, a country or a city may alternate as an adjective modifier or a PP complement attached to the trigger of a personal name (the same is true for English, Bulgarian, and Serbian), e.g., Greek: ο Πρωθυπουργός της Ελλάδας 'the Prime Minister of the Greece' or ο Έλληνας Πρωθυπουργός 'the Greek Prime Minister'; Serbian: Ambasada Grčke 'Embassy of Greece' vs. Grčka ambasada 'Greek Embassy' (while in French, the adjective follows the trigger: le président français François Mitterrand 'the president of-French.Adj François Mitterrand').

57
In French, we may have an alternation of the preposition de 'of' with the preposition à 'at', e.g., Martin Vetterli, professeur de l'École polytechnique fédérale de Lausanne 'Martin Vetterli, professor of the Federal Polytechnic School of Lausanne' or Martin Vetterli, professeur à l'École polytechnique fédérale de Lausanne 'Martin Veterli, professor at the Federal Polytechnic School of Lausanne'.
In Serbian, syntactic alternations are permissible, to some extent, with organization names: a complement in the genitive case instead of a PP complement (Ministarstvo rada i socijalne politike 'Ministry of Labor and Social Policy' instead of a Ministarstvo za rad i socijalnu politiku 'Ministry for Labor and Social Policy').
The features of the triggers in the five languages are summarized in Table 10.

Conclusion
The semantic classification and the syntactic patterns of single and multiword names in Bulgarian, English, French, Greek, and Serbian, may provide reliable data for rule-based Named Entity Recognition (NER).Linguistic features and distribution facts are used to identify MWEs in NER tasks -both in handcrafted rule-based systems that rely heavily on linguistic knowledge, and in machine-learning techniques.In their research on the application of MWEs and NEs in keyphrase extraction, Nagy T. et al. (2011) conclude that previously known noun compounds are beneficial in NER, and that identified NEs enhance MWE detection, as noun compounds and multiword NEs are linguistically similar and sometimes it is not easy to distinguish between the two.
These arguments are further supported by the tagging practice where both compound nouns and multiword NEs are often tagged as nouns, as their linguistic behaviour is similar to that of single-word nouns (Vincze et al. 2011).Approaches such as that of Nagy T. et al. (2011) also use features involving NEs or pertaining to NEs (i.e., orthography and semantics of keyphrase candidates; positions of a token belonging to a specific NE class, as certain classes of NEs can be identified by their position in the beginning, in the middle or at the end of a keyphrase candidate).Galicia-Haro et al. (2004) discuss the (Spanish) composite NEs (titles of books, movies, songs, etc.) that are described in terms of syntactic and semantic features and of local context and consider discourse features such as introductory words, prepositions, redundancy; specific sets of names, etc.
Rule-based systems usually rely on large-scale lexical resources and grammars, often in the form of regular expressions or Finite State Transducers (Savary & Piskorski 2011;Maurel et al. 2011).Much work has been done on rule-based NER for the five languages discussed in this paper, although machine learning methods prevail.A set of general NER rules with reasonable accuracy has been developed for rule-based annotation of NEs in Bulgarian (Karagiozov et al. 2012), French (Maurel et al. 2011), Greek (Farmakiotou et al. 2000), and Serbian (Krstev et al. 2013).Vitas et al. (2007) discuss semantic and morphological (derivational and inflectional) properties of proper names in Serbian (plus French and English) taking into account the significance of regular derivation and the properties and function of possessive and relational adjectives produced from proper names.Koeva & Dimitrova (2015) discuss a strategy for a linguistic description and classification of Bulgarian NEs referring to persons, and their application in several resources (lexicons and an annotated corpus) for the definition and evaluation of a set of NER rules.
The syntactic patterns presented in this paper are formulated as rules comprising morphological characteristics and syntactic dependencies related to the semantic properties of personal, location and organization NEs in Bulgarian, English, French, Greek, and Serbian.We intend to further exploit the formally encoded linguistic information in rule-based NER approaches.Moreover, as the syntactic patterns for different languages are linked to the same semantic pattern, they can be considered equivalent at the conceptual level and may be applied to any task that involves multilingual processing: cross-lingual information extraction and text classification, multilingual summarization and machine translation.Last but not least, the presented approach contributes to comparative language studies and may be further extended to other word classes that show relatively regular morphological properties and syntactic dependencies.
): (a) the semantic subclass of the trigger; (b) the type of the proper name that selects triggers; (c) obligatoriness / optionality of the trigger manifested by an internal or external trigger with respect to the name; (d) the semantic pattern that the proper name evokes.5.1 Pattern A(a) Semantic class of the trigger: legislative job title, executive job title, judicial position, academic position, academic title, military rank, profession.Specification of military ranks and top-level legislative, executive, and judicial triggers is not allowed: *prime minister of finance; lower level legislative triggers can be specified: engineer in automatics.(b) Type of the proper name: personal name extended or substituted by a family name.(c) External trigger.(d) Semantic pattern: (referent specification phrase) -trigger -(domain specification phrase) -(possessor phrase) -(affiliation phrase) -(location phrase).Example: English: (his | Stefan's) (new) professor (of law) (at the University) (in Plovdiv) Ivan Ivanov.5.2 Pattern B (a) Semantic class of the trigger: aristocratic title, religious title.(b) Type of the proper name: personal name or family name.Some aristocratic and religious titles are selected only by a personal name (Pope Francis), while others are selected by a family name (Lord Orsini).A trigger can also be part of a personal name (for distinguished persons) but no separate pattern is defined for this type of name.(c) External trigger.(d) Semantic pattern: (referent specification phrase) -trigger -(affiliation phrase) -(location phrase).Example: English: the (new) Metropolitan (of the Church) (in San Francisco) Semantic class of the trigger: kinship term.(b) Type of the proper name: personal name (rarely modified by family name(s)).(c) External trigger.(d) Semantic pattern: (referent specification phrase) -trigger -(possessor phrase) -(location phrase).Example: English: (his | Ivan's) (blond) step-brother (from Sofia) Stefan.5.4 Pattern D (a) Semantic class of the trigger: holy title (a limited set of words).(b) Type of the proper name: personal name (rarely modified or substituted by a nickname).(c) External trigger.(d) Semantic pattern: (referent specification phrase) -trigger -(location phrase).Example: English: (miraculous) saint (from Patara) Nicholas.5.5 Pattern E (a) Semantic class of the trigger: true honorific.(b) Type of the proper name: personal name extended or substituted by a family name.(c) External trigger.(d) Semantic pattern: -trigger Example: English: Monsieur Ivan Ivanov.Semantic class of the trigger: location.(b) Type of the proper name: location name.(c) External trigger.(d) Semantic pattern: (referent specification phrase)trigger -(specification phrase) -(possessor phrase) -(location phrase) Example: English: the (beautiful) city (near the big river), Plovdiv.5.7 Pattern G (a) Semantic class of the trigger: location.(b) Type of the proper name: location name.(c) Internal trigger.(d) Semantic pattern: (referent specification phrase)internal trigger -(location phrase) Example: English: the (beautiful) Mount Fuji (in Japan).5.8 Pattern H (a) Semantic class of the trigger: organization.(b) Type of the proper name: organization name.(c) External trigger.(d) Semantic pattern: (referent specification phrase) -trigger -(domain specification phrase) -(possessor phrase) -(affiliation phrase) -(location phrase).Example: English: the (new) company (of his friends) (in Athens), Tetracom.5.9 Pattern I (a) Semantic class of the trigger: organization.(b) Type of the proper name: organization name.(c) Internal trigger.(d) Semantic pattern: (referent specification phrase) -internal trigger -(location phrase).

Table 1 :
Syntactic pattern A -an example translated in the five languages.The examples do not illustrate all variants.

Gonzalo de Benito Sekades 6.2 Syntactic pattern B (single-or multiword personal name)
Characteristics shared by the five languages: i) The trigger phrase is placed in front of the personal name but the appositive order can also be found, especially if the trigger phrase is complex.Irinej 'His Grace Bishop of-Niš Irinej'; ii) If no trigger modifiers or complements exist, the trigger is indefinite (except for Greek where the article is obligatory); otherwise, it is definite.Examples: Bulgarian: Patriarh Maksim 'Patriarch Maxim'; French: L'archevêque de Paris, Monseigneur André Vingt-Trois 'the Archbishop of Paris, Monsignor André Vingt-Trois'; Le bienheureux père Brottier 'the blessed father Brottier'; iii) personal names can be extended with one or more (rarely more than two) proper names: a nickname, a patronym and/or a family name.These constituents form a complex name (MWE).Example: Greek: ο πρόσφατα χειροτονηθείς Σεβασμιότατος Μητροπολίτης Κεφαλληνίας, πατέρας Γεώργιος Σαπουνάς 'the newly appointed Most Reverend Bishop Metropolitan of Kefalonia, Father Georgios Sapounas'.

Table 2 :
Syntactic pattern B -an example translated in the five languages.

Table 3 :
Syntactic pattern C -an example translated in the five languages.

Syntactic pattern D (single-and, rarely, multiword personal name)
Characteristics shared by the five languages: i) The trigger appears before the personal name but a complex trigger phrase often occurs in apposition.Examples: English: Saint Haralambos, the Holy Martyr of Magnesia; Serbian: Sveti mučenik i arhiđakon Lavrentije 'Saint martyr and archdeacon Lavrentije'.ii) If no modifiers or complements exist, the trigger is indefinite (except for Greek where the article is obligatory); otherwise, it is definite.Examples: Bulgarian: Sveti Nikola 'Saint Nicholas'; French: le saint de l'Arcadie: Charles De Menou D'Aulnay 'the saint of Arcadia: Charles De Menou D'Aulnay'.

Table 4 :
Syntactic pattern D -an example translated in the five languages.

Table 5 :
Syntactic pattern E -an example translated in the five languages.

Table 6 :
Syntactic pattern F -an example translated in the five languages.

6.7 Syntactic pattern G (multiword location name)
Characteristics shared by the five languages: i) The internal trigger is part of the location name, thus the location name is always a MWE.Examples: Bulgarian: našiyat hubav grad Novi han 'our-the beautiful city Novi han'; Greek: ο Ινδικός Ωκεανός 'the Indian Ocean'; ii) A location name with an internal trigger is fixed, the order of constituents cannot be changed and insertions are not allowed.Examples: Bulgarian: našata Stara planina 'our-the Stara Planina', *Planina stara; French: le célèbre Mont Blanc 'the famous Mont Blanc', *Blanc Mont.

Table 7 :
Syntactic pattern G -an example translated in all five languages.

Syntactic pattern H (single-and multiword organization name)
ApacheCorp.ii)The phrase headed by the trigger is definite.If the trigger is a single-word one or specified for domain, the trigger phrase may be indefinite (except for Greek where the article is obligatory).Examples: English: the company of εταιρία Ελληνικός Χρυσός 'the mining company Hellas Gold', η Ελληνικός Χρυσός, μεταλλευτική εταιρία 'the Hellas Gold mining company'; Serbian: Fabrika mašina " Ivo Lola Ribar" 'Machine Factory "Ivo Lola Ribar"'.

Table 8 :
Syntactic pattern H -an example translated in the five languages.
Characteristics shared by the five languages: i) The trigger is an integral part of the organization name, thus the organization name is always a MWE.Examples: English: the European Bank for Reconstruction and Development in Serbia; the Association of Chartered Certified Accountants; Greek: η Τράπεζα Εμπορίου και Ανάπτυξης της Μαύρης Θάλασσας 'the Black Sea Trade and Development Bank'.ii) Organization names containing an integral trigger are fixed, the order of constituents cannot be changed and insertions are not allowed.Examples: Bulgarian: Evropeyska

banka za văzstanovyavane i razvi- tie 'European Bank for Reconstruction and Development'; novosăzdadeniyat Evropeyski fond za strategičeski investicii
'newly-found-the European Fund for Strategic Investment'; iii) organization names with an integral trigger can contain a personal, location or organization name.Example: Serbian: Memorijalni centar "Josip Broz Tito" 'Memorial Center "Josip Broz Tito"'; iv) The internal organization trigger can be specified by the same range of modifiers and complements permissible for it in a regular use.Example: French: l'Association des Historiens 'the Association of Historians'; v) Rarely, an organization trigger, different from the integral trigger, can specify the multiword organization name.Examples: Bulgarian: Săyuz na tărgovcite v Bălgariya 'Union of tradersthe in Bulgaria' ; Asociaciya "Săyuz na tărgovcite v Bălgariya" 'Association Union of traders-the in Bulgaria'.

Table 9 :
Syntactic pattern I -an example translated in all five languages.

Michel Sapin', or Michel Sapin, ministre
In all languages, the trigger can appear in pre-or post-nominal position: French: le ministre des Finances et des Comptes publics, Michel Sapin 'the Minister of finance and of public accounts, des Finances et des Comptes publics 'Michel Sapin, Minister of finance and of public accounts'; Greek: ο Υπουργός Οικονομικών Ευκλείδης Τσακαλώτος 'the Minister of finance Efkleidis Tsakalotos', or ο Ευκλείδης Τσακαλώτος, Υπουργός Οικονομικών 'the Efkleidis Tsakalotos, Minister of finance'.Some abbreviations can appear only before or after the names, as in: Serbian: JP "Srbijašume" 'PC (Acronym for Public Company) Srbijašume' but Takovo d.o.o.'Takovo (a place name) d.o.o.'.In all languages, a complex trigger phrase is often in apposition (when the trigger appears as an apposition, it is always separated with a comma), e.g., English: Chris, the new Professor of Agriculture and Forestry; French: le ministre des Finances et des Comptes publics, Michel Sapin 'the Minister of finance and of pub-lic accounts, Michel Sapin', or Michel Sapin, ministre des Finances et des Comptes publics 'Michel Sapin, Minister of finance and of public accounts'; Greek: o Καθηγητής Γεώργιος Μπαμπινιώτης 'the Professor Georgios Babiniotis' or o Γεώργιος Μπαμπινιώτης, Καθηγητής 'the Georgios Babiniotis, Professor'.
French: Nicolas Sarkozy vs. Sarkozy Nicolas; Greek: Γεώργιος Κοκκινόπουλος 'Georgios Kokkinopoulos' vs. Κοκκινόπουλος Γεώργιος 'Kokkinopoulos Georgios'; Serbian: Marko Vitas vs. Vitas Marko.In Serbian, a change of the order of the first name and the surname(s) of male persons results in a change of the syntactic properties, as in the former case both names inflect, while in the latter only the first name inflects, e.g., in the genitive Marka Vitasa vs. Vitas Marka.

Table 10 :
Comparison of the morphological and syntactic features of the five languages.