Detailed Description

Term URIs

Given a term t in a language L, the URI is constructed as follows:

A term URI generated in this way refers to the term t in language L.

Language URIs

Language URIs consist of the base address http://lexvo.org/id/iso639-3/ followed by a valid three-letter ISO 639-3 language code that is not defined as a special code. A language URI abiding to this specification refers to the language denoted by the language code according to the ISO 639-3 standard. Additionally, because many systems use two-letter ISO 639-1 codes instead of 3-letter ISO 639-3 codes, we also provide equivalent URIs consisting of the base address http://lexvo.org/id/iso639-1/ followed by a 2-letter ISO 639-1 code.

Script URIs

Script URIs consist of the base address http://lexvo.org/id/script/ followed by an ISO 15924 script code other than Zxxx, Zyyy, Zzzz. A script URI abiding to this specification refers to the script denoted by the code according to the ISO 15924 standard.

Character URIs

Character URIs consist of the base address http://lexvo.org/id/char/, followed by a Unicode code point in upper-case hexadecimal notation with zero-padding to 4 digits if shorter than 4 digits, and without additional zero-padding if longer. A character URI abiding to this specification refers to the character denoted by the code point according to the Unicode 5.0 standard.

Geographical URIs

Geographical URIs consist either of the base address http://lexvo.org/id/iso3166-1/, followed by an ISO 3166-1 alpha-2 code for countries, or of the base address http://lexvo.org/id/un_m49/ followed by a UN M.49 code for regions that are not countries (i.e. only for continents and other groupings).

WordNet URIs

WordNet URIs consist of the base address http://lexvo.org/id/wordnet/30/, followed by a part-of-speech indicator ("noun/", "verb/", "adj/", or "adv/"), and a sense key. The sense keys are similar to WordNet's original sense keys, however using the following format: lemma + "_" + lex_filenum + "_" + lex_id [+ "_" + head_word "_" + head_id], where lemma and headword are encoded using percent-encoding as per RFC 3986. These URIs identify not the synsets themselves but the denotational meanings associated with the synsets, just like Lexvo.org's language URIs identify not the corresponding language codes themselves but the actual languages.

Kangxi Radical URIs

Kangxi radicals are abstract entities associated with specific semantic components of Chinese characters. Lexvo.org's Kangxi Radical URIs consist of the base address http://lexvo.org/id/kangxi-radical/, followed by a number from 1 to 214 representing the radical numbers used in the 1716 Kangxi dictionary.

Ontology/Data Model

The classes and properties used are specified in the Lexvo.org Ontology in OWL/RDF.

Dataset Description

A machine-readable dataset description in VoiD format is available.

Java API

We offer a very simple Java API that creates URIs for languages and terms.

