Catalog

Abstract

Catalog is a JSON data to manage the configuration of a Droonga cluster. A Droonga cluster consists of one or more datasets, and a dataset consists of other portions. They all must be explicitly described in a catalog and be shared with all the hosts in the cluster.

Usage

This version of catalog will be available from Droonga 1.0.0.

Syntax

{
  "version": <Version number>,
  "effectiveDate": "<Effective date>",
  "datasets": {
    "<Name of the dataset 1>": {
      "nWorkers": <Number of workers>,
      "plugins": [
        "Name of the plugin 1",
        ...
      ],
      "schema": {
        "<Name of the table 1>": {
          "type"             : <"Array", "Hash", "PatriciaTrie" or "DoubleArrayTrie">
          "keyType"          : "<Type of the primary key>",
          "tokenizer"        : "<Tokenizer>",
          "normalizer"       : "<Normalizer>",
          "columns" : {
            "<Name of the column 1>": {
              "type"         : <"Scalar", "Vector" or "Index">,
              "valueType"    : "<Type of the value>",
              "vectorOptions": {
                "weight"     : <Weight>,
              },
              "indexOptions" : {
                "section"    : <Section>,
                "weight"     : <Weight>,
                "position"   : <Position>,
                "sources"    : [
                  "<Name of a column to be indexed>",
                  ...
                ]
              }
            },
            "<Name of the column 2>": { ... },
            ...
          }
        },
        "<Name of the table 2>": { ... },
        ...
      },
      "fact": "<Name of the fact table>",
      "replicas": [
        {
          "dimension": "<Name of the dimension column>",
          "slicer": "<Name of the slicer function>",
          "slices": [
            {
              "label": "<Label of the slice>",
              "volume": {
                "address": "<Address string of the volume>"
              }
            },
            ...
          }
        },
        ...
      ]
    },
    "<Name of the dataset 2>": { ... },
    ...
  }
}

Details

Catalog definition

Value
An object with the following key/value pairs.

Parameters

version
Abstract
Version number of the catalog file.
Value
2. (Specification written in this page is valid only when this value is 2)
Default value
None. This is a required parameter.
Inheritable
False.
effectiveDate
Abstract
The time when this catalog becomes effective.
Value
A local time string formatted in the W3C-DTF, with the time zone.
Default value
None. This is a required parameter.
Inheritable
False.
datasets
Abstract
Definition of datasets.
Value
An object keyed by the name of the dataset with value the dataset definition.
Default value
None. This is a required parameter.
Inheritable
False.
nWorkers
Abstract
The number of worker processes spawned for each database instance.
Value
An integer value.
Default value
0 (No worker. All operations are done in the master process)
Inheritable
True. Overridable in dataset and volume definition.

Example

A version 2 catalog effective after 2013-09-01T00:00:00Z, with no datasets:

{
  "version": 2,
  "effectiveDate": "2013-09-01T00:00:00Z",
  "datasets": {
  }
}

Dataset definition

Value
An object with the following key/value pairs.

Parameters

plugins
Abstract
Name strings of the plugins enabled for the dataset.
Value
An array of strings.
Default value
None. This is a required parameter.
Inheritable
True. Overridable in dataset and volume definition.
schema
Abstract
Definition of tables and their columns.
Value
An object keyed by the name of the table with value the table definition.
Default value
None. This is a required parameter.
Inheritable
True. Overridable in dataset and volume definition.
fact
Abstract
The name of the fact table. When a dataset is stored as more than one slice, one fact table must be selected from tables defined in schema parameter.
Value
A string.
Default value
None.
Inheritable
True. Overridable in dataset and volume definition.
replicas
Abstract
A collection of volumes which are the copies of each other.
Value
An array of volume definitions.
Default value
None. This is a required parameter.
Inheritable
False.

Example

A dataset with 4 workers per a database instance, with plugins groonga, crud and search:

{
  "nWorkers": 4,
  "plugins": ["groonga", "crud", "search"],
  "schema": {
  },
  "replicas": [
  ]
}

Table definition

Value
An object with the following key/value pairs.

Parameters

type
Abstract
Specifies which data structure is used for managing keys of the table.
Value
Any of the following values.
Default value
"Hash"
Inheritable
False.
keyType
Abstract
Data type of the key of the table. Mustn’t be assigned when the type is "Array".
Value
Any of the following data types.
Default value
None. Mandatory for tables with keys.
Inheritable
False.
tokenizer
Abstract
Specifies the type of tokenizer used for splitting each text value, when the table is used as a lexicon. Only available when the keyType is "ShortText".
Value
Any of the following tokenizer names.
Default value
None.
Inheritable
False.
normalizer
Abstract
Specifies the type of normalizer which normalizes and restricts the key values. Only available when the keyType is "ShortText".
Value
Any of the following normalizer names.
Default value
None.
Inheritable
False.
columns
Abstract
Column definition for the table.
Value
An object keyed by the name of the column with value the column definition.
Default value
None.
Inheritable
False.

Examples

Example 1: Hash table

A Hash table whose key is ShortText type, with no columns:

{
  "type": "Hash",
  "keyType": "ShortText",
  "columns": {
  }
}
Example 2: PatriciaTrie table

A PatriciaTrie table with TokenBigram tokenizer and NormalizerAuto normalizer, with no columns:

{
  "type": "PatriciaTrie",
  "keyType": "ShortText",
  "tokenizer": "TokenBigram",
  "normalizer": "NormalizerAuto",
  "columns": {
  }
}

Column definition

Value

An object with the following key/value pairs.

Parameters

type
Abstract
Specifies the quantity of data stored as each column value.
Value
Any of the followings.
Default value
"Scalar"
Inheritable
False.
valueType
Abstract
Data type of the column value.
Value
Any of the following data types or the name of another table defined in the same dataset. When a table name is assigned, the column acts as a foreign key references the table.
Default value
None. This is a required parameter.
Inheritable
False.
vectorOptions
Abstract
Specifies the optional properties of a “Vector” column.
Value
An object which is a vectorOptions definition
Default value
{} (Void object).
Inheritable
False.
indexOptions
Abstract
Specifies the optional properties of an “Index” column.
Value
An object which is an indexOptions definition
Default value
{} (Void object).
Inheritable
False.

Examples

Example 1: Scalar column

A scaler column to store ShortText values:

{
  "type": "Scalar",
  "valueType": "ShortText"
}
Example 2: Vector column

A vector column to store ShortText values with weight:

{
  "type": "Scalar",
  "valueType": "ShortText",
  "vectorOptions": {
    "weight": true
  }
}
Example 3: Index column

A column to index address column on Store table:

{
  "type": "Index",
  "valueType": "Store",
  "indexOptions": {
    "sources": [
      "address"
    ]
  }
}

vectorOptions definition

Value
An object with the following key/value pairs.

Parameters

weight
Abstract
Specifies whether the vector column stores the weight data or not. Weight data is used for indicating the importance of the value.
Value
A boolean value (true or false).
Default value
false.
Inheritable
False.

Example

Store the weight data.

{
  "weight": true
}

indexOptions definition

Value
An object with the following key/value pairs.

Parameters

section
Abstract
Specifies whether the index column stores the section data or not. Section data is typically used for distinguishing in which part of the sources the value appears.
Value
A boolean value (true or false).
Default value
false.
Inheritable
False.
weight
Abstract
Specifies whether the index column stores the weight data or not. Weight data is used for indicating the importance of the value in the sources.
Value
A boolean value (true or false).
Default value
false.
Inheritable
False.
position
Abstract
Specifies whether the index column stores the position data or not. Position data is used for specifying the position where the value appears in the sources. It is indispensable for fast and accurate phrase-search.
Value
A boolean value (true or false).
Default value
false.
Inheritable
False.
sources
Abstract
Makes the column an inverted index of the referencing table’s columns.
Value
An array of column names of the referencing table assigned as valueType.
Default value
None.
Inheritable
False.

Example

Store the section data, the weight data and the position data. Index name and address on the referencing table.

{
  "section": true,
  "weight": true,
  "position": true
  "sources": [
    "name",
    "address"
  ]
}

Volume definition

Abstract
A unit to compose a dataset. A dataset consists of one or more volumes. A volume consists of either a single instance of database or a collection of slices. When a volume consists of a single database instance, address parameter must be assigned and the other parameters must not be assigned. Otherwise, dimension, slicer and slices are required, and vice versa.
Value
An object with the following key/value pairs.

Parameters

address
Abstract
Specifies the location of the database instance.
Value
A string in the following format.
${host_name}:${port_number}/${tag}.${name}
  • host_name: The name of host that has the database instance.
  • port_number: The port number for the database instance.
  • tag: The tag of the database instance. The tag name can’t include .. You can use multiple tags for one host name and port number pair.
  • name: The name of the databases instance. You can use multiple names for one host name, port number and tag triplet.
Default value
None.
Inheritable
False.
dimension
Abstract
Specifies the dimension to slice the records in the fact table. Either ‘_key” or a scalar type column can be selected from columns parameter of the fact table. See dimension.
Value
A string.
Default value
"_key"
Inheritable
True. Overridable in dataset and volume definition.
slicer
Abstract
Function to slice the value of dimension column.
Value
Name of slicer function.
Default value
"hash"
Inheritable
True. Overridable in dataset and volume definition.

In order to define a volume which consists of a collection of slices, the way how slice records into slices must be decided.

The slicer function that specified as slicer and the column (or key) specified as dimension, which is input for the slicer function, defines that.

Slicers are categorized into three types. Here are three types of slicers:

Ratio-scaled
Ratio-scaled slicers slice datapoints in the specified ratio, e.g. hash function of _key. Slicers of this type are:
  • hash
Ordinal-scaled
Ordinal-scaled slicers slice datapoints with ordinal values; the values have some ranking, e.g. time, integer, element of {High, Middle, Low}. Slicers of this type are:
  • (not implemented yet)
Nominal-scaled
Nominal-scaled slicers slice datapoints with nominal values; the values denotes categories,which have no order, e.g. country, zip code, color. Slicers of this type are:
  • (not implemented yet)
slices
Abstract
Definition of slices which store the contents of the data.
Value
An array of slice definitions.
Default value
None.
Inheritable
False.

Examples

Example 1: Single instance

A volume at “localhost:24224/volume.000”:

{
  "address": "localhost:24224/volume.000"
}
Example 2: Slices

A volume that consists of three slices, records are to be distributed according to hash, which is ratio-scaled slicer function, of _key.

{
  "dimension": "_key",
  "slicer": "hash",
  "slices": [
    {
      "volume": {
        "address": "localhost:24224/volume.000"
      }
    },
    {
      "volume": {
        "address": "localhost:24224/volume.001"
      }
    },
    {
      "volume": {
        "address": "localhost:24224/volume.002"
      }
    }
  ]

Slice definition

Abstract
Definition of each slice. Specifies the range of sliced data and the volume to store the data.
Value
An object with the following key/value pairs.

Parameters

weight
Abstract
Specifies the share in the slices. Only available when the slicer is ratio-scaled.
Value
A numeric value.
Default value
1.
Inheritable
False.
label
Abstract
Specifies the concrete value that slicer may return. Only available when the slicer is nominal-scaled.
Value
A value of the dimension column data type. When the value is not provided, this slice is regarded as else; matched only if all other labels are not matched. Therefore, only one slice without label is allowed in slices.
Default value
None.
Inheritable
False.
boundary
Abstract
Specifies the concrete value that can compare with slicer’s return value. Only available when the slicer is ordinal-scaled.
Value
A value of the dimension column data type. When the value is not provided, this slice is regarded as else; this means the slice is open-ended. Therefore, only one slice without boundary is allowed in a slices.
Default value
None.
Inheritable
False.
volume
Abstract
A volume to store the data which corresponds to the slice.
Value

An object which is a volume definition

Default value
None.
Inheritable
False.

Examples

Example 1: Ratio-scaled

Slice for a ratio-scaled slicer, with the weight 1:

{
  "weight": 1,
  "volume": {
  }
}
Example 2: Nominal-scaled

Slice for a nominal-scaled slicer, with the label "1":

{
  "label": "1",
  "volume": {
  }
}
Example 3: Ordinal-scaled

Slice for a ordinal-scaled slicer, with the boundary 100:

{
  "boundary": 100,
  "volume": {
  }
}

Realworld example

See the catalog of basic tutorial.