A Droonga network consists of several resources. You need to describe them in catalog. All the nodes in the network shares the same catalog.
This documentation describes about catalog.
So far, you need to write catalog and share it to all the nodes manually.
Some utility programs will generate catalog in near feature. Furthermore Droonga network will maintain and share catalog automatically.
This section describes terms in catalog.
Catalog is a series of data which represents the resources in the network.
Zone is a set of farms. Farms in a zone are expected to close to each other, like in the same host, in the same switch, in the same network.
A farm is a Droonga Engine instance. Droonga Engine is implemented as a Fluentd plugin, fluent-plugin-droonga.
A fluentd
process can have multiple Droonga Engines. If you add one
or more match
entries with type droonga
into fluentd.conf
, a
fluentd
process instantiates one or more Droonga Engines.
A farm has its own workers and a job queue. A farm push request to its job queue and workers pull a request from the job queue.
Dataset is a set of logical tables. A logical table must belong to only one dataset.
Each dataset must have an unique name in the same Droonga network.
Logical table consists of one or more partitioned physical tables. Logical table doesn’t have physical records. It returns physical records from physical tables.
You can custom how to partition a logical table into one or more physical tables. For example, you can custom partition key, the number of partitions and so on.
Physical table is a table in Groonga database. It stores physical records to the table.
Ring is a series of partition sets. Dataset must have one ring. Dataset creates logical tables on the ring.
Droonga Engine replicates each record in a logical table into one or more partition sets.
Partition set is a set of partitions. A partition set stores all records in all logical tables in the same Droonga network. In other words, dataset is partitioned in a partition set.
A partition set is a replication of other partition set.
Droonga Engine may support partitioning in one or more partition sets in the future. It will be useful to use different partition size for old data and new data. Normally, old data are smaller and new data are bigger. It is reasonable that you use larger partition size for bigger data.
Partition is a Groonga database. It has zero or more physical tables.
Droonga Engine can be extended by writing plugin scripts. In most cases, a series of plugins work cooperatively to achieve required behaviors. So, plugins are organized by behaviors. Each behavior can be attached to datasets and/or tables by adding “plugins” section to the corresponding entry in the catalog.
More than one plugin can be assigned in a “plugins” section as an array. The order in the array controls the execution order of plugins when adapting messages. When adapting an incoming message, plugins are applied in forward order whereas those are applied in reverse order when adapting an outgoing message.
Consider the following case:
Catalog is written as a JSON file. Its file name is catalog.json
.
Here is a catalog.json
for the above case:
{
"version": 1,
"effective_date": "2013-06-05T00:05:51Z",
"zones": ["localhost:23003/farm0", "localhost:23003/farm1"],
"farms": {
"localhost:23003/farm0": {
"device": "disk0",
"capacity": 1024
},
"localhost:23003/farm1": {
"device": "disk1",
"capacity": 1024
}
},
"datasets": {
"Wiki": {
"workers": 4,
"plugins": ["groonga", "crud", "search"],
"number_of_replicas": 2,
"number_of_partitions": 2,
"partition_key": "_key",
"date_range": "infinity",
"ring": {
"localhost:23004": {
"weight": 10,
"partitions": {
"2013-07-24": [
"localhost:23003/farm0.000",
"localhost:23003/farm1.000"
]
}
},
"localhost:23005": {
"weight": 10,
"partitions": {
"2013-07-24": [
"localhost:23003/farm1.001",
"localhost:23003/farm0.001"
]
}
}
}
}
}
}
Here are descriptions about parameters in catalog.json
.
version
It is a format version of the catalog file.
Droonga Engine will change catalog.json
format in the
future. Droonga Engine can provide auto format update feature with the
information.
The value must be 1
.
This is a required parameter.
Example:
{
"version": 1
}
effective_date
It is a date string representing the day the catalog becomes effective.
The date string format must be W3C-DTF.
This is a required parameter.
Note: fluent-plugin-droonga 0.8.0 doesn’t use this value yet.
Example:
{
"effective_date": "2013-11-29T11:29:29Z"
}
zones
Zones
is an array to express proximities between farms.
Farms are grouped by a zone, and zones can be grouped by another zone recursively.
Zones make a single tree structure, expressed by nested arrays.
Farms in a same branch are regarded as relatively closer than other farms.
e.g.
When the value of zones
is as follows,
[["A", ["B", "C"]], "D"]
it expresses the following tree.
/\
/\ D
A /\
B C
This tree means the farm “B” and “C” are closer than “A” or “D” to each other.
You should make elements in a zones
close to each other, like in the
same host, in the same switch, in the same network.
This is an optional parameter.
Note: fluent-plugin-droonga 0.8.0 doesn’t use this value yet.
Example:
{
"zones": [
["localhost:23003/farm0",
"localhost:23003/farm1"],
["localhost:23004/farm0",
"localhost:23004/farm1"]
]
}
TODO: Discuss about the call of this parameter. This seems completely equals to the list of keys of farms
.
farms
It is an array of Droonga Engine instances.
TODO: Improve me. For example, we have to describe relations of nested farms, ex. children
.
Farms correspond with fluent-plugin-droonga instances. A fluentd process may have multiple farms if more than one match entry with type droonga appear in the “fluentd.conf”. Each farm has its own job queue. Each farm can attach to a data partition which is a part of a dataset.
This is a required parameter.
Example:
{
"farms": {
"localhost:23003/farm0": {
"device": "/disk0",
"capacity": 1024
},
"localhost:23003/farm1": {
"device": "/disk1",
"capacity": 1024
}
}
}
datasets
A dataset is a set of tables which comprise a single logical table virtually. Each dataset must have a unique name in the network.
ring
ring
is a series of partitions which comprise a dataset. replica_count
, number_of_partitons
and time-slice factors affect the number of partitions in a ring
.
workers
workers
is an integer number which specifies the number of worker processes to deal with the dataset.
If 0
is specified, no worker is forked and all operations are done in the master process.
number_of_partitions
number_of_partition
is an integer number which represents the number of partitions divided by the hash function. The hash function which determines where each record resides the partition in a dataset is compatible with memcached.
date_range
date_range
determines when to split the dataset. If a string “infinity” is assigned, dataset is never split by time factor.
number_of_replicas
number_of_replicas
represents the number of replicas of dataset maintained in the network.