7.3.56. select#
7.3.56.1. Summary#
select searches records that are matched to specified conditions
from a table and then outputs them.
select is the most important command in groonga. You need to
understand select to use the full power of Groonga.
7.3.56.2. Syntax#
This command takes many parameters.
The required parameter is only table. Other parameters are
optional:
select table
[match_columns=null]
[query=null]
[filter=null]
[scorer=null]
[sortby=null]
[output_columns="_id, _key, *"]
[offset=0]
[limit=10]
[drilldown=null]
[drilldown_sortby=null]
[drilldown_output_columns="_key, _nsubrecs"]
[drilldown_offset=0]
[drilldown_limit=10]
[cache=yes]
[match_escalation_threshold=0]
[query_expansion=null]
[query_flags=ALLOW_PRAGMA|ALLOW_COLUMN]
[query_expander=null]
[adjuster=null]
[drilldown_calc_types=NONE]
[drilldown_calc_target=null]
[drilldown_filter=null]
[sort_keys=null]
[drilldown_sort_keys=null]
[match_escalation=auto]
[load_table=null]
[load_columns=null]
[load_values=null]
[drilldown_max_n_target_records=-1]
[n_workers=0]
This command has the following named parameters for dynamic columns:
columns[${NAME}].stage=null
columns[${NAME}].flags=COLUMN_SCALAR
columns[${NAME}].type=null
columns[${NAME}].value=null
columns[${NAME}].window.sort_keys=null
columns[${NAME}].window.group_keys=null
You can use one or more alphabets, digits, _ for ${NAME}. For
example, column1 is a valid ${NAME}. This is the same rule as
normal column. See also name.
Parameters that have the same ${NAME} are grouped.
For example, the following parameters specify one dynamic column:
--columns[name].stage initial
--columns[name].type UInt32
--columns[name].value 29
The following parameters specify two dynamic columns:
--columns[name1].stage initial
--columns[name1].type UInt32
--columns[name1].value 29
--columns[name2].stage filtered
--columns[name2].type Float
--columns[name2].value '_score * 0.1'
This command has the following named parameters for advanced drilldown:
drilldowns[${LABEL}].keys=null
drilldowns[${LABEL}].sort_keys=null
drilldowns[${LABEL}].output_columns="_key, _nsubrecs"
drilldowns[${LABEL}].offset=0
drilldowns[${LABEL}].limit=10
drilldowns[${LABEL}].calc_types=NONE
drilldowns[${LABEL}].calc_target=null
drilldowns[${LABEL}].filter=null
drilldowns[${LABEL}].max_n_target_records=-1
drilldowns[${LABEL}].columns[${NAME}].stage=null
drilldowns[${LABEL}].columns[${NAME}].flags=COLUMN_SCALAR
drilldowns[${LABEL}].columns[${NAME}].type=null
drilldowns[${LABEL}].columns[${NAME}].value=null
drilldowns[${LABEL}].columns[${NAME}].window.sort_keys=null
drilldowns[${LABEL}].columns[${NAME}].window.group_keys=null
Deprecated since version 6.0.3: drilldown[...] syntax is deprecated, Use drilldowns[...]
instead.
You can use one or more alphabets, digits, _ and . for
${LABEL}. For example, parent.sub1 is a valid ${LABEL}.
Parameters that have the same ${LABEL} are grouped.
For example, the following parameters specify one drilldown:
--drilldowns[label].keys column
--drilldowns[label].sort_keys -_nsubrecs
The following parameters specify two drilldowns:
--drilldowns[label1].keys column1
--drilldowns[label1].sort_keys -_nsubrecs
--drilldowns[label2].keys column2
--drilldowns[label2].sort_keys _key
7.3.56.3. Usage#
Let’s learn about select usage with examples. This section shows
many popular usages.
Here are a schema definition and sample data to show usage.
Execution example:
table_create Entries TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries content COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries n_likes COLUMN_SCALAR UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries tag COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms entries_key_index COLUMN_INDEX|WITH_POSITION Entries _key
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms entries_content_index COLUMN_INDEX|WITH_POSITION Entries content
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Entries
[
{"_key": "The first post!",
"content": "Welcome! This is my first post!",
"n_likes": 5,
"tag": "Hello"},
{"_key": "Groonga",
"content": "I started to use Groonga. It's very fast!",
"n_likes": 10,
"tag": "Groonga"},
{"_key": "Mroonga",
"content": "I also started to use Mroonga. It's also very fast! Really fast!",
"n_likes": 15,
"tag": "Groonga"},
{"_key": "Good-bye Senna",
"content": "I migrated all Senna system!",
"n_likes": 3,
"tag": "Senna"},
{"_key": "Good-bye Tritonn",
"content": "I also migrated all Tritonn system!",
"n_likes": 3,
"tag": "Senna"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 5]
There is a table, Entries, for blog entries. An entry has title,
content, the number of likes for the entry and tag. Title is key of
Entries. Content is value of Entries.content column. The
number of likes is value of Entries.n_likes column. Tag is value
of Entries.tag column.
Entries._key column and Entries.content column are indexed
using TokenBigram tokenizer. So both Entries._key and
Entries.content are fulltext search ready.
OK. The schema and data for examples are ready.
7.3.56.3.1. Simple usage#
Here is the most simple usage with the above schema and data. It outputs
all records in Entries table.
Execution example:
select Entries
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ]
# ]
# ]
Why does the command output all records? There are two reasons. The
first reason is that the command doesn’t specify any search
conditions. No search condition means all records are matched. The
second reason is that the number of all records is 5. select
command outputs 10 records at a maximum by default. There are only 5
records. It is less than 10. So the command outputs all records.
7.3.56.3.2. Search conditions#
Search conditions are specified by query or filter. You can
also specify both query and filter. It means that selected
records must be matched against both query and filter.
7.3.56.3.2.1. Search condition: query#
query is designed for search box in Web page. Imagine a search box
in google.com. You specify search conditions for query as space
separated keywords. For example, search engine means a matched
record should contain two words, search and engine.
Normally, query parameter is used for specifying fulltext search
conditions. It can be used for non fulltext search conditions but
filter is used for the propose.
query parameter is used with match_columns parameter when
query parameter is used for specifying fulltext search
conditions. match_columns specifies which columnes and indexes are
matched against query.
Here is a simple query usage example.
Execution example:
select Entries --match_columns content --query fast
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ]
# ]
# ]
The select command searches records that contain a word fast
in content column value from Entries table.
query has query syntax but its deatils aren’t described here. See
Query syntax for datails.
7.3.56.3.2.2. Search condition: filter#
filter is designed for complex search conditions. You specify
search conditions for filter as ECMAScript like syntax.
Here is a simple filter usage example.
Execution example:
select Entries --filter 'content @ "fast" && _key == "Groonga"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ]
# ]
# ]
# ]
The select command searches records that contain a word fast
in content column value and has Groonga as _key from
Entries table. There are three operators in the command, @,
&& and ==. @ is fulltext search operator. && and
== are the same as ECMAScript. && is logical AND operator and
== is equality operator.
filter has more operators and syntax like grouping by (...)
its details aren’t described here. See
Script syntax for datails.
7.3.56.3.3. Paging#
You can specify range of outputted records by offset and limit.
Here is an example to output only the 2nd record.
Execution example:
select Entries --offset 1 --limit 1
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ]
# ]
# ]
# ]
offset is zero-based. --offset 1 means output range is
started from the 2nd record.
limit specifies the max number of output records. --limit 1
means the number of output records is 1 at a maximium. If no records
are matched, select command outputs no records.
7.3.56.3.4. The total number of records#
You can use --limit 0 to retrieve the total number of recrods
without any contents of records.
Execution example:
select Entries --limit 0
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ]
# ]
# ]
--limit 0 is also useful for retrieving only the number of matched
records.
7.3.56.3.5. Drilldown#
You can get additional grouped results against the search result in
one select. You need to use two or more SELECT s in SQL but
select in Groonga can do it in one select.
This feature is called as drilldown in Groonga. It’s also called as faceted search in other search engine.
For example, think about the following situation.
You search entries that has fast word:
Execution example:
select Entries --filter 'content @ "fast"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ]
# ]
# ]
You want to use tag for additional search condition like
--filter 'content @ "fast" && tag == "???". But you don’t know
suitable tag until you see the result of content @ "fast".
If you know the number of matched records of each available tag, you can choose suitable tag. You can use drilldown for the case:
Execution example:
select Entries --filter 'content @ "fast"' --drilldown tag
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ],
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Groonga",
# 2
# ]
# ]
# ]
# ]
--drilldown tag returns a list of pair of available tag and the
number of matched records. You can avoid “no hit search” case by
choosing a tag from the list. You can also avoid “too many search
results” case by choosing a tag that the number of matched records is
few from the list.
You can create the following UI with the drilldown results:
Links to narrow search results. (Users don’t need to input a search query by their keyboard. They just click a link.)
Most EC sites use the UI. See side menu at Amazon.
Groonga supports not only counting grouped records but also finding the maximum and/or minimum value from grouped records, summing values in grouped records and so on. See Drilldown related parameters for details.
7.3.56.3.6. Dynamic column#
You can create zero or more columns dynamically while a select
execution. You can use them for drilldown by computed value, window
function and so on.
Here is an example that uses dynamic column for drilldown by computed
value. This example creates a new column named
n_likes_class. n_likes_class column has classified value of
Entry.n_likes value. This example classifies Entry.n_likes
column value 10 step and the lowest number in the class is the
classified value. If a Entry.n_likes value is between 0 and
9 such as 3 and 5, n_likes_class value (classified
value) is 0. If Entry.n_likes value is between 10 and
19 such as 10 and 15, n_likes_class value (classified
value) is 10.
You can use number_classify function for
the classification. You need to register functions/number plugin
by plugin_register command to use
number_classify function.
This example does drilldown by n_likes_class value. The drilldown
result will help you to know data trend.
Execution example:
plugin_register functions/number
# [[0, 1337566253.89858, 0.000355720520019531], true]
select \
--table Entries \
--columns[n_likes_class].stage initial \
--columns[n_likes_class].type UInt32 \
--columns[n_likes_class].value 'number_classify(n_likes, 10)' \
--drilldown n_likes_class \
--drilldown_sort_keys _nsubrecs \
--output_columns n_likes,n_likes_class
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "n_likes",
# "UInt32"
# ],
# [
# "n_likes_class",
# "UInt32"
# ]
# ],
# [
# 5,
# 0
# ],
# [
# 10,
# 10
# ],
# [
# 15,
# 10
# ],
# [
# 3,
# 0
# ],
# [
# 3,
# 0
# ]
# ],
# [
# [
# 2
# ],
# [
# [
# "_key",
# "UInt32"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# 10,
# 2
# ],
# [
# 0,
# 3
# ]
# ]
# ]
# ]
See Dynamic column related parameters for details.
7.3.56.3.7. Window function#
You can compute each record value from values of grouped records. For example, you can compute sums of each group and puts sums to each record. The difference against drilldown is drilldown can compute sums of each group but it puts sums to each group not record.
Here is the result with window function. Each record has sum:
Group No. |
Target value |
Sum result |
|---|---|---|
1 |
5 |
5 |
2 |
10 |
25 |
2 |
15 |
25 |
3 |
3 |
8 |
3 |
5 |
8 |
Here is the result with drilldown. Each group has sum:
Group No. |
Target values |
Sum result |
|---|---|---|
1 |
5 |
5 |
2 |
10, 15 |
25 |
3 |
3, 5 |
8 |
Window function is useful for data analysis.
Here is an example that sums Entries.n_likes per
Entries.tag:
Execution example:
plugin_register functions/number
# [[0, 1337566253.89858, 0.000355720520019531], true]
select \
--table Entries \
--columns[n_likes_sum_per_tag].stage initial \
--columns[n_likes_sum_per_tag].type UInt32 \
--columns[n_likes_sum_per_tag].value 'window_sum(n_likes)' \
--columns[n_likes_sum_per_tag].window.group_keys tag \
--output_columns tag,n_likes,n_likes_sum_per_tag
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "tag",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "n_likes_sum_per_tag",
# "UInt32"
# ]
# ],
# [
# "Hello",
# 5,
# 5
# ],
# [
# "Groonga",
# 10,
# 25
# ],
# [
# "Groonga",
# 15,
# 25
# ],
# [
# "Senna",
# 3,
# 6
# ],
# [
# "Senna",
# 3,
# 6
# ]
# ]
# ]
# ]
See Window function related parameters for details.
7.3.56.4. Parameters#
This section describes all parameters. Parameters are categorized.
7.3.56.4.1. Required parameters#
There is a required parameter, table.
7.3.56.4.1.1. table#
Specifies a table to be searched. table must be specified.
If nonexistent table is specified, an error is returned.
Execution example:
select Nonexistent
# [
# [
# -22,
# 1337566253.89858,
# 0.000355720520019531,
# "[select][table] invalid name: <Nonexistent>",
# [
# [
# "grn_select",
# "lib/proc/proc_select.c",
# 4452
# ]
# ]
# ]
# ]
7.3.56.4.3. Advanced search parameters#
7.3.56.4.3.1. match_escalation_threshold#
New in version 8.0.1.
Specifies threshold to determine whether search storategy escalation is used or not. The threshold is compared against the number of matched records. If the number of matched records is equal to or less than the threshold, the search storategy escalation is used. See 検索 about the search storategy escalation.
The default threshold is 0. It means that search storategy escalation is used only when no records are matched.
The default threshold can be customized by one of the followings.
--with-match-escalation-thresholdoption of configure
--match-escalation-thresholdoption of groonga command
match-escalation-thresholdconfiguration item in configuration file
Here is a simple match_escalation_threshold usage example. The
first select doesn’t have match_escalation_threshold
parameter. The second select has match_escalation_threshold
parameter.
Execution example:
select Entries --match_columns content --query groo
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ]
# ]
# ]
# ]
select Entries --match_columns content --query groo --match_escalation_threshold -1
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 0
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ]
# ]
# ]
The first select command searches records that contain a word
groo in content column value from Entries table. But no
records are matched because the TokenBigram tokenizer tokenizes
groonga to groonga not gr|ro|oo|on|ng|ga. (The
TokenBigramSplitSymbolAlpha tokenizer tokenizes groonga to
gr|ro|oo|on|ng|ga. See Tokenizers for details.)
It means that groonga is indexed but groo isn’t indexed. So no
records are matched against groo by exact match. In the case, the
search storategy escalation is used because the number of matched
records (0) is equal to match_escalation_threshold (0). One record
is matched against groo by unsplit search.
The second select command also searches records that contain a
word groo in content column value from Entries table. And
it also doesn’t find matched records. In this case, the search
storategy escalation is not used because the number of matched
records (0) is larger than match_escalation_threshold (-1). So no
more searches aren’t executed. And no records are matched.
7.3.56.4.3.2. match_escalation#
Specifies how to use match escalation. See also match_escalation and 検索 about the match escalation.
Here are available values:
Value |
Description |
|---|---|
|
Groonga uses match_escalation_threshold to determine whether match escalation is used or not. This is the default. |
|
Groonga always uses match escalation. |
|
Groonga never use match escalation. |
--match_escalation yes is stronger than
--match_escalation_threshold 9999...999. --filter 'true &&
column @ "query" with --match_escalation yes uses match
escalation. --filter 'true && column @ "query" with
--match_escalation_threshold 9999...999 doesn’t use match
escalation.
Here is a simple match_escalation usage example. The first
select doesn’t have match_escalation parameter. The
second select has match_escalation parameter.
Execution example:
select Entries --filter 'true && content @ "groo"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 0
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ]
# ]
# ]
select Entries --filter 'true && content @ "groo"' --match_escalation yes
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ]
# ]
# ]
# ]
The first select command searches records that contain a word
groo in content column value from Entries table. But no
records are matched because the TokenBigram tokenizer tokenizes
groonga to groonga not gr|ro|oo|on|ng|ga.
The second select command also searches records that contain a
word groo in content column value from Entries table. And
it uses match escalation. So it can find matched records.
7.3.56.4.3.3. query_expansion#
Deprecated since version 3.0.2: Use query_expander instead.
7.3.56.4.3.4. query_flags#
It customs query parameter syntax. You cannot update column value
by query parameter by default. But if you specify
ALLOW_COLUMN|ALLOW_UPDATE as query_flags, you can update
column value by query.
Here are available values:
ALLOW_PRAGMAALLOW_COLUMNALLOW_UPDATEALLOW_LEADING_NOTQUERY_NO_SYNTAX_ERRORNONE
ALLOW_PRAGMA enables pragma at the head of query. This is not
implemented yet.
ALLOW_COLUMN enables search againt columns that are not included
in match_columns. To specify column, there are COLUMN:...
syntaxes.
ALLOW_UPDATE enables column update by query with
COLUMN:=NEW_VALUE syntax. ALLOW_COLUMN is also required to
update column because the column update syntax specifies column.
ALLOW_LEADING_NOT enables leading NOT condition with -WORD
syntax. The query searches records that doesn’t match
WORD. Leading NOT condition query is heavy query in many cases
because it matches many records. So this flag is disabled by
default. Be careful about it when you use the flag.
QUERY_NO_SYNTAX_ERROR enables never causes syntax error for query.
This flag is useful when an application uses user input directly and doesn’t want to show syntax error to the user and in a log.
This flag is disabled by default.
NONE is just ignores. You can use NONE for specifying no flags.
They can be combined by separated | such as
ALLOW_COLUMN|ALLOW_UPDATE.
The default value is ALLOW_PRAGMA|ALLOW_COLUMN.
Here is a usage example of ALLOW_COLUMN.
Execution example:
select Entries --query content:@mroonga --query_flags ALLOW_COLUMN
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ]
# ]
# ]
The select command searches records that contain mroonga in
content column value from Entries table.
Here is a usage example of ALLOW_UPDATE.
Execution example:
table_create Users TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Users age COLUMN_SCALAR UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Users
[
{"_key": "alice", "age": 18},
{"_key": "bob", "age": 20}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
select Users --query age:=19 --query_flags ALLOW_COLUMN|ALLOW_UPDATE
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "age",
# "UInt32"
# ]
# ],
# [
# 1,
# "alice",
# 19
# ],
# [
# 2,
# "bob",
# 19
# ]
# ]
# ]
# ]
select Users
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "age",
# "UInt32"
# ]
# ],
# [
# 1,
# "alice",
# 19
# ],
# [
# 2,
# "bob",
# 19
# ]
# ]
# ]
# ]
The first select command sets age column value of all records
to 19. The second select command outputs updated age
column values.
Here is a usage example of ALLOW_LEADING_NOT.
Execution example:
select Entries --match_columns content --query -mroonga --query_flags ALLOW_LEADING_NOT
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ]
# ]
# ]
The select command searches records that don’t contain mroonga
in content column value from Entries table.
Here are a schema definition and sample data to describe other flags:
Execution example:
table_create --name Magazine --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Magazine --name title --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Magazine
[
{"_key":"http://test.jp/magazine/webplus","title":"WEB+"},
{"_key":"http://test.jp/magazine/database","title":"DataBase"},
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
Here is an example of QUERY_NO_SYNTAX_ERROR:
Execution example:
select Magazine --match_columns title --query 'WEB +' --query_flags ALLOW_PRAGMA|ALLOW_COLUMN|QUERY_NO_SYNTAX_ERROR
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://test.jp/magazine/webplus",
# "WEB+"
# ]
# ]
# ]
# ]
If you don’t specify this flag, the quey causes a syntax error as below.
Execution example:
select Magazine --match_columns title --query 'WEB +' --query_flags ALLOW_PRAGMA|ALLOW_COLUMN
# [
# [
# -63,
# 1337566253.89858,
# 0.000355720520019531,
# "Syntax error: <WEB +||>",
# [
# [
# "yy_syntax_error",
# "lib/grn_ecmascript.lemon",
# 168
# ]
# ]
# ]
# ]
Here is a usage example of NONE.
Execution example:
select Entries --match_columns content --query 'mroonga OR _key:Groonga' --query_flags NONE
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ]
# ]
# ]
The select command searches records that contain one of two words
mroonga or _key:Groonga in content from Entries table.
Note that _key:Groonga doesn’t mean that the value of _key
column is equal to Groonga. Because ALLOW_COLUMN flag is not
specified.
See also Query syntax.
7.3.56.4.3.5. query_expander#
It’s for query expansion. Query expansion substitutes specific words to another words in query. Nomally, it’s used for synonym search.
It specifies a column that is used to substitute query parameter
value. The format of this parameter value is
“${TABLE}.${COLUMN}”. For example, “Terms.synonym” specifies
synonym column in Terms table.
Table for query expansion is called “substitution table”. Substitution
table’s key must be ShortText. So array table (TABLE_NO_KEY)
can’t be used for query expansion. Because array table doesn’t have
key.
Column for query expansion is called “substitution
column”. Substitution column’s value type must be
ShortText. Column type must be vector (COLUMN_VECTOR).
Query expansion substitutes key of substitution table in query with
values in substitution column. If a word in query is a key of
substitution table, the word is substituted with substitution column
value that is associated with the key. Substition isn’t performed
recursively. It means that substitution target words in substituted
query aren’t substituted.
Here is a sample substitution table to show a simple
query_expander usage example.
Execution example:
table_create Thesaurus TABLE_PAT_KEY ShortText --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Thesaurus synonym COLUMN_VECTOR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Thesaurus
[
{"_key": "mroonga", "synonym": ["mroonga", "tritonn", "groonga mysql"]},
{"_key": "groonga", "synonym": ["groonga", "senna"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
Thesaurus substitution table has two synonyms, "mroonga" and
"groonga". If an user searches with "mroonga", Groonga
searches with "((mroonga) OR (tritonn) OR (groonga mysql))". If an
user searches with "groonga", Groonga searches with "((groonga)
OR (senna))".
Normally, it’s good idea that substitution table uses a normalizer. For example, if normalizer is used, substitute target word is matched in case insensitive manner. See Normalizers for available normalizers.
Note that those synonym values include the key value such as
"mroonga" and "groonga". It’s recommended that you include the
key value. If you don’t include key value, substituted value doesn’t
include the original substitute target value. Normally, including the
original value is better search result. If you have a word that you
don’t want to be searched, you should not include the original
word. For example, you can implement “stop words” by an empty vector
value.
Here is a simple query_expander usage example.
Execution example:
select Entries --match_columns content --query "mroonga"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ]
# ]
# ]
select Entries --match_columns content --query "mroonga" --query_expander Thesaurus.synonym
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ]
# ]
# ]
select Entries --match_columns content --query "((mroonga) OR (tritonn) OR (groonga mysql))"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ]
# ]
# ]
The first select command doesn’t use query expansion. So a record
that has "tritonn" isn’t found. The second select command uses
query expansion. So a record that has "tritonn" is found. The
third select command doesn’t use query expansion but it is same as
the second select command. The third one uses expanded query.
Each substitute value can contain any Query syntax syntax
such as (...) and OR. You can use complex substitution by
using those syntax.
Here is a complex substitution usage example that uses query syntax.
Execution example:
load --table Thesaurus
[
{"_key": "popular", "synonym": ["popular", "n_likes:>=10"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
select Entries --match_columns content --query "popular" --query_expander Thesaurus.synonym
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ]
# ]
# ]
The load command registers a new synonym "popular". It is
substituted with ((popular) OR (n_likes:>=10)). The substituted
query means that “popular” is containing the word “popular” or 10 or
more liked entries.
The select command outputs records that n_likes column value
is equal to or more than 10 from Entries table.
7.3.56.4.3.6. n_workers#
New in version 12.0.5.
Note
This is an experimental feature. Currently, this feature is still not stable.
This feature requires Command version 3 or later.
This feature requires that Apache Arrow is enabled in Groonga.
It depends on package provider whether Apache Arrow is enabled or not.
To check whether Apache Arrow is enabled, you can use status command that show the result of apache_arrow is true or not.
If Apache Arrow is disabled, you should build Groonga from the source code with enabling Apache Arrow following the steps in Install or request to enable Apache Arrow to the package provider.
drilldown , drilldowns and slices
are executed in parallel when this parameter is specified -1 or 2 or more.
In a default setting, drilldown, drilldowns and slices are executed in serial.
In other words, a next process is executed after a current process is finished.
So, queries tend to take a long time if there are a lot of drilldown, drilldowns and slices.
n_workers enables to execute independent drilldown, drilldowns and slices in parallel.
The execution time of the total sum of processes can be shourtend by executing them in parallel.
This parallel execution is done for each select command.
“independent” means not using dorilldowns.table to reference the results of other drilldowns or slices.
If there are dependencies as same meaning as using dorilldowns.table, it wait for finish the dependent drilldowns or slices.
Therefore, the degree of parallelism is reduced if they have dependencies.
Executing in parallel means using multiple CPUs at the same time. If executing in parallel without free CPU resource, it may actually slow down the execution time. This is because they have to wait for the other process being executed by the target CPU to finish.
It depends on a system configuration whether or not there are free CPU resources and how many n_workers should be specified.
For example, consider using Groonga HTTP server on a system with 6 CPUs.
Groonga HTTP server allocates 1 thread (= 1CPU) for each request.
When the average number of concurrent connections is 6, there are no free CPU resources because 6 CPUs are already in use. All the CPU is used to process each request.
When the average number of concurrent connections is 2, there are 4 free CPU resources because only 2 CPUs are already in use.
When specifying 2 for n_workers, the select command will use at most 3 CPUs, including the thread for processing requests.
Therefore, if two select commands with 2 specified for n_workers are requested at the same time,
they will use at most 6 CPUs in total and will be processed fastly by using all of the resources.
When specifying greater than 2, the degree of parallelism can be higher than the CPU resources, so it may actually slow down the execution time.
n_workers behaves as follows depending on the specified value.
When specifying
0or1Executes the select command in serial
When specifying
2or moreExecutes the select command in parallel with at most the specified number of threads.
When specifying
-1or lessExecutes the select command in parallel with the threads of at most the number of CPU cores.
The default value of this parameter is 0 .
It means that the select command is executed in serial in default.
Note
The default value can be changed by specifying the environment variable GRN_SELECT_N_WORKERS_DEFAULT.
7.3.56.5. Return value#
The command returns a response with the following format:
[
HEADER,
[
SEARCH_RESULT,
DRILLDOWN_RESULT_1,
DRILLDOWN_RESULT_2,
...,
DRILLDOWN_RESULT_N
]
]
If the command fails, error details are in HEADER.
See Output format for HEADER.
There are zero or more DRILLDOWN_RESULT. If no drilldown and
drilldowns[${LABEL}].keys are specified, they are omitted like the
following:
[
HEADER,
[
SEARCH_RESULT
]
]
If drilldown has two or more keys like --drilldown "_key,
column1, column2", multiple DRILLDOWN_RESULT exist:
[
HEADER,
[
SEARCH_RESULT,
DRILLDOWN_RESULT_FOR_KEY,
DRILLDOWN_RESULT_FOR_COLUMN1,
DRILLDOWN_RESULT_FOR_COLUMN2
]
]
If drilldowns[${LABEL}].keys is used, only one DRILLDOWN_RESULT
exist:
[
HEADER,
[
SEARCH_RESULT,
DRILLDOWN_RESULT_FOR_LABELED_DRILLDOWN
]
]
DRILLDOWN_RESULT format is different between drilldown and
drilldowns[${LABEL}].keys. It’s described later.
SEARCH_RESULT is the following format:
[
[N_HITS],
COLUMNS,
RECORDS
]
See Simple usage for concrete example of the format.
N_HITS is the number of matched records before limit
is applied.
COLUMNS describes about output columns specified by
output_columns. It uses the following format:
[
[COLUMN_NAME_1, COLUMN_TYPE_1],
[COLUMN_NAME_2, COLUMN_TYPE_2],
...,
[COLUMN_NAME_N, COLUMN_TYPE_N]
]
COLUMNS includes one or more output column information. Each
output column information includes the followings:
Column name as string
Column type as string or
null
Column name is extracted from value specified as output_columns.
Column type is Groonga’s type name or null. It doesn’t describe
whether the column value is vector or scalar. You need to determine it
by whether real column value is array or not.
See Data types for type details.
null is used when column value type isn’t determined. For example,
function call in output_columns such as
--output_columns "snippet_html(content)" uses null.
Here is an example of COLUMNS:
[
["_id", "UInt32"],
["_key", "ShortText"],
["n_likes", "UInt32"],
]
RECORDS includes column values for each matched record. Included
records are selected by offset and
limit. It uses the following format:
[
[
RECORD_1_COLUMN_1,
RECORD_1_COLUMN_2,
...,
RECORD_1_COLUMN_N
],
[
RECORD_2_COLUMN_1,
RECORD_2_COLUMN_2,
...,
RECORD_2_COLUMN_N
],
...
[
RECORD_N_COLUMN_1,
RECORD_N_COLUMN_2,
...,
RECORD_N_COLUMN_N
]
]
Here is an example RECORDS:
[
[
1,
"The first post!",
5
],
[
2,
"Groonga",
10
],
[
3,
"Mroonga",
15
]
]
DRILLDOWN_RESULT format is different between drilldown and
drilldowns[${LABEL}].keys.
drilldown uses the same format as SEARCH_RESULT:
[
[N_HITS],
COLUMNS,
RECORDS
]
And drilldown generates one or more DRILLDOWN_RESULT when
drilldown has one ore more keys.
drilldowns[${LABEL}].keys uses the following format. Multiple
drilldowns[${LABEL}].keys are mapped to one object (key-value
pairs):
{
"LABEL_1": [
[N_HITS],
COLUMNS,
RECORDS
],
"LABEL_2": [
[N_HITS],
COLUMNS,
RECORDS
],
...,
"LABEL_N": [
[N_HITS],
COLUMNS,
RECORDS
]
}
Each drilldowns[${LABEL}].keys corresponds to the following:
"LABEL": [
[N_HITS],
COLUMNS,
RECORDS
]
The following value part is the same format as SEARCH_RESULT:
[
[N_HITS],
COLUMNS,
RECORDS
]
See also Output format for drilldowns[${LABEL}] style for
drilldowns[${LABEL}] style drilldown output format.