This is a document for PGroonga 2.X. See PGroonga 1.x document when you're using old PGroonga.

pgroonga_highlight_html function

Since 1.0.7.

Summary

pgroonga_highlight_html function surrounds the specified keywords in the specified text by <span class="keyword"> and </span>. HTML special characters such as & in the specified text are escaped.

Syntax

There are two signatures:

text pgroonga_highlight_html(target, ARRAY[keyword1, keyword2, ...])
text pgroonga_highlight_html(target, ARRAY[keyword1, keyword2, ...], index_name)

The first signature is simpler than others. The first signature is enough for most cases.

The second signature is useful when you use custom normalizer.

The second signature is available since 2.0.7.

Here is the description of the first signature.

text pgroonga_highlight_html(target, ARRAY[keyword1, keyword2, ...])

target is a text to be highlighted. It's text type.

keyword1, keyword2, ... are keywords to be highlighted. They're an array of text type. You must specify one or more keywords.

pgroonga_highlight_html markups the keywords in target. It's type is text type.

The keywords are surrounded with <span class="keyword"> and </span>. <, >, & and " in target is HTML escaped.

Here is the description of the second signature.

text pgroonga_highlight_html(target, ARRAY[keyword1, keyword2, ...], index_name)

target is a text to be highlighted. It's text type.

keyword1, keyword2, ... are keywords to be highlighted. They're an array of text type. You must specify one or more keywords.

index_name is an index name of the corresponding PGroonga index. It's text type.

index_name can be NULL.

If you aren't using NormalizerAuto normalizer such as NormalizerNFKC100, it's better that you use index_name. pgroonga_highlight_html uses NormalizerAuto normalizer by default. It may cause unexpected result.

If you specify index_name, the specified PGroonga index must have TokenNgram tokenizer with "report_source_location" option.

Here is an example:

CREATE TABLE memos (
  content text
);

CREATE INDEX pgroonga_content_index
          ON memos
       USING pgroonga (content)
        WITH (tokenizer='TokenNgram("report_source_location", true)',
              normalizer='NormalizerNFKC100');

Now, you can use pgroonga_content_index as index_name:

SELECT pgroonga_highlight_html('one two three four five',
                               ARRAY['two three', 'five'],
                               'pgroonga_content_index');
--                               pgroonga_highlight_html                              
-- -----------------------------------------------------------------------------------
--  one<span class="keyword"> two three</span> four<span class="keyword"> five</span>
-- (1 row)

pgroonga_highlight_html markups the keywords in target. It's type is text type.

The keywords are surrounded with <span class="keyword"> and </span>. <, >, & and " in target is HTML escaped.

It's available since 2.0.7.

Usage

You need to specify at least one keyword:

SELECT pgroonga_highlight_html('PGroonga is a PostgreSQL extension.',
                               ARRAY['PostgreSQL']) AS highlight_html;
--                           highlight_html                          
-- ------------------------------------------------------------------
--  PGroonga is a <span class="keyword">PostgreSQL</span> extension.
-- (1 row)

You can specify multiple keywords:

SELECT pgroonga_highlight_html('PGroonga is a PostgreSQL extension.',
                               ARRAY['Groonga', 'PostgreSQL']) AS highlight_html;
--                                         highlight_html                                         
-- -----------------------------------------------------------------------------------------------
--  P<span class="keyword">Groonga</span> is a <span class="keyword">PostgreSQL</span> extension.
-- (1 row)

You can extract keywords from query by pgroonga_query_extract_keywords function:

SELECT pgroonga_highlight_html('PGroonga is a PostgreSQL extension.',
                               pgroonga_query_extract_keywords('Groonga PostgreSQL -extension')) AS highlight_html;
--                                         highlight_html                                         
-- -----------------------------------------------------------------------------------------------
--  P<span class="keyword">Groonga</span> is a <span class="keyword">PostgreSQL</span> extension.
-- (1 row)

HTML special characters are escaped automatically:

SELECT pgroonga_highlight_html('<p>PGroonga is Groonga & PostgreSQL.</p>',
                               ARRAY['PostgreSQL']) AS highlight_html;
--                                     highlight_html                                     
-- ---------------------------------------------------------------------------------------
--  &lt;p&gt;PGroonga is Groonga &amp; <span class="keyword">PostgreSQL</span>.&lt;/p&gt;
-- (1 row)

Characters are normalized:

SELECT pgroonga_highlight_html('PGroonga + pglogical = replicatable!',
                               ARRAY['Pg']) AS highlight_html;
--                                     highlight_html                                         
-- ------------------------------------------------------------------------------------------------
--  <span class="keyword">PG</span>roonga + <span class="keyword">pg</span>logical = replicatable!
-- (1 row)

Multibyte characters are also supported:

SELECT pgroonga_highlight_html('10㌖先にある100キログラムの米',
                               ARRAY['キロ']) AS highlight_html;
--                                     highlight_html                                     
-- ---------------------------------------------------------------------------------------
--  10<span class="keyword">㌖</span>先にある100<span class="keyword">キロ</span>グラムの米
-- (1 row)

Custom tokenizer and normalizer can be used by specifying a PGroonga index name:

CREATE TABLE memos (
  content text
);

CREATE INDEX pgroonga_content_index
          ON memos
       USING pgroonga (content)
        WITH (tokenizer='TokenNgram("report_source_location", true)',
              normalizer='NormalizerNFKC100');

SELECT pgroonga_highlight_html('one two three four five',
                               ARRAY['two three', 'five'],
                               'pgroonga_content_index');
--                               pgroonga_highlight_html                              
-- -----------------------------------------------------------------------------------
--  one<span class="keyword"> two three</span> four<span class="keyword"> five</span>
-- (1 row)

See also