Contents
CAST- Assignment
- Comparison
- Other Operations
- Concatenation
- Scalar functions
- <char substring function>
- <char overlay function>
- <char trim function>
- <fold function>
- <character translation function>
- <form-of-use conversion function>
- <regular expression substring function>
- <char position expression>
- <bit length expression>
- <char length expression>
- <octet length expression>
- Set functions
- Predicates
A character string is compatible with, and comparable to, all other character
strings from the same Character set -- that is, character strings are mutually
comparable and mutually assignable as long as they belong to the same Character
set. Character strings may not be directly compared with, or directly assigned
to, any other <data type> class, though implicit type conversions can occur in
expressions, SELECTs, INSERTs,
DELETEs and UPDATEs. Explicit character
string type conversions can be forced with the CAST operator.
CAST
In SQL, CAST is a scalar operator that converts a given
scalar value to a given scalar <data type>. The required syntax for the
CAST operator is:
CAST (AS ) ::= scalar_expression ::= |
The CAST operator converts values of a source <data type>
into values of a target <data type>, where each <data type> is an SQL
pre-defined <data type> (data conversions between UDTs are done with a
user-defined cast). The source <data type>, or <cast operand>, can be any
expression that evaluates to a single value. The target <data type>, or <cast
target>, is either an SQL predefined <data type> specification or the name of a
Domain whose defined <data type> is the SQL predefined <data type> that you
want to convert the value of "scalar_expression" into. (If you use
CAST ... AS <Domain name>, your current <AuthorizationID>
must have the USAGE Privilege on that Domain.)
It isn't, of course, possible to convert the values of every <data type> into the values of every other <data type>. For character strings, the rules are:
CAST (NULL AS <data type>)andCAST (character_string_source_is_a_null_value AS <data type>)both result in aCASTresult ofNULL.- You can
CASTa fixed length or variable length character string or aCLOBorNCLOBsource to these targets:- exact numeric
- approximate numeric
- fixed length character string (if source and target belong to the same Character set)
- variable length character string (if source and target belong to the same Character set)
CLOB(if source and target belong to the same Character set)NCLOB(if source and target belong to the same Character set)- fixed length bit string
- variable length bit string
- date
- time
- timestamp
- year-month interval
- day-time interval
- boolean
You can also CAST a fixed length or variable length
character string or a CLOB or NCLOB
source to a UDT target or a <reference type> target if a user-defined cast
exists for this purpose and your current <AuthorizationID> has the
EXECUTE Privilege on that user-defined cast.
When you CAST a character string to an exact numeric target
or an approximate numeric target, your DBMS strips any leading or trailing
spaces from the source and converts the remaining string -- which must be the
character representation of a number -- to that number. For example,
CAST '-25' AS SMALLINT results in a
SMALLINT value of -25. If your source string doesn't
represent a number, the CAST will fail: your DBMS
will return the
SQLSTATE error 22018 "data exception-invalid character value for cast".
When you CAST a character string to a fixed length character
string target, a variable length character string target or a
CLOB or NCLOB target, both source and
target must belong to the same Character set, the result has the
COERCIBLE coercibility attribute and the Collation of the
result is the default Collation of the target's Character set.
- For fixed length character string targets, if the length of the
source equals the fixed length of the target, the result of the
CASTis the source string. If the length of the source is shorter than the fixed length of the target, the result of theCASTis the source string padded on the right with however many spaces are required to make the lengths match. If the length of the source is longer than the fixed length of the target, the result of theCASTis a character string that contains as much of the source string as possible -- in this case, if the truncated characters are not all spaces, your DBMS will return theSQLSTATE warning 01004 "warning-string data, right truncation". - For variable length character string or
CLOBorNCLOBtargets, if the length of the source is less than or equals the maximum length of the target, the result of theCASTis the source string. If the length of the source is longer than the maximum length of the target, the result of theCASTis a character string that contains as much of the source string as possible -- in this case, if the truncated characters are not all spaces, your DBMS will return theSQLSTATE warning 01004 "warning-string data, right truncation".
When you CAST a character string to a fixed length bit
string or a variable length bit string target, the result is the character
string converted to a string of bits. That is, when you CAST
a character string (which has a "form- of-use encoding") to a bit string (which
has no encoding), you get the bits that make up the characters in the source
string. For example, assume the source character string for a
CAST belongs to the ISO8BIT Character
set. In this Character set, the code for the letter 'A' is
41 hexadecimal (the binary number 01000001) and the code
for the letter 'B' is 42 hexadecimal (the binary number
01000010) -- so "CAST ('AB' TO BIT(16))"
will result in B'0100000101000010'.
- For fixed length bit string targets, if the bit length of the
converted source string equals the fixed bit length of the target, the result
of the
CASTis the converted source string. If the converted source value's bit length is larger than the fixed bit length of the target, the result of theCASTis a bit string that contains as much of the converted source string as possible -- in this case, if the truncated bits are not all zero-bits, your DBMS will return theSQLSTATE warning 01004 "warning-string data, right truncation". If the converted source value's bit length is less than the fixed bit length of the target, the result of theCASTis the converted bit string, padded on the least significant end with as many zero-bits as required to make the lengths match -- in this case, your DBMS will return theSQLSTATE warning 01008 "warning-implicit zero-bit padding". - For variable length bit string targets, if the bit length of the
converted source string is less than or equals the maximum bit length
of the target, the result of the
CASTis the converted source string. If the converted source value's bit length is larger than the maximum bit length of the target, the result of theCASTis a bit string that contains as much of the converted source string as possible -- in this case, if the truncated bits are not all zero-bits, your DBMS will return theSQLSTATE warning 01004 "warning-string data, right truncation".
When you CAST a character string to a date target, your DBMS
strips any leading or trailing spaces from the source and converts the
remaining string -- which must be the character representation of a valid date
-- to that date. If your source string doesn't represent a valid date, the
CAST will fail: your DBMS will return the
SQLSTATE error 22007 "data exception-invalid datetime format".
When you CAST a character string to a time target, your DBMS
strips any leading or trailing spaces from the source and converts the
remaining string -- which must be the character representation of a valid time
-- to that time. If your source string doesn't represent a valid time, the
CAST will fail: your DBMS will return the
SQLSTATE error 22007 "data exception-invalid datetime format".
If your source string isn't a string that could represent any time (even an
invalid one), the CAST will also fail: your DBMS will return
the
SQLSTATE error 22018 "data exception-invalid character value for cast".
When you CAST a character string to a timestamp target, your
DBMS strips any leading or trailing spaces from the source and converts the
remaining string -- which must be the character representation of a valid
timestamp -- to that timestamp. If your source string doesn't represent a valid
timestamp, the CAST will fail: your DBMS will return the
SQLSTATE error 22007 "data exception-invalid datetime format".
If your source string isn't a string that could represent any timestamp (even
an invalid one), the CAST will also fail: your DBMS will
return the
SQLSTATE error 22018 "data exception-invalid character value for cast".
When you CAST a character string to an interval target, your
DBMS strips any leading or trailing spaces from the source and converts the
remaining string -- which must be the character representation of a valid
interval for the target -- to that interval. If your source string doesn't
represent a valid interval for the target, the CAST will
fail: your DBMS will return the
SQLSTATE error 22XXX "data exception-invalid interval format".
If your source string isn't a string that could represent any interval (even an
invalid one for the target), the CAST will also fail: your
DBMS will return the
SQLSTATE error 22007 "data exception-invalid datetime format".
When you CAST a character string to a boolean target, your
DBMS strips any leading or trailing spaces from the source and converts the
remaining string -- which must be the character representation of one of the
truth values TRUE, FALSE or
UNKNOWN -- to that truth value. If your source string
doesn't represent a truth value, the CAST will fail: your
DBMS will return the
SQLSTATE error 22018 "data exception-invalid character value for cast".
When you CAST a character string to a UDT or a <reference
type> target, your DBMS invokes the user defined cast routine, with the source
value as the routine's argument. The CAST result is the
value returned by the user defined cast.
If you want to restrict your code to Core SQL, don't use <Domain name> as a
CAST target: CAST only to a <data type>
and don't use CAST to convert any CLOB
or NCLOB values to another <data type>.
Assignment
SQL allows you to assign only compatible character strings -- that is,
character strings are mutually assignable only if the source string and the
target string belong to the same Character set. If you need to assign a
character string to a target that belongs to a different Character set, use the
TRANSLATE function to translate the source into an
equivalent string that belongs to the target's Character set.
In SQL, when a character string is assigned to a character string target, the assignment is done one character at a time, from left to right.
When a character string is taken from SQL-data to be assigned to a fixed length
character string target and the source is shorter than the target, the source
is padded (on the right) with spaces until it matches the target's size. If the
source is longer than the target, the source is truncated to fit the target. In
this case, your DBMS will return the
SQLSTATE warning 01004 "warning-string data, right truncation".
When a character string is taken from SQL-data to be assigned to a variable
length character string or CLOB target, the size of the
target is first set either to the size of the source or to its own maximum
length, whichever is less. The source may then be truncated, if necessary, to
match the size of the target. In this case, your DBMS will return the
SQLSTATE warning 01004 "warning-string data, right truncation".
[Obscure Rule] Since only SQL accepts null values, if your source is
NULL, then your target's value is not changed. Instead,
your DBMS will set its indicator parameter to -1, to indicate that an
assignment of the null value was attempted. If your target doesn't have an
indicator parameter, the assignment will fail: your DBMS will return the
SQLSTATE error 22002 "data exception-null value, no indicator parameter".
If your source is a non-null value that fits into your target, your DBMS will
set the target's indicator parameter (if any) to zero. If your source is longer
than your target, your DBMS will set your target's indicator parameter to the
length of the source; that is, if your source is 12 characters long and your
target can accept only 10 characters, your DBMS will set the target's indicator
parameter to 12, to indicate that 2 characters were lost on assignment. If the
source's length is too big to be assigned to the indicator, the assignment will
fail: your DBMS will return the
SQLSTATE error 22022 "data exception-indicator overflow".
We'll talk more about indicator parameters in our chapters on SQL binding
styles.
When a character string is assigned to a fixed length SQL-data character string
target and the source is shorter than the target, the source is padded (on the
right) with spaces until it matches the target's size. If the source is larger
than the target, but the extra characters are all spaces, the source's
significant character string value is assigned to the target. If the source is
larger than the target and the extra characters are not all spaces, the
assignment will fail: your DBMS will return the
SQLSTATE error 22001 "data exception-string data, right truncation".
When a character string is assigned to a variable length SQL-data character
string or CLOB target, the size of the target is first set
either to the size of the source or to its own maximum length, whichever is
less. If the source is larger than the target, but the extra characters are all
spaces, the source's significant character string value is assigned to the
target. If the source is larger than the target and the extra characters are
not all spaces, the assignment will fail: your DBMS will return the
SQLSTATE error 22001 "data exception-string data, right truncation".
[Obscure Rule] There are two ways to assign a null value to an SQL-data target.
Within SQL, you can use the <keyword> NULL in an
INSERT or an UPDATE statement to
indicate that the target should be set to NULL; that is, if
your source is NULL, your DBMS will set your target to
NULL. Outside of SQL, if your source has an indicator
parameter that is set to -1, your DBMS will set your target to
NULL (regardless of the value of the source). (An indicator
parameter with a value less than -1 will cause an error: your DBMS will return
the
SQLSTATE error 22010 "data exception-invalid indicator parameter value".)
Comparison
SQL provides the usual scalar comparison operators -- = and <> and
< and <= and > and >= -- to perform operations on
CHAR, VARCHAR, NCHAR
and NCHAR VARYING character strings but provides only the
= and <> operators to perform operations on CLOB
and NCLOB character strings. All of them will be familiar;
there are equivalent operators in other computer languages. If any of the
comparands are NULL, the result of the operation is
UNKNOWN. For example:
'hello' < 'zebra'
returns TRUE.
'hello' > {result is NULL}returns UNKNOWN.
SQL also provides three quantifiers -- ALL,
SOME, ANY -- which you can use along
with a comparison operator to compare a CHAR,
VARCHAR, NCHAR or
NCHAR VARYING value with the collection of values returned
by a <table subquery>. (You can't use quantifiers in CLOB
or NCLOB comparisons.) Place the quantifier after the
comparison operator, immediately before the <table subquery>. For example:
SELECT char_column FROM Table_1 WHERE char_column = ALL ( SELECT char_column FROM Table_2);
ALL returns TRUE either (a) if the
collection is an empty set (i.e.: if it contains zero rows) or (b) if the
comparison operator returns TRUE for every value in the
collection. ALL returns FALSE if the
comparison operator returns FALSE for at least one value in
the collection.
SOME and ANY are synonyms. They return
TRUE if the comparison operator returns
TRUE for at least one value in the collection. They return
FALSE either (a) if the collection is an empty set or
(b) if the comparison operator returns FALSE for every
value in the collection. (The search condition
"= ANY (collection)" is equivalent to
"IN (collection)".)
SQL allows you to compare character strings only if (a) they belong to the same Character set and (b) have, or can be coerced into having, the same Collation for the comparison -- that is, character strings are mutually comparable only if both their Character sets and their Collations are the same. When a character string is compared to another character string, the comparison is done one character at a time, from left to right. The result of the comparison is determined by the rules of the relevant Collation. Because of this, two strings with different lengths may, or may not, compare as equal.
Although a comparison involves two character strings, both of which have a default Collation, only one of the Collations can be used to govern the result of the comparison. So, when you (a) compare character strings that have different current default Collations and (b) don't explicitly specify a Collation for the comparison, your DBMS will use the coercibility attribute of each string to choose the relevant Collation.
A <Column name>, <Column reference> or other character string value that
includes a COLLATE clause has a coercibility attribute of
EXPLICIT: its Collation is the Collation named. If a
<Column name> or <Column reference> doesn't include a
COLLATE clause, it has a coercibility attribute of
IMPLICIT: its Collation is the Collation specified when the
Column was created (see the <data type> definitions). If any other character
string value (e.g.: a host variable or a <literal>) doesn't include a
COLLATE clause, it normally has a coercibility attribute of
COERCIBLE: its Collation is the default Collation for its
Character set. Sometimes a character string value is the result of an
expression that joins strings with different Collations (e.g.: a concatenation
operation that doesn't include a COLLATE clause). These
character strings have a coercibility attribute of
NO COLLATION.
After determining the coercibility attribute of each character string in a comparison, your DBMS will choose the relevant Collation using these rules:
- Strings with
COERCIBLEcoercibility may be compared to strings with any coercibility attribute exceptNO COLLATION. If both comparands haveCOERCIBLEcoercibility, the relevant Collation is the default Collation of their mutual Character set. If one comparand hasCOERCIBLEcoercibility and the other hasEXPLICITcoercibility, the relevant Collation is theEXPLICITCollation. If one comparand hasCOERCIBLEcoercibility and the other hasIMPLICITcoercibility, the relevant Collation is theIMPLICITCollation. - Strings with
EXPLICITcoercibility may be compared to strings with any coercibility attribute. If one comparand hasEXPLICITcoercibility and the other hasCOERCIBLE,IMPLICITorNO COLLATIONcoercibility, the relevant Collation is theEXPLICITCollation. If both comparands haveEXPLICITcoercibility, they must also have the same Collation. The relevant Collation is their mutualEXPLICITCollation. - Strings with
IMPLICITcoercibility may be compared to strings with any coercibility attribute exceptNO COLLATION.If one comparand hasIMPLICITcoercibility and the other hasCOERCIBLEcoercibility, the relevant Collation is theIMPLICITCollation. If one comparand hasIMPLICITcoercibility and the other hasEXPLICITcoercibility, the relevant Collation is theEXPLICITCollation. If both comparands haveIMPLICITcoercibility, they must also have the same Collation. The relevant Collation is their mutualIMPLICITCollation. - Strings with
NO COLLATIONcoercibility may only be compared to strings with a coercibility attribute ofEXPLICIT. The relevant Collation is theEXPLICITCollation.
When you compare character strings that have different lengths, the result also
depends on whether the relevant Collation has the PAD SPACE
attribute or the NO PAD attribute. If the relevant Collation
has the PAD SPACE attribute, your DBMS will extend the
shorter character string to the length of the larger string (by padding it on
the right with spaces) before comparing the strings. If the relevant Collation
has the NO PAD attribute, then -- all other things being
equal -- the longer string will evaluate as greater than the shorter string.
That is, with a NO PAD Collation, the result of these rules
is that a shorter comparand which is equal to the same-length substring of a
larger comparand will evaluate as less than the larger comparand -- even if the
remainder of the larger string consists only of spaces or control characters.
For example, a comparison of these two <literal>s:
'BOB' 'BOB '
would result in the first <literal> being evaluated as equal to the second with
a PAD SPACE Collation and as less than the second with a
NO PAD Collation (assuming that the Collations both use the
familiar Latin collating rules).
Here's another example of the difference between the
PAD SPACE and the NO PAD attributes for
Collations.
CREATE TABLE Table_1 ( char_column CHAR(5)); INSERT INTO Table_1 (char_column) VALUES ('A');
In this example, the string actually inserted is five characters long, i.e.:
'A '
Thus, with a PAD SPACE Collation, this predicate is
TRUE:
... WHERE char_column = 'A'
and with a NO PAD Collation, the same predicate is
FALSE.
To summarize, SQL doesn't allow character strings to be compared unless they
belong to the same Character set and have the same Collation for the
comparison. You may explicitly specify the relevant Character set or allow it
to default to an implicit Character set chosen by your DBMS. You may also
explicitly specify the relevant Collation by adding a
COLLATE clause to your expression; this will override the
expression's default collating sequence. If you omit the
COLLATE clause, your DBMS will choose the relevant Collation
for you -- see "Character Strings and Collations", later in this chapter.
If you want to restrict your code to Core SQL, don't use
CLOBs or NCLOBs in comparisons.
Other Operations
With SQL, you have a wide range of operations that you can perform on character strings, or on other values to get a character string result.
Concatenation
The required syntax for a character string concatenation is:
character concatenation ::= character_string_operand_1 || character_string_operand_2 [ COLLATE]
The concatenation operator operates on two operands, both of which must
evaluate to character strings belonging to the same Character set. It joins the
strings together in the order given and returns a character string with a
length equal to the sum of the lengths of its operands. If either of the
operands is NULL, the result of the operation is also
NULL. Here are two examples of character string
concatenations:
'hello' || ' bob' -- returns hello bob char_column || 'hello' -- returns CHAR_COLUMN's value followed by hello
[Obscure Rule] If both operands are fixed length character strings, the concatenation result is a fixed length character string with a length equal to the sum of the lengths of the operands -- this length may not exceed the maximum allowed for a fixed length character string.
[Obscure Rule] If either operand is a variable length character string and the
sum of their lengths is not greater than the maximum allowed length for a
variable length character string, the concatenation result is a variable length
character string with a length equal to the sum of the lengths of the operands.
If the sum of the operands' lengths is greater than the maximum allowed, but
the extra characters are all spaces, the concatenation result is a variable
length character string with a length equal to the maximum allowed length. If
the sum of the operands' lengths is greater than the maximum allowed, and the
extra characters are not all spaces, the concatenation will fail: your DBMS
will return the
SQLSTATE error 22001 "data exception-string data, right truncation".
[Obscure Rule] The result of a character string concatenation normally has a
coercibility attribute and Collation determined by Table 7-2 "Collating
Sequences and Coercibility Rules for Dyadic Operations", but you can use the
optional COLLATE clause to force EXPLICIT
coercibility with a specific Collation. The Collation named must be a Collation
defined for the relevant Character set. If you're using
COLLATE in an SQL-Schema statement, then the
<AuthorizationID> that owns the containing Schema must have the
USAGE Privilege on "<Collation name>". If you're using
COLLATE in any other
SQL statement, then your current <AuthorizationID> must have the
USAGE Privilege on "<Collation name>". For example:
'hello' || 'bob' COLLATE my.collation_1
specifies that the result of the concatenation should use a Collation named
my.collation_1.
If you want to restrict your code to Core SQL, don't use the concatenation
operator with CLOBs or NCLOBs and don't
use the COLLATE clause to force an
EXPLICIT Collation for any character string concatenation.
Scalar functions
SQL provides eleven scalar functions that return a character string: the <case expression>, the <cast specification>, the <char substring function>, the <char overlay function>, the <char trim function>, the <fold function>, the <character translation function>, the <form-of-use conversion function>, the <regular expression substring function>, the <specific type function> and the <niladic user function>. It also provides four scalar functions that operate on character strings, returning a number: the <char position expression>, the <bit length expression>, the <char length expression> and the <octet length expression>. We'll discuss all but the <specific type function>, the <niladic user function>, the <case expression> and the <cast specification> here. Look for the rest in other chapters; for now, just remember that they all evaluate to a character string and can therefore be used anywhere in SQL that a character string could be used.
<char substring function>
The required syntax for a <char substring function> is:
::= SUBSTRING (character_string_argument FROM start_argument [ FOR length_argument ] [ COLLATE ])
SUBSTRING operates on three arguments: the first must
evaluate to a character string, the other two must evaluate to exact numeric
integers. It extracts a substring from
character_string_argument and returns a variable length
character string with a maximum length that equals the fixed length or maximum
variable length (as applicable) of the character string argument. If any of the
arguments are NULL, SUBSTRING returns
NULL.
The "start_argument" is a number that marks the first
character you want to extract from
character_string_argument. If
SUBSTRING includes the (optional) FOR
clause, "length_argument" is the total number of characters
you want to extract. If you omit the FOR clause,
SUBSTRING will begin at "start_argument"
and extract all the rest of the characters in
character_string_argument. Here are some examples of
SUBSTRING:
SUBSTRING('epiphany' FROM 5)
-- returns hany
SUBSTRING('epiphany' FROM 5 FOR 3)
-- returns han
SUBSTRING(char_column FROM 1 FOR 4)
-- returns the first four characters of the value in CHAR_COLUMN
... WHERE SUBSTRING (char_column FROM 3 FOR 1) = 'A'
-- returns "true" if the third character of the value in CHAR_COLUMN is the letter AIf "length_argument" is negative, your DBMS will return
SQLSTATE error 22011 "data exception-substring error".
If "start_argument" is greater than the length of
character_string_argument, or if
(start_argument + length_argument) is
less than one, SUBSTRING returns a zero-length character
string. If start_argument is negative, or if
(start_argument + length_argument) is
greater than the length of character_string_argument, that's
okay -- the DBMS just ignores any characters before the start of
character_string_argument or after the end of
character_string_argument. Note that:
SUBSTRING('abc' FROM -2 FOR 4)is legal SQL syntax, but pointless because it won't return the "expected"
result. The SQL Standard requires SUBSTRING to return
'a' for this operation -- not 'ab'.
[Obscure Rule] The result of SUBSTRING belongs to the same
Character set that its string argument does. It normally has a coercibility
attribute and Collation determined by Table 7-1 "Collating Sequences and
Coercibility Rules for Monadic Operations", where
character_string_argument is the monadic operator, but
you can use the optional COLLATE clause to force
EXPLICIT
coercibility with a specific Collation. The Collation named must be a
Collation defined for the relevant Character set. If you're using
COLLATE in an SQL-Schema statement, then the
<AuthorizationID> that owns the containing Schema must have the
USAGE Privilege on "<Collation name>". If you're using
COLLATE in any other SQL statement, then your current
<AuthorizationID> must have the USAGE Privilege on
"<Collation name>". For example:
SUBSTRING(char_column FROM 1 FOR 4) COLLATE my.collation_1
specifies that the result of SUBSTRING should use a
Collation named my.collation_1.
[Obscure Rule] SUBSTRING can also operate on a bit string
and a BLOB. We've ignored these options for now -- look for
them in our chapters on bit strings and BLOBs.
If you want to restrict your code to Core SQL, don't use
SUBSTRING with NCLOBs and don't use the
COLLATE clause to force an EXPLICIT
Collation for any SUBSTRING operation.
<char overlay function>
The required syntax for a <char overlay function> is:
::= OVERLAY (character_string_argument_1 PLACING character_string_argument_2 FROM start_argument [ FOR length_argument ] [ COLLATE ])
OVERLAY operates on four arguments: the first two must
evaluate to character strings belonging to the same Character set, the other
two must evaluate to exact numeric integers. It extracts a substring from
"character_string_argument_1", replacing it with
"character_string_argument_2", and returns the resulting
character string. If any of the arguments are NULL,
OVERLAY returns NULL.
The "start_argument" is a number that marks the first
character you want to replace in
"character_string_argument_1". If
OVERLAY includes the (optional) FOR
clause, "length_argument" is the total number of characters
you want to replace. Thus, start_argument and
length_argument identify the portion of
"character_string_argument_1" you want to replace, while
"character_string_argument_2" is what you want to replace
with. If you omit the FOR clause, then
length_argument defaults to the length of
"character_string_argument_2". Here are some examples of
OVERLAY:
OVERLAY('epiphany' PLACING 'no' FROM 5)
-- returns epipnony
OVERLAY('epiphany' PLACING 'no' FROM 5 FOR 3)
-- returns epipnoy[Obscure Rule] The result of OVERLAY belongs to the same
Character set that its arguments do. It normally has a coercibility attribute
and Collation determined by Table 7-2 "Collating Sequences and Coercibility
Rules for Dyadic Operations", but you can use the optional
COLLATE clause to force EXPLICIT
coercibility with a specific Collation. The Collation named must be a Collation
defined for the relevant Character set. If you're using
COLLATE in an SQL-Schema statement, then the
<AuthorizationID> that owns the containing Schema must have the
USAGE Privilege on "<Collation name>". If you're using
COLLATE in any other SQL statement, then your current
<AuthorizationID> must have the USAGE Privilege on
"<Collation name>". For example:
OVERLAY('epiphany' PLACING 'no' FROM 5) COLLATE my.collation_1specifies that the result of OVERLAY should use a Collation
named my.collation_1.
[Obscure Rule] OVERLAY can also operate on a
BLOB. We've ignored this option for now -- look for it in
our chapter on BLOBs.
If you want to restrict your code to Core SQL, don't use
OVERLAY with character strings.
<char trim function>
The required syntax for a <char trim function> is:
::= TRIM ([ [ {LEADING | TRAILING | BOTH} ] [ character_string_argument_1 ] FROM ] character_string_argument_2 [ COLLATE ])
TRIM operates on two arguments, both of which must evaluate
to character strings that belong to the same Character set and have the same
Collation for the operation. It strips all leading, all trailing or all leading
and all trailing trim characters from
"character_string_argument_2" and returns the resulting
variable length character string. The result has a maximum length that equals
the fixed length or maximum variable length (as applicable) of
"character_string_argument_2". If any of the arguments are
NULL, TRIM returns
NULL.
The trim specification is either LEADING (i.e.: trim all
leading trim characters), TRAILING (i.e.: trim all trailing
trim characters) or BOTH (i.e.: trim all leading and all trailing trim
characters). If this clause is omitted, TRIM defaults to
BOTH. For example, these two TRIM functions are equivalent:
they both strip away all leading and all trailing letters A:
TRIM('A' FROM char_column)
TRIM(BOTH 'A' FROM char_column)The "character_string_argument_1" defines the trim
character: the character that should be stripped away by the
TRIM function. If
"character_string_argument_1" is omitted,
TRIM strips spaces away. For example, these two
TRIM functions are equivalent: they both strip away all
trailing spaces:
TRIM(TRAILING FROM char_column) TRIM(TRAILING ' ' FROM char_column)
These two TRIM functions are equivalent: they both strip
away all leading spaces:
TRIM(LEADING FROM char_column) TRIM(LEADING ' ' FROM char_column)
These two TRIM functions are equivalent: they both strip away all leading and all trailing spaces:
TRIM(char_column) TRIM(BOTH ' ' FROM char_column)
If the length of "character_string_argument_1" is not one
character, TRIM will fail: your DBMS will return the
SQLSTATE error 22027 "data exception-trim error".
[Obscure Rule] The result of TRIM belongs to the same
Character set that its arguments do. It normally has a coercibility attribute
and Collation determined by Table 7-1 "Collating Sequences and Coercibility
Rules for Monadic Operations" (where
"character_string_argument_2" is the monadic operand), but
you can use the optional COLLATE clause to force
EXPLICIT coercibility with a specific Collation. The
Collation named must be a Collation defined for the relevant Character set. If
you're using COLLATE in an SQL-Schema statement, then the
<AuthorizationID> that owns the containing Schema must have the
USAGE Privilege on "<Collation name>". If you're using
COLLATE in any other SQL statement, then your current
<AuthorizationID> must have the USAGE Privilege on
"<Collation name>". For example:
TRIM(BOTH ' ' FROM char_column) COLLATE my.collation_1
specifies that the result of TRIM should use a Collation
named my.collation_1.
[Obscure Rule] TRIM can also operate on a
BLOB. We've ignored this option for now -- look for it in
our chapter on BLOBs.
If you want to restrict your code to Core SQL, don't use
TRIM with NCLOBs and don't use the
COLLATE clause to force an EXPLICIT
Collation for any TRIM operation.
<fold function>
The required syntax for a <fold function> is:
::= { UPPER | LOWER } (character_string_argument [ COLLATE ])
UPPER and LOWER operate on an argument
that evaluates to a character string. UPPER converts every
lower case letter in character_string_argument to its
corresponding upper case equivalent, while LOWER converts
every upper case letter in character_string_argument to
its corresponding lower case equivalent. Any character that has no upper or
lower case equivalent (as applicable) remains unchanged. The conversion
reflects the normal rules for letters of the simple Latin 26-letter alphabet --
that is "abcdefghijklmnopqrstuvwxyz" converts to and from
"ABCDEFGHIJKLMNOPQRSTUVWXYZ" -- but it also reflects the normal rules for the
accented letters in character_string_argument's Character
set, e.g.: "ö" converts to and from "Ö".
Both UPPER and LOWER return a character
string with a length that equals the fixed length or maximum variable length
(as applicable) of character_string_argument. If the
character string argument is NULL, UPPER
and LOWER return NULL. Here are some
examples:
UPPER('E. E. Cummings')
-- returns E. E. CUMMINGS
LOWER('E. E. Cummings')
-- returns e. e. cummings
UPPER(LOWER('E. E. Cummings'))
-- returns E. E. CUMMINGSIn the last example, UPPER and LOWER do
not cancel each other out; the output string is not the same as the input
string. Such information loss occurs because fold functions don't affect
characters which are already in the right case.
TIP: A string which contains no letters will be the same after
UPPER and LOWER, so if you need to test
whether a character string contains letters do this:
SELECT character_column FROM Table_Of_Character_Strings WHERE UPPER(character_column) <> LOWER(character_column);
Such a query will find '1a 2' but will not find
'1$ 2'; thus you can use fold functions to filter out
strings which contain letters.
[Obscure Rule] The result of UPPER and
LOWER is a fixed length string if
character_string_argument is a fixed length string and a
variable length string if character_string_argument is a
variable length string. In either case, the result belongs to the same
Character set that the argument does. It normally has a coercibility attribute
and Collation determined by Table 7-1 "Collating Sequences and Coercibility
Rules for Monadic Operations", but you can use the optional
COLLATE clause to force EXPLICIT
coercibility with a specific Collation. The Collation named must be a Collation
defined for the relevant Character set. If you're using
COLLATE in an SQL-Schema statement, then the
<AuthorizationID> that owns the containing Schema must have the
USAGE Privilege on "<Collation name>". If you're using
COLLATE in any other SQL statement, then your current
<AuthorizationID> must have the USAGE Privilege on
"<Collation name>". For example:
UPPER('hello') COLLATE my.collation_1specifies that the result of UPPER should use a Collation
named my.collation_1 and
LOWER('HELLO') COLLATE my.collation_1specifies that the result of LOWER should use a Collation
named my.collation_1.
If you want to restrict your code to Core SQL, don't use the
COLLATE clause to force an EXPLICIT
Collation for any UPPER or LOWER
operation.
<character translation function>
The required syntax for a <character translation function> is:
::= TRANSLATE (character_string_argument USING [ COLLATE ])
TRANSLATE operates on an argument that evaluates to a
character string. It converts every character in
character_string_argument to its corresponding equivalent
in another Character set (by changing each character according to some
many-to-one or one-to-one mapping) and returns a variable length character
string that belongs to the target Character set defined for "<Translation
name>". If the character string argument is NULL,
TRANSLATE returns NULL. Here is an
example of TRANSLATE:
TRANSLATE('hello' USING my.translation_1)
-- returns a string, equivalent to hello, that belongs to the target Character set defined for a Translation called my.translation_1(Translations are defined using the CREATE TRANSLATION
statement.)
If you're using TRANSLATE in an SQL-Schema statement, then
the <AuthorizationID> that owns the containing Schema must have the
USAGE Privilege on "<Translation name>". If you're using
TRANSLATE in any other SQL statement, then your current
<AuthorizationID> must have the USAGE Privilege on
"<Translation name>".
[Obscure Rule] The result of TRANSLATE normally has a
coercibility attribute of IMPLICIT and uses the default
Collation of the Translation's target Character set, but you can use the
optional COLLATE clause to force EXPLICIT
coercibility with a specific Collation. The Collation named must be a Collation
defined for the target Character set. If you're using
COLLATE in an SQL-Schema statement, then the
<AuthorizationID> that owns the containing Schema must have the
USAGE Privilege on "<Collation name>". If you're using
COLLATE in any other SQL statement, then your current
<AuthorizationID> must have the USAGE Privilege on
"<Collation name>". For example:
TRANSLATE('hello' USING my.translation_1) COLLATE my.collation_1
specifies that the result of TRANSLATE should use a
Collation named my.collation_1.
If you want to restrict your code to Core SQL, don't use
TRANSLATE.
<form-of-use conversion function>
The required syntax for a <form-of-use conversion function> is:
::= CONVERT (character_string_argument USING [ COLLATE ])
CONVERT operates on an argument that evaluates to a
character string. It converts every character in
character_string_argument to its corresponding
equivalent using another Form-of-use and returns the resulting variable length
character string. If the character string argument is NULL,
CONVERT returns NULL. Here is an example
of CONVERT:
CONVERT('hello' USING INFORMATION_SCHEMA.new_form)
-- returns a string, equivalent to hello, whose characters are encoded using a Form-of-use called NEW_FORMA Form-of-use is a character repertoire's encoding scheme -- the one-to-one
mapping scheme between each character in the repertoire and a set of internal
codes (usually 8-bit values) that define how the repertoire's characters are
encoded as numbers. (These codes are also used to specify the order of the
characters within the repertoire.) Supported Forms-of-use are all predefined by
your DBMS and thus belong to INFORMATION_SCHEMA. SQL
provides no ability to define your own Forms-of-use.
CONVERT's purpose is to allow you to transfer character
strings between SQL- data and your host application, therefore you may only use
the function in certain places. When transferring SQL-data to the host,
CONVERT is legal only as part of a <select sublist>. For
example:
SELECT CONVERT(char_column USING INFORMATION_SCHEMA.new_form) FROM Table_1 WHERE char_column = 'hello';
When transferring host values into SQL-data, use CONVERT to
change any host parameter. For example:
INSERT INTO Table_1 (char_column) VALUES (CONVERT(:char_parameter USING INFORMATION_SCHEMA.new_form));
TIP: You might want to use CONVERT to change a character
string's encoding scheme from 8-bit to 16-bit.
[NON-PORTABLE] Whether you can use CONVERT or not is
non-standard because the SQL Standard requires implementors to define all
Forms-of-use supported -- but has no requirement that a DBMS must support any
Form-of-use at all. However, you can use TRANSLATE to
provide an equivalent operation.
[OCELOT Implementation] The OCELOT DBMS that comes with this book does
not provide support for any Form-of-use.
[Obscure Rule] The result of CONVERT belongs to a Character
set that consists of the same character repertoire that its argument's
Character set has -- but with a different Form-of-use encoding. It normally has
a coercibility attribute of IMPLICIT and uses the default
Collation of its Character set, but you can use the optional
COLLATE clause to force EXPLICIT
coercibility with a specific Collation. The Collation named must be a Collation
defined for the target Character set. If you're using
COLLATE in an SQL-Schema statement, then the
<AuthorizationID> that owns the containing Schema must have the
USAGE Privilege on "<Collation name>". If you're using
COLLATE in any other SQL statement, then your current
<AuthorizationID> must have the USAGE Privilege on
"<Collation name>". For example:
CONVERT('hello' USING INFORMATION_SCHEMA.new_form) COLLATE my.collation_1specifies that the result of CONVERT should use a Collation
named my.collation_1.
If you want to restrict your code to Core SQL, don't use
CONVERT.
<regular expression substring function>
The required syntax for a <regular expression substring function> is:
::= SUBSTRING (character_string_argument FROM pattern FOR escape_character [ COLLATE ])
SUBSTRING operates on three arguments, all of which
evaluate to character strings that belong to the same Character set. It
extracts a substring based on pattern from
character_string_argument and returns a variable length
character string with a maximum length that equals the fixed length or maximum
variable length (as applicable) of
character_string_argument. Both
pattern and escape_character must be
regular expressions (see our discussion of the SIMILAR
predicate) and escape_character must be exactly one
character long. If any of the arguments are NULL,
SUBSTRING returns NULL.
The pattern shows the substring you want to extract from
character_string_argument. It's actually a triple pattern:
it must consist of three regular expressions, the middle of which is a tagged
regular expression (a regular expression that is delimited by
escape_character immediately followed by a double quote
sign). For example, if your escape character is ?, then
pattern must contain ?" exactly two times, as in:
'The rain?"%?"Spain'
The three parts of pattern --
start pattern ?" middle pattern ?" end pattern -- must
match character_string_argument's start, middle and end;
that is, the expression:
'character_string_argument' SIMILAR TO 'pattern'
must be TRUE if the "escape <double quote>" markers are
stripped from the pattern. If that's not the case -- that is, if the start or
end patterns aren't in the string -- SUBSTRING returns
NULL. Otherwise the result of SUBSTRING
is character_string_argument's middle string which
corresponds to the middle pattern. Thus, for this SUBSTRING
function:
SUBSTRING('The rain in Spain' FROM 'The rain?"%?"Spain' FOR '?')the result is ' in ' -- that is, the return from
SUBSTRING is the string of characters which appears between
the start pattern ('The rain') and the end pattern
('Spain').
[Obscure Rule] The result of SUBSTRING belongs to the same
Character set that its string arguments do. It normally has a coercibility
attribute and Collation determined by Table 7-1 "Collating Sequences and
Coercibility Rules for Monadic Operations", where
"character_string_argument_1" is the monadic operator, but
you can use the optional COLLATE clause to force
EXPLICIT coercibility with a specific Collation. The
Collation named must be a Collation defined for the relevant Character set. If
you're using COLLATE in an SQL-Schema statement, then the
<AuthorizationID> that owns the containing Schema must have the
USAGE Privilege on "<Collation name>". If you're using
COLLATE in any other SQL statement, then your current
<AuthorizationID> must have the USAGE Privilege on
"<Collation name>". For example:
SUBSTRING(char_column FROM 'hi/"[b-o]/"by' FOR '/') COLLATE my.collation_1
specifies that the result of SUBSTRING should use a
Collation named my.collation_1.
If you want to restrict your code to Core SQL, don't use the <regular
expression substring function> form of SUBSTRING.
<char position expression>
The required syntax for a <char position expression> is:
::= POSITION (character_string_argument_1 IN character_string_argument_2)
POSITION operates on two arguments, both of which must
evaluate to character strings that belong to the same Character set. It
determines the first character position (if any) at which
"character_string_argument_1" is found in
"character_string_argument_2" and returns this as an exact
numeric integer. If either of the arguments are NULL,
POSITION returns NULL. If
"character_string_argument_1" is a zero-length character
string, POSITION returns one. If
"character_string_argument_1" is not found in
"character_string_argument_2", POSITION
returns zero. Here are some examples of POSITION:
POSITION('is' IN 'mistake') -- returns 2
POSITION('yy' IN 'mistake') -- returns 0
POSITION('' IN 'mistake') -- returns 1
[NON-PORTABLE] The precision of POSITION's result is
non-standard because the SQL Standard requires implementors to define the
result's precision.
[OCELOT Implementation] The OCELOT DBMS that comes with this book gives
the result of POSITION an INTEGER
<data type>.
[Obscure Rule] POSITION can also operate on a bit string and
a BLOB. We've ignored these options for now -- look for them
in our chapters on bit strings and BLOBs.
If you want to restrict your code to Core SQL, don't use
POSITION with character strings.
<bit length expression>
The required syntax for a <bit length expression> is:
::= BIT_LENGTH (character_string_argument)
BIT_LENGTH operates on an argument that evaluates to a
character string. It determines the length of the argument, in bits, and
returns this as an exact numeric integer, e.g.:
BIT_LENGTH('hello') returns 40 (assuming that an 8-bit
Character set is in use). If the argument is NULL,
BIT_LENGTH returns NULL.
TIP: The length of a character string argument depends on the Character set it
belongs to. Most Character sets are 8-bit sets, so
BIT_LENGTH would return 8 for each character in your
argument. But if you're using a DBCS, remember that
BIT_LENGTH will allot 16 bits for each character.
TIP: BIT_LENGTH will return the total length of your
character string argument -- including any trailing (or leading) spaces. If
you're looking for the length of the significant value only, use
TRIM with BIT_LENGTH. For example:
BIT_LENGTH('hello ') -- returns 64; the length of hello followed by 3 spaces BIT_LENGTH(TRIM('hello ')) -- returns 40; the length of hello
TIP: BIT_LENGTH returns the number of bits taken up by each
Latin letter in your character string argument. For example, if your argument
is 'Chorizo' and you're using a Spanish Collation, the second character is 'h'
-- it is not 'o' despite the digraph, because BIT_LENGTH
doesn't care about the Collation. Thus:
BIT_LENGTH('Chorizo') COLLATE my.spanish_collationreturns 56, not 48.
[NON-PORTABLE] The precision of BIT_LENGTH's result is
non-standard because the SQL Standard requires implementors to define the
result's precision.
[OCELOT Implementation] The OCELOT DBMS that comes with this book gives
the result of BIT_LENGTH an INTEGER
<data type>.
[Obscure Rule] BIT_LENGTH can also operate on a bit string
and a BLOB. We've ignored these options for now -- look for
them in our chapters on bit strings and BLOBs.
If you want to restrict your code to Core SQL, don't use
BIT_LENGTH with NCLOBs.
<char length expression>
The required syntax for a <char length expression> is:
::= {CHAR_LENGTH | CHARACTER_LENGTH} (character_string_argument)
CHAR_LENGTH (or CHARACTER_LENGTH)
operates on an argument that evaluates to a character string. It determines the
length of the argument, in characters, and returns this as an exact numeric
integer, e.g.: CHAR_LENGTH('hello') returns 5. If the
argument is NULL, CHAR_LENGTH returns
NULL.
TIP: CHAR_LENGTH will return the total length of your
character string argument -- including any trailing (or leading) spaces. If
you're looking for the length of the significant value only, use
TRIM with CHAR_LENGTH. For example:
CHAR_LENGTH('hello ') -- returns 8; the length of hello followed by 3 spaces CHAR_LENGTH(TRIM('hello ')) -- returns 5; the length of hello
TIP: CHAR_LENGTH returns the number of Latin letters in your
character string argument. For example, if your argument is 'Chorizo' and
you're using a Spanish Collation, the second character is 'h' -- it is not 'o'
despite the digraph, because CHAR_LENGTH doesn't care about
the Collation. Thus:
CHAR_LENGTH('Chorizo') COLLATE my.spanish_collation
returns 7, not 6.
[NON-PORTABLE] The precision of CHAR_LENGTH's result is
non-standard because the SQL Standard requires implementors to define the
result's precision.
[OCELOT Implementation] The OCELOT DBMS that comes with this book gives
the result of CHAR_LENGTH an
INTEGER <data type>.
[Obscure Rule] CHAR_LENGTH can also operate on a bit string
and a BLOB. We've ignored these options for now -- look for
them in our chapters on bit strings and BLOBs.
If you want to restrict your code to Core SQL, don't use
CHAR_LENGTH with NCLOBs.
<octet length expression>
The required syntax for a <octet length expression> is:
::= OCTET_LENGTH (character_string_argument)
OCTET_LENGTH operates on an argument that evaluates to a
character string. It determines the length of the argument, in octets, and
returns this as an exact numeric integer, e.g.:
OCTET_LENGTH('hello') returns 5 (assuming that an 8-bit
Character set is in use; the octet length of a string is the bit length
divided by 8, ignoring any remainder.) If the argument is
NULL, OCTET_LENGTH returns
NULL.
TIP: The length of a character string argument depends on the Character set it
belongs to. Most Character sets are 8-bit sets, so
OCTET_LENGTH would return 1 for each character in your
argument. But if you're using a DBCS, remember that
OCTET_LENGTH will allot 2 octets for each character.
TIP: OCTET_LENGTH will return the total length of your
character string argument -- including any trailing (or leading) spaces. If
you're looking for the length of the significant value only, use
TRIM with OCTET_LENGTH. For example:
OCTET_LENGTH('hello ') -- returns 8; the length of hello followed by 3 spaces OCTET_LENGTH(TRIM('hello ')) -- returns 5; the length of hello
None
TIP: OCTET_LENGTH returns the number of octets taken up by
each Latin letter in your character string argument. For example, if your
argument is 'Chorizo' and you're using a Spanish Collation, the second
character is 'h' -- it is not 'o' despite the digraph, because
OCTET_LENGTH doesn't care about the Collation. Thus:
OCTET_LENGTH('Chorizo') COLLATE my.spanish_collationreturns 7, not 6.
[NON-PORTABLE] The precision of OCTET_LENGTH's result is
non-standard because the SQL Standard requires implementors to define the
result's precision.
[OCELOT Implementation] The OCELOT DBMS that comes with this book gives
the result of OCTET_LENGTH an
INTEGER <data type>.
[Obscure Rule] OCTET_LENGTH can also operate on a bit string
and a BLOB. We've ignored these options for now -- look for
them in our chapters on bit strings and BLOBs.
If you want to restrict your code to Core SQL, don't use
OCTET_LENGTH with NCLOBs.
Set functions
SQL provides five set functions that operate on CHAR,
VARCHAR, NCHAR and
NCHAR VARYING. SQL also provides three set functions that
operate on CLOB and NCLOB character
strings: COUNT and GROUPING. Since none
of these operate exclusively with character string arguments, we won't discuss
them here; look for them in our chapter on set functions.
Predicates
In addition to the comparison operators, SQL provides ten other predicates that
operate on CHAR, VARCHAR,
NCHAR and NCHAR VARYING character
strings: the <like predicate>, the <similar predicate>, the <between
predicate>, the <in predicate>, the <null predicate>, the <exists predicate>,
the <unique predicate>, the <match predicate>, the <quantified predicate> and
the <distinct predicate>. SQL also provides five predicates that operate on
CLOB and NCLOB character strings: the
<like predicate>, the <similar predicate>, the <null predicate>, the <exists
predicate> and the <quantified predicate>. Each will return a boolean value:
either TRUE, FALSE or
UNKNOWN. Only the <like predicate> and the <similar
predicate> operate strictly on strings; we'll discuss them here. Look for the
rest in our chapter on search conditions.
<like predicate>
The required syntax for a <like predicate> is:
::= character_string_argument [ NOT ] LIKE pattern [ ESCAPE escape_character ]
LIKE is a predicate that operates on three operands that
evaluate to character strings belonging to the same Character set: it searches
for values that contain a given pattern. NOT LIKE is the converse and lets you
search for values that don't contain a given pattern. The
character_string_argument is the character string you're
searching within, the pattern is the pattern you're
searching for and the optional escape_character is a
character that tells your DBMS to treat a metacharacter in the pattern as
itself (rather than as a metacharacter). If
character_string_argument contains the pattern, LIKE returns
TRUE and NOT LIKE returns
FALSE. If character_string_argument does
not contain the pattern, LIKE returns
FALSE and NOT LIKE returns
TRUE. If any of the operands are NULL,
LIKE and NOT LIKE return
UNKNOWN.
The pattern you specify in pattern may contain any
combination of regular characters and metacharacters. Any single character in
pattern that is not a metacharacter or the
escape_character represents itself in the pattern. For
example, this predicate:
char_column LIKE 'A'
is TRUE for 'A'.
Special significance is attached to metacharacters in a pattern. The
metacharacters are: _ and %. If the predicate doesn't include an
ESCAPE clause, they are interpreted as follows:
- _ An underline character means "any single character". For example, this predicate:
char_column LIKE 'A_C'
is TRUE for 'A C', 'AAC', 'ABC', 'A#C' and so on.
- % A percent sign means "any string of zero or more characters". For example, this predicate:
char_column LIKE 'A%C'
is TRUE for 'AC', 'A C', 'AxC', 'AxxxxxxxxC' and so on.
If you want to search for a character that would normally be interpreted as a
metacharacter, you must use the optional ESCAPE clause. To
do so:
- Pick a character that you won't need in the pattern and designate it as your escape character.
- In the pattern, use your escape character followed immediately by the metacharacter, to designate the metacharacter as a character you want to search for. For example:
... LIKE 'B$%'
(without an ESCAPE clause) means "like the letter B followed
by a dollar sign followed by anything at all", while:
... LIKE 'B$?%' ESCAPE '?'
means "like the letter B followed by a dollar sign followed by a percent sign" (since % is preceded by the escape character it has no special significance in this pattern). Your escape character can also be followed by itself in the pattern, if you want to search for the escape character. For example:
... LIKE 'B$??' ESCAPE '?'
means "like the letter B followed by a dollar sign followed by a question mark"
(since ? is preceded by the escape character it has no special significance in
this pattern). Your best choice for an escape character is an SQL special
character which isn't a [NOT] LIKE metacharacter. We suggest
the question mark.
The escape_character must be exactly one character long. If
it isn't, [NOT] LIKE will fail: your DBMS will return the
SQLSTATE error 22019 "data exception-invalid escape character".
If escape_character is _ or % and that metacharacter is used
once only in your pattern, or if escape_character is used
without being followed by a metacharacter (or by itself) in your pattern,
[NOT] LIKE will fail: your DBMS will return the
SQLSTATE error 22025 "data exception-invalid escape sequence".
For example, this predicate will result in SQLSTATE 22025:
LIKE 'B%B' ESCAPE '%'
For the purposes of [NOT] LIKE, a substring of
character_string_argument is a sequence of zero or more
contiguous characters, where each character belongs to exactly one such
substring (this includes any trailing spaces in the argument). A substring
specifier of pattern is either (a) _: an arbitrary
character specifier, (b) %: an arbitrary string specifier, (c)
escape_character followed by _ or % or
escape_character or (d) any other single character. If
character_string_argument and pattern
are both variable length character strings with a length of zero,
LIKE returns TRUE.
LIKE also returns TRUE if
pattern is found in
character_string_argument. That is, LIKE
returns TRUE only if the number of substrings in
character_string_argument equals the number of substring
specifiers in pattern and the following conditions are also
met:
- If the pattern's n-th substring specifier is _, then the argument's n-th substring must be any single character.
- If the pattern's n-th substring specifier is %, then the argument's n-th substring must be any sequence of zero or more characters.
- If the pattern's n-th substring specifier is any other character,
then the argument's n-th substring must be equal, in length and character
representation, to that substring specifier -- without trailing spaces being
added to the argument. Note that this means if the pattern is found in the
argument, but the lengths don't match,
LIKEreturnsFALSE. For example, these four predicates all returnTRUE:
'bob' LIKE 'b_b' 'bob' LIKE 'b%b' 'bob ' LIKE 'b_b ' 'bob ' LIKE 'b%b '
But the following two predicates return FALSE because of the
trailing spaces in character_string_argument that aren't
found in pattern:
'bob ' LIKE 'b_b' 'bob ' LIKE 'b%b'
And these two predicates return FALSE because of the
trailing spaces in pattern that aren't found in
character_string_argument:
'bob' LIKE 'b_b ' 'bob' LIKE 'b%b '
Note that this is only a problem with fixed length character string arguments. Here's a more complete example:
CREATE TABLE Test_Stuffs ( column_1 CHAR(4)); INSERT INTO Test_Stuffs (column_1) VALUES ('ABC'); -- actually inserts 'ABC ' (four characters) SELECT * FROM Test_Stuffs WHERE column_1 = 'ABC'; -- works because comparisons will pad the shorter argument (assuming the -- relevant Collation has the <<fixed>>PAD SPACE<</fixed>> attribute) so the -- test is WHERE 'ABC ' = 'ABC ' SELECT * FROM Test_Stuffs WHERE column_1 LIKE '%C'; -- fails because LIKE never pads the shorter argument, no matter what -- Collation is used, so the test is "find a value of any length that ends -- in C" -- and 'ABC ' ends in a space, not in C
To get around this, use TRIM to get rid of trailing spaces
in your character_string_argument, like this:
SELECT * FROM Test_Stuffs WHERE TRIM (TRAILING FROM column_1) LIKE '%C';
[Obscure Rule] The result of [NOT] LIKE belongs to the same
Character set that its operands do. If you omit the ESCAPE
clause, then it has a Collation determined by Table 7-3 "Collating Sequences
used for Comparisons", where character_string_argument is
comparand 1 and pattern is comparand 2. If you include the
ESCAPE clause, then it also has a Collation determined by
Table 7-3 "Collating Sequences used for Comparisons", where:
- comparand 1 is determined by Table 7-2 "Collating Sequences and
Coercibility Rules for Dyadic Operations", where
character_string_argumentis operand 1 andpatternis operand 2. - comparand 2 is
escape_character.
[Obscure Rule] [NOT] LIKE can also operate on
BLOBs. We've ignored this option for now -- look for it in
our chapter on BLOBs.
If you want to restrict your code to Core SQL, don't use the
[NOT] LIKE predicate with CLOBs or
NCLOBs and, when you do use [NOT LIKE], make sure your
character_string_argument is a <Column reference> and that
your pattern and your escape_character
are both <value specification>s.
<similar predicate>
The required syntax for a <similar predicate> is:
::= character_string_argument [ NOT ] SIMILAR TO pattern [ ESCAPE escape_character ]
SIMILAR is a predicate that operates on three operands that
evaluate to character strings belonging to the same Character set. It works
much like Unix's grep: it searches for values that contain a given pattern.
NOT SIMILAR is the converse and lets you search for values
that don't contain a given pattern. The
character_string_argument is the character string you're
searching within, the pattern is the pattern you're
searching for and the optional escape_character is a
character that tells your DBMS to treat a metacharacter in the pattern as
itself (rather than as a metacharacter). If
character_string_argument contains the pattern,
SIMILAR returns TRUE and NOT SIMILAR
returns FALSE. If
character_string_argument does not contain the pattern,
SIMILAR returns FALSE and
NOT SIMILAR returns TRUE. If any of the
operands are NULL, SIMILAR and
NOT SIMILAR return UNKNOWN.
The pattern you specify in pattern must be a regular
expression: a sequence of ordinary characters combined with some special
characters (or metacharacters). It may contain character ranges, repetitions
and combinations. Any single character in pattern that is
not a metacharacter or the escape_character represents
itself in the pattern. For example, this predicate:
char_column SIMILAR TO 'A'
is TRUE for 'A'.
Special significance is attached to metacharacters in a pattern. The
metacharacters are: _ and % and * and + and | and ( and ) and [ and ] and ^
and - and :. If the predicate doesn't include an ESCAPE
clause, they are interpreted as follows:
- _ An underscore character means "any single character". For example, this predicate:
char_column SIMILAR TO 'A_C'
is TRUE for 'A C', 'AAC', 'ABC', 'A#C' and so on.
- % A percent sign means "any string of zero or more characters". For example, this predicate:
char_column SIMILAR TO 'A%C'
is TRUE for 'AC', 'A C', 'AxC', 'AxxxxxxxxC' and so on.
- * An asterisk means "preceding repeats indefinitely" (from zero to infinity times). For example, this predicate:
char_column SIMILAR TO 'A*'
is TRUE for '', 'A', 'AA', 'AAA', 'AAAA' and so on.
- + A plus sign means "preceding repeats indefinitely" (from one to infinity times). For example, this predicate:
char_column SIMILAR TO 'A+'
is TRUE for 'A', 'AA', 'AAA', 'AAAA' and so on.
- [ ] Brackets are used for character enumeration in the pattern. There are two ways to enumerate: as a simple list or with a minus sign, with the result that a match is made with any one of the characters inside the brackets. For example, this predicate:
char_column SIMILAR TO '[A]'
is TRUE for 'A'. This
predicate:
char_column SIMILAR TO '[AQZ]'
is TRUE for 'A' or 'Q' or 'Z'. This
predicate:
char_column SIMILAR TO '[A-E]'
is TRUE for 'A' or 'B' or 'C' or 'D' or 'E'. And this
predicate:
char_column SIMILAR TO '[A-EQ-S]'
is TRUE for 'A' or 'B' or 'C' or 'D' or 'E' or 'Q' or 'R'
or 'S'.
- [^ ] A circumflex inside enumerating brackets means negative enumeration. The options are the same as for ordinary enumeration, with a negated meaning. For example, this predicate:
char_column SIMILAR TO '[^A-C]'
is TRUE for anything not equal to 'A' or to 'B' or to 'C'.
This predicate:
char_column SIMILAR TO '[^AQZ]'
is TRUE for anything not equal to 'A' or 'Q' or 'Z'. And
this predicate:
'ABCDE' SIMILAR TO '[^C-F]'
is FALSE, since the last character in the character string
argument must not be 'C' or 'D' or 'E' or 'F'.
- [: :] Brackets containing colons surrounding one of:
ALPHA,UPPER,LOWER,DIGITorALNUMare used for set enumeration in the pattern. For example, this predicate:
char_column SIMILAR TO '[:ALPHA:]'
is TRUE for values of CHAR_COLUMN that
are equal to any simple Latin letter, i.e.: to any of
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz". This
predicate:
char_column SIMILAR TO '[:UPPER:']
is TRUE for values of CHAR_COLUMN that
are equal to any simple Latin upper case letter, i.e.: to any of
"ABCDEFGHIJKLMNOPQRSTUVWXYZ". This predicate:
char_column SIMILAR TO '[:LOWER:]'
is TRUE for values of CHAR_COLUMN that
are equal to any simple Latin lower case letter, i.e.: to any of
"abcdefghijklmnopqrstuvwxyz". This predicate:
char_column SIMILAR TO '[:DIGIT:]'
is TRUE for values of CHAR_COLUMN that
are equal to any digit, i.e.: to any of "0123456789". And this
predicate:
char_column SIMILAR TO '[:ALNUM:]'
is TRUE for values of CHAR_COLUMN that
are equal to any simple Latin letter or to any digit, i.e.: to any of
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789".
- | The vertical bar means "the logical OR of the first and second expressions". For example, this predicate:
char_column SIMILAR TO '[A-C']|[:DIGIT:]'
is TRUE for any of "ABC0123456789". The | operator has a
lower priority than * and + have.
- || The concatenation operator means "concatenate one element from first expression with one element from second expression". For example, this predicate:
char_column SIMILAR TO '[A-C]||[:DIGIT:]'
is TRUE for: 'A0',
'A1', 'A2', 'A3',
'A4', 'A5', 'A6',
'A7', 'A8', 'A9',
'B0', 'B1', 'B2',
'B3', 'B4', 'B5',
'B6', 'B7', 'B8',
'B9', 'C0', 'C1',
'C2', 'C3', 'C4',
'C5', 'C6', 'C7',
'C8' and 'C9'.
- ( ) Parentheses in a pattern force the order of evaluation, in the usual way. For example, this predicate:
char_column SIMILAR TO '[:UPPER:]|([:DIGIT:][:DIGIT:])'
is TRUE for any single upper case letter, or for any two
digits.
- If
patternis not a valid pattern,[NOT] SIMILARwill fail: your DBMS will return theSQLSTATE error 2201B "data exception-invalid regular expression". Here are two examples of invalid patterns:
'^[:UPER:]' '[:abc:]'
If you want to search for a character that would normally be interpreted as a
metacharacter, you must use the optional ESCAPE clause. To do so:
- Pick a character that you won't need in the pattern and designate it as your escape character.
- In the pattern, use your escape character followed immediately by the metacharacter, to designate it as a character you want to search for. For example:
... SIMILAR TO 'B$%'
(without an ESCAPE clause) means "similar to the letter B
followed by a dollar sign followed by anything at all",
while:
... SIMILAR TO 'B$?%' ESCAPE '?'
means "similar to the letter B followed by a dollar sign followed by a percent sign" (since % is preceded by the escape character it has no special significance in this pattern). Your escape character can also be followed by itself in the pattern, if you want to search for the escape character. For example:
... SIMILAR TO 'B$??' ESCAPE '?'
means "similar to the letter B followed by a dollar sign followed by a
question mark" (since ? is preceded by the escape character it has no special
significance in this pattern). Your best choice for an escape character is an
SQL special character which isn't a [NOT] SIMILAR
metacharacter. We suggest the question mark.
The escape_character must be exactly one character long. If
it isn't, [NOT] SIMILAR will fail: your DBMS will return the
SQLSTATE error 22019 "data exception-invalid escape character".
If escape_character is [ or ] or ( or ) or | or ^ or - or +
or * or _ or % and that metacharacter is used once only in your pattern, or if
escape_character is used without being followed by a
metacharacter (or itself) in your pattern, [NOT] SIMILAR
will fail: your DBMS will return the
SQLSTATE error 2200C "data exception-invalid use of escape character".
For example, this predicate will result in SQLSTATE 2200C:
SIMILAR TO 'B?B' ESCAPE '?'
If escape_character is a colon and your pattern contains
that metacharacter surrounding one of: ALPHA,
UPPER, LOWER, DIGIT or
ALNUM, [NOT] SIMILAR will fail: your DBMS
will return the
SQLSTATE error 2200B "data exception-escape character conflict".
[Obscure Rule] The result of [NOT] SIMILAR belongs to the
same Character set that its operands do. If you omit the
ESCAPE clause, then it has a Collation determined by Table
7-3 "Collating Sequences used for Comparisons", where
character_string_argument is comparand 1 and
pattern is comparand 2. If you include the
ESCAPE clause, then it also has a Collation determined by
Table 7-3 "Collating Sequences used for Comparisons", where:
- comparand 1 is determined by Table 7-2 "Collating Sequences and
Coercibility Rules for Dyadic Operations", where
character_string_argumentis operand 1 andpatternis operand 2. - comparand 2 is
escape_character.
If you want to restrict your code to Core SQL, don't use the
[NOT] SIMILAR predicate.
Common checks
Although [NOT] SIMILAR is not terribly useful in
WHERE clauses (it's too inefficient) it is great for
CHECK clauses. Here are some real-world examples of strings
which have rigid format specifications. [NOT] SIMILAR is
appropriate for making sure the strings meet the specifications.
POSTAL CODES These strings must be "letter digit letter space digit letter
digit" -- and the letters must be upper case simple Latin letters, e.g.:
'T5E 1G7', 'V1K 4K0'. To make sure your
data fits these requirements, use a simple Domain Constraint:
ALTER DOMAIN postal_code ADD CONSTRAINT postal_code_specs CHECK (VALUE SIMILAR TO '[:UPPER:][DIGIT:][:UPPER] [:DIGIT]:[UPPER:][:DIGIT:]');
PERIODIC-TABLE SYMBOLS These strings are either a single upper case simple Latin letter capital letter (e.g.: 'H', 'O') or one upper case and one lower case letter (E.G.: 'Al', 'Fe'). To make sure your data fits these requirements, use another simple Domain Constraint:
CREATE DOMAIN periodic_table_element CHAR(2) CHECK (VALUE SIMILAR TO '[A-Z]|([A-Z][a-z])');
NORTH AMERICAN TELEPHONE NUMBERS These strings must be "digit digit digit minus-sign digit digit digit digit"; optionally preceded by "left-parenthesis digit digit digit right-parenthesis" (e.g.: '498-1234' or '(604)498-1234'). This is a hard one: the string includes both an optional format and a special character that needs "escaping". To make sure your data fits these requirements, use a Table Constraint:
CREATE TABLE Table_1 (
phone_number CHAR(13),
CHECK (phone_number SIMILAR TO
'([:DIGIT:][:DIGIT:][:DIGIT:]?-[:DIGIT:][:DIGIT:][:DIGIT:][:DIGIT:])
|
(?([:DIGIT:][:DIGIT:][:DIGIT:]?)[:DIGIT:][:DIGIT:][:DIGIT:]?-[:DIGIT:][:DIGIT:][:DIGIT:][:DIGIT:])'
ESCAPE '?');(This example is shown on multiple lines for clarity; in reality the string may
not contain carriage returns, nor would the pattern.)
[NOT] LIKE or [NOT] SIMILAR?
[NOT] LIKE and [NOT] SIMILAR are both
pattern-matching predicates. You should continue to use
[NOT] LIKE if the pattern contains only _ and % wildcards --
although [NOT] SIMILAR can use these wildcards too, there is
no advantage in using an SQL3 expression when an SQL-92 expression will do. For
more complex patterns, your choice is between [NOT] SIMILAR
and nothing. When you make the switchover, remember that there are subtle
differences between [NOT] SIMILAR and
[NOT] LIKE, in the way that collating sequences are handled
(which affects which characters are regarded as "equal" and whether there are
pad spaces at the end of a string).