The SQL Standard has numerous rules for such basic issues as what makes a legal name and how to put together SQL syntax. The starting point for these rules is knowing what the basic scalar language elements are. The SQL basic scalar language elements are defined in the set of <SQL language character>s.
<SQL language character>
According to the SQL Standard, the syntactic element <SQL language character> defines "the terminal symbols of the SQL language and the elements of strings". In other words, you'll use <SQL language character>s to write SQL syntax or <token>s. <SQL language character>s are case insensitive; that is, uppercase and lowercase simple Latin letters are interchangeable so that, to an SQL parser, these three words are exactly alike:
SELECT select Select
The set of <SQL language character>s contains:
- The uppercase simple Latin letters A to Z.
- The lowercase simple Latin letters a to z.
- The digits 0 to 9.
- The set of <SQL special character>s.
<SQL special character>
The set of <SQL special character>s is part of the set of <SQL language character>s and contains:
| The space character |
( | The left parenthesis |
) | The right parenthesis |
" | The double quote mark |
' | The single quote mark |
% | The percent sign |
& | The ampersand |
* | The asterisk or multiplication sign |
/ | The solidus or division sign |
+ | The plus sign |
- | The minus sign or dash |
, | The comma |
. | The period |
: | The colon |
; | The semicolon |
< | The less than operator |
> | The greater than operator |
? | The question mark |
[ | The left bracket |
] | The right bracket |
_ | The underline character |
| | The vertical bar |
= | The equals operator |
{ | The left brace |
} | The right brace |
^ | The circumflex |
<token>
A <token> is either a <literal>, a <keyword>, an <identifier> or an <SQL
special character> or symbol -- that is, a <token> is a group of characters
that is recognized as a single unit by an SQL parser. For example, there are a
total of 7 <token>s (SELECT, a, +, 5, FROM,
t, and ;) in the following SQL statement.
SELECT a+5 FROM t;
In SQL, <token>s are grouped into two types: <nondelimiter token>s and <delimiter token>s. The difference between them lies in the fact that, while any <token> may be followed by a <separator>, a <nondelimiter token> must be followed either by a <separator> or a <delimiter token>.
A <nondelimiter token> is an <unsigned numeric literal>, a <national character string literal>, a <bit string literal>, a <hex string literal>, a <keyword>, or a <regular identifier>. A <delimiter token> is a <character string literal>, a <date literal>, a <time literal>, a <timestamp literal>, an <interval literal>, a <delimited identifier>, an <SQL special character>, or one of these symbols:
<> | The not equals operator |
>= | The greater than or equals operator |
<= | The less than or equals operator |
|| | The concatenation operator |
??( | The left trigraph |
??) | The right trigraph |
-> | The right arrow |
=> | The keyword parameter tag |
For example, the <keyword> <token> SELECT may be followed
either by a <separator> (usually a space) or by an <SQL special
character>. Thus, both of the following are examples of legal SQL syntax:
SELECT column_1
is legal syntax because a space separates the <token> SELECT from the
<token> column_1
SELECT*
is legal syntax because, although no space separates the <token> SELECT
from the <token> * , the asterisk is identified as a separate <token>
because it is a <SQL special character>.
A <token> may not include any <separator>s unless it is a <character string literal>, a <bit string literal>, a <hex string literal>, a <timestamp literal>, an <interval literal>, or a <delimited identifier>.
<separator>
[Obscure Rule] applies to this entire section.
Your SQL parser must know where one <token> ends and another begins. To do so, it recognizes white space, a newline character, a simple comment, and a bracketed comment as <separator>s.
White space is usually just one or more spaces, but it can also consist of any consecutive sequence of these Unicode characters:
| U+0009 | Horizontal Tab |
| U+000A | Line Feed |
| U+000B | Vertical Tabulation |
| U+000C | Form Feed |
| U+000D | Carriage Return |
| U+0020 | Space |
| U+00A0 | No-Break Space |
| U+2000 | En Quad |
| U+2001 | Em Quad |
| U+2002 | En Space |
| U+2003 | Em Space |
| U+2004 | Three-Per-Em Space |
| U+2005 | Four-Per-Em Space |
| U+2006 | Six-Per-Em Space |
| U+2007 | Figure Space |
| U+2008 | Punctuation Space |
| U+2009 | Thin Space |
| U+200A | Hair Space |
| U+200B | Zero Width Space |
| U+200C | Zero Width Non-Joiner |
| U+200D | Zero Width Joiner |
| U+200E | Left-To-Right Mark |
| U+200F | Right-To-Left Mark |
| U+3000 | Ideographic Space |
| U+2028 | Line Separator |
| U+2029 | Paragraph Separator |
| U+FEFF | Zero Width No-Break Space |
[NON-PORTABLE] A newline character marks the end of a line. It is non-standard because the SQL Standard requires implementors to define which white space character(s) will be recognized as end- of-line indicators by their parsers. [OCELOT Implementation] The OCELOT DBMS that comes with this book recognizes carriage returns and line feeds as newline characters.
A simple comment begins with two or more consecutive dashes, contains any number of characters (including spaces and more dashes), and ends with a newline character. For example, these two SQL statements are both followed by a simple comment.
SELECT a+5 FROM t; -- this is a simple comment SELECT a+5 FROM t; --- this is a simple comment too
A bracketed comment is a C-style comment. It begins with /* , ends with
*/ and contains any number of characters, including zero or more
<separator>s. For example, this SQL statement is followed by a bracketed
comment.
SELECT a+5 FROM t; /* this is a bracketed comment that contains a carriage return */
If you want to restrict your code to Core SQL, don't use bracketed comments.
<literal>
A <literal> is a <token> that represents a non-null data value. SQL values are
normally atomic -- they cannot be subdivided -- and are either non-null values
or the null value. The null value isn't represented by a <literal>. Instead,
the <keyword> NULL is used whenever it's necessary to indicate that the
null value is represented.
[NON-PORTABLE] The logical representation of the null value is non-standard because the SQL Standard requires implementors to define that character used to display the null value. [OCELOT Implementation] The OCELOT DBMS that comes with this book displays a question mark to represent the null value.
In SQL, a <literal> is either a signed <numeric literal> (for example: +52.6), an unsigned <numeric literal> (for example: 15) or a general literal. (An unsigned literal is thus either an unsigned <numeric literal> or a general literal. A general literal is one of the following:
- A <bit string literal>, for example,
B'1011' - A <hex string literal>, for example,
X'4A' - A <binary string literal>, for example,
X'44AF' - A <character string literal>, for example,
'hello' - A <national character string literal>, for example,
N'hello' - A <date literal>, for example,
DATE '1997-07-15' - A <time literal>, for example,
TIME '19:30:20'
TIME '19:30:20.05'
TIME '19:30:20+03:00' - A <timestamp literal>, for example,
TIMESTAMP '1997-07-15 19:30:20'
TIMESTAMP '1997-07-15 19:30:20.05'
TIMESTAMP '1997-07-15 19:30:20.05-10:30' - A <year-month literal>, for example,
INTERVAL '20' YEAR
INTERVAL '10' MONTH
INTERVAL '20-10' YEAR TO MONTH - A <day-time literal>, for example,
INTERVAL -'20' DAY
INTERVAL '-10' HOUR
INTERVAL '15' MINUTE
INTERVAL '10' SECOND
INTERVAL '20 10:15:10' DAY TO SECOND - A <boolean literal>, either
TRUE,FALSEorUNKNOWN.
<keyword>
A <keyword> is a word that has a special meaning for the SQL parser. There are two types of SQL <keyword>s: reserved <keyword>s and non-reserved <keyword>s. Reserved <keyword>s may not be used as <regular identifier>s. Non-reserved <keyword>s are not so restricted, but it's probably not a good idea to use them as <regular identifier>s anyway.
A <keyword> is case insensitive because all its characters are part of the set of <SQL language character>s. That is, uppercase and lowercase letters within a <keyword> are interchangeable; so that, for example, these three <keyword>s are exactly alike to an SQL parser:
SELECT select Select
The Set of Reserved <keyword>s
| ABSOLUTE |
| ACTION |
| ADD |
| ADMIN |
| AFTER |
| AGGREGATE |
| ALIAS |
| ALL |
| ALLOCATE |
| ALTER |
| AND |
| ANY |
| ARE |
| ARRAY |
| AS |
| ASC |
| ASSERTION |
| AT |
| AUTHORIZATION |
| BEFORE |
| BEGIN |
| BINARY |
| BIT |
| BLOB |
| BOOLEAN |
| BOTH |
| BREADTH |
| BY |
| CALL |
| CASCADE |
| CASCADED |
| CASE |
| CAST |
| CATALOG |
| CHAR |
| CHARACTER |
| CHECK |
| CLASS |
| CLOB |
| CLOSE |
| COLLATE |
| COLLATION |
| COLUMN |
| COMMIT |
| COMPLETION |
| CONDITION |
| CONNECT |
| CONNECTION |
| CONSTRAINT |
| CONSTRAINTS |
| CONSTRUCTOR |
| CONTAINS |
| CONTINUE |
| CORRESPONDING |
| CREATE |
| CROSS |
| CUBE |
| CURRENT |
| CURRENT_DATE |
| CURRENT_PATH |
| CURRENT_ROLE |
| CURRENT_TIME |
| CURRENT_TIMESTAMP |
| CURRENT_USER |
| CURSOR |
| CYCLE |
| DATA |
| DATALINK |
| DATE |
| DAY |
| DEALLOCATE |
| DEC |
| DECIMAL |
| DECLARE |
| DEFAULT |
| DEFERRABLE |
| DEFERRED |
| DELETE |
| DEPTH |
| DEREF |
| DESC |
| DESCRIBE |
| DESCRIPTOR |
| DESTROY |
| DESTRUCTOR |
| DETERMINISTIC |
| DIAGNOSTICS |
| DICTIONARY |
| DISCONNECT |
| DISTINCT |
| DO |
| DOMAIN |
| DOUBLE |
| DROP |
| DYNAMIC |
| EACH |
| ELSE |
| ELSEIF |
| END |
| END-EXEC |
| EQUALS |
| ESCAPE |
| EVERY |
| EXCEPT |
| EXCEPTION |
| EXEC |
| EXECUTE |
| EXIT |
| EXPAND |
| EXPANDING |
| EXTERNAL |
| FALSE |
| FETCH |
| FIRST |
| FLOAT |
| FOR |
| FOREIGN |
| FOUND |
| FROM |
| FREE |
| FULL |
| FUNCTION |
| GENERAL |
| GET |
| GLOBAL |
| GO |
| GOTO |
| GRANT |
| GROUP |
| GROUPING |
| HANDLER |
| HAVING |
| HASH |
| HOST |
| HOUR |
| IDENTITY |
| IF |
| IGNORE |
| IMMEDIATE |
| IN |
| INDICATOR |
| INITIALIZE |
| INITIALLY |
| INNER |
| INOUT |
| INPUT |
| INSERT |
| INT |
| INTEGER |
| INTERSECT |
| INTERVAL |
| INTO |
| IS |
| ISOLATION |
| ITERATE |
| JOIN |
| KEY |
| LANGUAGE |
| LARGE |
| LAST |
| LATERAL |
| LEADING |
| LEAVE |
| LEFT |
| LESS |
| LEVEL |
| LIKE |
| LIMIT |
| LOCAL |
| LOCALTIME |
| LOCALTIMESTAMP |
| LOCATOR |
| LOOP |
| MATCH |
| MEETS |
| MINUTE |
| MODIFIES |
| MODIFY |
| MODULE |
| MONTH |
| NAMES |
| NATIONAL |
| NATURAL |
| NCHAR |
| NCLOB |
| NEW |
| NEXT |
| NO |
| NONE |
| NORMALIZE |
| NOT |
| NULL |
| NUMERIC |
| OBJECT |
| OF |
| OFF |
| OLD |
| ON |
| ONLY |
| OPEN |
| OPERATION |
| OPTION |
| OR |
| ORDER |
| ORDINALITY |
| OUT |
| OUTER |
| OUTPUT |
| PAD |
| PARAMETER |
| PARAMETERS |
| PARTIAL |
| PATH |
| PERIOD |
| POSTFIX |
| PRECEDES |
| PRECISION |
| PREFIX |
| PREORDER |
| PREPARE |
| PRESERVE |
| PRIMARY |
| PRIOR |
| PRIVILEGES |
| PROCEDURE |
| PUBLIC |
| READ |
| READS |
| REAL |
| RECURSIVE |
| REDO |
| REF |
| REFERENCES |
| REFERENCING |
| RELATIVE |
| REPEAT |
| RESIGNAL |
| RESTRICT |
| RESULT |
| RETURN |
| RETURNS |
| REVOKE |
| RIGHT |
| ROLE |
| ROLLBACK |
| ROLLUP |
| ROUTINE |
| ROW |
| ROWS |
| SAVEPOINT |
| SCHEMA |
| SCROLL |
| SEARCH |
| SECOND |
| SECTION |
| SELECT |
| SEQUENCE |
| SESSION |
| SESSION_USER |
| SET |
| SETS |
| SIGNAL |
| SIZE |
| SMALLINT |
| SOME |
| SPACE |
| SPECIFIC |
| SPECIFICTYPE |
| SQL |
| SQLEXCEPTION |
| SQLSTATE |
| SQLWARNING |
| START |
| STATE |
| STATIC |
| STRUCTURE |
| SUCCEEDS |
| SYSTEM_USER |
| TABLE |
| TEMPORARY |
| TERMINATE |
| THAN |
| THEN |
| TIME |
| TIMESTAMP |
| TIMEZONE_HOUR |
| TIMEZONE_MINUTE |
| TO |
| TRAILING |
| TRANSACTION |
| TRANSLATION |
| TREAT |
| TRIGGER |
| TRUE |
| UNDER |
| UNDO |
| UNION |
| UNIQUE |
| UNKNOWN |
| UNTIL |
| UPDATE |
| USAGE |
| USER |
| USING |
| VALUE |
| VALUES |
| VARCHAR |
| VARIABLE |
| VARYING |
| VIEW |
| WHEN |
| WHENEVER |
| WHERE |
| WHILE |
| WITH |
| WITHOUT |
| WORK |
| WRITE |
| YEAR |
| ZONE |
Note: SQL-92 and SQL3 both added a considerable number of words to the set of
SQL reserved <keyword>s. The Standard acknowledges this and -- as an aid to
users -- suggests that you include either a digit or an underline character in
your <regular identifier>s, and avoid names that begin with CURRENT_,
SESSION_, SYSTEM_ or TIMEZONE_ and those that end with
_LENGTH, to avoid conflicts with reserved <keyword>s added in future
revisions.
The set of non-reserved <keyword>s
| ABS |
| ADA |
| ASENSITIVE |
| ASSIGNMENT |
| ASYMMETRIC |
| ATOMIC |
| AVG |
| BETWEEN |
| BIT_LENGTH |
| BITVAR |
| BLOCKED |
| C |
| CARDINALITY |
| CATALOG_NAME |
| CHAIN |
| CHAR_LENGTH |
| CHARACTER_LENGTH |
| CHARACTER_SET_CATALOG |
| CHARACTER_SET_NAME |
| CHARACTER_SET_SCHEMA |
| CHECKED |
| CLASS_ORIGIN |
| COALESCE |
| COBOL |
| COLLATION_CATALOG |
| COLLATION_NAME |
| COLLATION_SCHEMA |
| COLUMN_NAME |
| COMMAND_FUNCTION |
| COMMAND_FUNCTION_CODE |
| COMMITTED |
| CONCATENATE |
| CONDITION_NUMBER |
| CONNECTION_NAME |
| CONSTRAINT_CATALOG |
| CONSTRAINT_NAME |
| CONSTRAINT_SCHEMA |
| CONTAINS |
| CONTROL |
| CONVERT |
| COUNT |
| CURSOR_NAME |
| DATETIME_INTERVAL_CODE |
| DATETIME_INTERVAL_PRECISION |
| DB |
| DISPATCH |
| DLCOMMENT |
| DLFILESIZE |
| DLFILESIZEEXACT |
| DLLINKTYPE |
| DLURLCOMPLETE |
| DLURLPATH |
| DLURLPATHONLY |
| DLURLSCHEMA |
| DLURLSERVER |
| DLVALUE |
| DYNAMIC_FUNCTION |
| DYNAMIC_FUNCTION_CODE |
| EXISTING |
| EXISTS |
| EXTRACT |
| FILE |
| FINAL |
| FORTRAN |
| GENERATED |
| HOLD |
| INFIX |
| INSENSITIVE |
| INSTANTIABLE |
| INTEGRITY |
| KEY_MEMBER |
| KEY_TYPE |
| LENGTH |
| LINK |
| LOWER |
| MAX |
| MIN |
| MESSAGE_LENGTH |
| MESSAGE_OCTET_LENGTH |
| MESSAGE_TEXT |
| METHOD |
| MOD |
| MORE |
| MUMPS |
| NAME |
| NULLABLE |
| NUMBER |
| NULLIF |
| OCTET_LENGTH |
| OPTIONS |
| OVERLAPS |
| OVERLAY |
| OVERRIDING |
| PASCAL |
| PARAMETER_MODE |
| PARAMETER_ORDINAL_POSITION |
| PARAMETER_SPECIFIC_CATALOG |
| PARAMETER_SPECIFIC_NAME |
| PARAMETER_SPECIFIC_SCHEMA |
| PERMISSION |
| PLI |
| POSITION |
| RECOVERY |
| REPEATABLE |
| RESTORE |
| RETURNED_LENGTH |
| RETURNED_OCTET_LENGTH |
| RETURNED_SQLSTATE |
| ROUTINE_CATALOG |
| ROUTINE_NAME |
| ROUTINE_SCHEMA |
| ROW_COUNT |
| ROW_TYPE_CATALOG |
| ROW_TYPE_SCHEMA |
| ROW_TYPE_NAME |
| SCALE |
| SCHEMA_NAME |
| SELECTIVE |
| SELF |
| SENSITIVE |
| SERIALIZABLE |
| SERVER_NAME |
| SIMPLE |
| SOURCE |
| SPECIFIC_NAME |
| SIMILAR |
| STRUCTURE |
| STYLE |
| SUBCLASS_ORIGIN |
| SUBLIST |
| SUBSTRING |
| SUM |
| SYMMETRIC |
| SYSTEM |
| TABLE_NAME |
| TRANSACTIONS_COMMITTED |
| TRANSACTIONS_ROLLED_BACK |
| TRANSACTION_ACTIVE |
| TRANSFORM |
| TRANSLATE |
| TRIGGER_CATALOG |
| TRIGGER_SCHEMA |
| TRIGGER_NAME |
| TRIM |
| TYPE |
| UNCOMMITTED |
| UNLINK |
| UNNAMED |
| UPPER |
| USER_DEFINED_TYPE_CATALOG |
| USER_DEFINED_TYPE_NAME |
| USER_DEFINED_TYPE_SCHEMA |
| YES |
The SQL Standard allows implementations to define more reserved words for their own DBMSs. Here are some words that are reserved in some dialect of one of the major vendors (e.g., Oracle, Sybase, Microsoft). You may be able to use these words as <regular identifier>s, but if you do so, you will lose portability.
| ABORT |
| ACCEPT |
| ANALYZE |
| ARCHIVELOG |
| ARRAY |
| ASSIGN |
| ASYNCH |
| ATTRIBUTES |
| AUDIT |
| BACKUP |
| BINARY_INTEGER |
| BODY |
| CACHE |
| CHAR_BASE |
| CLUSTER |
| CLUSTERS |
| COLAUTH |
| COLUMNS |
| COMPRESS |
| CONSTANT |
| CRASH |
| CURVAL |
| DATA_BASE |
| DATABASE |
| DBA |
| DEBUGOFF |
| DEBUGON |
| DEFINITION |
| DELAY |
| DELTA |
| DICTIONARY |
| DIGITS |
| DISPLACEMENT |
| DISPOSE |
| ELEMENT |
| ENTRY |
| EXCEPTION_INIT |
| FACTOR |
| FORM |
| FREELISTS |
| GENERIC |
| IDENTIFIED |
| IGNORE |
| INCLUDE |
| INDEX |
| INDEXES |
| INFILE |
| INSTEAD |
| INSTANCE |
| LIMITED |
| LIST |
| MAXEXTENTS |
| MINUS |
| MLSLABEL |
| MODE |
| NEW |
| NEW_TABLE |
| NEXTVAL |
| NOCOMPRESS |
| NONE |
| NUMBER |
| NUMBER_BASE |
| OFF |
| OID |
| OLD_TABLE |
| OPERATOR |
| OPERATORS |
| OTHERS |
| PACKAGE |
| PARTITION |
| PCTFREE |
| PENDANT |
| POSITIVE |
| PRAGMA |
| PREORDERED |
| PRIVATE |
| PROTECTED |
| RAISE |
| RANGE |
| RAW |
| RECORD |
| RELEASE |
| REM |
| RENAME |
| REPLACE |
| RESOURCE |
| REUSE |
| REVERSE |
| ROWID |
| ROWLABEL |
| ROWNUM |
| ROWTYPE |
| RUN |
| SEPARATE |
| SEQUENCE |
| SQLCA |
| SQLCODE |
| SQLERRM |
| SQLWARNING |
| STATEMENT |
| STDDEV |
| SUBTYPE |
| SYMBOL |
| TABAUTH |
| TABLES |
| TASK |
| TERM |
| TEST |
| THERE |
| TUPLE |
| USE |
| VARCHAR2 |
| VARIANCE |
| VIEWS |
| VIRTUAL |
| VISIBLE |
| WAIT |
| XOR |
<identifier>
An <identifier> (a <token> that names an SQL Object) is a
character string, up to 128 characters long, from one Character
set. Within a CREATE SCHEMA statement, an <identifier> that
doesn't include an explicit <Schema name> names an Object that
belongs to the Schema you're creating. In any other SQL
statement, an <identifier> that doesn't include an explicit
<Schema name> names an Object that belongs to the Schema named in
the SCHEMA clause (or, if there is no SCHEMA clause, in the
AUTHORIZATION clause) of the MODULE statement that defines the
Module you're running. SQL recognizes three types of
<identifier>s: the <regular identifier>, the <SQL language
identifier>, and the <delimited identifier>.
<regular identifier>
The required syntax for a <regular identifier> is:
<regular identifier> ::= Object name
A <regular identifier> is a character string, up to 128 characters long, that consists only of letters, digits, and underscore characters. It must begin with a letter.
[Obscure Rule] We usually think of a "letter" as one of the simple Latin letters, but in fact -- depending on the Character set being used -- a "letter" can also be an accented character, a character in a non-Latin alphabet, or a syllable or ideograph; i.e., it can be any character with the Unicode alphabetic property or ideographic property. The "letter" that begins a <regular identifier> may not have the Unicode combining property; the letters following it may, with the proviso that these characters are not legal anywhere in a <regular identifier>:
| U+06DD | Arabic End of Ayah. |
| U+06DE | Arabic Start of Rub El Hizb. |
| U+20DD | Combining Enclosing Circle. |
| U+20DE | Combining Enclosing Square. |
| U+20DF | Combining Enclosing Diamond. |
| U+20E0 | Combining Enclosing Circle Backslash. |
Depending on the Character set in use, you may also use these characters in a <regular identifier>, as long as they're not used as the <identifier>'s first character:
| U+00B7 | Middle Dot |
| U+02D0 | Modifier Letter Triangular Colon |
| U+20D1 | Modifier Letter Half Triangular Colon |
| U+0640 | Arabic Tatweel |
| U+0E46 | Thai Character Maiyamok |
| U+0EC6 | Lao Ko La |
| U+3005 | Ideographic Iteration Mark |
| U+3031 to U+3035 inclusive | variations of Vertical Kana Repeat Mark |
| U+309B to U+309E inclusive | variations of Combining Katakana-Hiragana Sound Mark and Hiragana Iteration Mark |
| U+30FC to U+30FE inclusive | variations of Katakana-Hiragana Prolonged Sound Mark and Katakana Iteration Mark |
| U+FF70 | Halfwidth Katakana-Hiragana Prolonged Sound Mark |
| U+FF9E | Halfwidth Katakana Voiced Sound Mark |
| U+FF9F | Halfwidth Katakana Semi-voiced Sound Mark |
| U+200C | Zero Width Non-Joiner |
| U+200D | Zero Width Joiner |
| U+200E | Left-To-Right Mark |
| U+200F | Right-To-Left Mark |
| U+202A | Left-To-Right Embedding |
| U+202B | Right-To-Left Embedding |
| U+202C | Pop Directional Formatting |
| U+202D | Left-To-Right Override |
| U+202E | Right-To-Left Override. |
| U+206A | Inhibit Symmetric Swapping |
| U+206B | Activate Symmetric Swapping |
| U+206C | Inhibit Arabic Form Shaping |
| U+206D | Activate Arabic Form Shaping |
| U+206E | National Digit Shapes |
| U+206F | Nominal Digit Shapes |
| U+FEFF | Zero-Width No-Break Space |
| U+203F | Undertie |
| U+2040 | Character Tie |
| U+FE33 | Presentation Form for Vertical Low Line |
| U+FE34 | Presentation Form for Vertical Wavy Low Line |
| U+FE4D | Dashed Low Line |
| U+FE4E | Centreline Low Line |
| U+FE4F | Wavy Low Line |
| U+FF3F | Fullwidth Low Line |
A <regular identifier> is case insensitive. That is, uppercase and lowercase letters within a <regular identifier> are interchangeable; for example, these three <regular identifier>s are exactly alike to an SQL parser:
SAMS_TABLE sams_table Sams_Table
SQL doesn't allow a reserved <keyword> to be used as a <regular identifier>.
When comparing a <regular identifier> and a reserved <keyword> to check for
equality, your DBMS will replace the lowercase letters in each with their
uppercase equivalents and assume that both belong to the SQL_TEXT
Character set. In fact, your DBMS will replace all lowercase letters in a
<regular identifier> with their uppercase equivalents prior to any comparison
and prior to storing the <identifier> either in a Catalog's
INFORMATION_SCHEMA or a diagnostics area.
Here are some examples of <regular identifier>s:
TABLE_1 | a <regular identifier> |
OCELOT_COMPUTER_SERVICES | another <regular identifier> |
DATE_ | a <regular identifier> that looks like a reserved <keyword> |
MÜLLER_DATEI | a <regular identifier> that doesn't exclusively use simple Latin letters |
If you want to restrict your code to Core SQL, make sure your <regular identifier>s are no more than 18 characters long.
<SQL language identifier>
The required syntax for an <SQL language identifier> is:
::= Object name
An <SQL language identifier> is a <regular identifier> that consists only of simple Latin letters, digits, and underscore characters. It must begin with a simple Latin letter. Here are two examples of <SQL language identifier>s:
TABLE_1 BOB_SCHEMA
<delimited identifier>
The required syntax for a <delimited identifier> is:
::= "Object name"
A <delimited identifier> is a character string, up to 128 characters long, surrounded by a pair of double quote marks. (The delimiting double quotes aren't part of the <identifier>, so they're not included in the calculation of its size.) Two consecutive double quotes within the character string (i.e., "") represent one double quote mark; together, they count as one character when calculating the size of the <identifier>.
A <delimited identifier> is case sensitive. That is, uppercase and lowercase letters within a <delimited identifier> are not interchangeable; for example, to an SQL parser, these three <delimited identifier>s
"SAMS_TABLE" "sams_table" "Sams_Table"
represent three different names. Your DBMS will not replace lowercase letters in a <delimited identifier> with their uppercase equivalents prior to any comparison or storage operation.
Here are some examples of <delimited identifier>s:
"table#1" | a <delimited identifier> that uses lowercase letters and a special character |
"OCELOT Computer Services" | a <delimited identifier> that includes spaces |
"DATE" | a <delimited identifier> that looks like a reserved <keyword> |
If you want to restrict your code to Core SQL, make sure your <delimited identifier>s are no more than 18 characters long.
<identifier> Equivalence
Two <regular identifier>s are the same if they consist of the same characters.
Your DBMS assumes the relevant Character set is SQL_TEXT when comparing them.
A <regular identifier> and a <delimited identifier> are the same if the
<regular identifier> consists of the same characters that make up the body
(i.e., the string of characters inside the double quote marks) of the
<delimited identifier>. Two <delimited identifier>s are the same if their
bodies consist of the same characters. Your DBMS assumes the relevant Character
set is SQL_TEXT with a case sensitive Collation when comparing <regular
identifier>s to <delimited identifier>s and <delimited identifier>s to one
another.
Because of the difference in case sensitivity between <regular identifier>s and <delimited identifier>s, these two <regular identifier>s are the same:
P_TABLE p_table
and both are equal to this <delimited identifier>:
"P_TABLE"
but neither are equal to this <delimited identifier>:
"p_table"
For another example, consider this group of <identifier>s:
| 1. | "E" | A <delimited (uppercase) identifier>. |
| 2. | "e" | A <delimited (lowercase) identifier>. |
| 3. | E | A <regular identifier>. |
| 4. | e | A <regular identifier>. |
| 5. | ë | A <regular identifier>. |
Because delimiting double quotes are not themselves part of an <identifier>,
the <delimited identifier> "E" is the same as the <regular identifier>
E, i.e., examples #1 and #3 are the same name. Because lowercase letters
in a <regular identifier> are mapped to uppercase letters before comparison and
storage, examples #3 and #4 are the same name -- and they're also the same name
as example #1. Because lowercase letters in a <delimited identifier> are not
mapped to uppercase letters at any time, example #2 is not the same name as
example #4. Because there is no mapping of accented characters in an
<identifier>, example #5 is not the same name as any of the others -- but
ë is a letter, and so qualifies as a <regular identifier>. (This example
assumes that the MS-Windows encoding scheme -- the one that Microsoft calls
"ANSI" -- is in use. This is not always the case; the choice of possible
Character sets is broad.)
Qualification of <identifier>s
All SQL Objects have names which are some combination of <regular identifier>s, <delimited identifier>s, or <SQL language identifier>s in an appropriate hierarchy of qualification. The top of the hierarchy is [SQL-server name.], an implicit name, therefore never specified. Then comes [<Catalog name>.], which is the first level of the hierarchy that can be explicitly stated. The next level is [<Schema name>.], then comes [the name of an Object], and (if the Object is a Table) the final level of the hierarchy is <.Column name>. The entire qualification hierarchy always exists but is not necessarily visible; the Standard contains rules by which high-level parts of the combination may be omitted and their values assumed by default.
Note:
Portions of the text in this entry are Copyright © 1999 by Ocelot Computer Services Incorporated. Used by permission.