Relative Truth, Three-Valued Logic, and One Reason Why SQL Sucks

SQL Tripwires

This thought experiment sheds light on the flaws in the way in which the SQL DBMS deals with nulls, that is to say, with missing values. In this section, I argue that the SQL DBMS interprets UNKNOWN in an arbitrary way, usually pretending that it means FALSE, but sometimes pretending that it means TRUE, a pretence made necessary by the fact that UNKNOWN is in fact not a truth value. Since the DBMS is working with pretence, not with truth, not with reality (I will show later in what sense the DBMS does work with the real) -- certain queries in certain circumstances fail in an utterly disastrous way.

I examine first situations in which UNKNOWN gets treated as having the effect of FALSE, then situations in which UNKNOWN is treated as meaning TRUE.

When We Pretend That UNKNOWN means FALSE

Both the SQL DBMS and the ritual in our thought experiment rely upon a truth-value-evaluating operation. In the case of the thought experiment, this operator consists in someone's opening one or the other door based on L.'s statement, an action which reveals in a concrete and immediate way whether that statement has the value TRUE or the value FALSE. Similarly, the SQL DBMS performs a truth-value-evaluating operation when it compares one value with another. Just as the truth-evaluating operation in the thought experiment can fail when the doors jamb, the truth-evaluating operation performed by the SQL DBMS can fail as well. Let's see how.

Suppose that a certain business uses a SQL DBMS running a database that contains, among other tables, a supertype table EMPLOYEE and a subtype table CONVICTED_FELONS (the business is a Mafia operation comprising several lines of business, including loansharking, that ...ahem... work together synergistically). The EMPLOYEE table has, among other columns, the primary key EMPLOYEE_ID and the varchar column KNOWN_AS. (the business always assigns a unique KNOWN_AS alias to each of its employees).

Current Value Of EMPLOYEE Table
EMPLOYEE_ID	KNOWN_AS	HIRED_DATE	POSITION	AGE
1	CHENEY	01/21/2001	CEO	64
2	MUGSY	01/22/2001	HONEST CRIMINAL	49
3	LEFTY	01/22/2001	HONEST CRIMINAL	24
4	SCARFACE	01/22/2001	HONEST CRIMINAL	38
5	MOM	01/22/2001	HONEST CRIMINAL	54
6	NIGHTHAWK	12/09/2001	HONEST CRIMINAL	42
7	UNCLE FOONJIE	02/11/2002	HONEST CRIMINAL	19

The predicate for this table (this relation-manque) is:

It is known that employee is identified by an integer EMPLOYEE_ID, possesses the alias written down as varchar2 KNOWN_AS, was hired on date HIRED_DATE, is in position named by varchar2 POSITION, and is number AGE years old.

The CONVICTED_FELON table also has an EMPLOYEE_ID column serving as both a primary key and as a foriegn key to the EMPLOYEE table. Contrary to good database design principles (but conveniently for this example), however, CONVICTED_FELON also has a KNOWN_AS column repeating the same information from the corresponding EMPLOYEE rows. CONVICTED_FELON.KNOWN_AS is, unfortunately, a column that can contain nulls, and it does in fact contain one null -- that is to say, it is missing one value. Let's say that the value that should be there is the varchar2 string 'UNCLE FOONJIE' that names Uncle Foonjie:¹

Current Value Of CONVICTED_FELON Table
EMPLOYEE_ID	KNOWN_AS	CONVICTION_DATE
2	MUGSY	08/23/2005
3	LEFTY	01/04/2003
4	SCARFACE	07/29/2007
5	MOM	09/14/2002
6	NIGHTHAWK	10/18/2006
7		03/17/2004

The predicate for the table is:

Employee is identified by an integer EMPLOYEE_ID, possesses the alias written down as varchar2 KNOWN_AS, was convicted of a felony on date Conviction_DATE.

Side-note for now -- part of my adventure of discovery, so I don't quite know where to fit it for the present: Any proposition (call it p that is formed from either of these two predicatse will be believed to be true, the users of the database relying upon the information in their day to day tasks, and the belief is to be formed via a reliable channel of information. (It is plausible to suppose that the less reliable the channel of information is, the more likely the database will not be used and eventually eliminated). For the moment, I will take knowledge to be true belief formed via a reliable channel of information. So if p actually is true, then it is known that p. For the moment I will leave 'it is known that' as somewhat vague.

A clerk (he is a contractor, otherwise known as a "pseudo-employee," so he is not listed in EMPLOYEE) who is not completely familiar with the business needs to know how many employees are not also convicted felons. He submits the following query to the database, with the intent of telling the database: 'Give me all the EMPLOYEES who are not also convicted felons.'

select *
from EMPLOYEE
where EMPLOYEE.KNOWN_AS not in (
select CONVICTED_FELON.KNOWN_AS
from CONVICTED_FELON
)
;

(Of course, the Developer would have been safer using the non-nullable column EMPLOYEE_ID rather than KNOWN_AS, but we are taking this query as an example precisely to show the pitfalls one would avoid using non-nullable columns.) As it so happens, there is only one employee, Cheney, who is not also a convicted felon. So if the query in fact conforms with the clerk's intent, only one row (Cheney's) would show up in the result set.

Let's see what happens were a SQL DBMS employing 2-valued logic to try to evaluate this query, given the constitution of the relevant tables just presented. Would it give us the expected row?

For each row that results from the outer query, the 2VL SQL DBMS evaluates the truth value of:

EMPLOYEE.KNOWN_AS != 'MUGSY' and EMPLOYEE.KNOWN_AS != 'LEFTY' and EMPLOYEE.KNOWN_AS != 'SCARFACE' and EMPLOYEE.KNOWN_AS != 'MOM' and EMPLOYEE.KNOWN_AS != 'NIGHTHAWK' and EMPLOYEE.KNOWN_AS != null

where the names of course are generated from the inner query getting its values from CONVICTED_FELON. Let me call this the DBMS' scratchpad predicate to emphasize that this predicate is part of the process [logical? materialized?] by which the SQL DBMS gets its final result, and so to distinguish it from the predicate from which are generated the propositions comprised by the result set of the inner query, the result set of the outer query, and the result set of the total query.

If CONVICTED_FELON.KNOWN_AS contained no nulls, the 2VL SQL DBMS would have proceeded on its merry way, humming a tune to itself while it first substitutes 'CHENEY' for EMPLOYEE_NAME.KNOWN_AS in the predicate, and performed a truth-evaluating operation on the resulting proposition (call it the first scratchpad proposition):

'CHENEY' != 'MUGSY' and 'CHENEY' != 'LEFTY' and 'CHENEY' != 'SCARFACE' and 'CHENEY' != 'MOM' and 'CHENEY' != 'NIGHTHAWK'

The truth-evaluating operation would have generated the result TRUE ( P and Q and R and S and T is true if P, Q, R, S, and T are all true), and the EMPLOYEE row containing 'CHENEY' would have been included in the result set. Then the 2VL SQL DBMS would have gone to the second row returned by the outer query, containing, say, 'MOM', and would have performed the truth-evaluating operation on this scratchpad proposition:

'MOM' != 'MUGSY' and 'MOM' != 'LEFTY' and 'MOM' != 'SCARFACE' and 'MOM' != 'MOM' and 'MOM' != 'NIGHTHAWK'

This time, the truth-evaluating operation would have generated the result FALSE ( P and Q and R and S and T is false if just one of those propositions is false), so the SQL DBMS would have excluded the row from the result set.

So far, the truth-value-evaluating operation of the 2VL SQL DBMS is working as it should. Let me now cash out my 'works with truth, that is to say, with reality' claim above. In these scratchpad propositions, the DBMS is working with truth -- with reality -- not with pretence because in these scratchpad propositions it actually does have access to the things in reality that make the propositions true or false. For example, it is capable of comparing the string 'CHENEY' with the string 'MOM' to determine the truth value of "'CHENEY' = 'MOM'." The DBMS may not have the ability to evaluate the truth value of any proposition (say, 'Cheney was made CEO on January 21, 2001')² but it is able to determine the truth or falsity of the proposition "the varchar2 string 'LEFTY' is identical with the varchar2 string 'LEFTY'." The following statements are equivalent; one does not add anything that the other does not say: 'the proposition "the varchar2 string 'LEFTY' is identical with the varchar2 string 'LEFTY'." is true; it is a fact that "the varchar2 string 'LEFTY' is identical with the varchar2 string 'LEFTY'."; that the "the varchar2 string 'LEFTY' is identical with the varchar2 string 'LEFTY'." is not a pretence, but part of reality -- the DBMS' reality.³

Unfortunately, however, CONVICTED_FELON does contain nulls, so in fact for each row generated by the outer query a proposition similar to the following will be evaluated for its truth value:

'CHENEY' != 'MUGSY' and 'CHENEY' != 'LEFTY' and 'CHENEY' != 'SCARFACE' and 'CHENEY' != 'MOM' and 'CHENEY' != 'NIGHTHAWK' and 'CHENEY' != null

This time, the truth-value-evaluating operation gets stopped short when it comes to the null. No value exists for it to perform the comparison with 'CHENEY', so it can generate neither the truth value TRUE nor the truth value FALSE. The door has jammed, so to speak. To adopt a different metaphor, the truth-value-evaluating operation has stumbled into the hole left by the missing value. It has fallen through the trap door of a table field that looks like it has a value (null) but in fact does not because null means precisely the absence of a value. To adopt an even more florid metaphor, it has been sucked into the black hole formed by the field which should have a value but does not....

The 2VL SQL DBMS has to DO something -- it can't just stand there, or rather, it can't just keep falling down through the null space left by the missing value. It needs to report something back to the outer query, if the query is not to hang forever. But do what? The 2VL SQL DBMS can report back neither TRUE nor FALSE. It would seem that the only thing it can do is report back an error. This particular query has failed, and it will always fail if CONVICTED_FELON has a null value in the KNOWN_AS column. This presents quite a conundrum, because as we all know, missing values will always be with us. Should we then allow a certain proportion of our queries return nothing except an error message?

The SQL DBMS that adopts a TRUE/FALSE/UNKNOWN 3-valued logic thinks it has a solution to this conumdrum. In this logic, the truth-value-evaluating operation generates UNKNOWN for 'CHENEY != null.' Since one of the conjuncts above evaluates to UNKNOWN, the entire proposition also evaluates to UNKNOWN. In TRUE-FALSE-UNKNOWN 3-valued logic, P and Q and R is UNKNOWN if P and Q are each true but R is unknown. Each and every substitution of EMPLOYEE.KNOWN_AS produced by the outer query will get evaluated as UNKNOWN. If the outer query is expecting a truth value, we'll give it a truth value! We won't just get stopped short -- we think.

But what should the 3VL SQL DBMS do with the value UNKNOWN? It can do only one of three things. It can accept the row, reject the row, or return an error.

Should the 3VL SQL DBMS accept the row? Since all rows (propositions) evaluate to UNKNOWN, all rows would be accepted. The query would give a patently incorrect result, saying in effect that there are no rows in CONVICTED_FELON that are also in EMPLOYEE. Each and every employee to have a spotless record...? -- not! The query would generate an incorrect result.

Should the 3VL SQL DBMS reject the row? In that case every single EMPLOYEE row would also be in CONVICTED_FELON. The DBMS would be saying, in effect, that UNKNOWN here has the effect of FALSE, and reject every single row in EMPLOYEE from inclusion in the result set. But while Cheney may very well richly deserve to be in the CONVICTED_FELON table, at the time of this writing he is not. Again, the query would produce an incorrect result.

Should the 3VL SQL DBMS return an error message, then? But in that case, there would be absolutely no point in using the TRUE/FALSE/UNKNOWN 3-valued logic instead of the TRUE/FALSE 2-valued logic!

So UNKNOWN gives the DBMS no basis for any of the three alternatives (accept the row, reject the row, return an error message). The DBMS is at a loss, so to speak, because it doesn't know the information needed to make a decision, just as the townspeople are left without the information needed to make a reliable prediction when one of the doors jam. In both case, UNKNOWN fails to tell us anything about how things stand with regard to the 'is the employee a felon?' issue and the 'is the tiger behind door A or door B' issue respectively -- both issues about how things stand in the world, information about which we could use to guide our actions. Instead, it tells us how things stand with regard to us, to the gap in our knowledge and information about the world. As I have argued, UNKNOWN is not a truth value at all -- it is a knowledge value.

But again, the 3VL SQL DBMS can't just stand there, looking flummoxed. It has to DO something. Let's see what the Oracle 3VL SQL DBMS does to remove the deer-in-the-headlights look from its eyes.

Just as in the thought experiment, where the townspeople sometimes pretend that UNKNOWN means FALSE, and sometimes pretend that it means TRUE, the pretence being completely unrelated to the factors that actually make L.'s utterance either true or false, the Oracle DBMS sometimes pretends that UNKNOWN means FALSE, sometimes pretends that UNKNOWN means TRUE, and bases its pretence on factors that have absolutely nothing to with what makes the proposition whose truth value it has just tried to evaluate either true or false.

In this particular circumtance -- this particular query -- UNKNOWN gets treated as if it meant FALSE, so every row in which the predicate evaluates to UNKNOWN gets rejected. That's all the rows, so the query gives us back 'No Rows Returned' -- a patently incorrect result.

But why should UNKNOWN be treated as false in this particular case? Why couldn't we pretend it means TRUE? Did the designers of the SQL language flip a coin? Perhaps all that lies behind the decision is the psychological feeling that UNKNOWN, resulting from the absence of a value, is negative, just as FALSE is something negative. That the comparison did not occur feels almost the same as saying that had it occurred we would have gotten the result FALSE. It is safer to assume the worst, perhaps the thinking/feeling went, so we'll assume that if the comparison had been able to go through, we would have gotten 'CHENEY' != 'CHENEY', which results in FALSE, which results in rejecting the row because of the NOT IN filter. Also, if the point is to exclude rows (exclude all the ones that are in CONVICTED_FELON), it feels more awkward to end up including all of them instead of excluding all of them.

But basing the decision whether to pretend that UNKNOWN has the same effect as FALSE or TRUE on psychological factors is as arbitrary as the townspeople in the though experiment basing their analogous decision on the price of salt in the marketplace that day or how high the sun is in the sky at the moment. In both cases, the decision is completely arbitrary and has no relation to the truth value of the query proposition or of L.'s statement respectively. [If we COULD base the decision on the truth value it 'also' has, we would. Surely this is devastating to the UNKNOWN as a truth value notion. It's the one that counts, the one on which we can base our actions in a non-arbitrary way.]

In returning no rows when it should have returned one, the query has failed. It is guaranteed to fail if there is a null in CONVICTED_FELON.KNOWN_AS at the same time there is an employee in EMPLOYEE who is not also in CONVICTED_FELON. Moreover, the query has failed in the worst possible way -- silently and insidiously. The user has absolutely no clue that the query has failed unless he already knows that Cheney is the sole employee who is not also a convicted felon. The user who does not know this has only the semantic meaning of the query to go on: give me all the employees who are not also convicted felons. Why should he think that the query is two-faced -- it does not in fact mean what it says? [More on this later: To not be a blunt instrument of thought, a query language must ensure that all the queries written in it mean what they say.....The query does not mean what it says because the TRUE-FALSE-UNKNOWN 3-valued logic is confused and hides a failure] If the user is a trusting soul, he is in trouble.

We are about to see that there is a sort-of-kind-of adequate work-around to this failure that the savvy SQL developer will know about. But -- and this is a point I will be harping on later -- nothing in the meaning of the query carries the slightest warning that it will product an incorrect result. One just has to know that queries with a NOT IN subquery have to be treated with care. The situation is a little like entering a room in which one has to know to turn off a valve that is spewing out Carbon Monoxide gas. The innocent person who enters this room will have absolutely no awareness of the poison gas -- Carbon Monoxide has no smell, produces no harsh or unpleasant feelings in the mouth, throat, or lungs. The query will fail without the non-savvy user knowing it; the non-savvy person going into the room will die. Both would have been better off with an error message or warning, respectively.

Which is to say that the user would have been better off with a 2VL SQL DBMS which reports back an error message when it cannot produce a correct result when it stumbles across a null. In introducing the "truth value" UNKNOWN which, because it is not really a truth value, forces the SQL DBMS to pretend that UNKNOWN means FALSE in some circumstances and TRUE in others, all that is happening is that failure has been covered up. Failure has not been prevented.

In the query above, all of the rows are rejected because they all evaluate to UNKNOWN. This result is unacceptable, of course, because it gives us false information about how things stand in the real (the extra-database) world. (In the real world CHENEY is not a convicted felon.) SQL gives us a way to get the correct result. We could issue:

select *
from EMPLOYEE
where /* not needed here because this is a not null column: EMPLOYEE.KNOWN_AS is not null */
/* and */ EMPLOYEE.KNOWN_AS not in (
select CONVICTED_FELON.KNOWN_AS
from CONVICTED_FELON
where CONVICTED_FELON.KNOWN_AS is not null )
;

This query no longer guarantees an incorrect result if there is a missing value in the CONVICTED_FELON.KNOWN_AS column at the same time there is an employee in EMPLOYEE who is not also in CONVICTED_FELON; nonetheless, whether we get the correct result is a matter of chance. It depends on what the missing value is. -- We can think of this as a second "solution" to the problem of missing values -- simply excise them from the tables we are examining as if they were diseased tissue. 'We just might get the right answer if we are lucky' is an improvement over 'We are guaranteed to get the wrong answer, always.' Nonetheless, the problem of nulls is not solved until we can reliably get the correct answer.

Here the meaning of the query is: give me the employee whose KNOWN_AS alias we have information about and who is not among those convicted felons whose KNOWN_AS alias we have information about. [No longer a transparent window through which to view the world; have to worry about the individual words; we are being like a poet like Paul Celan who worries whether the words in the language are still valid.] This is less than ideal because we don't want to be worrying about the tags we use to identify the employees and convicted felons about which we are thinking our thoughts; we want to be thinking about the employees and convicted felons. It is as if one said: among the employees and convicted felons we can think about because we have tags to identify them in our thoughts, give me the employees who are not also convicted felons.

A somewhat extraneous note: in contrast to the COLUMN != to null in the predicate formed during the evaluation of the query previous to the one directly above, the 'is null' ('is not null') predicate in queries that contain 'where COLUMN is null' or 'where COLUMN is not null' is evaluated using standard 2-value logic. Give me all the rows where COLUMN's value is missing (is not missing): in other words, the predicate COLUMN is null (is not null)' evaluates to either TRUE or FALSE as the SQL DBMS tests each row.

[Because of the ad hoc treatment, because of the not-based-on-the-semantics, SQL is a blunt instrument of thought. This is the true objection I am getting at. This is more important -- though stemming from the same cause as -- 'SQL is more difficult to learn because of the complication.']

Another somewhat extraneous note: The insidious carbon-monoxide poison gas queries enabled by the TRUE-FALSE-UNKNOWN 3-valued logic does not show up if the 'not in' gets changed to 'in' in the query above. This is because the in clause:

EMPLOYEE.KNOWN_AS in ( 'MUGSY', 'LEFTY', 'SCARFACE', 'MOM', 'NIGHTHAWK', null )

is equivalent to:

EMPLOYEE.KNOWN_AS = 'MUGSY' or EMPLOYEE.KNOWN_AS = 'LEFTY' or EMPLOYEE.KNOWN_AS = 'SCARFACE' or EMPLOYEE.KNOWN_AS = 'MOM' or EMPLOYEE.KNOWN_AS != 'NIGHTHAWK' or EMPLOYEE.KNOWN_AS = null

TRUE-FALSE-UNKNOWN 3-valued logic needs only one of the above disjuncts to be true if the entire expression is to be true, regardless of whether any of the other disjuncts have the value FALSE or UNKNOWN. So when EMPLOYEE.KNOWN_AS gets replaced by a particular value from a row generated by the outer query, the resulting proposition will evaluate to TRUE whenever one of the disjuncts are true, and FALSE whenever one of the disjuncts is false. The presence of the UNKNOWN does not doom in this case every resulting proposition to either the value FALSE or the value TRUE. So Cheney is left out of the result set, and Mugsy, Lefty, Scarface, Mom, and Nighthawk are included in. However, even though the TRUE-FALSE-UNKNOWN 3-valued logic doesn't pull its insidious carbon-monoxide-poisoning act here to worsen the problem of nulls, nulls still do present a problem here. If the missing value in CONVICTED_FELON.KNOWN_AS is 'UNCLE FOONJIE', then the query is giving us an incorrect result. The query is supposed to give us all the convicted felons (appended with information from EMPLOYEE); Uncle Foonjie is a convicted felon in this scenario; therefore, he should be included in the result set of the query. The 3-valued logic isn't worsening the problem to the point of turning it into a train wreck; nonetheless, whether we get the correct answer is still hit-and-miss.

Back to the arbitrary effects of UNKNOWN: suppose business' database contains a SQL table DEADBEAT_CUST meant to record how much each of its delinquent customers owes. The table contains rows with columns CUSTOMER_ID and AMOUNT_OWED. CUSTOMER_ID is a mandatory column; incredibly, AMOUNT_OWED is not, and a number of AMOUNT_OWED fields are null. The customer does owe some amount greater than $0.00, but the business has no clue what that value is. The business wants to send pay-up (or else we'll send Uncle Foonjie) letters to the egregiously delinquent customers -- the most delinquent of the delinquent -- who owe more than $499.99, so an employee types into TOAD or SQL*Plus the query:

select CUSTOMER_ID, AMOUNT_OWED
from DEADBEAT_CUST
where AMOUNT_OWED > 499.99
;

Again, the relevant proposition will evaluate to UNKNOWN if the AMOUNT_OWED field is null. Here, again, we pretend that UNKNOWN means false. (This is the equivalent of holding up the sign with the word 'FALSE' written on it, instead of having the actual truth value FALSE ((supposing L.'s statement is in fact false)) revealed to us by the tiger's rendering L. into Fancy Feast for large felines.) We exlude the row from the result set. The SQL DBMS has to DO SOMETHING with the row after all, and UNKNOWN does not give it a third option. It has to treat the row as if it were TRUE, or as if it were FALSE. If we don't actually know the truth value of the proposition, it is safer to treat it as if that value were FALSE. We will err on the side of keeping all false rows out of the result set.

This is not a tasteless-odorless-poison gas situation in which the query is guaranteed to give us the wrong result in certain circumstances; nonetheless, that getting the correct result is a hit-or-miss affair is unacceptable. The correct result matters to the business. Suppose that in fact there is one customer who owes the business one million dollars and is capable of paying that debt to the business. Suppose further that this payment would save the business from bankruptcy. The AMOUNT_OWED column in the row for that customer, however, is missing a value. The query will not include that customer in its result set; consequently, the business will be forced to declare bankruptcy. The SQL DBMS has led the business to this disaster because it is failing to give the business a fact about its customers; instead, it is in effect making the decision in an arbitrary way based a fact about the business, namely, the gap in its information. Truth value about the customer. Knowledge value about the business.

UNKNOWN in PL/SQL Stored Procedures and Functions

In if statements PL/SQL stored procedures and functions, UNKNOWN has the same effect as FALSE. This can lead to insiduously incorrect results just as in the query with the NOT IN subquery:

if EMPLOYEE.AGE <= 45 then dbms_output.PUT_LINE( 'Employee L_KNOWN_AS is 45 or younger' ); else dbms_output.PUT_LINE( 'Employee L_KNOWN_AS is older than 45' ); end if;

But if the EMPLOYEE.AGE field happens to be null for, say, Uncle Foonjie, he will see 'Employee Uncle Foonjie is older than 45' even if Uncle Foonjie is in fact 19. If the expression following the 'if' has the truth value UNKNOWN, it counts as not TRUE, and therefore given the same effect as FALSE in the else clause. As in the first query above, this creates plenty of room for silent, insidious error to creep in. One has to be savvy enough to code for the nulls if nulls are possible. The user would have been better off with an error message if EMPLOYEE.AGE did not have a value for a particular employee. All that the TRUE/FALSE/UNKNOWN 3-valued logic does is mask the failure of the programmatic code. My objection to this, however, is not as strong as my objection to the SQL, which is supposed to be dealing with MEANINGS, i.e., with propositions believed to be true.

When the 3VL SQL DBMS Pretends That UNKNOWN means TRUE

Now suppose that an employee is inserting data into DEADBEAT_CUST. The employee leaves some of the AMOUNT_OWED fields null. DEADBEAT_CUST has a check constraint to reject any candidate rows in which the AMOUNT_OWED value is less than $1.00. In this case, the SQL DBMS pretends that UNKNOWN here means TRUE -- it accepts the row for inclusion in the table even though it is not known whether the row matches the check constraint or not.

The rationale behind pretending the check predicate evaluates to true is perhaps this: in cases such as the EMP table in Oracle's SCOTT schema, the COMM (COMMISSION) field (generally? always?) has a value when the employee is a sales representative, but is left null when the employee is not a sales representative. Any check constraint on this column would then create the possibility that a non-sales-representative row would be rejected from insertion into the table if the DBMS pretended that the check predicate had evaluated to FALSE. Therefore pretending, in the case of the failed evaluation of check predicates, that the predicate has evaluated to TRUE serves the "practical" purpose of enabling a shoddy design for a SQL table, one that mushes together things (regular employee versus sales representative) that should be kept distinct. In this case, we err on the side of getting all the true rows into the table.

TRUE means accept; FALSE means reject; UNKNOWN can mean either accept or reject, depending on factors that have nothing to do with facts about the entity whose information we are storing in SQL tables.

Interpretation of UNKNOWN Is Arbitrary Because It Is In Fact Not A Truth Value

More On What A Query Means

Anyone who has read any in the HEAD FIRST series of tutorial books will be familiar with the sections labeled 'BE the Compiler.' These sections are always accompanied by a picture of a college student conspicuously in concentrated Eastern-religion-style meditation. For the first query above, it is illuminating to BE the DBMS (adopting either a Western or an Eastern meditation style is perfectly kosher).

Max Black on why when we are beset by wolves, it is always by a pack of wolves, never by a set (much less by -- God Forbid -- the set) of wolves. A set is completely abstract; it is the naming of some things all at once. A pack is concrete; it is not fully characterized just by the individual wolves, because we also have to talk about the wolves' hierarchy, how they communicate with one another, how they interact with one another to surround one in the deep snow in the dark woods then bring one down....

With a set, one is engaging in pure thought. I assume the thought can be mechanized because it is formal, but one doesn't have to get into the plumbing details of the mechanics. With SQL, one's thought processes do have to get down to the mechanics; he has to BE the DBMS. Procedural -- one has to process the rows one by one. And after one has gone through all this trouble, these gaps, these blank spaces, keep popping up in his thought whenever he encounters a missing value.

The analogy: SQL with nulls is like the Piero della Francesca fresco with huge gaps -- except here the difference is also illuminating, since one can still get the gist of the painting. With SQL, nulls are like hitting a blank in one's thought: the identifier is completely missing, therefore one cannot represent it and perform the mental juggling acts, the mental manipulations that will produce further, derived propositions. Inability to represent, therefore inability to manipulate, do a calculus, do an algebra to produce further, derived propositions. Inability to represent, either mentally or mechanically.

Notice the parallel here. The DBMS can't rest content with its previous state (predicate is UNKNOWN); the goal of evaluating the predicate attached to WHERE NOT IN is precisely to convert the initial UNKNOWN to 'TRUE (there is no EMPLOYEE_NAME in SALES_REPRESENTATIVE with the value 'SMITH')' or FALSE (there is an EMPLOYEE_NAME in SALES_REPRESENTATIVE with the value 'SMITH').' We demand that the DBMS not just stand there; it has to DO something. It must have seemed impractical to get rid of the nulls, the cause of the failure, the door jambs so to speak. Just as the poor will always be with us, goes the thinking, so missing information will always be with us. So the thing the DBMS has to do is NOT return a message to the effect of 'Failure -- sorry, couldn't determine what if any rows would answer to the query.' So to produce a definite result, to DO something, the DBMS pretends that the result of 'SMITH is not present in SALES_REPRESENTATIVE' is FALSE. Why FALSE rather than TRUE? Maybe nothing more than psychology: UNKNOWN and FALSE are both negative, TRUE is positive.

The PL/SQL case, in which the effect of UNKNOWN is the same as that of false. TRUE vs. (FALSE or UNKNOWN) treated the same as TRUE vs. FALSE.

¹Uncle Foonjie is the relative Ralph calls upon whenever he ... has an issue . . . with someone. (I don't know if he mentions this in his Business Ethics books.) Never knew there was a Finnish mafia, did you?
¹I argue here that this lack of access does not alter the content of the proposition represented by the tuple.
²I think this should be uncontroversial to everyone with the possible exception of L., who, after all, did think for a while that the proposition 'In base ten 1 + 1 = 2' could be false given the "reference frame" of some group that strongly believes it is false.