Learning the foundations of any profession will help you better understand your job and in turn further your career—and database administration is no exception. Database professionals who write a lot of T-SQL code should explore T-SQL’s foundations. SQL Server’s T-SQL is based on standard SQL, which is based on the relational model, which in turn is based on mathematical foundations (i.e., set theory and predicate logic). In this article I discuss a fundamental topic in set theory: the properties of relations on sets. I present T-SQL code that you can use to identify those properties; however, I encourage you to write your own T-SQL code to determine whether a relation has a particular property, before you use my code.

Sets and Relations

Georg Cantor, the creator of set theory, defines a set as “any collection M into a whole of definite, distinct objects m (which are called the “elements” of M) of our perception or of our thought.” The elements of a set can be anything: people, numbers—even sets themselves. The symbols ∈ and ∉ are operators that express set membership and nonmembership, respectively. So the notation x ∈ V means that x is a member of V, and the notation x ∉ V means that x isn’t a member of V.

A binary relation on a set is a collection of ordered pairs of elements of the set. That is, for a set of elements V = \{a, b, c\}, a binary relation R on the set V is any subset of the ordered pairs in the Cartesian product V × V = \{(a, a), (a, b), (a, c), (b, a), (b, b), (b, c), (c, a), (c, b), (c, c)\}. So, for example, say that R = \{(a, b), (b, c), (a, c)\}. R is a valid relation on V. You can say that a is related to b by R. Suppose that R = \{(a, b), (b, c), (c, d)\}. R isn’t a valid relation on V because the ordered pair (c, d) isn’t a member of the Cartesian product V × V. Note that the order of elements in a set isn’t important. V can be expressed as \{a, b, c\} as well as \{b, a, c\}, and so on. However, the order of an ordered pair—for example, (a, b)—is important, so (a, b) ≠ (b, a).

As a more tangible example of a binary relation on a set, suppose that F is the set of my family members \{Itzik, Mickey, Ina, Mila, Gabi\}. Mickey is my twin brother; Ina is my elder sister; Mila is my mom; and Gabi is my dad. An example of a relation R on the set F would be “is a brother of.” The members of this relation are \{(Itzik, Mickey), (Mickey, Itzik), (Itzik, Ina), (Mickey, Ina)\}. Observe that the ordered pair (Itzik, Ina) appears in R, but (Ina, Itzik) doesn’t. Although I’m a brother of Ina, she’s not my brother.

Properties of Relations on Sets

Now that we have some background about sets and relations, let’s proceed to the focus of the article—properties of relations on sets. For sample data, use the code in Listing 1 to create the tables V and R. V represents a set, and R represents a binary relation on V. Use the code in Listing 2 to create the procedure ClearTables, which you can use to clear both tables before populating them with new sample data. Use the code in Listing 3, Listing 4, and Listing 5 to populate the tables with different sets of sample data for your tests (call the sets Sample Data 1, 2, and 3, respectively).

Reflexive. A relation R on a set V is reflexive if whenever v ∈ V (meaning, v is a member of V), then (v, v) ∈ R (meaning, (v, v) belongs to R). A relation R on a set V is not reflexive if there exists some v ∈ V, such that (v, v) ∉ R. Consider again the set of my family members F. The relation “has the same age as” on F is reflexive because every person in the set has the same age as himself or herself. The members of this relation are \{(Itzik, Itzik), (Itzik, Mickey), (Mickey, Mickey), (Mickey, Itzik), (Ina, Ina), (Mila, Mila), (Gabi, Gabi)\}.

Let’s proceed to writing a T-SQL query against the tables V and R (representing a set and a relation on the set) to check whether the relation R on the set V is reflexive:

SELECT
  CASE
    WHEN EXISTS
      (SELECT v, v FROM dbo.V
       EXCEPT
       SELECT r1, r2 FROM dbo.R)
    THEN 'no'
    ELSE 'yes'
  END AS reflexive

The first query in the EXCEPT set operation returns a set with an ordered pair (v, v) for every row in V. The second query returns a set with an ordered pair (r1, r2) for every row in R. The EXCEPT set operation returns all ordered pairs that appear in the first set but not the second. The EXISTS predicate checks whether at least one row exists in the result of the set operation. If at least one such row exists, the CASE expression returns 'no' (not reflexive), otherwise 'yes' (reflexive).

Take a look at the three sample data sets in Listing 3, Listing 4, and Listing 5, and try to determine without running the query in which cases the relation is reflexive. I provide the outputs later in the article.

Irreflexive. A relation R on a set V is irreflexive (not to be confused with not reflexive) if for every element v ∈ V, it follows that (v, v) ∉ R. A relation is not irreflexive if there exists some v ∈ V, such that (v, v) ∈ R. An example for an irreflexive relation on the set of my family members would be “is a parent of” because a person can’t be a parent of himself or herself. The members of this relation are \{(Mila, Itzik), (Mila, Mickey), (Mila, Ina), (Gabi, Itzik), (Gabi, Mickey), (Gabi, Ina)\}.

Following is a query that checks whether R on V is irreflexive:

SELECT
  CASE
    WHEN EXISTS
      (SELECT * FROM dbo.R
       WHERE r1 = r2)
    THEN 'no'
    ELSE 'yes'
  END AS irreflexive

We have foreign keys in place to ensure that only members of the set V can appear in the attributes r1 and r2 of R. So the only thing left to check is whether there’s a row in R where r1 is equal to r2. If such a row exists, the relation is not irreflexive; otherwise, it is irreflexive.

Symmetric. A relation R on a set V is symmetric if whenever (r1, r2) ∈ R, then (r2, r1) ∈ R. A relation is not symmetric if there exists some (r1, r2) ∈ R, such that (r2, r1) ∉ R. In the set of my family members, the relation “is a sibling of” is symmetric. The members of this relation are \{(Itzik, Mickey), (Itzik, Ina), (Mickey, Itzik), (Mickey, Ina), (Ina, Itzik), (Ina, Mickey)\}.

Following is a query that checks whether the relation R on the set V is symmetric:

SELECT
  CASE
    WHEN EXISTS
      (SELECT r1, r2 FROM dbo.R
       EXCEPT
       SELECT r2, r1 FROM dbo.R)
    THEN 'no'
    ELSE 'yes'
  END AS symmetric

The code uses the EXCEPT set operation. The first query in the set operation returns a set with an ordered pair (r1, r2) for every row from R, and the second query returns an ordered pair (r2, r1) for every row from R. If the relation R on the set V is not symmetric, the EXCEPT set operation returns a nonempty set, in which case the EXISTS predicate returns TRUE, and the CASE expression returns 'no.' If the relation is symmetric, the CASE expression returns 'yes.'

Asymmetric. A relation R on a set V is asymmetric (not to be confused with not symmetric) if for every (r1, r2) ∈ R, where r1 ≠ r2, it follows that (r2, r1) ∉ R. An example of an asymmetric relation on the set of my family members is “is a parent of,” as described earlier. As an interesting exercise, try to think of an example in which a relation on a nonempty set is simultaneously symmetric and asymmetric. (Query the sample data in this article for a solution.)

SELECT
  CASE
    WHEN EXISTS
      (SELECT r1, r2 FROM dbo.R WHERE r1 <> r2
       INTERSECT
       SELECT r2, r1 FROM dbo.R WHERE r1 <> r2)
    THEN 'no'
    ELSE 'yes'
  END AS asymmetric

The code uses an INTERSECT set operation. The first query in the set operation returns an ordered pair (r1, r2) for every row in R where r1 <> r2. The second query in the set operation returns an ordered pair (r2, r1) for every row in R where r1 <> r2. If the intersection of the sets yields at least one row, it means that R is not asymmetric; otherwise, R is symmetric.

Transitive. A relation R on a set V is transitive if whenever (a, b) ∈ R and (b, c) ∈ R, it follows that (a, c) ∈ R. An example of a transitive relation on the set of my family members is the relation “is a sibling of,” as described earlier.

The following code checks whether the relation R on the set V is transitive:

SELECT
  CASE
    WHEN EXISTS
      (SELECT *
       FROM dbo.R AS RA
         INNER JOIN dbo.R AS RB
           ON RA.r2 = RB.r1
         LEFT OUTER JOIN dbo.R AS RC
           ON RA.r1 = RC.r1 AND RB.r2 = RC.r2
       WHERE RC.r1 IS NULL)
    THEN 'no'
    ELSE 'yes'
  END AS transitive

The code first uses a self inner join between two instances of R to filter only rows where r2 in the first instance is equal to r1 in the second. Then the code uses a left outer join to a third instance of R where r1 in the first instance equals r1 in the third, and r2 in the second instance is equal to r2 in the third. If an outer row exists (r1 in the third instance is NULL), it means that the relation is not transitive; otherwise, the relation is transitive.

Equivalence relations. An equivalence relation is a relation that is reflexive, symmetric, and transitive. You can use the queries I provided earlier to check each property separately; if the relation has all three properties, you can report that it’s an equivalence relation. Alternatively, you can use the code in Listing 6 to report all properties of the relation R on the set V that I discuss in the article, including whether the relation is an equivalence relation. If you run the code in Listing 6 for the sample data sets 1, 2, and 3 provided in Listing 3, Listing 4, and Listing 5, respectively, you get the outputs shown in Table 1, Table 2, and Table 3, respectively.

Back to T-SQL Roots

In this article I discuss a fundamental topic from mathematical set theory—properties of relations on sets. I use my own queries to check the properties of a relation represented by a table called R on a set represented by a table called V. Going back to the roots of the T-SQL language helps me adopt the correct mindset to better understand the properties of relations on sets.

Table 1: Output of Query for Sample Data 1

reflexive irreflexive symmetric asymmetric transitive equivalence
yes no no yes yes no

Table 2: Output of Query for Sample Data 2

reflexive irreflexive symmetric asymmetric transitive equivalence
yes no yes no yes yes

Table 3: Output of Query for Sample Data 3

reflexive irreflexive symmetric asymmetric transitive equivalence
no yes yes yes yes no

Listing 1: DDL for Set V and Relation R on V

SET NOCOUNT ON;

USE tempdb;

IF OBJECT_ID('dbo.R', 'U') IS NOT NULL DROP TABLE dbo.R;

IF OBJECT_ID('dbo.V', 'U') IS NOT NULL DROP TABLE dbo.V;

CREATE TABLE dbo.V

(

  v VARCHAR(10) NOT NULL

    CONSTRAINT PK_V PRIMARY KEY

);

CREATE TABLE dbo.R

(

  r1 VARCHAR(10) NOT NULL

    CONSTRAINT FK_R1_V REFERENCES dbo.V,

  r2 VARCHAR(10) NOT NULL

    CONSTRAINT FK_R2_V REFERENCES dbo.V,

  CONSTRAINT PK_R PRIMARY KEY(r1, r2)

);

Listing 2: Script to Create the ClearTables Procedure

IF OBJECT_ID('dbo.ClearTables', 'P') IS NOT NULL

  DROP PROC dbo.ClearTables;

GO

CREATE PROC dbo.ClearTables

AS

BEGIN TRAN

  ALTER TABLE dbo.R DROP CONSTRAINT FK_R1_V, FK_R2_V;

  TRUNCATE TABLE dbo.R;

  TRUNCATE TABLE dbo.V;

  ALTER TABLE dbo.R ADD

    CONSTRAINT FK_R1_V FOREIGN KEY(r1) REFERENCES dbo.V,

    CONSTRAINT FK_R2_V FOREIGN KEY(r2) REFERENCES dbo.V;

COMMIT TRAN

GO

Listing 3: Sample Data 1

EXEC dbo.ClearTables;

INSERT INTO dbo.V(v) VALUES

  ('a'),('b'),('c');

INSERT INTO dbo.R(r1, r2) VALUES

  ('a','a'),

  ('b','b'),

  ('c','c'),

  ('a','b'),

  ('b','c'),

  ('a','c');

Listing 4: Sample Data 2

EXEC dbo.ClearTables;

INSERT INTO dbo.V(v) VALUES

  ('a'),('b'),('c');

INSERT INTO dbo.R(r1, r2) VALUES

  ('a','a'),

  ('b','b'),

  ('c','c'),

  ('a','b'),

  ('b','c'),

  ('a','c'),

  ('b','a'),

  ('c','b'),

  ('c','a');

Listing 5: Sample Data 3

EXEC dbo.ClearTables;

INSERT INTO dbo.V(v) VALUES

  ('a'),('b'),('c');

Listing 6: Code to Return Properties of Relation R on V

WITH BaseProperties AS

(

  SELECT

    CASE

      WHEN EXISTS

        (SELECT v, v FROM dbo.V

         EXCEPT

         SELECT r1, r2 FROM dbo.R)

       THEN 'no'

       ELSE 'yes'

    END AS reflexive,

    CASE

      WHEN EXISTS

        (SELECT * FROM dbo.R

         WHERE r1 = r2)

       THEN 'no'

       ELSE 'yes'

    END AS irreflexive,

    CASE

      WHEN EXISTS

        (SELECT r1, r2 FROM dbo.R

         EXCEPT

         SELECT r2, r1 FROM dbo.R)

       THEN 'no'

       ELSE 'yes'

    END AS symmetric,

    CASE

      WHEN EXISTS

        (SELECT r1, r2 FROM dbo.R WHERE r1 <> r2

         INTERSECT

         SELECT r2, r1 FROM dbo.R WHERE r1 <> r2)

       THEN 'no'

       ELSE 'yes'

    END AS asymmetric,

    CASE

      WHEN EXISTS

        (SELECT *

         FROM dbo.R AS RA

           INNER JOIN dbo.R AS RB

             ON RA.r2 = RB.r1

           LEFT OUTER JOIN dbo.R AS RC

             ON RA.r1 = RC.r1 AND RB.r2 = RC.r2

         WHERE RC.r1 IS NULL)

       THEN 'no'

       ELSE 'yes'

    END AS transitive

)

SELECT *,

  CASE

    WHEN  reflexive  = 'yes'

      AND symmetric  = 'yes'

      AND transitive = 'yes'

     THEN 'yes'

     ELSE 'no'

  END AS equivalence

FROM BaseProperties;