\[Editor's Note: Tim Huckaby adapted this article from a chapter he wrote on Directory Services for Professional Windows DNA: Building Distributed Web Applications with COM+ by Christopher Blexrud, et al. (Wrox Press, 2000).\]
Many SQL Server programmers don't realize that they have a powerful and feature-rich data storage mechanism available to them. It's not SQL Server or a relational database; it's the Windows 2000 Active Directory (AD). Because Microsoft recently shipped Exchange 2000 Server, which requires using Win2K Server, Win2K Advanced Server, or Win2K Datacenter Server, software developers will see AD technologies proliferate during the next year. This article shows you how and why to develop software to search, read from, and write to AD. It also shows you which type of data you should store in AD and which kind is right for SQL Server.
About Active Directory
By definition, a directory service stores information about entities on a network. Software developers make this information available to users and network administrators. AD provides users with appropriate permissions access to network resources using a single logon process. AD also gives network administrators an intuitive, hierarchical network view and one administration point for all network objects, including the users who authenticate on AD.
AD's features fall into two categories:
- Authentication. AD provides the location for authenticating securely within the OS.
- Attributes. AD provides an enterprise-available data store for persisting data, called attributes. By design, directory services store information about your users. AD goes further by also storing application and enterprise data. Every entity in AD is an instance of a Class, and every entity consists of a set of attributes that define the entity. The Class of these entities determines which of the attributes possible in the AD schema these entities can have. The attribute definition includes its syntax, which specifies the type of value it can have.
AD's Attribute Features
Attributes describe objects within AD. The types of objects, or pieces of data, stored within AD fall into three main categories: user information, application information, and enterprise information. Attributes in a directory service are essentially synonymous with columns in a SQL Server database table, but the directory service organizes them much more rigidly. It puts them into classes. AD's User class, for example, contains attributes specific to a user, such as name and address. The directory service stores attributes as name/value pairs. For instance, if you persist the value of a user's email address in AD, you persist the attribute name, mail, and value (e.g., TimHuck@InterKnowlogy.com) together.
Unlike a relational database such as SQL Server, where table columns exist regardless of whether they are null, if you don't persist an attribute in a directory service, you don't persist the space for the attribute's name either, eliminating the wasted space. AD's hierarchical structure supports this space-saving approach.
In addition, the attributes in a directory service can be single-valued or multivalued (such as an array). For example, the userGroups attribute is multivalued because it contains the names of the multiple groups that a user belongs to. In a relational database such as SQL Server, you would simply create a table called groups with a primary key that ties you back to the user table. But AD, like all directory services, has no concept of relationships with primary or secondary keys.
As in a relational database, all AD attributes have a syntax that determines the kind of data that the attribute can store. The AD schema allows a fixed number of syntaxes. You can't add new or custom-made syntaxes to AD, as you can to SQL Server. Many software developers view this restriction as beneficial because they need not worry about casting to user-defined types. I have adapted the following tables from the Microsoft Developer Network (MSDN) platform software development kit (SDK) under Directory Services to show you the 23 valid AD attribute syntaxes. Table 1 shows the syntax for simple data types; Table 2 shows the syntax for string data types; Table 3 shows the syntax for the time data types; and Table 4 shows the syntax for the object reference data types.
Attributes can have length or range constraints. For attributes with numeric syntax, the range specifies the minimum and maximum value. For attributes with string syntax, the range specifies the minimum and maximum length. You can also index AD attributes. Indexing attributes helps AD queries more quickly find the objects that persist data in the searched attribute.
Directory Service Structure
The biggest difference between a database such as SQL Server and a directory service such as AD is that a directory service is not relational. Directory services are hierarchical and support object inheritance. Every object in AD is an instance of a particular class. A class defines the attributes available to an instance of that particular class. A class, such as User, defines properties such as first and last name. An instance of a class, such as TimHuck, has AD properties called attributes: firstname=Tim lastname=Huckaby. In AD, classes follow a simple inheritance model. When defining a class, you must declare a parent class, which defines what attributes the new class implicitly inherits. Figure 1 shows an example of the class hierarchy for the User class. Note that the class at the peak of the hierarchy tree is an object called Top; this class is the only one without a parent class.
The AD schema is a description of the object classes and attributes that AD contains. For each object class, the schema defines the attributes that the object class must have, the additional attributes it might have, and the object class that can be its parent. For example, the Person class has one mandatory attribute, cn, and four optional attributes: seeAlso, sn, telephoneNumber, and userPassword, as Figure 2 shows. The Parent for the Person class is the Top class.
The AD schema is extensible, and you can update it dynamically. For example, software developers can write applications that extend the schema with new attributes and classes, and they can use the extensions immediately in their applications. Like every object in AD, schema objects have an ACL, so only authorized users can alter the schema. When you access the schema programmatically, you can modify the schema or read which attributes a particular class instance should have.
Each class definition within AD specifies:
- The structure rules that determine the class's superclass or parent class. AD needs these rules to determine where to implicitly inherit.
- The list of attributes that can exist in an instance of that class. For example, the User class, where domain users exist, has attributes specific to a user, such as an email address. An attribute that determines a printer's color-printing capabilities won't apply in the User class.
- Which attributes are mandatory (mustContain). Username and Password are examples of mustContain attributes in the User class. You can't create a user without persisting mustContain attributes.
- Which attributes are optional (mayContain). The attributes relating to the user's home address are examples of mayContain attributes in the User class. You can create a user without persisting mayContain attributes.
AD uses entities, often called objects (these objects aren't the same as COM objects; they're simply entities of data), that represent network resources, such as users, groups, machines, devices, and application settings. Like all directory services, AD uses containers to represent organizations, such as the marketing department, or collections of related objects, such as printers. AD organizes information in a hierarchical, tree-like structure that comprises these objects and containers, similar to the way you use folders and files to organize information on your computer.
The AD hierarchy is flexible and configurable, supporting the organization of resources in a way that optimizes usability and manageability. In Figure 3, containers represent collections of users, machines, devices, and applications on my Win2K network. You can nest containers—the Users container nests the Admin and Engineer containers—to accurately reflect a company's organizational structure. Nesting objects in AD makes objects more manageable on a macro level (as collections) than handling them one-by-one. This approach increases AD management efficiency because it lets you manage in organized groups and leverage the availability of those objects.
Directory services have object-oriented properties. In AD, all schema classes ultimately come from the special class called Top. With the exception of Top, often called the top-level container, all classes are subclasses of some other class. Inheritance lets you build new classes from existing classes. The original class becomes the new class's parent or superclass. A subclass inherits the parent's attributes, including structure rules and content rules. A class can inherit attributes from more than one parent.
Structure rules define the possible hierarchical relationships between objects in AD. Each class schema object's possible superiors (PossSuperiors) attribute expresses the structure rules. PossSuperiors lists all the possible superclass (parent) objects from which a particular object can inherit attributes.
For example, my company creates new Customer Extranet User classes that define information about our customers who don't specifically or physically work at corporate headquarters. Attributes specific to these extranet users might include the company they work for, the location where they work, project plans, vision scopes, design documents, specs, and any network-specific information regarding how users connect to our corporate LAN. We specify that the Customer Extranet User class is a subclass of the User class. Because the Customer Extranet User class inherits all the mustContain and mayContain attributes of the User class and the attributes of all the User class's parent classes, we don't have to define these attributes. The only attributes we need to define are those specific to our customers' projects, their location, and network connectivity, which are unique for all instances of the Customer Extranet User class.
Every object in AD has a distinguished name (DN). The DN identifies the domain that holds the object, as well as the complete path through the container hierarchy by which the object is reached. A typical DN might be: CN=Tim Huckaby,CN=Users,DC=InterKnowlogy,DC=com. This DN identifies the "Tim Huckaby" User object in the InterKnowlogy.com domain.
Directory services scale horizontally through partitioning, across machines and multinode replication. Availability in this case doesn't mean 24 — 7, which is a feature of Win2K; it means an enterprise-available data store for user and application data. Win2K completely integrates AD, so the security is free—you no longer need to code a user database and security for your enterprise applications. And AD is always network-available without any special configuration. You don't need Data Source Names (DSNs) as you do when connecting with code to SQL Server.
Which Data Goes Where
Deciding which data goes into the relational database and which data goes into the directory service is a source of heated debate. No clear-cut formula for making the distinction exists, but some concepts can make the decision easier. Developers designed relational databases to spread performance evenly throughout the process of writing and reading data; let's call this balancing the Relational Database 50-50 Rule. Developers do an enormous amount of engineering to ensure that writes to the database are just as quick as reads. Microsoft's SQL Server team has succeeded in this task. The software development community has been programming relational databases so long that it takes the 50-50 rule for granted. However, developers designed directory services quite differently.
Microsoft designed AD's hierarchical structure to attain inheritance and other object-oriented properties and for lightning-quick reads. The Microsoft Internet Information Server (IIS) metabase and the Win2K registry are good examples of structures that Microsoft designed hierarchically to support fast read access. The drawback, or by-product, is that the lightning-fast reads occur at the expense of some latency in the writes. Let's call this by-product the Directory Service 80-20 Rule., This rule implies that reads are four times faster than writes, although the actual performance difference between reads and writes depends on the type of directory service, the resources available, the hierarchical structure's format, and other factors.
In addition, Lightweight Directory Access Protocol (LDAP—the high-level protocol that services use to access AD) servers have features that facilitate hosting directory services completely in memory. This approach is similar to the concept of in-memory databases. Hosting directory services completely in memory provides dramatic speed increases in reads and writes. So, as to which data becomes an AD attribute and which becomes a SQL Server field (or column):
- Data that changes infrequently and benefits from being enterprise-available, such as data specific to the user (username, password, email address, home address, etc.) is more appropriate for the directory service.
- Network resources, such as printers and computers, are more appropriate for the directory service.
- Software application settings, such as language settings and look-and-feel settings, are more appropriate for the directory service.
- Data that changes very frequently and doesn't need to be available to all enterprise software applications is more appropriate in a table in a relational database such as SQL Server.
Let's consider the example of designing a business-to-consumer (B2C) e-commerce Web application. As the software developer or architect, you'll want to profile your users, when possible, by authentication. You'll explicitly force them to persist data about themselves when they register on the Web site: name, address, gender, favorite hobbies, age, income, etc. This type of data, which the users enter once and rarely change but which the business reads quite frequently, is perfect for the directory service.
In addition, you'll want to implicitly profile your users. Behind the scenes in the Web application, you'll want to track the URLs and the other resources on your Web site that the user touches. You might also want to track the frequency with which a user on your Web site looks at particular products. And you might want to implicitly track referrals, where a visitor, or user, on your Web site comes from or where they go when they leave. Because you can't control the frequency of this type of data and it has the potential for a massive number of writes, this data is perfect for a relational database such as SQL Server.
I'm not advocating an either/or scenario. You would rarely write a Web application that only uses a directory service such as AD without also using a relational database such as SQL Server. Most likely, you would use them both and persist the data where it seems most applicable—high-volume, frequently written data goes in SQL server. Once written, frequently read data goes into AD. In addition, any data that relates to authentication or authorization, such as data about the users, goes into AD. These rules aren’t hard and fast; they're bendable, depending on the circumstances, but they provide guidelines for your implementations.