You are here: Users Guide > Database Design

4.1 Introduction

A database design is a description of the data that will be stored in a database and the relationships that will exist within that data. The design of a particular database is driven by the requirements of the application it will support and is therefore a major part of the total application design process.

Our purpose in this chapter is to provide you with the information necessary to design RDM Embedded databases. If you carefully study this chapter and the example database design, you should be able to design a workable RDM Embedded database, even if you have no database design experience.

The chapter begins with a detailed description of the RDM Embedded Database Definition Language (DDL). Each DDL statement is explained and illustrated through examples. The operation of the DDL processor is explained followed by a section on database design considerations. The final section builds on the tims database example introduced in Operational Overview by expanding the requirements and describing how the database design supports those requirements.

4.2 Database Definition Language (DDL)

4.2.1 DDL Basics

An RDM Embedded database design is specified in the Database Definition Language. One DDL specification file exists for each RDM Embedded database. In general there is one database per application, although some applications may require the use of several databases. (The Logical Design Considerations section discusses multiple database design.) A DDL specification identifies the database, defines the files that comprise the database, and contains declarations for all record types, data and key fields, and set relationships that will exist in the database.

Below is an example DDL specification. It is an expanded version of the checking account schema given in the Operational Overview. It is provided here to illustrate some additional DDL statements and to serve as a baseline example in the descriptions that follow. A complete DDL syntax summary can be found in Utility Descriptions of the RDM Embedded Reference Manual.

database ckngacct[512]
{
	timestamp records budget;

	data file datfile = "chkg.dat" contains budget, check;
	key file[1024] keyfile1 = "chkg.k01" contains code;
	key file[2048] keyfile2 = "chkg.k02" contains check_no, check_date;

	record budget
	{
		unique key char code[6];
		char cat_desc[48];
		float alloc;
		float balance;
	}

	record check
	{
		unique key int check_no;
		key int check_date;
		char paid_to[48];
		float amount;
	}

	set transactions
	{
		order last;
		owner budget;
		member check;
	}
}

The DDL specification is stored in a text file and is created using your usual text editor. Input is free form with comments specified, as in C, between /* */ pairs.

Nested comments are not allowed.

Identifiers are used to name the database, files, records, fields, and sets. They are formed, as in C, from any combination of letters, digits, and underscores (_) beginning with a letter.

File, record, and set statements may be interspersed provided that:

Timestamp statements may appear anywhere within a DDL specification.

General practice is to place all file statements first, followed by all timestamp statements, followed by all record type declarations, followed by all set declarations. The ddlp can, optionally, produce a record, field, and set name cross-reference listing to help you locate names contained in a long DDL specification (see  DDL Processor Operation).

4.2.2 Database Statement

Syntax:

database dbname [pgsize] [ inmemory [ persistent | read | volatile ]]
        {     ddl statements ... }

Description:

The database statement is used to specify the name of the database and, optionally, the default page size of all files.

The name of the database dbname is an identifier and is used by ddlp to form the names of the database dictionary file and the C or C++ header file. The dictionary file is named dbname.dbd and the header file is named dbname.h. Note that the length of this name can be up to 31 characters.

The database name, dbname, is passed to function d_open to identify the name of the database to be opened and subsequently accessed. It is also passed to utilities such as initdb to identify the database to be operated on (initialized) by the particular utility.

The RDM Embedded data, vardata, blob and key files are blocked or divided into fixed-length pages, each containing as many record or key occurrences as will fit on a page. The pgsize parameter determines the default database page size in bytes. This value should, for performance reasons, always be a multiple of the basic block size for your operating system (a multiple of 512 will work for most systems). If not specified, the default database page size is at least 1024 bytes, but will be the first number divisible by 512 that is large enough to contain the largest record. …

By default the database files are created on the file system, but you can specify that they should be created in shared memory instead, by adding the inmemory qualifier to the database definition. The read, persistent and volatile options control whether the database files are read from disk when the database is opened, and whether they are written to disk when the database is closed.

Examples:

database ckngacct[512] { ... } 

For the checking account example, the name of the database is ckngacct and it has a default database page size of 512 bytes. The name of the dictionary file created by ddlp is ckngacct.dbd. The name of the C header file created by ddlp is ckngacct.h.

database acctsrec { ... } 

The above is a possible database statement for an accounts receivable database. Since no page size parameter is specified the default database page size is 1024 bytes. The dictionary will be stored in file acctsrec.dbd and the C header file will be named acctsrec.h.

4.2.3 Timestamp Statement

Syntax:

timestamp records [ recname[, recname...] ] ;
timestamp sets [ setname[, setname...] ] ;

Description:

Timestamping is a multi-user database technique used to detect possible changes to record and set occurrences since the last time they were accessed by a particular user. Refer to Timestamping for a complete discussion of the use of timestamping.

Timestamp statements may appear anywhere within a DDL specification.

Record timestamping adds eight bytes of additional disk space overhead to each timestamped record slot. Set timestamping adds four bytes of additional overhead to each set pointer contained in set owner records. Physical Design Considerations describes all record and set space overhead.

Examples:

timestamp records budget; 

In the checking account database, only the budget record type is to be timestamped. This allows an appraisal to see if another user has modified or deleted a particular budget record before a check is written to the database, which updates the balance for that budget category.

timestamp records; 
timestamp sets;
 

The database for the above example requires that all record and set types be timestamped.

4.2.4 File Declarations

data file [[pgsize]] [fileid =] filename
        [inmemory [persistent | volatile | read]] [initial=pages] [next=pages] [maxpgs=pages]
        [maxslots=slots]
        contains recname[, recname]... ;

key file [[pgsize]] [fileid =] filename
        [inmemory [persistent | volatile | read]] [initial=pages] [next=pages] [maxpgs=pages]
        contains [recname.]keyfld[, [recname.]keyfld]... ;

[compact] vardata file [[pgsize] [fileid =] filename
        [inmemory [persistent | volatile | read]]  [initial=pages] [next=pages] [maxpgs=pages]
        contains [recname.]varcharfld[, [recname.]varcharfld]... ;

blob file [[pgsize] [fileid =] filename
        [inmemory [persistentvolatileread]]  [initial=pages] [next=pages] [maxpgs=pages]
        contains [recname.]blobidfld[, [recname.]blobidfld]... ;

Description:

File declarations identify the physical files to contain the data stored in the database.

The RDM Embedded data, vardata, blob and key files are blocked or divided into fixed-length pages, each contains many record or key occurrences as will fit on a page. Each slot contains a record occurrence, a key value, or part of a varchar or blob_id field. The optional pgsize parameter specifies the size for the file. If not specified, the page size for the file will be the default database page size. This value should always be a multiple of the basic block size of your operating system (a multiple of 512 will work for most systems). Otherwise, the operating system's file access performance will be impaired.

The name of the file, filename, is a string enclosed in quotation marks (") containing the physical (operating system) name of the file. If the name of the file is not qualified (that is, it does not include a directory name), applications using the database must be executed from within the directory containing the database files, subject to the environmental conditions discussed in Operational Environment.

Associated with a database file is an optional file, fileid, which is used to identify files within an application program independent of the physical name of the file. It is intended to be used for dynamic initialization of individual files using the d_initfile function (Dynamic Database Initialization) or for dynamically substituting files using the d_renfile function (Operational Environment).

Vardata files contain the string data associated with varchar fields, divided up into fixed-length slots. By default, each slot contains the database address of its "owner" record occurrence.

Vardata files defined as compact will not contain database addresses referring back to the "owner" records. This saves some space, making the files smaller, but it may make data recovery impossible in the event of database corruption. It also limits the ability of the dbcheck program to test the integrity of the database.

BLOB files contain the data associated with the blob_id fields. The BLOB is stored as a linked list of fixed-length pages. The record containing the blob_id field stores the address of the first page of the BLOB data in the blob file. There is a one-to-one correspondence between the BLOB data and the record occurrence containing the blob_id field referencing the BLOB data.

Files declared as inmemory will be stored in RAM, in shared memory segments. In-memory database files can be accessed simultaneously by multiple applications, provided these are all running on the system where the database is located. In-memory file options are listed in the table below.

In-memory
Keywords
Description
persistent

The contents of the in-memory "file" will be written back to the specified disk file when the last application closes the database.

readThe contents of the in-memory "file" will be read from the specified disk file when the first application opens the database.
volatileThe in-memory "file" will not be backed by a disk file. Depending on the operating system, it will cease to exist either when the last application closes the database (Windows) or when the system is rebooted (Unix).
initialSpecifies the initial size of the "file" when created by the functions d_initfile or d_initialize, or by the tool initdb. For in-memory files, the default value is 32 pages.
nextSpecifies the size of increments to the "file". When the file grows beyond its initial size, the amount of extra space allocated will be determined by this value. For in-memory files, this equates to the shared memory segment size – the default value is 32 pages.
maxpgsSpecifies the maximum size of the "file". For in-memory files, the default value is 2048 pages.

Creating a large number of records in an in-memory database fails with a S_BADWRITE error even though memory has not run out. For example, build and run the attached program and try to insert 200,000 records.The error is caused by the fact that RDM Embedded's internal limit on the number of shared memory segments it creates has been exceeded. The size of the segments allocated can be increased by using the "next" parameter in the DDL file definitions in the database schema. For example:

data file datafile="acct.d01"
         initial=1024 next=1024 maxpgs=16777216
         contains acct;

This creates segments of size 1024 pages, where the default page size is 1024 bytes.The default segment size is 32 pages. If the segment size is enlarged fewer segments are required to store a given amount of data.

The maxslots parameter specifies the maximum number of slots (record instances) RDM Embedded will allow in the data file. By default this value is determined by the size of database addresses in the database. For example, in a database with 32-bit database addresses 24 bits are used for the record slot number, implying a maximum of 16.7 million slots per file. The maxslots value can be set to any positive value smaller than this default. For files containing circular records the maxslots value must be specified explicitly in the file definition.

All record types defined in the DDL must be contained in a data file. All occurrences of the record types listed in the contains clause will be stored in that file. Occurrences of a given record type can only be stored in a single file. Each page in the data file consists of one or more fixed-length record. Each page of the record slot is based upon the size of the largest record type contained in the file. Smaller record types will occupy the same record slots, thus leaving unused space. You can define all record types to be contained in one file or you can have a separate data file for each individual record type. The choice is yours.

A special system-defined record named system may be listed as the first record type in the contains clause of one (and only one) of the data files. Only one occurrence of the system record exists in the database. It is used as the owner of any number of sets to provide the means for records to be connected to the top or root of the network. When the database is opened, the current record and the current owners of all system-owned sets are initialized to the system record.

All key fields defined in the DDL must be contained in a key file. All occurrences of the key fields listed in the contains clause will be stored in the file. Occurrences of a given key type can only be stored in a single file. Each page in the key file consists of one or more fixed-length key slots. The size of the key slot is based upon the size of the largest key field contained in the file. Smaller key fields will occupy the same slots, thus leaving unused space. You can define all key fields to be contained in one file or you can have a separate key file for each individual key field. The choice is yours.

The Physical Design Considerations section provides some guidelines to help you determine the best file organization for your particular application.

Examples:

data file datfile = "chkg.dat" contains budget, check;
key file[1024] keyfile1 = "chkg.k01" contains code;
key file[2048] keyfile2 = "chkg.k02" contains check_no;

The checking account database consists of one data file and two key files. The data file identified as datfile contains occurrences of both the budget and check record types in the physical file named chkg.dat. From the database statement the page size for datfile is 512 bytes. The key file identified a keyfile1 has a page size of 1024 bytes and contains the index for key field code in file chkg.k01. Key file keyfile2 has a page size of 2048 bytes and contains the index for key field check_no in file chkg.k02.

data file "/client/master.dat" contains system, master; 

The above file name specified in this data file statement includes the fully qualified path for file master.dat located in the client directory. In addition to the occurrences of the master record, this file will contain the system record.

key file "invnt.k01" contains stock.id_code;
key file "invnt.k02" contains bkorder.id_code;

The above example shows duplicate key names that are qualified by the record type in which they are defined. When duplicate field names are used, it is necessary to use the -d option with the ddlp command (see  DDL Processor Operation).

4.2.5 Record Declarations

Syntax:

[static] [circular] recordrecname {
        [field_stmt]
        ...
        [comkey_stmt]
        ...
}

Description:

The record statement defines a group of related data fields named recname that will be stored and accessed together on the database as a single unit. The record declaration consists of zero or more field a statements followed by zero or more compound key statements. A record specification with no data fields is valid, and often is used in the implementation of many-to-many sets (see Use of Sets).

The ddlp utility will use the identifier recname to create two C identifiers in the dbname.h file. A struct named recname, containing the C declarations for the data fields defined in the record will be declared in dbname.h. The name of the struct will identically match the name of the record as specified in the DDL. An upper-case form of recname will define an integer constant record number to be passed to those RDM Embedded runtime functions that manipulate records. Record names, therefore, should never be specified entirely in upper-case.

The static attribute specifies that the records are never updated when the database is opened in shared mode. This allows the RDM Embedded runtime to optimize access to the static record occurrence, yielding much better performance. Static records are used for storing information such as menus, data entry forms, help screens, system messages, coded value information, etc. Multi-user database programs are not required to place locks on this data in order to access it. Static records can only be modified when the database is opened in exclusive access mode. Single-user applications may modify static records at any time. The data and key files containing static record information can only contain static data. The system record is a special case and can be included in a data file that contains either static or not static records. Static records can only be used in sets which themselves are not modified except in exclusive access mode.

If a record is defined as circular, record slots are automatically re-used once the maximum number of slots has been stored. A circular record must be the only record type in its file. The definition of this file must include a maxslots value specifying the maximum number of slots in the file. Once this number of record instances has been stored, RDM Embedded will overwrite the oldest record instance whenever a new one is created. Instances of circular record types cannot be explicitly deleted with functions d_delete or d_disdel.

Example:

record check {
	unique key int check_no;
	int check_date;
	char paid_to[49];
	float amount;
}

The check record type in the checking account database contains two integer fields (one of which is a key), a character string field, and a floating point field. File chkgacct.h will contain the following definitions associated with the check record type:

struct check { 
	int check_no;
	int check_date;
	char paid_to[49];
	float amount;
}; 
#define CHECK 10001

Record trans below contains two compound key field declarations. (For more information, see Use of Keys).

record trans {
	unique key int checkno;
	int trdate;
	char vendid[8];
	long amount;
	compound key tr_key {
		trdate descending;
		vendid ascending;
	}
	compound key ven_chks {
		vendid; checkno;
	}
}

Below are two record types and a set that might be defined for a multi-user data entry, forms management system. Note that both records are declared to be static, since during normal operation of an application using the forms manager, form and field records are not modified.

static record form {
	unique key char form_id[11];
}
static record field {
	unique key char field_id[21]; /* Display name of field */
	int id_row; 	/* Row where field_id displayed */
	int id_col; 	/* Column where field_id displayed /
	int data_row; 	/* Row where data starts */
	int data_col; 	/* Column where data starts */
	int field_len; 	/* Editable field length */
	char required;	/* Required field */
	int edit_fcn; 	/* Editing function called on entry */
	int disp_fcn; 	/* Display function called on output /
	int rtype; 	/* Record type containing the field */
	long ftype; 	/* RDM Embedded field type */
}
set form_fields {
	order last;
	owner form;
	member field;
}

4.2.6 Field Declarations

Data Fields

Syntax:

[[optional] [unique] key[[key_dim]]] type fldname [ [dim]...] ;

or

[[optional] [unique] key] struct {
        field_stmt
        ...
} fldname [[dim]...] ;

type =
            [unsigned] int | [unsigned] short | [unsigned] long
            | [u]int8_t | [u]int16_t | [u]int32_t | [u]int64_t
            | [unsigned] charwcharwchar_t | varcharvarwchar
            | blob_id
            | float | double | DB_ADDR

Description:

The fldname is an identifier that names the particular data field. Field names may not duplicate record or set names, nor by default other field names.

The syntax for a data field statement is similar to, although not as general as, a C data declaration. In fact, most of the basic data types in C are directly supported in the DDL. Arrays of any type are available, where dim specifies the size of a given array dimension.

One-dimensional character arrays are treated by the RDM Embedded runtime functions as C strings. Thus, these fields should always be terminated by a null byte, and the length as specified in the field declaration should include the null byte. If a one-dimensional character array is needed that is not intended to be treated like a string (for example, a byte array), it should be declared as a two-dimensional array where the second dimension is one.

A data field that is also to be used as a key to the record has the key attribute. Key field values are stored on the key file in the natural order based on the data type. If only unique keys are allowed then the field should be qualified as a unique key field. The maximum length of a key field is 246 bytes. Fields defined as a unique key must contain a value that does not already exist on the key file at the time its associated record is entered into the database. If a record is entered (modified) with a duplicate value in a unique key field, the status code S_DUPLICATE is returned and the record is not entered (or modified).

The optional attribute indicates that the data field is an optional key. Optional key values are not inserted into the key file until the application program calls function d_keystore. Optional keys can be manually deleted using function d_keydel. When an optional key is modified (through a d_recwrite or d_crwrite call), the key file will be updated only if the current value exists in the key file.

Character string key fields are defined as "partial keys" if the value key_dim is smaller than the last array subscript dim. This means that the maximum size of the key values stored in the index will be smaller than the maximum field size. This reduces the physical size of the key files at the cost of performance, but does not otherwise affect the functionality of the field.

Fields of type varchar and varwchar are variable length fields. These data types are used in array definitions of string fields where a long maximum length is required, but where the field data actually stored will often be much shorter than the maximum. RDM Embedded will only allocate enough space in the database files for the data actually stored. This differs from the storage of char and wchar_t strings, where the defined maximum length is always allocated regardless of the data actually stored.

All varchar and varwchar fields are stored in vardata files. These files must be defined explicitly in the DDL schema.

Fields declared as struct cannot be nested. Sub-fields of an arrayed struct field cannot be defined as key fields nor accessed individually.

Data fields of type DB_ADDR contain the database addresses of specific record occurrences in the database. Database addresses can be directly accessed using the currency table access functions described in Currency Tables section. This allows you to maintain your own record linkages directly, if desired.

Data fields will be aligned within the record in order to match the struct field alignment rules followed by your particular compiler and computer.

Examples:

unique key char code[6]; 

Field code is a character string field of six characters long (five characters plus one null terminator) that is defined as a unique key.

unique key[16] varchar email[80]; 

Field email is a variable length character string field. Its maximum length is eighty characters (including null terminator), but only the first sixteen characters are stored in the key file. This field is defined as a unique key, which means that its values must be unique over their whole length, not necessarily in the first sixteen characters.

varchar description[32,256];

The definition of a varchar or varwchar field may contain a minimum length specifier. The minimum length definition affects the internal storage of the data, but not the functionality of the field. The description field in the example above has a minimum length of 32 bytes, This is the number of bytes that will always be allocated in the record (in the data file) for this field, even if the field is empty. This value will always be rounded up to a multiple of 4 bytes.

The minimum length also determines the number of pointers to vardata slots that can be stored in the field, and therefore affects the size of the vardata slots that RDM Embedded will allocate. The default minimum value is 20 bytes.

float balance; 

Field balance is a floating point field which, in the checking account database, contains a monetary value.

Key int check_date; 

Key field check_date is used to store a date in the application-defined Julian format (for example, number of elapsed days since January 1, 1900). Its values are stored on the key file in integer order. Thus, check records can be retrieved in check date order through use of the key retrieval functions (see Data Retrieval Using Keys).

struct {
	double imag;
	double real;
} complex[3];

This field statement defines a structure array field, complex, which stores an array of three complex numbers composed of an imaginary part and a real part.

key long coordinates[3]; 

Field coordinates is an array of three long type variables and is also a key. It may be used to locate an object on a large three-dimensional grid.

int bitmap; 

Field bitmap is used to store a bitmap of attribute flags, which are tested using binary operators and masks. For example, assuming this field was declared in record type rec, rec.bitmap & 0x0001 is non-zero if the low order bit is set.

RDM Embedded does not directly support C bit fields.

char byte_array[16][1]; 

Byte array fields are implemented, as in byte_array, as a two-dimensional character array where the second dimension is one. This will force the RDM Embedded runtime to manipulate all 16 bytes of the field rather than stopping at the first null byte as it does with string fields.

DB_ADDR ptr_array[20]; 

Field ptr_array is an array of type DB_ADDR. It is used to store an array of the database addresses of record occurrences that are related to the record type in which ptr_array is defined. Use of DB_ADDR fields provides unlimited data organization possibilities to the programmer. However, these alternatives should only be used in those rare instances when the standard capabilities provided by keys and sets are insufficient for a particular requirement.

optional key struct {
	char last_name[21];
	char first_name[21];
	char initial[2];
} name;

Name is a struct field to contain person names and is composed of three string fields for the last and first names and the middle initial. Last_name is the first field specified since, because name is a key field, the last name can be used in a partial key search to find, for example, all the Smiths on file. Thus, the order of the fields in a keyed struct field specifies the major and minor sort sequences for the key on the key file.

This field is defined as an optional key. Optional keys are often used to defer the overhead associated with storing keys to a time when the system is less busy. For example, it might be that the record which contains name needs to be stored as rapidly as possible during the day. At night a batch program can be run to create the optional keys.

Compound Key Fields

Syntax:

compound [optional] [unique] key[key_dim] keyname {
                fldname [ asc[ending] | desc[ending] ] ;
                ...
}

Description:

The preceding syntax is used to define a compound key field named keyname. Compound keys are key field definitions consisting of any combination of fields (not necessarily contiguous) from a given record. Each sub-field of the compound key can be specified to be sorted in either ascending (default) or descending sequence. Compound keys differ from normal key fields in that they do not define additional data fields in the record. By using compound keys, you can have a field appear in multiple keys within a record, without needing to duplicate the field's value in the data file.

The compound key specifications must follow all other field statements in a record declaration. The order in which the sub-fields are specified determines the major and minor sort sequences. The fldname must be the name of a field that is defined in the record and is not defined as a struct. If the optional qualifier is specified, the key will only be stored when d_keystore is called. Otherwise, the key is created when the record is created. All of the key functions that apply to normal key fields also apply to compound keys.

ddlp will create in the dbname.h file a struct declaration named keyname for each compound key defined in the schema, similar to the struct declarations associated with records. These can be used in conjunction with the key manipulation functions of the RDM Embedded runtime library.

Because of the nature of compound keys, records containing them can only be created using function d_fillnew (not d_makenew).

Examples:

compound key tr_key {
	trdate descending;
	vendid ascending;
}
compound key ven_chks {
	vendid;
	checkno;
}

Assume that a record type called trans (transaction) contains two compound key definitions. Key tr_key is composed of fields trdate and vendid. Scanning through transaction records by tr_key would produce a sorted list in descending transaction date order, and ascending vendor id order within each date. Key ven_chks consists of two fields: vendid and checkno. Scanning through transaction records by ven_chks would give a sorted list in ascending vendor id order, and in ascending check number order within each vendor.

4.2.7 Set Declarations

Syntax:

setsetname {
        order { asc[ending] | desc[ending] | first | last | next };
        ownerrecname;
        memberrecname [ byfldname[, fldname...] ];
        ...
}

Description:

Set declarations define explicit, one-to-many relationships between record types. Sets are implemented as a linked list of member record instances connected to a single instance of an owner record, which serves as the root or head of the list (see Set and Member Pointers). The order in which member records are inserted into this list is specified in the set order clause.

Possible set orderings are defined as follows:

first New member records are connected (that is, inserted) at the front of the list.
last New member records are connected at the end of the list.
ascending New member records are connected in ascending order based on the contents of the data fields specified in the by part of the member clause of the set statement.
descending New member records are connected in descending order based on the contents of the data fields specified in the by part of the member clause of the set statement.
next New member records are connected immediately following the current member of the set, or, if the current member is null, at the front of the list.

The by part of the member clause is only supplied when ascending or descending order is specified. For sorted sets having more than one member record type, the sort field(s) of those record types should correspondingly be of the same type and length and be listed on the by clause in the same order.

When the sort field values of new member records of a sorted set duplicate existing members, the new members are added in front of the members with matching values.

The set owner may be specified as system. Use of the system record is not required. If one is used, do not declare it in a DDL record statement. ddlp automatically creates the system record when it is specified in a data file statement. There is only one occurrence of the system record in the database. It is used as the owner of any number of sets providing the means whereby records can be connected to the top or root of the network. When a database is opened, the current record and the current owner of all system-owned sets are initialized to the system record (see Currency Tables, for a discussion of currency).

Connecting member records to sets of order first, last, or next is faster than connecting to sets of order ascending or descending. This is because the list of member records associated with ascending or descending sets must be scanned each time a new member is connected, in order to find the proper insertion point. Thus, a connection to an ascending or descending set with a large number of members can be relatively slow. If you need a large sorted set, explore alternative approaches using keys or reverse ordering to maintain the ordering.

RDM Embedded does not implement ascending or descending sets through an index. As with all sets, they are implemented as a linked list (or chained) structure. RDM Embedded does provide a keyed record access (through a B-tree index) but it is totally distinct from sets. A record can both have keys and be an owner and/or member of sets, but sets do not use keys and keys do not use sets. Thus, sort fields of ascending or descending sets do not need to be declared as key fields.

Examples:

set transactions {
	order last;
	owner budget;
	member check;
}

Transactions is a set between the budget record and the check record in the checking account database. Each check written is to be applied to a particular budget account. Each budget account has a set of checks that have been written against it. As a new check is written and entered into the database, it is connected to the transactions set instance, where the appropriate budget record has been identified as the current owner of the set. The new check record will be connected to the end of the set. Thus (assuming checks are written in order), they will be stored in check number order (and probably date order as well).

To ensure that checks are stored in date order, the set specification could be modified as follows:

set transactions {
	order ascending;
	owner budget;
	member check by check_date;
}                                                 

Here the order has been changed to ascending and the member clause now includes the by part to indicate that the set is to be sorted on the check_date field of the check record. Again, the connect operation will be slower than if the order is last. However, if the sort field is known to usually force the insertion to be at the end of the set instance, the ordering of the set could be reversed so that the set remained sorted and so that the insertion was usually made quickly at the front of the list.

set comment {
	order first;
	owner note;
	member project;
	member task;
	member work_day;
}

The comment set illustrates a use of multiple member sets to reduce unnecessary space overhead. Each of the project, task, and work_day record types in a project management database can have an optional comment associated with it in the note record. In this example, an occurrence of project, task, or work_day can be associated with only a single occurrence of note. Thus each set instance is strictly one-to-one. Use of a single set is preferred over three separate sets because it will use less space for the set overhead (one set pointer is needed instead of three). See Physical Design Considerations, for more information.

4.3 DDL Processor Operation

4.3.1 DDL Processor Execution

The Database Definition Language Processor, ddlp, is executed as follows:

ddlp [-c] [-d] [-n] [-r] [-x ] [-z] [-sql] [-jni [pkgname]] 
    [-s [on/off]] ddlspec 

File ddlspec is the name of the text file containing the DDL specification. This file is sometimes called the schema file.

The ddlp will compile the DDL in file ddlspec and report any errors to stdout with the line number where the error was detected.

The compiled DDL is stored in the database dictionary file for use by the RDM Embedded runtime library functions. The name of the dictionary is taken from the dbname of the database DDL statement and given an extension .dbd, forming dbname.dbd. The amount of dynamically allocated memory required by the RDM Embedded runtime for storage of the dictionary is reported by ddlp upon completion, if no errors were detected. Otherwise, the dictionary file is not created.

If the -c option is specified, an "in-memory" copy of the database dictionary is created. This consists of a C source file containing the declaration of a statically initialized array, with the same contents as the file dbname.dbd. You can include this C source file in your application, and pass the address of the array to the function d_open_ptr to open the database. This allows your application to open the database without using a dictionary file.

If the -r option is specified, ddlp will display (in stdout) the File Structure Report. (See ddlp File Structure Report for an explanation of the use of this report.)

If the -x option is given, a cross-reference listing of the records, fields, and set identifiers is displayed in stdout. An example report is shown below for the checking account schema.

RDM Embedded 1.0.0, DDL X-Ref Listing of File: ckngacct.ddl
Sun Jul 29 14:12:24 2007

alloc 	        field    13
amount         field    22
balance        field    14
budget         record    3     5    9   27
cat_desc       field    12
check          record    5    17   28
check_date     field     7    20
check_no       field     7    19
code 	        field     6    11
paid_to        field    21
transactions   set      24                                                 

The names are listed in alphabetical order, with the associated type and the line numbers in the DDL file where the name is referenced.

With the -d option, ddlp allows duplicate field names, such that field types within different record types may have the same name. The record structures created by ddlp will contain the name as specified in the schema. The constant definitions for all fields will be a concatenation of the record type and field type, separated by an underscore. For example:

record ticket {
	.
	.
	float unit_cost;
}
record invoice {
	.
	float unit_cost
}

will cause the following constant definitions:

#define TICKET_UNIT_COST 4005
#define INVOICE_UNIT_COST 6014

If duplicate field names are used with key fields, then each reference to the key field name in the key file statement must have a prefix showing the record type that contains the field. The syntax is recname.fldname.

The -n option instructs ddlp to omit writing the ASCII record, set, and field names to the dictionary file. This creates a smaller dictionary file, but makes names unavailable to utilities such as ida, dal, and dbimp.

The -jni option instructs ddlp to create the Java database classes. Optionally, a package name (i.e., -jni com.companyname.productname) can be specified with the -jni option to insert the package directive with the user-defined package class. Specifying this option suppresses generation of the C header file.

ddlp creates C struct declarations for each record type and compound key field defined in the DDL and stored with file id, record, field, and set constants in a header file also named from the dbname of the database statement forming dbname.h. By default, case is preserved on all of the names used. The -s option can be used to instruct ddlp to convert all of the names to lowercase to be compatible with previous versions.

The ddlp can process constants, #defines, predefined structures, and typedefs. All of these keywords must be used before ddlp recognizes the keyword database. Everything before the keyword database is copied to the resulting header file, so the application will not need to redefine anything.

As an example, the contents of file ckngacct.h, which was created when the checking account schema (as given in DDL Basics) was processed by ddlp, is shown below.

#ifndef CKNGACCT_H
#define CKNGACCT_H


/* RDM Embedded 1.0.0 */


/* database ckngacct record/key structure declarations */

struct budget {
	char code[6];
	char cat_desc[48];
	float alloc;
	float balance;
};
struct check {
	int check_no;
	int check_date;
	char paid_to[48];
	float amount;
};

/* record, field and set table entry definitions */


/* File Id Constants */
#define DATFILE 0
#define KEYFILE1 1
#define KEYFILE2 2


/* Record Name Constants */
#define BUDGET 10000
#define CHECK 10001


/* Field Name Constants */
#define CODE 0L
#define CAT_DESC 1L
#define ALLOC 2L
#define BALANCE 3L
#define CHECK_NO 1000L
#define CHECK_DATE 1001L
#define PAID_TO 1002L
#define AMOUNT 1003L

/* Set Name Constants */
#define TRANSACTIONS 20000

/* Field Sizes */
#define SIZEOF_CODE 6
#define SIZEOF_CAT_DESC 48
#define SIZEOF_ALLOC 4
#define SIZEOF_BALANCE 4
#define SIZEOF_CHECK_NO 2
#define SIZEOF_CHECK_DATE 2
#define SIZEOF_PAID_TO 48
#define SIZEOF_AMOUNT 4



#endef /* CKNGACCT_H */

The #define constants are passed to runtime library functions to identify the particular file, record, field, or set type involved in the operation. The actual values represent entries into the tables that make up the database dictionary. The constants also have some additional control information encoded, as follows:

Record name constants consist of a record number (record 0 is the first record defined in the DDL, record 1 is the second, and so on) plus 10000.

Set name constants consist of a set number (numbered sequentially, as are records) plus 20000. This information is used by RDM Embedded runtime functions to distinguish between record and set constants.

Field name constants are formed using the following formula:

(record number * 1000) + number of field within record

This can simplify the work involved in adding a new field to a record. With this technique, if a new field is only added to the end of a record, only those modules that need to reference the new field need to be recompiled. Note that field constants are long integers, whereas file, record, and set constants are standard integers.

The SIZEOF_?????? constants are added in the header file as a convenience of the user. If they are not needed, or are causing problems with large databases, they can be removed with a -z option.

Use the -sql option if your application makes use of SQL. Specifying -sql has the following effects on field declaration syntax in DDL:

Note that the database header file should be #included in every C source module that needs to access the database.

4.3.2 ddlp Alignment

The structures defined in dbname.h may be used to store the contents of a record. When a variable is defined as a certain structure, the variable may be passed by reference to RDM Embedded functions for storage into or retrieval from the database. The offsets of each element in the structure is dependent on how the compiler aligns structure elements. When ddlp is built with a certain compiler, it performs its own measurements on the compiler's alignments and uses those to compute offsets into the record's structure. Hence the offsets computed by ddlp and the offsets in the C structure should match up if the application program is built with the same compiler as ddlp.

To see if a different compiler, or a newer version of the compiler will result in a different alignment, rename your existing database dictionary (.dbd) file and then run a ddlp built with the new compiler on your schema. Then use prdbd to examine the alignments of the two dictionaries. If they're the same, you won't have a problem using the different compiler. If they're different, you can choose whether or not to convert your application to the newer version.

If you have existing data that needs to be preserved, you can use the dbexp and dbimp utilities.

Nested Structures

Be very careful about using nested structures in records. The alignment of nested structures is very compiler-dependent. Although RDM Embedded compensates for this dependence, ddlp may not always be able to produce the correct alignment of nested structures.

While numeric and character data can be included in the same record, we recommend that numeric data and character data not be mixed together. You should put the doubles, longs, ints, and shorts first, with the chars last. This will create the smallest possible records without artificially packing the structures and avoid most alignment problems.

If you plan to use nested structures, we recommend that they start on word boundaries (that is, with numeric data). Starting the structures on eight-byte boundaries may be necessary on some machines when the structures contain double data types.

4.4 Database Design Considerations

C programmers prefer the flexibility to make intelligent decisions based on a number of design alternatives, rather than having to rely on a less flexible system (which may do a lot more of our work for us, but not always in the way we want it done). Much of the power of RDM Embedded stems from the flexibility it allows the application developer in organizing the database. This flexibility, however, requires the programmer to make design decisions from a wide variety of alternatives.

This section presents database design considerations as they pertain to RDM Embedded. These design considerations are separated into logical design and physical design issues. Logical design (as defined here) involves those aspects of the database organization that directly affect the manner in which the C applications have to access and manipulate the stored information. Physical design addresses the organization of the database into data and key files and (except for file renaming and initialization) is transparent to the application program.

4.4.1 Logical Design Considerations

The logical design considerations discussed here address the use of keys, sets, and multiple databases. Note that the examples given show only the DDL and not the actual code that uses it. The Database Manipulation section will complete many of these examples by providing the necessary code.

Use of Keys

General Key Usage

Key fields are basically used:

What are the possibilities for locating a specific record occurrence among many? One could read and inspect each record occurrence in the order they were stored in the file, until the desired record is found. This would take some time if there are many occurrences. On average, half of the records on file would have to be inspected to find a specific record, and all would have to be inspected to determine that the record was not on file.

Alternatively, sets could be navigated through the network structure to locate the desired record (see Database Design). This would generally mean that fewer records would have to be read, but the process still would be too slow, depending on the application requirements.

Because of the efficiency of the B-tree indexing method, locating a record through a key will involve at most only a few disk read operations (usually three or four). Thus, it is an ideal way to locate specific record occurrences rapidly.

In RDM Embedded, any field defined in a record can be a key. Thus there are a wide variety of potential ways to access a particular record. Fields can be defined as unique keys, to prevent the entry of records containing duplicate key values. When key fields are not defined as unique, records with duplicate keys are allowed to reside in the database. For example, zip code may be a key in a person record containing an address.

In the checking account database, check_no was defined as a unique key to allow the program to retrieve the check record for a particular check number. Other examples of data fields that would be good candidates for keys are:

Many different kinds of fields containing coded values are best implemented using a key. If new codes are needed, they can be added to the database without recompiling. For example, consider the following record declarations:

/* vehicle make validation table */
record vehicle {
	unique key vma_code[7]; 	/* vehicle make code */
	char vma_desc[25]; 		/* vehicle make description */
}
/* vehicle fleet record */
record fleet {
	unique key char vin[25]; 	/* vehicle id number */
	char vma[5]; 			/* vehicle make */
	char vmo[5]; 			/* vehicle model */
	char vyr[3]; 			/* vehicle year */
	int lsd; 			/* last svc. date */
}

An example vehicle record occurrence might be:

vma_code:   CHEV 
vma_desc:   Chevrolet 

When a new fleet record is entered, the application program can easily validate that the vma field contains a correct code by doing a key find operation on its contents (specifics of how to do this are explained in Data Retrieval Using Keys).

The check_date field in the checking account database example was not a unique key. This provided the ability to write multiple checks with a common date and to quickly retrieve all checks written on a specific date or within a particular range of dates.

Key fields can also be used for rapid pattern-matching type searches. Many more keys than records can be retrieved in a single disk read. Use of a key field for certain search requirements will result in much faster processing, especially when all occurrences need to be checked by the search operation.

For example, consider a law enforcement database containing "method of operation" records, which give the details of how specific crimes were committed. These records contain perhaps 25 data fields, each containing a numeric code identifying a particular aspect of the crime. One field in the record might describe a burglar's method of entry into a home as follows:

Code Meaning
1 Broke in through front door
2 Broke in through back door
3 Broke in through window
4 Picked front door lock
5 Picked back door lock
6 Came down the chimney
7 Other

The reason for maintaining "method of operation" records in the database is, of course, to allow searches for a match of a selection of the fields. The key for the search would be a 25-element byte array in which each element's value would be a numeric code, with zero meaning "not supplied."

record mo {    
	key char mo_data[25][1]; 
} 

The fields in the key would be ordered with the highest priority search field first and the lowest priority search field last. The highest priority search field is the one most often used in the searches. By listing it first, you can reduce the number of keys to be scanned in the maximum number of cases. If the first field is not used in the search, all keys must be scanned.

Compound Key Usage

In general, compound keys are used when one of the following conditions exists:

Suppose a record is to be searched based on the contents of two fields, where search values for either field are not always available. If a struct key field were used and the first field in the key was not available, all keys would need to be scanned. The use of two compound keys would solve the problem.

record combo_search {
	int field1;
	int field2;
	compound key f1_1st {
		field1; field2;
	}
	compound key f2_1st {
		field2; field1;
	}
}

If only field1 data were available for the search, then key f1_1st would be scanned. If only field2 data were supplied, then key f2_1st would be scanned. If both were supplied, by convention, f1_1st would be scanned.

Compound keys allow sort fields to be individually sorted in either ascending or descending order. Suppose in the checking account database we wanted to be able to list the checks sorted by the paid_to field, and, within that, in check number order with the most recent checks listed first, as in the following example :

record check
{
	unique key int check_no;
	key int check_date;
	char paid_to[48];
	float amount;
	compound key pay_list
	{
		paid_to ascending;
		check_no descending;
	}
}                                              
Optional Key Usage

Optional keys are not created when the record is created, but only when specifically requested by the application program through the d_keystore function.

Optional keys should be used for fields that are not mandatory (the field's contents are not always supplied), but that need to be keyed when they are used.

Optional keys can also be used when it is desirable to defer the key creation to a time other than when the record is first stored. In real-time applications the number of record updates per a given time period (called the transaction rate) often needs to be maximized. Normally, the key fields in a record are created when the record is created, and this involves additional overhead of up to three or four disk input and output operations per key. The ability to defer key creation can greatly improve the transaction rate. Usually a separate program performs the key creation during non-peak system load times (for example, late at night).

Use of Sets

Set relationships form the basis of the network database model. The basic use of sets in forming one-to-many relationships was introduced in Database Concepts and illustrated in the checking account database example. This section will expand the use of sets by showing how they can be used to implement many-to-many relationships and variable-length records.

Many-to-Many Relationships

Many-to-many set relationships are best explained through an example. A typical example is students and classes in college. Each class has many students and each student takes many classes. Thus, there is a many-to-many relationship between classes and students.

Sets, however, only implement one-to-many relationships. On the surface, it might appear that the following incorrect implementation using two sets would work.

record class {
	unique key char class_id[6];
	... other fields
}
record student {
	key char name[36];
	... other fields
}
set my_students {
	order last;
	owner class;
	member student;
}
set my_classes {
	order last;
	owner student;
	member class;
}                                              

The schema diagram in Figure 4-1 illustrates the relationships.

Fig. 4-1. Incorrect Many-to-Many Implementation

Consider, however, the following instance of the my_students set.

Class: 		Computer Science 101 (CS101)
Students: 	Smith
		Jones
		Kelly
		Carlson

A problem occurs when the my_classes set instances for these students is examined. Each student in CS101 must have that same class as a member of his my_classes set instance. This is not possible, however, since CS101 can only be connected to a single owner in the my_classes set.

The correct technique for implementing many-to-many relationships does indeed utilize two sets but through the use of an intersection record type, as follows:

record class {
	unique key char class_id[6];
	... other fields
}
record student {
	key char name[36];
	... other fields
}
record intersect {
}
set my_students {
	order last;
	owner class;
	member intersect;
}
set my_classes {
	order last;
	owner student;
	member intersect;
}                                              

Figure 4-2 shows the schema diagram that corresponds to the above DDL.


Fig. 4-2. Correct Many-to-Many Implementation

Each student record occurrence will have its own set of intersect record occurrences through the my_classes set. Each of these intersect records is also owned by a specific class record through the my_students set. Similarly, each class record will have its own set of intersect record occurrences through the my_students set, each of which is owned by an individual student record through the my_classes set.

In this case the intersection record has no user-defined data in it. However, it could be an ideal place to store the student's grade for the class if that was needed. In that case you would probably want to change the name of the record type from intersect to grade or some other more meaningful name.

A description of the actual database manipulation involved in maintaining many-to-many relationships, based on the database design example presented in the Database Design Example section.

Variable-length Records

RDM Embedded can store variable length strings (varchar or varwchar fields), but these are limited to a maximum length of 4096 characters. If you need to store longer strings than this, you can use a set to store a linked list of fixed-length segments

In the simplest example, a note of arbitrary length is associated with a record such as a customer record. If the note is stored as a field in the customer record, then an arbitrary limit on the length of the note text would need to be imposed. Much of the space taken up by the field would never be used. If, however, a separate record type were defined to store a single line of text, a set relationship between the customer and note text line would allow any number of text lines to be connected to a particular customer. The schema diagram in Figure 4-3 shows this relationship.


Fig. 4-3. Example Variable-Length Text Implementation

The corresponding DDL statements are shown below.

record customer {
	unique key char cust_id[7];
	char company[21];
	... other fields
}
record text {
	char line[80];
}
set cust_notes {
	order last;
	owner customer;
	member text;
}

Use of multiple member sets can save even more space. Consider the alternative schema diagram in Figure 4-4.


Fig. 4-4. Better Variable-Length Text Implementation

The DDL corresponding to this implementation follows:

record customer {
	unique key char cust_id[7];
	char company[21];
	... other fields
}
record text30 {
	char t30_line[30];
}
record text55 {
	char t55_line[55];
}
record text80 {
	char t80_line[80];
}
set cust_notes {
	order last;
	owner customer;
	member text30;
	member text55;
	member text80;
}

Multiple member sets allow any occurrences of the record types that are defined as valid members to exist in the same set instance. Thus, in this example, a customer record may have text30, text55, and text80 record occurrences connected through the cust_notes set. This allows the program to choose the record type that best fits the amount of text actually entered for a given line of text. Note, however, that each of the text record types should be contained in a separate data file, or no real space savings will occur (see Physical Design Considerations).

The C code that manages the cust_notes set connections is described in Database Manipulation.

Use of Multiple Databases

Application programs can open and access more than one database at a time. The use of multiple databases in an application can yield certain advantages and therefore should be a consideration in the design of the database. In this sense, an application's database may actually consist of several RDM Embedded databases.

There are at least two situations in which multiple database design is useful:

Sometimes temporary application control information needs to be manipulated in sufficient quantities to benefit from the use of database operations on that data. This information can be stored in a temporary working database, which is initialized when the program begins and deleted when the program terminates.

An accounting package is a good example of a modular database application. Most such packages provide separate modules for various accounting functions. Companies only purchase the modules they need. A typical package would include these modules :

Almost all the modules require the general ledger module. The amount of shared information between the other modules depends on the application.

When designing an application that will use multiple databases, keep in mind that inter-database relationships can exist, but can only be implemented relationally through keys or through the use of DB_ADDR fields containing the database addresses of records in another database. It is not possible to have a set with an owner defined in one database and a member defined in another. However, an account number stored in the accounts receivable database can be used to find the associated account record in the general ledger.

4.4.2 Physical Design Considerations

This section describes those aspects of the physical characteristics of an RDM Embedded database that can impact the disk space usage and access performance of an application program. An understanding of these physical implementation issues will help you design the most efficient databases possible under RDM Embedded.

The RDM Embedded File Structure

Data and key files consist of pages, which contain a specific number of fixed-length record and key slots. Control information, as well as the normal record and key contents, are stored in these fixed-length slots. The size of the record and key slots in a file is based on the size of the largest record or key contained in that file. Smaller records or keys will of necessity contain unused slot space.

The amount of control information in a key is always 10 bytes. However, the amount of space required for the control information in a record varies, and can therefore have database design implications.

The control information maintained by RDM Embedded in a record is as follows:

The information to be considered in database design is the space required for set and member pointers, as this is determined from the DDL.

File Formats and Dictionary Tables, describes in detail the structure of RDM Embedded files, records, and keys.

Record and Key Placement

The factors that dictate the optimal placement of records and keys into files are not always easy to determine, and are sometimes even mutually exclusive. Here are some "rules of thumb" to assist you. Some of these involve conflicting requirements. The best determination for your particular environment will be based on your own intuition supported by some testing, but ultimately through experience gained from actual use.

The chances of multiple users locking the same file is increased when several record or key types are contained in the same file. Placing each key and record type in its own file will minimize access conflicts.

If new owner and member record occurrences are stored and connected in the same transaction and if they are located on the same file the likelihood of their being placed on the same database page is increased. This could yield better set access performance.

Caching is a technique in which frequently accessed database pages are kept in memory, thus reducing the amount of actual disk input and output required. Using fewer key files increases the probability that needed index pages will be in the cache.

Once the application is built, there will be only a fixed amount of memory left over that can be used for the RDM Embedded cache. Some applications get better performance from a smaller number of larger-sized pages that contain more slots per page. Other applications get better performance from a larger number of smaller-sized pages that contain fewer slots per page.

ddlp File Structure Report

The File Structure Report is produced by using the -r option on the ddlp command line. This report summarizes the physical characteristics for each file and record type defined in the DDL. The report produced for the checking account database is given below.

RDM Embedded 7.2.0 Summary of Database: ckngacct
Fri Aug 17 15:05:41 2007


FILE: chkg.dat
	Id : 0
	Type: data
	Size of record slots : 90
	Record slots per page: 5
	Unused page space : 58

FILE: chkg.k01
	Id : 1
	Type: key
	Size of record slots : 16
	Record slots per page: 63
	Unused page space : 6


FILE: chkg.k02
	Id : 2
	Type: key
	Size of record slots : 14
	Record slots per page: 145
	Unused page space : 8


RECORD: BUDGET
	Id : 0
	File: chkg.dat [0]
	Total set pointers : 1
	Total member pointers: 0
	Offset to data : 26
	Size of record : 90
	Unused slot space : 0


RECORD: CHECK
	Id : 1
	File: chkg.dat [0]
	Total set pointers : 0
	Total member pointers: 1
	Offset to data : 18
	Size of record : 78
	Unused slot space : 12

The summary for each data and key file includes the following:

The information most significant to database design is the number of slots per page and the amount of unused page space in a data file. The unused page space occurs because the slot size multiplied by the number of slots per page does not always exactly equal the size of the database page. Much of this unused space can be reserved by adding an extra (unused) data field to the largest record on the file. The length of the field is computed as follows:

length = (unused page space) / (slots per page)

The unused space is contained in a record so that if additional data needs to be stored in that record, space will be available. If the reserved space is large enough, the new field can be added without restructuring the database.

The record summary contains the following information.

The design-related information in this report is the unused slot space. This space can easily be converted into a usable form by adding an extra field the length of the unused slot space to the record. Then, as explained above, the space is available for additional data to be stored in the record at a later date, without requiring a restructure of the database.

Also important in calculating the size of the records is the ordering and size of the data types used in the record. Most systems require numeric data to start on word boundaries. This means that if the field preceding a numeric field does not end on a word boundary, unseen and unusable padding is inserted to force the numeric data to the word boundary. When character and numeric data are not carefully mixed, wasted space can be generated.

File Page Sizes

To optimize file access performance, the page sizes for data and key files should be a multiple of the file block size used by your operating system. Many systems have a block size of 512 bytes. In these systems block sizes of 512, 1024, 1536, etc. would be acceptable.

The best page size for a given file is not always easily determined, and will be based on application implementation details. Key file pages should be large enough to hold a reasonable number of keys, so that the number of levels in the resulting B-tree are kept as small as possible (less than or equal to four). Data file page sizes should be based on the number of records you want to store on each page. For example, if all members of a set are stored together, then they will likely be stored in contiguous record slots. If they will usually be accessed together as well, then you may decide that the page size should be large enough to hold the average number of set members in order to minimize the actual disk accesses necessary to read each member.

The initial RDM Embedded runtime page buffers are sized based upon the largest page size specified in the DDL. For example, if you have specified four files with a 1024-byte page size and one file with a 4096-byte page size, RDM Embedded will allocate 4096 bytes for each page in the cache. You need to be aware of your memory requirements in deciding on page sizes.

4.5 Database Design Example

4.5.1 Introduction

The example to be presented in this section is an elaboration of the tims database introduced in Database Design. It will be used in examples throughout the remainder of this document. A solid understanding of this example design is necessary.

The requirements for the tims application are given first, followed by a description of the schema with explanations of how the design will be used to satisfy the stated requirements.

4.5.2 Requirements

The system is to be used to maintain a database of technical information contained in books, technical journals or magazines, and articles. In the following discussion, a single book, journal issue, or article will be generally referred to as an info item.

The following data is to be stored for each book, journal, or article:

The id code will be a unique Dewey-Decimal library code assigned by the user. The abstract will be a brief description (up to several paragraphs) of the info item. Each info item may have several key words associated with it that identify topics discussed in it.

Functions are to be provided to allow info item entry and deletion.

The info item data is to be retrieved as follows:

The ability to keep track of loaned books and magazines is also to be provided, where the borrower's name, the date borrowed, and the date returned are stored for each item loaned. A loan history is to be maintained for each info item. In addition, the ability to report all unreturned info items is to be provided.

4.5.3 Database Design

The schema diagram for the database design is shown in Figure 4-5 below.


Fig. 4-5. tims Database Schema

The principal data for each info item is stored in a record called info. This includes the id code, title, publisher, and publication date. Also included is a coded-value field for storing the type of info, where 0 = book, 1 = journal or magazine and 2 = article. The id code will be a key in order to quickly find specific info occurrences.

Because there may be many books or articles written by a single author, storing the author name in the info record would replicate multiple occurrences of the same author. This is often referred to as redundant data. So, the author is stored in a separate record with a set, has_published, connecting an author to that author's info records.

Since the info records for a given author are to be retrievable by author name, a set, ordered by author name, called author_list, has been defined with the system record as the owner. To find a specified author, this set is searched. This is sufficient for a small personal library where there would be relatively few authors (less than 100) and the system is only used by a single user. In a large library with many authors, access would be faster if we used the author name as a key field and did not use a set. Here (mainly for instructional purposes), the assumption is that the system is for a small, personal library.

A simple variable-length text structure is used for storing the abstract. A record type called text is defined that stores a text string of up to 80 characters, including a sentinel null byte. A set called abstract with order last is defined with info as owner and text as member, forming a one-to-many set between an info record and each line of abstract text.

The relationship between key words and item info records is many-to-many. A key word is stored in a record type named key_word. The key word is a string field that is keyed to allow rapid retrieval of individual key word occurrences and to allow alphabetized key word perusal. The many-to-many relationship is implemented through the use of two sets, as described in "Logical Design Considerations." Key_word records and info records are connected to an intersection record called intersect. Set key_to_info is used to find the info records corresponding to a particular key word. Set info_to_key is used to find the key words associated with a given info record. The intersect record has one field to hold a copy of the info type from its info owner through the info_to_key set. By eliminating an extra disk read of the info record for non-books, this facilitates the kind of key word searches where, for example, you're only interested in finding the books covering a specific topic. Redundant data is sometimes incorporated into a database design in order to improve data access performance.

A record type named borrower will contain the name of the borrower, the date loaned, and the date returned. The borrower's name will be a key field, in order to be able to quickly find all of the items borrowed by a particular person. Dates will be stored as a long integer of the form YYMMDD (for example, 870709 is July 9, 1987). A date of zero indicates that the loaned item has not yet been returned. When an item is loaned, a new borrower record is created and is connected to two sets. A set called loaned_books connects the borrower record to the info record for the loaned item. These records will normally remain members of this set even after the item is returned, to maintain a loan history for each item in the library. The borrower record is also connected to a set called loan_history, which is owned by the system record. This set is scanned when a list of all unreturned books is desired. Both sets are in last order so the records will be connected in chronological order (without having to specify ascending order by date loaned).

One final set has been included. The set named article_list has info records participating as both owner and member of the same set (which is legal in RDM Embedded). Here, the set is intended to connect article info records to the info record of the journal or magazine in which it is published.

The RDM Embedded DDL that implements the tims database design is presented on the next page. Two data files and two key files have been defined. Data file tims.d01 contains the system record (of which there is only one occurrence and is small because it has no fields), key_word records and intersect records. Data file tims.d02 contains the occurrences of record types author, borrower, info, and text. This organization is arbitrary in this case since the database is not large.

Key field id_code is much smaller than keys friend and word and is therefore stored in a separate key file, as is shown in the example below.

/*-------------------------------------------------------------------
Technical Information Management System (TIMS) Database
--------------------------------------------------------------------*/
database tims
{
	data file "tims.d01" contains system, key_word, intersect;
	data file "tims.d02" contains author, borrower, info, text;
	key file "tims.k01" contains id_code;
	key file "tims.k02" contains friend, word;
	
record author {
	char name[32]; 			/* author's name: "last, first" */
} 					/* or editor's name */
record info {
	unique key char id_code[16]; 	/* dewey dec. code */
	char info_title[80]; 		/* title of book, article, mag. */
	char publisher[32]; 		/* name of publisher */
	char pub_date[12]; 		/* date of publication */
	short info_type; 		/* 0=book, 1=mag, 2=art */
}
record borrower {
	key char friend[32]; 		/* name of borrower */
	long date_borrowed; 		/* dates are stored initially */
	long date_returned; 		/* numeric YYMMDD */
}
record text {
	char line[80]; 			/* line of abstract text */
}
record key_word {
	unique key char word[32]; 	/* subject key words */
}
record intersect {
	short int_type; 		/* copy of info_type */
}
set author_list {
	order ascending;
	owner system;
	member author by name;
}
set has_published {
	order ascending;
	owner author;
	member info by info_title;
}
set articles {
	order last;
	owner info;
	member info;
}
set loaned_books {
	order last;
	owner info;
	member borrower;
}
set abstract {
	order last;
	owner info; 
	member text;
}
set key_to_info {
	order last;
	owner key_word;
	member intersect;
}
set info_to_key {
	order last;
	owner info;
	member intersect;
}
set loan_history {
	order last;
	owner system;
	member borrower;
	}
}

Copyright © 2009 by Birdstep Technology, Inc. All rights reserved.  RDM Embedded Product Family