A full range of computer services in the Central Texas Hill Country

No Job too small - Office Machines or Home PC's - Minimum $25 Labor Chg, free estimate

Home Websites OS Installs Repairs Upgrades Database Computers Payments Contact
Formatted Database Results Just me. Relational Databases
Home Websites OS Installs Repairs Upgrades Database Computers Payments Newsletter Contact

Hopefully this page will help anyone with an interest to understand the terminology used in describing the database services which I offer and the logic behind my choice of tools.

If you find technical discussions boring or you're just too pressed for time and not up to the task of reading this humble treatise, relax and let me take the wheel.

I assume you think it's possible a database could fill some need or you wouldn't be looking at this page; my phone number is at the bottom or you can click the "Contact" button.

Give us a try

Whatever your computing needs Give us a Try!

Need something else? Click to Contact me

Phone: (512) 564-0367

Email me at jmay@linuxfamily.org

Psalm 30

4 Sing unto the LORD, O ye saints of his, and give thanks at the remembrance of his holiness.

5 For his anger endureth but a moment; in his favour is life: weeping may endure for a night, but joy cometh in the morning.

6 And in my prosperity I said, I shall never be moved.

7 LORD, by thy favour thou hast made my mountain to stand strong: thou didst hide thy face, and I was troubled.

8 I cried unto thee, O LORD; and unto the LORD I made supplication.

9 What profit is there in my blood, when I go down to the pit? Shall the dust praise thee? shall it declare thy truth?

10 Hear, O LORD, and have mercy upon me: LORD, be thou my helper.

11 Thou hast turned for me my mourning into dancing: thou hast put off my sackcloth, and girded me with gladness;

12 To the end that my glory may sing praisse to thee, and not be silent. O LORD my God, I will give thanks unto thee forever.


Historical lessons learned

I have to start somewhere; let's begin with a little history. Not so long ago businesses first began to realize the advantage of computerized record keeping and databases were widely adapted. This movement started when the first PC (Personal Computer) was introduced in 1977.

The first was the Altair and IBM, quick (but not the quickest) to see the potential market, followed with the XT and the stampede was on. The acronym PC became a household word and businesses sprouted, grew, merged, split and/or perished like a colony of yeast culture.

Prior to the PC, databases existed on the legendary mainframes (System 360/370 with JCL to submit your jobs), the smaller DEC (Digital Equipment Corporation) machines, Hewlett Packard and others too numerous to mention. Mainframes had operators to mount tapes & disk packs when the JCL requested a database not online; this and distributed devices such as very large and expensive printers requiring maintenance contracts led competitors to quip "Starring IBM and a cast of thousands." Only large businesses or government entities could even think about such implementations.

These databases were not flexible when the need arose to sort by different criteria than primary sorts so they were quickly fitted with Alternate Indexing sometimes called AIX (no relation to a certain flavor of Unix) to accomplish sorted access. You had to preallocate disk extents and build these indexes before they could be used. If you didn't have millions to budget for your system or work for someone who did, such knowledge was useless trivia. The PC counterparts of these mainframe roots often use the same acronyms or terminology for their database drivers; for example ISAM for Indexed Sequential Access Method.

Many of the first databases were "flat file" in nature; that is internal organization made it impractical to quickly access specific information after the db grew too large. I've personally seen a customer database that took up to an hour to pull up a specific customers record.

In those days there simply wasn't a hardware combination available that was fast enough to overcome a badly glutted db, so as tempting as it might be you couldn't just throwing money at the problem wasn't an option. Many schemes were developed to overcome this problem; let's look at just a few presented roughly in chronological order.

One early solution was the linked list; essentially each record had a pointer to the next record in some logical sorting sequence; the record pointed to by each preceding record contained a backward pointer to that previous record so the list could be traversed in either direction.

Insertion of a new record required that the list be traversed in the logical sorting sequence till it was determined you had gone one record too far. Then you backed up one record (using the backward pointer), changed the forward pointer to the new record, set the new records backward pointer to that record and the new records forward pointer to the "one record too far", and the "one record too far" backward pointer to the new record. The "one record too far" did not have it's forward pointer changed. Thusly, we accomplished a new record insertion. Now wasn't that simple?

This strategy worked if it was carefully done, but it had two fatal flaws. First it didn't take into account hard drive manufacturers internal access methods or operating system organization and access methods to the critical hard drive. In addition, since everything was done programmatically, it opened up new opportunities for programmers to create magnificent mistakes.

The new records added were logically sorted, but in actuality might be located physically anywhere on the hard drive (we aren't going far enough back in time here to address real time tape access with fast forward, rewind etc.). Usually as the disk filled up, new records were added nearer the end (disallowing for disk fragmentation due to deleted files). Sooner or later the system began to crawl.

On a multi-user system, a users time slice might end before the record was located. If several users were in the same situation, the server could end up using all of it's resources restoring machine state and repositioning the disk read heads all the while leaving gobs of file locks in place across time slices. This phenomena is known as "thrashing" and the memory hungry server operating systems of that era frequently ran afoul of page faults while thrashing and hung.

One solution to this was off-loading; at night the linked list would be traversed on the primary sort field and dumped to another storage device (often a tape drive) in the correct order and then dumped back to disk in actual sorted sequence. Secondary and tertiary sorts complicated this making it more risky. However, offloads also cured disk file fragmentation making all files contiguous, so it was not an unusual operation; those defragmentation utilities are wonderful things.


One solution to the dilemma was Btrees. This solution stored the data in nodes of a predetermined size; each node could be roughly equated to a clump of records. Each node had a pair of indexes at the beginning indicating the first and last entry in that node relative to the logical sort. When inserting new records you traversed these nodes checking the indexes to see if your record was in the clump instead of sequentially accessing individual records.

This arrangement had the added advantage of reading in many logically related records in small batches and as a side effect of disk transfer methodology, (and the result of really careful node sizing) leveraged the physical disk transfer routines which read slightly before and considerably beyond the actual requested sectors. This behavior is controlled by the disk IDE and OS. The chances of getting a hit on the desired record without a lot of disk reads were greatly enhanced.

Before inserting a new record into the node, a predetermined maximum number of records for any given node was checked. If the new record bumped you over the limit, the node was then split into two nodes and the new record was then inserted as appropriate in one of these two (all the while juggling pointers). A further refinement was to also have a minimum number of records in a node, disallowing nodes with only one or a few records except for the first few records inserted in the root node.

Each node, in a manner similar to linked lists, had pointers to the next and previous nodes in the logical sort; searches always started at the root node. In most ways this was an efficient method of quickly accessing desired information and lessened the load on the hard drive.  However, it did have some shortcomings.

With improper planning, a multi-user system could be overloaded. Occasionally, record insertion could result in a lot of node splits at the same time in large databases. Obviously, the nodes being manipulated had to be locked and several users could be waiting for the first users process to finish reallocation and release the locks while riding the merry-go-round of time slices. Some users would think the system was frozen and start switching things off and on. Have patience people.

A more serious problem was that, similar to the linked list, the nodes had to be traversed by programmatic routines which increased the chance of errors (magnificent mistakes).

To overcome all of this many companies were formed offering proprietary solutions with various indexing schemes for quick retrieval. Like the first PC manufacturers, everybody wanted their solution to capture the entire market. Early in the game, this might have made you rich; now it will make you enemies. Enter the world of Structured Query Language.


The World of Structure Query Language

There are many databases designed to interact with SQL which has exceptional flexibility and portability. SQL has brought to databases a consistent user interface for queries, updates and extensions in the form of new database and table creation. Properly written SQL statements will work on a variety of platforms whether the flavor is Linux, Unix or Windows. For that reason, many of the database creators have versions of their software for both Windows & Linux and other Unix-like OS's.

This leads us to a short discussion of Relational Databases and key fields within tables. If you want portability (and you should) SQL compatible databases are the way to go. A relational database has internal organization that allows a user to specify relationships between fields in different tables; the user can write any syntactically correct SQL statement defining the result set.

To illustrate, consider this simple example. We have a database with a primary recordset of customer data. This primary table has a key field which is the customer id; within this recordset the customerid field is the primary field and must always be unique to each customer.

Now we create a second recordset in a table which must allow duplicate customer ids. This second table contains one record for each transaction the customer has done; hence in this second table the customerid field must allow duplication.

Pretend I am viewing a customers record from the primary table; I have pulled him up by first and last name (or customer id if I know it) with an SQL statement like this:

"select * from custrecs where firstname = 'Sylvester' and lastname = 'Cat'"

Allow that this statement returned only one record and that the customerid was 1234. Now I'm viewing Sylvester's record and I have his customer id since I selected ALL fields from custrecs using the asterisk; you can always limit an SQL statement to return only specified fields separated by commas.

Consider this subsequent SQL statement for our fictitious database to view all Sylvester Cat's purchases:

"select * from purchrecs where customerid = '1234'"

This returns all fields of every purchase Sylvester ever made.

If I want to return all customers whose last name starts with 'Ca' I can use the 'like' keyword instead of the = operator and see all the 'Cantons', 'Calleys', 'Cats' etc. in the database. I can sort them by saying, 'order by lastname' or any other field I want them sorted on. Note that this is true even if there isn't a specific index on the specified field; the sort will just take a little longer.

I can create databases, tables, fields within the tables based on multiple criteria and sort by any of the fields in the result set. The query can merge fields from different tables. I can keep the result sets or discard them. The selection process or updates or creation actions are limited by your imagination.

The SQL statements can be made to sample a specific number of records from a large database to speed things up; when you have it just the way you want it, you can save it as a stored procedure.

This short page doesn't even scratch the surface, but I hope if you are new to databases you will see the potential for your own searches realize just how flexible SQL compliant databases are.


I service Windows + Linux, Solaris, & other Unix-like operating systems.