Utilizing Gdbm, GNU's Easy-to-Use Database Management Library

Thursday Jan 4th 2001 by Jay Link

Youve outgrown plain-text flatfiles, but youre not quite ready for to plan and maintain a full RDBMS? GNUs Database Management Library (gdbm) is the simple, open source solution.

Lately, a number of open source relational databases have received favorable press. Indeed, MySQL and PostgreSQL have been touted as capable alternatives to Oracle and Sybase, the traditional heavyweights of the commercial RDBMS world.

But let's say you, the programmer, need something simpler. You may have outgrown plain-text "flatfiles," but you might not be quite ready for the planning and maintenance that goes along with a full-fledged RDBMS.

Further, as you likely run Linux or some version of BSD, Microsoft Access isn't aviable option, either. Isn't there an intermediate data storage solution, somewhere between clunky text files and full-blown relational databases?

Fortunately, the answer is yes: Gdbm!

Gdbm, or the GNU Database Manager, is a fully-capable, yet simple means of storing and retrieving data. There aren't 1,001 features and procedures to memorize, making for a gentle learning curve. Tuning is not an issue, and you don't need sophisticated algorithms to get the most out of your data store. Yet, you can still use gdbm-powered databases for customer files, back ends to Websites, and anywhere else that an RDBMS would be appropriate.

On the downside, the speed isn't as great as it might be with other systems. Databases created with gdbm aren't indexed, meaning that you certainly don't want to use it for a multi-terabyte data warehouse. But, for small to mid-sized data collections, the time differential is negligible.

Best of all, you don't need to worry about mounting and unmounting anything created with gdbm. Nor is there any server to run. Your data is simply stored as a regular file on disk.

Now, how do we use it?

The first step is to make sure that you have gdbm. Most Linux distributions include it, but you can make a quick check by looking in your /lib directory. If you see something along the lines of libgdbm.so. (e.g., libgdbm.so.1.7.3), you're in good shape. If not, then all you have to do is make a quick trip to www.gnu.org. Simply download, compile, and install, as you would with any other package.

Now that you have the gdbm library, we'll jump right in and do some coding.

You access the gdbm library by creating C programs that use functions found in the gdbm.h header file. As with other libraries, you link with a simple command line flag -- in this case, -lgdbm.

Gdbm Functions

Creating a new database and opening an existing one are both accomplished with the same function: gdbm_open(). Here's how it works:
   GDBM_FILE dbf;   dbf = gdbm_open(name, block_size, flags, mode, fatal_func);
The "flags" argument is what differentiates between creating a new database and opening an old one. But let's not get out of sequence. Here, then, are the arguments and explanations:
char *name  -- The name of your database. This will be the                name of the binary data file that's written                to disk.int block_size  -- A block is a group of bits that's                    transferred as a single unit. In this                    case, 512 is both the minimum and the                    default. Once the database is created,                    the block size cannot be changed.int flags  --   GDBM_READER   Opens the database in read-only mode.                  Multiple programs can access the                  database while in GDBM_READER mode.   GDBM_WRITER   Allows both reading and writing. Only                  one program can access the database                  in GDBM_WRITER mode.   GDBM_WRCREAT  Creates a new database if none                  previously exists; otherwise the same                  as GDBM_WRITER.   GDBM_NEWDB    Creates a new database regardless of                  whether another one with the same                  name exists (i.e., it will overwrite                  old databases); otherwise, it's the                  same as GDBM_WRITER.int mode  -- This sets the file permissions of the              database, just as you would with chmod.              A sample value would be 0644. void (*fatal_func) ()  -- A function for gdbm to call                           if it detects a fatal error.                           The only parameter of this is                           a string. If the value of NULL                           is provided,gdbm will use a                           default function.
Finally, gdbm_open()'s return value, dbf, is a pointer. This is needed by all the other gdbm functions to access the opened file. If the return value is NULL, however, then your gdbm_open() was unsuccessful. When you're finished with your database, just call:
That's easy enough, isn't it?

Adding Data

Data is stored in gdbm files by means of key/value pairs. The "key" is analogous to a data string's name, and the "value" (or "content") is, of course, your data. So, for example, you might have a key called "name," and the content would be "John Smith." Here's how you add a record:
   ret = gdbm_store(dbf, key, content, flag);
Both the "key" and the "content" arguments are of type "datum," which is defined by this structure:
typedef struct {        char *dptr;        int   dsize;      } datum;
The "flag" argument tells gdbm what to do if the "key" you're adding already exists. (Of course, if it doesn't yet exist, then there's no problem.) The two options for the flag are:
   GDBM_REPLACE  -- Trash the old data, and replace                     it with the new.   GDBM_INSERT  -- Only add the data if it won't                    overwrite anything else. If a                    key with the same name already                    exists, then return an error                    (without writing anything).
Then, as you've probably guessed, "ret" is your return value. Here are the three possible outcomes:
   -1   The data was NOT stored, because the database         was opened in read-only mode, or the data was NULL.    1   The data was NOT stored, because the "flag"         used was GDBM_INSERT and a key with the same         name was already in the database.    0   The data was written successfully.
So, anything other than a zero means "no dice."

Retrieving Data

Now that you've populated your database, how do you get the information back out? Simple! Use gdbm_fetch(), like this:
   content = gdbm_fetch(dbf, key);
Now, as I said, gdbm databases aren't indexed, so it's conceivable that your fetch will have to read through the entire database before it finds your key. This is why gdbm isn't the best choice for, say, the telephone records of the entire eastern seaboard. Nevertheless, gdbm's simplicity can't be beat. If you merely want to see if a record exists (without reading it), use:
   ret = gdbm_exists(dbf, key);

Trashing Records

If you want to delete a record without replacing it, just use gdbm_delete() as follows:
   ret = gdbm_delete(dbf, key);

Retrieving ALL Records

If you want each and every piece of data in your database, you can use the next two functions:
   key = gdbm_firstkey(dbf);   nextkey = gdbm_nextkey(dbf, key);
Keep calling gdbm_nextkey(),until you get a NULL value, indicating that you've reached the end of the database.

Deleting an Entire Database

Remembering that gdbm databases are comprised of single files; all the programmer has to do to delete a database is to delete that one file. Trivial!

The gdbm man page lists a few other functions, but I'm not going to cover them here, as they're not imperative to the operation of a gdbm database, and this is just a simple overview.

Can the functionality of gdbm be accessed via shell scripts? The answer is yes, but it takes a little bit of ingenious manipulation. What you'll need to do is create some programs that accept command line arguments, and then pass those values on to the gdbm functions. For example, let's make a simple program that allows you to populate a database from the command line. Here's how it might look:

#include <stdio.h>#include <stdlib.h>#include <string.h>#include <gdbm.h>#define BLOCK_SIZE  512#define MODE        0644int main(int argc, char *argv[]){   int ret;   datum key;   datum value;   GDBM_FILE dbf;   if (argc < 4)   {      printf("Usage: insert <database name> <key> <value>\n");      exit(0);   }   key.dptr = argv[2];   key.dsize = strlen(argv[2]);   value.dptr = argv[3];   value.dsize = strlen(argv[3]);   dbf = gdbm_open(argv[1], BLOCK_SIZE, GDBM_WRCREAT, MODE, NULL);   ret = gdbm_store(dbf, key, value, GDBM_INSERT);   gdbm_close(dbf);   if (ret == 1)   {      printf("That key already exists.\n");   }   return EXIT_SUCCESS;}
If this program were named insert.c, then you'd compile it with:
   gcc -O3 -Wall -o insert insert.c -lgdbm
Then, to use it, all you'd have to do is type the following:
   insert <database name> <key> <value>
For example:
   insert employees name1 Mary
Now, if the value for "name" were "Mary Smith," then you'd have a problem. You could solve this by making two values, firstname1 and lastname1, or you could modify insert.c to look for ALL the command line arguments given after the name of the key, which you could then lump into one value. Either way, it's hardly insurmountable.

I'm sure you could now figure out how to make another program to retrieve specific key values, using gdbm_fetch()!

In conclusion, I find gdbm to be an excellent tool for creating small data files. Either used within C programs, or called from the shell using a program such as include.c, gdbm functions are much easier to work with than the tedious alternative of opening a text file and searching through line after line. This is especially true when dealing with bash scripts!

Related Resources

1. GNU's home page This is the home page for the GNU project. You can download gdbm -- including the source code -- from here.

2. The Linux Documentation Project The LDP is a vast storehouse of knowledge. More information on gdbm can be found in the ELF-HOWTO guide.

3. PostgreSQL home page When you're ready to move up to a full RDBMS, check out PostgreSQL. Pay no attention to the silly name!

About Author

Jay Link is twentysomething and lives in Springfield, Illinois. Aside from Linux, his interests include mountain climbing and flying. He administrates InterLink BBS (an unintentionally not-for-profit Internet provider) in his fleeting spare moments, as well as working various odd jobs to pay the rent.

Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved