The C structures

 

We write programs to do at least three things: we want to capture data and we want to work on that data and we want to use the results.  C has built in basic data types, such as signed and unsigned 16 bit integer (‘int’ and ‘short’) or 32 bit integer (‘long’), 8 bit character or byte (‘char’) and (always signed) 32 bit floating point numbers (‘float ) or 64 bit floating point numbers (‘double’).  One problem, though: not much in the real world comes to us as simple data types.  The individual pieces of the data may exist in one of the basic forms, but what do you do when you have to deal with 12 or 24 or 36 or 64 bit data?

 

We usually have to keep track of when the individual parts were captured so we can relate them to each other and to the real world. How do you keep track of what pieces of the data belong together in a record?  How do we easily find a particular record?  What happens to our program when we have to add or subtract data from what we are handling?  There is an old software engineering rule that states: “The more difficult it is to modify your data definition, the more frequently it will change” (Berry’s Rule 009).

 

The answer to these and other data definition problems is almost always to put the data into a structure.  C gives us an elegant and powerful means of keeping related data together: the ‘struct’ keyword.  As in all of C, learn to build the basic shell of a syntax element (like ‘struct’) then fill in the insides.  You don’t want to spend an hour figuring out you forgot the semicolon at the end of a syntax structure!  The basic syntax for the ‘struct’ keyword is as follows:

 

struct fribits

{

};

 

That declares a new data type for your program called fribits and it is defined as a struct that contains the data elements you will define for it.  It DOES NOT create the data element itself!  It only tells the program what you mean when you use the ‘struct fribits’ syntax.  To actually get a structure you can use, you have to create one (called instantiation).  Do that with the following:

 

struct fribits theStruct;

 

Now, you actually have a structure, theStruct, of type fribits.

 

Now, decide what you need in the structure.  We define the elements of the structure using either basic data types or our own data types.  We will not cover creating new data types here, but will do so in another chapter.  For this example, let’s define a structure that will capture the name of the data in a string that is 16 bytes long,  a 32 bit record index so we can locate the data easier, four 16 bit data values, a 32 bit floating point value and five 8 bit values.  You may easily get this type of record if you are capturing a command input from another part of the system so you can parse the data (break it down into its parts and decide what action you will take based on the contents of the data).

 

One thing that often gets programmers into trouble is the length of their structures.  Unless you tell the compiler that you want an odd-sized structure (using the ‘PACKED’ keyword), the compiler will try to make your structure an even number of bytes.  This is known as “word alignment” and many compilers do this automatically if some instructions to the microprocessor that will run the code must access data on a word boundary (even-odd byte boundary).  Thus, you might think that every 19 bytes you can find a particular data record when the compiler has padded out the actual storage to put the data every 20 bytes.  This is never a good thing!

 

For that reason, unless there is a good reason you have to only store the exact number of bytes you need, fill your structures out to an even number of bytes by adding dummy values to the structure so everything ends on even byte boundaries.  If the data comes in as a short (16 bits) then a char (8 bits) then a short (16 bits), add an extra char dummy element between the real char and the next short so the short starts on an odd boundary.  If you don’t, the compiler will do it for you and you won’t find the start of the short at the 4th byte of the record.

 

To illustrate this, let’s look at the data just described.  First we look at what we actually get assuming we are getting the high byte first followed by the low byte on multi-byte transfers:

 

Odd Byte                        Even Byte

=================    ==================

High byte first short       Low byte first short

 

                Char                        High byte second short

 

Low byte second short

=====================================

 

If we define the following to hold that data:

struct AlsoFribits

{

  short firstIn;

  char secondIn;

  short thirdIn;

};

 

most compilers will actually store the following in memory:

 

Odd Byte                        Even Byte

=================    ==================

High byte first short       Low byte first short

 

                Char                        zero filled byte

 

High byte second short       Low byte second short

=====================================

 

This may not be a big deal if you always use the structure-defined names to access the data since the compiler knows how to generate the correct access to the data.  But, and this is a very big but, if you need to capture 10,000 of these records and you are trying to figure out how much memory you need, that one extra byte translates to using 10,000 more bytes than you counted on!

 

Not to beat the subject into the ground, but you will often have to re-arrange the data for further processing.  Let’s take the above example and do some digital signal processing on the data to extract signal features a scientist has determined are hidden in the data.  We want to take the lower byte of the first short, the char and the highest byte of the second short, use them as a 24 bit number and do processing on that number.  Yes, Alice, this is what you run into every day in the real world and in Kansas, too.

 

Assume we already have defined a 24-bit data type, which we will call TWENTYFOURBITS and declared a variable of that type called B24.  If I want to grab the 24 bits of data, I will use an advanced C feature that lets me tell the compiler what the organization of data looks like, called typecasting.  We will cover all this in detail later, but for now, try to follow along.

 

We create a pointer to unsigned character, puc as follows:

 

unsigned char *puc;  (this means the highest bit is not a sign bit, but is actually data.  Thus, the binary number 1111 1111 is 255 not –1)

 

We then assign that pointer to our data structure, AlsoFribits:

 

puc = (unsigned char*)&AlsoFribits;

 

Note that the ‘(unsigned char*)’ syntax casts what follows it as a pointer to unsigned char.  That “just happens” to be the data type we declared for puc.  Casting the address of AlsoFribits (& means address of, in this context, because a pointer wants to hold the address of something, not its contents) to a pointer to unsigned char tells the compiler we really do want to assign the address to that pointer. even though AlsoFribits was declared to be a different data arrangement (type).

 

We can now assign our 24-bit variable B24 to the correct portion of the data by casting that data to TWENTYFOURBITS type as follows:

 

B24 = *(TWENTYFOURBITS*)(&puc[1]);

 

This advanced C syntax means we are setting the contents of the B24 variable, which is 24 bits long, to the contents of ‘*()‘ a pointer of type TWENTYFOURBITS ‘TWENTYFOURBITS*’ which is the address of ‘&’ the second byte of the data pointed to by puc ‘puc[1]’.  (The first data byte is referenced as puc[0].)  We are expecting this to take the second, third and fourth bytes of the structure and assign them to the 24-bit B24 variable.  BUT, since the compiler padded the data storage to word boundaries, we are actually getting the last byte of the first short, the char and the zero fill char!  Our poor scientist is going to end up as bald as I am from tearing his hair out trying to figure out why his filtering is not working, and you will be looking for another job when he discovers why.

 

Like I said, this is advanced C syntax and you aren’t expected to understand it at this point.  It just illustrates that we quite often have to access data in ways we didn’t envision when the software and data definitions were first designed, or we often don’t understand the real consequences of how we define our data structures.

 

Exercise 1:

How would you define the data fields of AlsoFribits so we can correctly access the inner data as a 24-bit number but still read it in as a short followed by a char followed by a short?  Remember, we will probably be using typecasting when we read the data in.

 

To get back to our fribits definition:

 

Our struct now looks like this:

 

struct fribits

{

  char DataName[16];

  long Index;

  short data1;

  short data2;

  short data3;

  short data4;

  float data5;

  char data6;

  char data7;

  char data8;

  char data9;

  char data10;

  char fill;

};

 

If you know what each of the data fields actually contains, give them meaningful names, of course.  If the float value is actually a temperature, you would declare it as float Temperature, not float data5.

 

Remember, we declared a variable theStruct as having type fribits.  We can now access each of the data elements of theStruct individually.  We do this using the ‘dot’ operator, ‘.’ since theStruct is the structure itself and not a pointer to a fribits structure.  To access data8 of theStruct, we type theStruct.data8  and can now read its value or assign a new one to that element of the structure.  We can set the name in the structure like this:

 

theStruct.DataName = “This is a name”;

 

Make sure you don’t try to put more data into the field than there is room for, 16 characters in this case.

 

One other keyword usually associated with structures is ‘typedef’.  It is how we declare new data types.  We can then include the new definition in a header file (.h extension) and use it with any file that includes that header file (#include “theheader.h”)  You definitely want to do this because your definition is now in one place and not contained in every file that uses the structure type.  If and when something changes, you will only have to change it in one place, not risk missing one and having your program blow up in spectacular and humiliating ways later.  Berry’s law of Software Engineering number 7 states “The harder your program is to maintain, the more often it will change.”

 

In our header file, let’s define our fribits data structure so multiple routines and files can access it.  We want to create a new data type that contains the data in fribits.  Do so as follows:

 

typedef struct tfribits

{

  char DataName[16];

  long Index;

  short data1;

  short data2;

  short data3;

  short data4;

  float data5;

  char data6;

  char data7;

  char data8;

  char data9;

  char data10;

  char fill;

} FRIBITS, *P_FRIBITS;

 

This actually declares two data types for us: FRIBITS is the structure type and P_FRIBITS is a pointer to a FRIBITS structure type.  We then use these data types to declare structures and pointers to that type of structure as follows:

 

FRIBITS theStruct;

P_FRIBITS ptheStruct;

 

Note, you don’t need the ‘struct’ keyword because you declared FRIBITS to be a struct and P_FRIBITS is a pointer so you don’t need the pointer syntax ‘*’.

 

You now use theStruct exactly as you did before, such ‘as theStruct.data5’.  You use ptheStruct as you would any pointer.  REMEMBER, a pointer is just an empty box until we put something inside it.  It does not point to anything except random garbage until we assign it to point to something.  If you do the following, you will have some serious and ugly system crashes or you will corrupt data because you will overwrite anything the garbage contents of the pointer happen to be pointing to!

 

P_FRIBITS ptheStruct;

 

PtheStruct->data5 = -345.9;

 

The best habit you can get into is to initialize new pointers to NULL, which means they don’t point anywhere at all.  The compiler will generate an error if you try to use the pointer before you set it to something meaningful.  To correctly use P_FRIBITS, you will do something like the following:

 

FRIBITS theStruct;

P_FRIBITS ptheStruct = NULL;

 

ptheStruct = &theStruct;

ptheStruct->data5 = -345.9;

 

Otherflt = PtheStruct->data5 + 15.7;