Saturday, April 17, 2010

Archive: Binary data from a Structure

For this post, I will be using the binary structure used by the trusty old DBF file structure.

[StructLayout(LayoutKind.Sequential, Pack = 1)]
struct DBFHeader
{
  public byte Tag;
  public byte Year;
  public byte Month;
  public byte Day; 
  public int RecCount;
  public short HeaderSize;
  public short RecSize;
  public short Reserved1;
  public byte Trans;
  public byte Encrypt;
  public long Reserved2_1;
  public short Reserved2_2;
  public byte MDX;
  public byte LangId;
  public short Reserved3;
}

Given the above structure, our challenge now is to transfer it from its in memory binary format to a stream. The stream may be a physical file, memory stream even a network stream. The most important aspect of all the techniques that I will investigate here is that the resulting binary data format is consistent with the format expected by the reader.

After looking at the interface provided by the Stream class, from which all streams inherit, I came to the conclusion that I would have to convert the structure to a byte array (byte[]).

Again our first stop is the interop services provided by the .NET framework. The following piece of code will convert a structure to a byte[] that can then be written to a stream.

DBFHeader hdr = new DBFHeader(); 
byte[] data = new byte[Marshal.SizeOf(hdr)];
unsafe
{
  DBFHeader *p = &hdr;
  Marshal.Copy( (IntPtr)p, data, 0, data.Length ); 
} 

This code requires that you compile with unsafe code allowed. This can be done either by supplying the /unsafe switch to the command line compiler or in the Visual Studio .NET project properties under Configuration Properties>Build, you can set the Allow Unsafe Code Blocks to True.

After creating a instance of the structure DBFHeader a byte[] is created large enough to hold the contents of the structure. At that point we need to transfer the contents of the DBFHeader instance to the byte[]. To accomplish this, I declared an unsafe block and assigned the address of the structure, being a ValueType the structure is allocated on the local stack and is therefore implicitly pinned. Then the Marshal.Copy method is used to copy the data from an address in memory to the byte[]. Now we are free to write the byte array to a stream using the Write method provided by the stream.

There are a number of things that count against this technique. Using unsafe code blocks means that the assembly can not be verified and the caller would require the SkipVerification permission. The use of the Marshal methods requires that the immediate caller has SecurityPermissionAttribute.UnmanagedCode. This technique also requires that the data be moved in two stages, from the structure to the byte[] an then from there to the stream, this could be costly for large structures.

The first problem of the code being non-verifiable can be overcome by using a technique very similar to the above technique, without the requirement for unsafe code blocks. The following code demonstrates this technique.

DBFHeader hdr = new DBFHeader(); 
byte[] data = new byte[Marshal.SizeOf(hdr)];
IntPtr p = Marshal.AllocHGlobal( Marshal.SizeOf(hdr) );
Marshal.StructureToPtr( hdr, p, false );
Marshal.Copy( (IntPtr)p, data, 0, data.Length ); 
Marshal.FreeHGlobal( p );

Notice that it still requires that the data be moved in two stages and the use of the Marshal class carries the same security requirements as earlier. The only benefit is that the resulting assembly remains verifiable, not requiring the /unsafe option. However, allocating memory from the unmanaged heap carries with it additional performance overheads.

And finally the pure managed code solution that while not as easy as the prior techniques, it does benefit from the fact that the code is easily ported to other .NET languages and does not have the same security restrictions as the other two techniques. The solution is to use the BinaryWriter class to write each bit of information from the structure to the stream.

using (BinaryWriter wr = new BinaryWriter( stm ))
{
  wr.Write( hdr.Tag );
  wr.Write( hdr.Year );
  wr.Write( hdr.Month );
  wr.Write( hdr.Day );
  wr.Write( hdr.RecCount );
  wr.Write( hdr.HeaderSize );
  wr.Write( hdr.RecSize );
  wr.Write( hdr.Reserved1 );
  wr.Write( hdr.Trans );
  wr.Write( hdr.Encrypt );
  wr.Write( hdr.Reserved2_1 );
  wr.Write( hdr.Reserved2_2 );
  wr.Write( hdr.MDX );
  wr.Write( hdr.LangId );
  wr.Write( hdr.Reserved3 ); 
}

This code first creates a BinaryWriter that provides binary methods to access the underlying stream. In this case the underlying stream is designated by the stm argument passed to the BinaryWriter constructor. Then using the binary writer, each member of the structure is written in order to the stream through the BinaryWriter instance. I have already mentioned the clear advantages of this technique, but it does carry with it a share of disadvantages. The first and most obvious is that it requires you to write out each member individually, and with larger structures, this can be a tedious and error prone task. This brings me to the second disadvantage, if the data is not written out in the exact order expected by the reader, the reader will either fail to read the structure or even worse, corrupt data.

Just as a closing note, the final technique could also be used to get a byte[] by writing to a MemoryStream and then calling the GetBuffer method provided by MemoryStream to access the underlying byte[].


Conclusion

As with every design, it is a series of compromises that determines the ultimate chooses we make. The above techniques provide a number of alternative ranging from simplicity of code to flexibility of implementation. I believe that the final option is most likely the more purist and from a reusability standpoint the more effective solution.

No comments:

Post a Comment