# Intro

A while ago, I was given a binary file reading script written using MATLAB. The scripts goal was to read a binary output file from an Inertial Measurement Unit (IMU) into a MATLAB matrix. What I worked on was converting the code from MATLAB to C++, specifically for use with the Armadillo C++ Matrix Library. Furthermore, I wanted to embed it within the gmwm R package that provides a method for modeling IMU error processes. ‘Twas here that the tale begins.

# Binary 1-0

Binary file formats are a bit different than an ASCII format, which is what we are accustom to seeing. Specifically, a binary file contains blocks of strings such as “01000010 01101100 01100001 01100011 01101011” to represent the traditional word: “black”. To view the files, you really cannot use your traditional IDE since more often than not they will try to convert file encoding into UTF-8 or Windows ISO. Instead, you should seek out a binary file editor (Windows: Hexplorer, OS X: Hex Fiend, Linux: xxd -b shell )

# Conversion from MATLAB to C++

The function was easily ported from MATLAB into C++. The main differences between the two versions settled around how I structured the IMU data type and, as you guessed it, how the binary file was read. The later was very problematic as it related to cross-platform deployment. The entire IMU function is available on GitHub for you to peruse or use in your standalone applications per the LICENSE.

# The Bug

The binary record format was either all double or a double and 6 longs for int. On Windows, everything just “worked” on the C++ port. I was able to easily recover the data within the matrix. However, on OS X, it was a mess. Only IMU binary record formats that used double exclusively were able to be loaded. When a record format contained long, the results were really, really odd.

The worst part is the bug came up as I was doing a live action demo on OS X. So, when I went back to my Windows development machine, I couldn’t replicate it at all. Thinking it was just a corrupted file, I went back to try to demo it again. Yet again, the bug raised its head.

When I started this section, it might have seemed odd that I wrote the data type as a long for int instead of long int. Part of the reason for that was the original code showed long as the primitive type. The other part is I wanted to emphasis that long by default is associated with int. So, long is equivalent to long int.

With this being said, I narrowed it down after several bits of running the code on my Windows machine and then running it on OS X to being related to the data types I was using. I coded up a straight forward primitive data comparison file:

#include <Rcpp.h>   // Way to bundle C++ with R, replace with normal C++ header
#include <stdint.h>

// [[Rcpp::export]]
void sizeme() {
std::cout << "Size of double: " << sizeof(double) << std::endl;
std::cout << "Size of long double: " << sizeof(long double) << std::endl;

std::cout << "Size of float: " << sizeof(float) << std::endl;

std::cout << "Size of int: " << sizeof(int) << std::endl;
std::cout << "Size of long int: " << sizeof(long int) << std::endl;
std::cout << "Size of long: " << sizeof(long) << std::endl;

std::cout << "Size of int32_t: " << sizeof(int32_t) << std::endl;
}

/*** R
sizeme()
*/


Under gcc on Windows, data types* have the following values:

• Size of double: 8
• Size of long double: 16
• Size of float: 4
• Size of int: 4
• Size of long int: 4
• Size of long: 4
• Size of int32_t: 4

Under clang on OS X, data types* have the following values:

• Size of double: 8
• Size of long double: 16
• Size of float: 4
• Size of int: 4
• Size of long int: 8
• Size of long: 8
• Size of int32_t: 4

* Above given in bytes…

Notice that a difference exist in the amount of bytes allocated to the long int type between OS X and Windows of 4 bytes. At long last, the reason was becoming clear what the issue was and what the solution is.

# Solution

Use portable integer types!

Type Description Value Range [min, max]
int8_t 8-bit signed integer $$[-2^{7}, 2^{7} - 1]$$
uint8_t 8-bit unsigned integer $$[0, 2^{8} - 1]$$
int16_t 16-bit signed integer $$[-2^{15}, 2^{15} - 1]$$
uint16_t 16-bit unsigned integer $$[0, 2^{16} - 1]$$
int32_t 32-bit signed integer $$[-2^{31}, 2^{31} - 1]$$
uint32_t 32-bit unsigned integer $$[0, 2^{32} - 1]$$
int64_t 64-bit signed integer $$[-2^{63}, 2^{63} - 1]$$
uint64_t 64-bit unsigned integer $$[0, 2^{64} - 1]$$

These primitive data types are guaranteed to have the same byte value across platforms. However, depending on the C++ compiler, they may or may not be declared! Though, since the world has moved on from the era of C++98, except for R, this is less likely but I would be remissed if I didn’t mention it as a potential future source of error.

Specifically, I ended up using int32_t in place of long to ensure the buffer had adequate space for the file to be read into.