Easy Computer Learning : Google Buffer Protocol V2

Introduction

Data exchange formats are standards used to transmit data across different applications running on a variety of platforms in the internet. During a long time, XML and JSON were the leaders. However, those formats lake some programming paradigms that developers tend to use like ENUMs and methods (OOP compliant). In addition, human readable formats (like JSON) are slower to parse and quiet large in size compared to binary files; reasons that lead Google to release Google Buffer Protocol.

Google Protocol Buffer (Protobuf)

Google Protocol Buffer (also known as protobuf) was developed by Google as a data exchange neutral independent platform. It's goal is to represent complex structures and transmit them in a more efficient, reliable and safer way in different programming languages like Python and C++. Protobuf is seen as a better and more extensible format compared with XML or JSON :

It has support for ENUMs
Methods are used to get (and set) and serialize (and deserialize) structures.
It is faster (from 20 to 100)and smaller in size (from 3 to 10) than traditional data exchange formats (like XML)

Generating Google Buffer Format

Only three steps are required to serialize your data to protobuf compliant format :

Write your data in a .proto file (example in picture above shows a class representing car information).
Use Google ProtoBuffer Compiler (called protoc) to generate classes which can get/set and serialize/deserialize data (data defined in .proto file)
Include generated classes in your code and have fun

Note : protoc generates class names as those defined in .proto file (in our example, class name is CarInfo).

General Syntax Overview

A proto file is no different from standard C structure (with only some additional specifications). Let's have an example :

syntax = "proto2"; // mandatory field (proto3 - protobuf version 3 exists also)

package com.company.test; // The package is an optional field 

message EmployeeInformation{
  required string employee_name = 1;
  required int32  employee_id = 2;
  optional bool gender = 3;
  repeated float last_three_month_salary = 4;
}

We will analyse this message step by step (see schematic below which serves as an introduction).

Data type

As one can see, familiar data types (string, bool, int32, float..., etc) are used for variable members (employee_id, employee_id, ..., etc).
Nevertheless to say that other data types are possible : more an be found here https://developers.google.com/protocol-buffers/docs/proto#simple

Field modifiers

In order to understand more, We can compare those modifiers to some regular expression concepts :

Modifier	Regular expression	Meaning
required	{1}	mandatory field that cannot be empty (otherwise protobuf throws an error when serializing or deserializing)
optional	? or {0,1}	field can be empty and not set by user
repeated	* or {0,}	can have zero or more values (used to create arrays)

Field identifiers

Every field must be associated with a unique identifier. In fact, the latter will be serialized (and not the field name) and deserialized by protobuf. Both of field type and identifier are encoded. However, the required number of bytes used for representing them (field type+identifier) depends on identifier's value. When the latter is less than 16, only one byte is required (try using it for more frequently transmitted data to).

Default values

Default values can be assigned to fields declared as "optional". If the user did not set an optional field value, default one will be assigned. In case a default value was not provided, another value will assigned implicitly by protobuf as follow :

Field type	Value (absence of default value)
numeric (int, float, double, ..., etc)	0
string	empty
ENUM	first value in ENUM

Nested messages

Multiple messages can be nested like C structures as shown in the example below :

message AirCraft{
    required string aircraft_model = 1;
    required double aircraft_max_speed = 2;
    
    message TransmissionAndDiscovery{
        required string transmission_frequency_technology_version = 1;
        required double transmission_frequency = 2;
        required double transmission_frequency_range = 3;
        
    }

    required bool aircraft_stealthy = 3;
}

Remark : Field identifier must be reset when declared inside another message, so transmission_frequency_technology_version = 1 and not 3

Miscellaneous concepts

Syntax version

Two different versions of the protocol are used in the wild (version 2 is more dominant). We must include the version number (typically syntax = "proto2" or syntax = "proto3").

Package name

Package name is translated by protoc (google buffer protocol compiler) to a Namespace in the generated classes in order to avoid naming conflicts (this field is optional but it's highly recommended).

Naming conventions

Message name : Capitalize the first word and use camel case for others (if any), for instance :
- message teslaCarfeatures ==> Wrong
- message TeslaCarFeatures ==> Correct
Field name : use lower case notation separated by underscore when having multiple words
- required int32 chargingSpeed ==> Wrong
- required int32 charging_speed ==> Correct

Practical Protobuf

As We have already said, Google Protocol Buffer can be used with various programming languages (C++, GO, DART, and more). We can demonstrate it's usage with two widely spread languages C++ and PYTHON.

Protobuf in PYTHON

In this section, We are going to use google protocol buffer to serialize data required to identify a given person and save it to a file. Another process will parse the file, read the data back for display.

Describing the data using protocol buffer format :

syntax = "proto2";

package com.company.humainIdentity;

message HumainIdentityDescription{
    required string humain_first_name = 1;
    required string humain_last_name = 2;
    required int32 humain_age = 3; 
    required bool humain_gender = 4;
    optional string humain_profession = 5 [default="No Profession"];
}

Save the file as humainIdentity.proto (or any other name like example.proto).

Generating classes using protoc (Google Buffer Protocol)

protoc --python_out=. humainIdentity.proto

Use generated classes in Python code

serializer.py :

import humainIdentity_pb2 # import generated class
import sys


humainId = humainIdentity_pb2.HumainIdentityDescription() # Create an objectHumainIdentityDescription
humainId.humain_first_name = "Jugurtha" # Set first name
humainId.humain_last_name = "BELKALEM" # Set last name
humainId.humain_age = 26 # Set age
humainId.humain_gender = True # Set age
humainId.humain_profession = "Embedded System Enginner" # Set profession (this optional, so can be left empty)


humainIdSerialized = humainId.SerializeToString() # Serialize data using SerializeToString method

print(humainIdSerialized) # display serialized data

fileOutputStream = open("humain_id", "wb")
fileOutputStream.write(humainIdSerialized) # save serialized data to a file
fileOutputStream.close()

deserializer.py

import humainIdentity_pb2 # import generated class (required to deserialize)
import sys


def getGender(humainGender):
    if humainGender:
        return "Male"
    else:
        return "Female"


def parseDeserializedData(hId):
    print("Full Name : " + hId.humain_first_name + " " + hId.humain_last_name)
    print("Gender : " + getGender(hId.humain_gender))
    print("Age : " + str(hId.humain_age))
    print("Profession : "+ hId.humain_profession)

humainId = humainIdentity_pb2.HumainIdentityDescription() # Create object Instance 

fileInputStream = open("humain_id", "rb")
humainIdSerialized = fileInputStream.read() # Read data from file
humainId.ParseFromString(humainIdSerialized) # deserialize data read from file using ParseFromString method
fileInputStream.close()

parseDeserializedData(humainId) # display deserialized data

Executing the above programs yields the following :

protobuf in C++

In this example we are going to gather necessary information to compute a BMI (Body Mass Index) of an individual, save it to a file and read it back to calculate the BMI.

proto buffer file specification :

syntax = "proto2";

package com.company.bodyMassIndex;


message BodyMassIndex{
    required float height = 1; 
    required float weight = 2;
}

Generate C++ classes :
```
protoc --cpp_out=. bodyMassIndex.proto
```
In case of C++, a header (bodyMassIndex.pb.h) and class implementation (bodyMassIndex.pb.cc) are generated.

Use generated classes in your code :

serializer.cpp :

#include <iostream>
#include <fstream>
#include <string>
#include "bodyMassIndex.pb.h"
using namespace std;


int main(){
    GOOGLE_PROTOBUF_VERIFY_VERSION; // it's recommanded by Google to make sure that the correct protobuf library is loaded

    com_company_bodyMassIndex::BodyMassIndex bmi; // Create an instance of BodyMassIndex

    bmi.set_height(1.75); // Set the height
    bmi.set_weight(60); // Set the weight

    // Create a output file stream (in order to save BodyMassIndex instance)
    fstream outFileStream("bmi", ios::out | ios::trunc | ios::binary);

    if (outFileStream) {
      bmi.SerializeToOstream(&outFileStream); // Serialize bmi and save it
    } else
        cout << "file error"  << endl;
    google::protobuf::ShutdownProtobufLibrary(); // free all resources
    return 0;    
}

deserializer.cpp :

#include <iostream>
#include <fstream>
#include <string>
#include "bodyMassIndex.pb.h"
using namespace std;


int main(){
    GOOGLE_PROTOBUF_VERIFY_VERSION;

    com_company_bodyMassIndex::BodyMassIndex bmi;
    
    // Create a input stream (to read serialized data from a file)
    fstream inFileStream("bmi", ios::in | ios::binary);

    if (inFileStream) {
      bmi.ParseFromIstream(&inFileStream); // Deserialize read data and load them into bmi
    } else
        cout << "file error"  << endl;

    cout << "Height : " << bmi.height() << " ==> Weight : " << bmi.weight() << endl; // display deserialized data

    cout << "BMI = " << bmi.weight() / (bmi.height() * bmi.height()) << endl; // bmi = (weight / (height * height))
    google::protobuf::ShutdownProtobufLibrary();
    return 0;    
}

The above programs produces the following output :

Easy Computer Learning

samedi 22 décembre 2018

Google Buffer Protocol V2