Introduction
Data exchange formats are standards used to transmit data across different applications running on a variety of platforms in the internet. During a long time, XML and JSON were the leaders. However, those formats lake some programming paradigms that developers tend to use like ENUMs and methods (OOP compliant). In addition, human readable formats (like JSON) are slower to parse and quiet large in size compared to binary files; reasons that lead Google to release Google Buffer Protocol.Google Protocol Buffer (Protobuf)
Google Protocol Buffer (also known as protobuf) was developed by Google as a data exchange neutral independent platform. It's goal is to represent complex structures and transmit them in a more efficient, reliable and safer way in different programming languages like Python and C++. Protobuf is seen as a better and more extensible format compared with XML or JSON :
- It has support for ENUMs
- Methods are used to get (and set) and serialize (and deserialize) structures.
- It is faster (from 20 to 100)and smaller in size (from 3 to 10) than traditional data exchange formats (like XML)
Generating Google Buffer Format
Only three steps are required to serialize your data to protobuf compliant format :- Write your data in a .proto file (example in picture above shows a class representing car information).
- Use Google ProtoBuffer Compiler (called protoc) to generate classes which can get/set and serialize/deserialize data (data defined in .proto file)
- Include generated classes in your code and have fun
Note : protoc generates class names as those defined in .proto file (in our example, class name is CarInfo).
General Syntax Overview
A proto file is no different from standard C structure (with only some additional specifications). Let's have an example :syntax = "proto2"; // mandatory field (proto3 - protobuf version 3 exists also) package com.company.test; // The package is an optional field message EmployeeInformation{ required string employee_name = 1; required int32 employee_id = 2; optional bool gender = 3; repeated float last_three_month_salary = 4; }
Data type
As one can see, familiar data types (string, bool, int32, float..., etc) are used for variable members (employee_id, employee_id, ..., etc).
Nevertheless to say that other data types are possible : more an be found here https://developers.google.com/protocol-buffers/docs/proto#simple
Field modifiers
In order to understand more, We can compare those modifiers to some regular expression concepts :Modifier | Regular expression | Meaning |
---|---|---|
required | {1} | mandatory field that cannot be empty (otherwise protobuf throws an error when serializing or deserializing) |
optional | ? or {0,1} | field can be empty and not set by user |
repeated | * or {0,} | can have zero or more values (used to create arrays) |
Field identifiers
Every field must be associated with a unique identifier. In fact, the latter will be serialized (and not the field name) and deserialized by protobuf. Both of field type and identifier are encoded. However, the required number of bytes used for representing them (field type+identifier) depends on identifier's value. When the latter is less than 16, only one byte is required (try using it for more frequently transmitted data to).Default values
Default values can be assigned to fields declared as "optional". If the user did not set an optional field value, default one will be assigned. In case a default value was not provided, another value will assigned implicitly by protobuf as follow :Field type | Value (absence of default value) |
---|---|
numeric (int, float, double, ..., etc) | 0 |
string | empty |
ENUM | first value in ENUM |
Nested messages
Multiple messages can be nested like C structures as shown in the example below :message AirCraft{ required string aircraft_model = 1; required double aircraft_max_speed = 2; message TransmissionAndDiscovery{ required string transmission_frequency_technology_version = 1; required double transmission_frequency = 2; required double transmission_frequency_range = 3; } required bool aircraft_stealthy = 3; }
Remark : Field identifier must be reset when declared inside another message, so transmission_frequency_technology_version = 1 and not 3
Miscellaneous concepts
Syntax version
Two different versions of the protocol are used in the wild (version 2 is more dominant). We must include the version number (typically syntax = "proto2" or syntax = "proto3").Package name
Package name is translated by protoc (google buffer protocol compiler) to a Namespace in the generated classes in order to avoid naming conflicts (this field is optional but it's highly recommended).Naming conventions
- Message name : Capitalize the first word and use camel case for others (if any), for instance :
- message teslaCarfeatures ==> Wrong
- message TeslaCarFeatures ==> Correct
- Field name : use lower case notation separated by underscore when having multiple words
- required int32 chargingSpeed ==> Wrong
- required int32 charging_speed ==> Correct
Practical Protobuf
As We have already said, Google Protocol Buffer can be used with various programming languages (C++, GO, DART, and more). We can demonstrate it's usage with two widely spread languages C++ and PYTHON.Protobuf in PYTHON
In this section, We are going to use google protocol buffer to serialize data required to identify a given person and save it to a file. Another process will parse the file, read the data back for display.- Describing the data using protocol buffer format :
Save the file as humainIdentity.proto (or any other name like example.proto).syntax = "proto2"; package com.company.humainIdentity; message HumainIdentityDescription{ required string humain_first_name = 1; required string humain_last_name = 2; required int32 humain_age = 3; required bool humain_gender = 4; optional string humain_profession = 5 [default="No Profession"]; }
- Generating classes using protoc (Google Buffer Protocol)
- Use generated classes in Python code
- serializer.py :
import humainIdentity_pb2 # import generated class import sys humainId = humainIdentity_pb2.HumainIdentityDescription() # Create an objectHumainIdentityDescription humainId.humain_first_name = "Jugurtha" # Set first name humainId.humain_last_name = "BELKALEM" # Set last name humainId.humain_age = 26 # Set age humainId.humain_gender = True # Set age humainId.humain_profession = "Embedded System Enginner" # Set profession (this optional, so can be left empty) humainIdSerialized = humainId.SerializeToString() # Serialize data using SerializeToString method print(humainIdSerialized) # display serialized data fileOutputStream = open("humain_id", "wb") fileOutputStream.write(humainIdSerialized) # save serialized data to a file fileOutputStream.close()
- deserializer.py
import humainIdentity_pb2 # import generated class (required to deserialize) import sys def getGender(humainGender): if humainGender: return "Male" else: return "Female" def parseDeserializedData(hId): print("Full Name : " + hId.humain_first_name + " " + hId.humain_last_name) print("Gender : " + getGender(hId.humain_gender)) print("Age : " + str(hId.humain_age)) print("Profession : "+ hId.humain_profession) humainId = humainIdentity_pb2.HumainIdentityDescription() # Create object Instance fileInputStream = open("humain_id", "rb") humainIdSerialized = fileInputStream.read() # Read data from file humainId.ParseFromString(humainIdSerialized) # deserialize data read from file using ParseFromString method fileInputStream.close() parseDeserializedData(humainId) # display deserialized data
- serializer.py :
protoc --python_out=. humainIdentity.proto
protobuf in C++
In this example we are going to gather necessary information to compute a BMI (Body Mass Index) of an individual, save it to a file and read it back to calculate the BMI.- proto buffer file specification :
syntax = "proto2"; package com.company.bodyMassIndex; message BodyMassIndex{ required float height = 1; required float weight = 2; }
- Generate C++ classes :
In case of C++, a header (bodyMassIndex.pb.h) and class implementation (bodyMassIndex.pb.cc) are generated.
protoc --cpp_out=. bodyMassIndex.proto
- Use generated classes in your code :
- serializer.cpp :
#include <iostream> #include <fstream> #include <string> #include "bodyMassIndex.pb.h" using namespace std; int main(){ GOOGLE_PROTOBUF_VERIFY_VERSION; // it's recommanded by Google to make sure that the correct protobuf library is loaded com_company_bodyMassIndex::BodyMassIndex bmi; // Create an instance of BodyMassIndex bmi.set_height(1.75); // Set the height bmi.set_weight(60); // Set the weight // Create a output file stream (in order to save BodyMassIndex instance) fstream outFileStream("bmi", ios::out | ios::trunc | ios::binary); if (outFileStream) { bmi.SerializeToOstream(&outFileStream); // Serialize bmi and save it } else cout << "file error" << endl; google::protobuf::ShutdownProtobufLibrary(); // free all resources return 0; }
- deserializer.cpp :
#include <iostream> #include <fstream> #include <string> #include "bodyMassIndex.pb.h" using namespace std; int main(){ GOOGLE_PROTOBUF_VERIFY_VERSION; com_company_bodyMassIndex::BodyMassIndex bmi; // Create a input stream (to read serialized data from a file) fstream inFileStream("bmi", ios::in | ios::binary); if (inFileStream) { bmi.ParseFromIstream(&inFileStream); // Deserialize read data and load them into bmi } else cout << "file error" << endl; cout << "Height : " << bmi.height() << " ==> Weight : " << bmi.weight() << endl; // display deserialized data cout << "BMI = " << bmi.weight() / (bmi.height() * bmi.height()) << endl; // bmi = (weight / (height * height)) google::protobuf::ShutdownProtobufLibrary(); return 0; }
- serializer.cpp :
Aucun commentaire:
Enregistrer un commentaire