Protocol Buffers : not another markup language

We have seen quite a lot of message interchange formats. some good, some bad and some of them very very ugly.

My life with message interchange formats started with csv, they were very simple, and easy to understand, however it was not really useful when the messages contained one to many relationships among its attributes. For better or for worse, a large number of people were using XML and soon there were XMLs all over, not only for message interexchange, but for keeping configurations as well. Even though it might seem a bit confusing the first time, once a person gets familiarised with the concepts of the tags, properties, attributes namespaces,, you will be like a duck in the water. Then one fine day JSONs came into picture, and the yml after it. In my area of work, JSON requests and responses flew to and fro for years.

Recently I have been in touch with new and exciting way of sharing messages. Protocol buffers are Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data.

As you know the journey of a serialized message kind of happen the below way

As you are already aware, the source application should have the class definitions which will determine the structure of the serialised JSON, and the destination application should already have a class of the similar structure available in order to parse it properly and we all know the pain of mapping issues.

The Protocol buffers becomes interesting at this point, and thats why this is not just another mark up language. ProtoBuff allows to create .proto files in which we can speicify the structure of the message that we are passing. a sample .proto file from the google developer documentation is given below

syntax = "proto3";

message SearchRequest {
string query = 1;
int32 page_number = 2;
int32 result_per_page = 3;
}

The proto file has the the type information on each attribute and there is a number associated with each number that denotes the position of the attribute in the structure.

Explanation of the each components of the above example can be read here

-“Okay, wait.. how is this different from rest of the 10000 markup languages those are already available ? “

-“ Ah, these Proto files can be compiled to classes of your favourite languages”

-“wait, like JSON can be converted to the POJOs ?”

-“hmm… there is more.”

There is the protoc compiler that can be installed with ease and once you have the protoc compiler and the proto file available, you can generate the classes as below.

protoc java_out=. *.proto

for more, refer

If done correctly, the generated java code should look much more than POJO, the class is very elaborated and it even comes with a builder method that can be used to construct the instance of the object. Link to refer

Now comes the beauty of Protobuff, once materialised into the application class, protobuff generated classes can be used like any other classes and can be serialized. The serialised binaries can be sent to any other platforms and can be desiriazed into to its native classes generated using the same .proto. Also the serialized messages are much lighter than JSON or XML.

Lets see it in action

Lets create a Java-Maven project and add the below dependency

<dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
<version>3.11.0</version>
</dependency>

Now, lets add the maven protoc plugin to compile the protoc files

<plugin>
<groupId>com.github.os72</groupId>
<artifactId>protoc-jar-maven-plugin</artifactId>
<version>3.2.0.1</version>
<executions>
<execution>
<phase>generate-sources</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<!-- <includeDirectories> <include>src/main/protobuf</include> </includeDirectories> -->
<inputDirectories>
<include>src/main/protobuf</include>
</inputDirectories>
<!-- Create java files. And put them in the src/main/java directory. -->
<outputTargets>
<outputTarget>
<type>java</type>
<outputDirectory>src/main/java</outputDirectory>
</outputTarget>
</outputTargets>
</configuration>
</execution>
</executions>
</plugin>

Once the dependencies are resolved, we are good to go

Lets consider the below content for this example

syntax="proto3";

package com.protobuff.examples;

message Account{
int64 accoutNumber=1;
string accountHolderName=2;
enum AccountType{
UNKNOWN_TYPE=0;
SAVINGS=1;
CURRENT=2;
}
AccountType accountType=3;
}

Place the .proto file in src/main/protobuf directory and run com.github.os72:protoc-jar-maven-plugin:3.2.0.1:run to generate the corresponding java classes to the proto file.

Souce code from proto file

The source code from the proto file will be generated in the generated-sources directory.

Since the class is avialble now, we can put some data into it.

public static void main(String[] args) throws IOException {
System.out.println("Protocol Buffers example");
Person.Account account = Person.Account.newBuilder()
.setAccountHolderName("Amal")
.setAccoutNumber(5674622)
.setAccountType(Person.Account.AccountType.SAVINGS)
.build();
FileOutputStream output = new FileOutputStream("out.bin");
DataOutputStream dos = new DataOutputStream(output);
dos.write(account.toByteArray());
}

Note that the enums are also created in Java now.

Once the code executes, the object will be serialized and written into the file out.bin file.Now let us copy this file and feed it to another application in a different platform.

Lets consider go lang .

copy the proto file into a go lang workspace

generate the go lang structs using the protoc compiler. Refer

protoc — go_out=. account.proto

The generated go file will be available as account.pb.go

Once the Account struct is available within the workspace, copy the out.bin file to which the java code wrote the account object into the go workspace.

The above go code will read the binary from the out.bin file and unmarshall the same using the structs generated from the proto file. Note that enums will also be converted.

The serialised data is really small compared to JSON and XML and the proto files are human readable.

According to google developer documentation -

You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.

Protocol buffers currently support generated code in Java, Python, Objective-C, and C++. With our new proto3 language version, you can also work with Dart, Go, Ruby, and C#, with more languages to come.