Akhil's Blog Thoughts, Ideas, Essays & Views

Erlang Supervision Trees

This is a post written in the past and brought here with minor changes only.

Erlang Supervision Trees - This presentation was made for the meet up on Erlang supervision trees which was arranged to discuss and understand how Erlang supervision trees helps bring fault tolerance, recovery and robustness to Erlang applications. The presentation covers following

  • Basics - A supervisor is responsible for starting, stopping and monitoring its child processes. The basic idea of a supervisor is that it has to keep its child processes alive by restarting when necessary.

  • Supervision Trees - Supervisors can supervise workers (leaf nodes) or other supervisors forming supervision tree, while workers should only be positioned under supervisor.

  • Supervision Strategy - Supervision strategy consists of two steps - first to form the supervision tree and then providing restart strategy at each level to follow when child dies, which could affect other children as well.

  • Complete examples - To look at the complete example of supervision tree from open source projects like rabbitmq and ejabberd to get some understanding of how supervisors are used in real world.

Here is the presentation for the Erlang Supervision Trees

https://www.slideshare.net/digikrit/erlang-supervision-trees

Here is the github repository for the basic and advanced examples of Erlang supervision tree

https://github.com/agrawalakhil/erlang-supervision-trees

Few Interesting Meetups - Erlang & Javascript

This is a post written in the past and brought here with minor changes only.

This month has been little busy, doing lot of networking with companies at incubation center as well as organizing second Erlang meetup discussing build tools and speaking on AngularJS at a Javascript meetup. Lot of these meetups were good from different perspectives, here are the details of the presentations prepared for these events.

Erlang Meetups - Bangalore Erlang-OTP-ians

Have been working on Erlang building scalable ad server from 2010 onwards and have built some depth on the topic, wanted to meet other people who were working on the erlang based development.

  • Erlang Latest Version - This was the first erlang meetup I have hosted, more than 20 developers joined and lot of interesting discussion around
    • Extended Time Functionality
    • SSL & SSH Improvements
    • License Change
    • Performance & Scalability

    Here is the Erlang latest version & opensource projects presentation and snap of the event which was hosted at Bizense (my first startup) office on 23rd Apr, 2016.

  • Erlang Build Tools - This presentation was made for the meetup on Erlang build tools which basically discussed on different build tools available in Erlang and simple examples for each. Here is the summary of build tools discussed
    • Emakefile - A make utility for Erlang providing set of functions similar to unix style make functions. This is packaged with Erlang distribution and is the default build tool.
    • Erlang.mk - It is an included file for GNU Make, meaning including it in a Makefile allows building project, building/fetching dependencies and more.
    • rebar & rebar3 - rebar is self contained Erlang script, easy to distribute and embed in the project, provides dependency management, version 3.x has lot of improvements over 2.x
    • Mix - Mix is a command line utility that manages elixir projects but can be used for managing Erlang projects as well.

    Here is the presentation for the Erlang Build Tools

  • Erlang Supervision Trees - This was the last meetup on erlang I have hosted, here is blog post for it. The erlang community in India is very small, met & came to know many of the excellent erlang developers during these meetups. Here is the presentation for the Erlang Supervision Trees

Javascript

Have been associated with frontend development from the long time from college days even before jquery became popular, our product was built using angularjs framework and got an opportunity to discuss on how it works with other developers during a meetup event.

  • AngularJS Anatomy & Directives - at the JS Meetup Event

    Present basic and advanced code examples for AngularJS along with an example on directives. The presentation is much more detailed as it was created for training employees on front end development in my previous company covering web development evolution & concepts, angularjs anatomy, demos, directives, testing & debugging. The presentation talked about the following

    • Evolution - Let’s start with an overview of the evolution of the web application development over time in the last 6 years from 2011 to 2016.
    • Concepts - Angular combines lot of good programming concepts together to create an effective & powerful web application development framework.
    • Anatomy - Anatomy of an AngularJS application - understanding structure of angular app with basic & advanced examples.
    • Demos - Get familar with wide range of demos on different aspects of AngularJS as well as complete apps and opensource projects.
    • Directives - Extend HTML with directives for your application specific functionality abstracted into a reusable & clean interface.
    • Testing & Debugging - How AngularJS helps in creating testable application and what tools are available for testing & debugging your app.

    Here is the AngularJS Anatomy & Directives presentation and snap of the event which was hosted at Calm.io office on 18th Jun, 2016

Lot of interesting technical discussions with different developers of varied background and experience. Our goal in coming months is to increase interest among developers around open source projects we are working on and also to keep the discussion going deeper on Erlang & Javascript.

Another interesting meetup I attended this month was Open API Meetup from AWS showing how to integrate with APIs from Exotel, Freshdesk & Reverie. Lot of insights around power of API from the three companies who have already solved very different but real world problems.

Lambda Architecture On Elasticsearch

This is a blog post written in the past and brought here with minor changes only. There will be a another version with more insights coming out soon.

Elasticsearch provides an extremely robust platform for building custom analytics application through flexible aggregations. Although, as data goes into 100gb range, the query performance starts to degrade, so a mechanism like lambda architecture is needed to ensure the queries are running fast (especially on fields with high cardinality) without increasing the infrastructure cost.

We have used similar architecture and design for a mobile advertising product, Adatrix. The raw data and batch views (created through background jobs) were both stored in elasticsearch in different indexes. UI was using only the batch views and the speed layer was not implemented as the requirement for real time statistics was not so much there, although as were using an in-memory database Aerospike, the speed layer could be implemented whenever a requirement comes up.

Cardinality is important factor in elasticsearch query performance and the fields with high cardinality seems to have higher performance degradation. In raw data, there is very high possibility of id field (like session id or message id) having high cardinality and statistics would have to be built on those fields.

Few important considerations for creating batch views

  1. breaking batch views by time (like hourly, daily, weekly etc) helps in keeping the cardinality of these fields reasonable for queries which run on raw data

  2. breaking batch views by dimension (master dimensions included for creating the batch view, for example if your data has attributes like location, device, advertiser, publisher, you can select some of those to create batch views and rest of the dimensions will be flattened)

Few important considerations for queries

  • Direct queries on raw data (should not be done of very large data)

  • Rolling up of summaries or batch views

  • Batch views limits the kind of queries that can be done (as dimensions are flattened, cross dimension is not possible)

Also, for storing raw data on elasticsearch, few important optimizations are required

  • only keeping reverse indexed data and not raw data (removing _source storage)

  • keyed fields instead of raw text fields

  • optimizing elasticsearch storage (removing _all, keeping cardinality of fields reasonable, if possible)

Lastly, one important consideration is with finding unique values for high cardinality fields like audience. The unique value cannot be rolled up, for example, the unique audience per day can’t be added to find unique per week. So for this, some approximate algorithm like HyperLogLog has to be used to find unique statistics for high cardinality fields.

Background - Lambda Architecture

In case the data generation per day goes above 3gb, your data generation is going into the big-data use case and now the most popular architecture for this kind of data generation is Lambda Architecture defined by Nathan Marz of Apache Storm (contributed by Twitter), more details provided in the end. Briefly, lambda architecture is layered architecture with speed layer (creating real time views for recent data), batch layer (raw data and creating batch views on older data) and serving layer (which combines queries through the batch views and realtime views).

Interesting video from Yieldbot (an advertising company)

Other important links

Attribution Reporting - Beyond Last Touch Point

This is a post written in the past and brought here with minor changes only.

Introduction

Accurate attribution has become increasingly important aspect in digital advertising due to the fact that users are being reached through multiple channels and touchpoints. And to determine the ROI from a particular channel or touch point, it is extremely important that returns are rightly attributed. This not only helps in better understanding of the results of the campaign already executed but it will also provide a great insight and direction on what should be the media planning for future campaigns.

The concept of purchase funnel which starts with creating awareness then generating interest and later invoking desire which finally results in an action, has not been built into digital advertising systems (which mostly depend on last click model). As these systems are maturing, attribution reporting and modeling is being added which will help in accurate ROI calculation.

Attribution reporting is basically retroactive reporting which helps to compare contribution of each type of touch point and ad events while attribution modeling is more like a proactive “what if” analysis which helps in optimizing the ways ad events occur based on attribution model. For example, a campaign which is creating awareness for a newly launched product will give most importance to last impression and reducing it with further with old impressions (decay). When a conversion happens, the attribution will be given to all the touch points which happened to be in the path to conversion based on the attribution model designed.

Attribution modeling can also help in better bidding with RTB systems as the bid will be determined using attribution model. It can also provide optimization opportunities based on consumer responses. But due to the impact it has, the modeling has to be done with caution. With Adatrix, we are building attribution reporting and providing ability to specify simple attribution models. In future, we will bring attribution modeling in bidding process and trend based optimization.

Presentation

Have created first feature presentation on attribution reporting & modeling below. Will convert it into reveal.js presentation sometime.

https://www.slideshare.net/bizense/attribution-reporting-and-modeling

How DSPs, Exchanges and SSPs have evolved?

This is a post from the past and brought here with minor changes only.

Background

Last week during discussions on integrations with exchanges, DSPs etc. we were curious to know about how these have evolved and what is their uniqueness in digital advertising landscape. Briefly putting below my understanding on various jargons which are being used in online advertising today.

Considering the general workflow, advertisers hire agencies (for expertise) to spend their money, these agencies have buying desks which has relationships/partnerships with different entities on the supply side. Now two things have evolved during last 3 years mostly surrounding real time bidding.

Buying Desks along with the traditional buying relationships and partnerships now have something called Automated Trading Desks which are similar to DSP but are fully owned by agencies. Most of automated trading desks are running on technology either licensed from another company or acquisition. The video highlights the point of conflict when any technology company works as an agency or vice versa. But this conflict is not occurring till now with Google’s display network which is pretty strange.

Ad networks or publisher networks started out with simple model of combining publisher sites into verticals, but due to abundance of networks and lack of differentiation, these networks have re-branded themselves as DSP and SSP. Even now many of them don’t have technology platform but provide a combination of licensed technology with inventory they had earlier. Many of these will now evolve to private exchange (providing the benefits of real time bidding along with the inventory they bring) which seems to be the next logical step. Private exchange concept is picking up to increase the spending of direct buy through real time bidding model. Exchange was mostly used for remnant stuff not just in terms of inventory that is left out but also in terms of money that is left out after premium buy.

Some links which will bring clarity on these are provided. Most of the links are interesting to read giving slightly different perspective and argument.

Definitions

  • DSP – Demand side platform – providing integrations with exchanges and ad networks to buy with RTB Ex: AdChemy, X+1, Media Math, DataXu
  • Exchange – RTB – Real Time Bidding – providing auction capability across different kinds of systems (DSP, SSP, Networks) Ex: DoubleClick, Right Media
  • SSP – Supply side platform – providing integrations with exchanges and ad networks to sell with RTB Ex: Rubicon, AdMeld, Pubmatic
  • Networks – Used to refer to ad networks which are basically publisher networks – Thousands of networks are there