31 Aug 2016
Erlang Supervision Trees - This presentation was made for the meet up on Erlang supervision trees which was arranged to discuss and understand how Erlang supervision trees helps bring fault tolerance, recovery and robustness to Erlang applications. The presentation covers following
Basics - A supervisor is responsible for starting, stopping and monitoring its child processes. The basic idea of a supervisor is that it has to keep its child processes alive by restarting when necessary.
Supervision Trees - Supervisors can supervise workers (leaf nodes) or other supervisors forming supervision tree, while workers should only be positioned under supervisor.
Supervision Strategy - Supervision strategy consists of two steps - first to form the supervision tree and then providing restart strategy at each level to follow when child dies, which could affect other children as well.
Complete examples - To look at the complete example of supervision tree from open source projects like rabbitmq and ejabberd to get some understanding of how supervisors are used in real world.
Here is the presentation for the Erlang Supervision Trees
Here is the github repository for the basic and advanced examples of Erlang supervision tree
25 Jun 2016
Have been working on Erlang building scalable ad server from 2010 onwards and have built some depth on the topic, wanted to meet other people who were working on the erlang based development.
- Erlang Latest Version - This was the first erlang meetup I have hosted, more than 20 developers joined and lot of interesting discussion around
- Extended Time Functionality
- SSL & SSH Improvements
- License Change
- Performance & Scalability
Here is the Erlang latest version & opensource projects presentation and snap of the event which was hosted at Bizense (my first startup) office on 23rd Apr, 2016.
- Erlang Build Tools - This presentation was made for the meetup on Erlang build tools which basically discussed on different build tools available in Erlang and simple examples for each. Here is the summary of build tools discussed
- Emakefile - A make utility for Erlang providing set of functions similar to unix style make functions. This is packaged with Erlang distribution and is the default build tool.
- Erlang.mk - It is an included file for GNU Make, meaning including it in a Makefile allows building project, building/fetching dependencies and more.
- rebar & rebar3 - rebar is self contained Erlang script, easy to distribute and embed in the project, provides dependency management, version 3.x has lot of improvements over 2.x
- Mix - Mix is a command line utility that manages elixir projects but can be used for managing Erlang projects as well.
Here is the presentation for the Erlang Build Tools
- Erlang Supervision Trees - This was the last meetup on erlang I have hosted, here is blog post for it. The erlang community in India is very small, met & came to know many of the excellent erlang developers during these meetups.
Here is the presentation for the Erlang Supervision Trees
Have been associated with frontend development from the long time from college days even before jquery became popular, our product was built using angularjs framework and got an opportunity to discuss on how it works with other developers during a meetup event.
AngularJS Anatomy & Directives - at the JS Meetup Event
Present basic and advanced code examples for AngularJS along with an example on directives. The presentation is much more detailed as it was created for training employees on front end development in my previous company covering web development evolution & concepts, angularjs anatomy, demos, directives, testing & debugging. The presentation talked about the following
- Evolution - Let’s start with an overview of the evolution of the web application development over time in the last 6 years from 2011 to 2016.
- Concepts - Angular combines lot of good programming concepts together to create an effective & powerful web application development framework.
- Anatomy - Anatomy of an AngularJS application - understanding structure of angular app with basic & advanced examples.
- Demos - Get familar with wide range of demos on different aspects of AngularJS as well as complete apps and opensource projects.
- Directives - Extend HTML with directives for your application specific functionality abstracted into a reusable & clean interface.
- Testing & Debugging - How AngularJS helps in creating testable application and what tools are available for testing & debugging your app.
Here is the AngularJS Anatomy & Directives presentation and snap of the event which was hosted at Calm.io office on 18th Jun, 2016
Another interesting meetup I attended this month was Open API Meetup from AWS showing how to integrate with APIs from Exotel, Freshdesk & Reverie. Lot of insights around power of API from the three companies who have already solved very different but real world problems.
02 Sep 2015
Elasticsearch provides an extremely robust platform for building custom analytics application through flexible aggregations. Although, as data goes into 100gb range, the query performance starts to degrade, so a mechanism like lambda architecture is needed to ensure the queries are running fast (especially on fields with high cardinality) without increasing the infrastructure cost.
We have used similar architecture and design for a mobile advertising product, Adatrix. The raw data and batch views (created through background jobs) were both stored in elasticsearch in different indexes. UI was using only the batch views and the speed layer was not implemented as the requirement for real time statistics was not so much there, although as were using an in-memory database Aerospike, the speed layer could be implemented whenever a requirement comes up.
Cardinality is important factor in elasticsearch query performance and the fields with high cardinality seems to have higher performance degradation. In raw data, there is very high possibility of id field (like session id or message id) having high cardinality and statistics would have to be built on those fields.
Few important considerations for creating batch views
breaking batch views by time (like hourly, daily, weekly etc) helps in keeping the cardinality of these fields reasonable for queries which run on raw data
breaking batch views by dimension (master dimensions included for creating the batch view, for example if your data has attributes like location, device, advertiser, publisher, you can select some of those to create batch views and rest of the dimensions will be flattened)
Few important considerations for queries
Direct queries on raw data (should not be done of very large data)
Rolling up of summaries or batch views
Batch views limits the kind of queries that can be done (as dimensions are flattened, cross dimension is not possible)
Also, for storing raw data on elasticsearch, few important optimizations are required
only keeping reverse indexed data and not raw data (removing _source storage)
keyed fields instead of raw text fields
optimizing elasticsearch storage (removing _all, keeping cardinality of fields reasonable, if possible)
Lastly, one important consideration is with finding unique values for high cardinality fields like audience. The unique value cannot be rolled up, for example, the unique audience per day can’t be added to find unique per week. So for this, some approximate algorithm like HyperLogLog has to be used to find unique statistics for high cardinality fields.
Background - Lambda Architecture
In case the data generation per day goes above 3gb, your data generation is going into the big-data use case and now the most popular architecture for this kind of data generation is Lambda Architecture defined by Nathan Marz of Apache Storm (contributed by Twitter), more details provided in the end. Briefly, lambda architecture is layered architecture with speed layer (creating real time views for recent data), batch layer (raw data and creating batch views on older data) and serving layer (which combines queries through the batch views and realtime views).
Interesting video from Yieldbot (an advertising company)
Other important links
10 Sep 2011
Accurate attribution has become increasingly important aspect in digital advertising due to the fact that users are being reached through multiple channels and touchpoints. And to determine the ROI from a particular channel or touch point, it is extremely important that returns are rightly attributed. This not only helps in better understanding of the results of the campaign already executed but it will also provide a great insight and direction on what should be the media planning for future campaigns.
The concept of purchase funnel which starts with creating awareness then generating interest and later invoking desire which finally results in an action, has not been built into digital advertising systems (which mostly depend on last click model). As these systems are maturing, attribution reporting and modeling is being added which will help in accurate ROI calculation.
Attribution reporting is basically retroactive reporting which helps to compare contribution of each type of touch point and ad events while attribution modeling is more like a proactive “what if” analysis which helps in optimizing the ways ad events occur based on attribution model. For example, a campaign which is creating awareness for a newly launched product will give most importance to last impression and reducing it with further with old impressions (decay). When a conversion happens, the attribution will be given to all the touch points which happened to be in the path to conversion based on the attribution model designed.
Attribution modeling can also help in better bidding with RTB systems as the bid will be determined using attribution model. It can also provide optimization opportunities based on consumer responses. But due to the impact it has, the modeling has to be done with caution. With Adatrix, we are building attribution reporting and providing ability to specify simple attribution models. In future, we will bring attribution modeling in bidding process and trend based optimization.
Have created first feature presentation on attribution reporting & modeling below. Will convert it into reveal.js presentation sometime.
07 Sep 2011
Last week during discussions on integrations with exchanges, DSPs etc. we were curious to know about how these have evolved and what is their uniqueness in digital advertising landscape. Briefly putting below my understanding on various jargons which are being used in online advertising today.
Considering the general workflow, advertisers hire agencies (for expertise) to spend their money, these agencies have buying desks which has relationships/partnerships with different entities on the supply side. Now two things have evolved during last 3 years mostly surrounding real time bidding.
Buying Desks along with the traditional buying relationships and partnerships now have something called Automated Trading Desks which are similar to DSP but are fully owned by agencies. Most of automated trading desks are running on technology either licensed from another company or acquisition. The video highlights the point of conflict when any technology company works as an agency or vice versa. But this conflict is not occurring till now with Google’s display network which is pretty strange.
Ad networks or publisher networks started out with simple model of combining publisher sites into verticals, but due to abundance of networks and lack of differentiation, these networks have re-branded themselves as DSP and SSP. Even now many of them don’t have technology platform but provide a combination of licensed technology with inventory they had earlier. Many of these will now evolve to private exchange (providing the benefits of real time bidding along with the inventory they bring) which seems to be the next logical step. Private exchange concept is picking up to increase the spending of direct buy through real time bidding model. Exchange was mostly used for remnant stuff not just in terms of inventory that is left out but also in terms of money that is left out after premium buy.
Some links which will bring clarity on these are provided. Most of the links are interesting to read giving slightly different perspective and argument.
- DSP – Demand side platform – providing integrations with exchanges and ad networks to buy with RTB Ex: AdChemy, X+1, Media Math, DataXu
- Exchange – RTB – Real Time Bidding – providing auction capability across different kinds of systems (DSP, SSP, Networks) Ex: DoubleClick, Right Media
- SSP – Supply side platform – providing integrations with exchanges and ad networks to sell with RTB Ex: Rubicon, AdMeld, Pubmatic
- Networks – Used to refer to ad networks which are basically publisher networks – Thousands of networks are there