Data Mart - User Sessions
NOTE***: Certain information has been redacted/changed to protect proprietary information
While at Intuit, I created a data mart which measured user sessions in specific parts of the product using SQL. By creating an automated job using Github and SuperGlue, any analyst could use the model hosted in the AWS data lake.
The development of this data mart came from the need to understand user behavior within the product on a deeper level. Past analyses had shown that the more time users spent in the product (controlling for things like complexity & user proficiency) the more sub-optimal the interaction was. Analyzing overall interaction length could only take us so far; we needed granular metrics by each widget, which we felt a user session data mart would provide.
The behavior in the product is mostly tracked via clickstream, which produces hundreds of millions, and sometimes billions, of data points. The main difficulty of creating sessions in this example is the necessity of chronology. We need to know each time a user enters and exits a certain part of the product & when they do it. In this case, order matters.
Since order matters, the main way to develop sessions is through window functions. Using functions such as row_number(), lead() & lag(), we can flag each time a User enters and exits the area & the timestamps related to it. The final output is a table with every session that occurs in that area of the platform, it's start and end, the user who experienced it, and other relevant dimensions.
Want to dig into the redacted code yourself? Head on over to my Github repo!