Saturday, April 27, 2019

AWS Glue python ApplyMapping / apply_mapping example

The ApplyMapping class is a type conversion and field renaming function for your data. To apply the map, you need two things:
  1. A dataframe
  2. The mapping list

Friday, April 5, 2019

The Glue code that runs on AWS Glue and on Dev Endpoint

When you develop code for Glue with the Dev Endpoint, you soon get annoyed with the fact that the code is different in Glue vs on Dev Endpoint
  • glueContext is created in a different manner
  • there's no concept of 'job' on dev endpoint, and therefore
  • no arguments for the job, either
So Mike from The MIS Theorist asked if there was a simpler way. And sure there is!

Friday, March 22, 2019

AWS Glue, Dev Endpoint and Zeppelin Notebook

AWS Glue is quite a powerful tool. What I like about it is that it's managed: you don't need to take care of infrastructure yourself, but instead AWS hosts it for you. You can schedule scripts to run in the morning and your data will be in its right place by the time you get to work.

The downside is that developing scripts for AWS Glue is cumbersom, a real pain in the butt. I first tried to code the scripts through the console, but you end up waiting a lot only to realize you had a syntax error in your code.

Tuesday, October 2, 2018

Using survival plot to analyze churn in Power BI

I did not guess I'd be working with Kaplan-Meier survival plots so soon.

Analyzing churn and trying to figure out what kind of users churn more likely is not so easy. In order to calculate churn, you need to have a good volume of users that you can follow month over month, having a number of active users per month and what share of them are left behind every month. But what if you want to select a different set of users, another segment? Producing an analytics cube with the necessary dimensions takes time. And if you end up with a segment that doesn't have high volumes every month, interpreting the results can be quite tricky.

Thursday, September 13, 2018

Create a funnel analysis tool with Redshift and Power BI in 5 minutes

If you're not collecting events from your product, get started right away!

Events are a great way to collect behavioral data on how your users use your data: what paths they take, what errors they encounter, how long something takes etc. When you have events, there isn't a lot you cannot analyze.

Thursday, September 6, 2018

How to ETL in Amazon AWS? AWS Glue for dummies

You can do ETL in AWS in a few different ways:

  1. Glue
  2. DataPipeline
  3. A custom solution, e.g. a Docker

Monday, August 27, 2018

An insights strategy for winning companies

An executive summary

Companies struggle to gain maximum benefit from analytics and insights since

  1. Analytics is seen as a support function, not a business partner and therefore is  not prioritized high enough
  2. Analytics is separated from business processes and insights are produced away from execution
  3. Under-resourced, inflexible analytics stack, which doesn’t enable speed to react to changing needs

Monday, June 4, 2018

A more holistic purpose of analytics

Why do people make bad decisions? Mostly because of insufficient or wrong information, but not always. So what should be done about it?

Monday, May 28, 2018

Simple Big Data setup on Amazon AWS

Everyone wants to do some big data stuff, right? In all honesty, no-one cares if your data is big or small - size doesn't matter. What matters is your ability to take any size of data and generate understanding from it. At some point the data you are gathering might become inconvenient to process with more traditional tools. It might be that some big data tools might help you - or not. The bottom line is, it is a tool you want to have in your toolbox.

Monday, April 30, 2018

Simple way to query Amazon Athena in python with boto3

ETL takes time and it's a lot to maintain. Sometimes it breaks when you didn't expect a string to contain emojis. You might decide the transformation needs to be changed, which means you need to refresh all your data. So what can you do to avoid this?

Tuesday, February 13, 2018

Know exactly how much you pay to acquire any user: Python with Google API

So you've read about how to optimize your marketing efforts through data. With that, you should know the kind of users different marketing campaigns are bringing in. Some campaigns might be bringing more high-quality users than others. Can you do that now?