Search This Blog

Ilkka Peltola

Business, Data and Technology

Posts

Showing posts from June, 2018

Show all

A more holistic purpose of analytics

Get link
Facebook
X
Pinterest
Email
Other Apps

- June 04, 2018

Why do people make bad decisions? Mostly because of insufficient or wrong information, but not always. So what should be done about it?

ThumbmarkJS: A free, open source device fingerprinting JavaScript library for the web

- December 30, 2023

I needed a decent JavaScript fingerprinting library. I wanted something that was 'good enough': not crappy, but didn't need to be perfect. I noticed the great FingerprintJS , but sadly, they changed their license to a paid one. Boo! What is a good alternative to FingerprintJS? There are alternatives out there too, but to be honest, they all have faults. FingerprintJS is great, but they're monetizing their product in a way that I don't like. I might need hundreds of thousands of requests per month, but I can't pay thousands of dollars. It doesn't need to be perfect either, so I don't want to pay such a high premium. ImprintJS used to be a thing, but it's now archived for a few years already. ClientJS hasn't been updated for a few years either. It is promising, but I find it a little too complicated to extend and I see nowhere any statistics on how good is it. BroprintJS is the new kid on the block and hats off for trying, but it's very lim...

Simple way to query Amazon Athena in python with boto3

- April 30, 2018

ETL takes time and it's a lot to maintain. Sometimes it breaks when you didn't expect a string to contain emojis. You might decide the transformation needs to be changed, which means you need to refresh all your data. So what can you do to avoid this?

How to access AWS S3 with pyspark locally using AWS profiles tutorial

- April 14, 2021

At Zervant , we currently use databricks for our ETL processes, and it's quite great. However, there's been some difficulty in setting up scripts that work both locally and on the databricks cloud. Specifically, databricks uses their own prorpietary libraries to connect to AWS S3 based on AWS hadoop 2.7. That version does not support accessing using AWS profiles. Internally, we use SSO to create temporary credentials for an AWS profile that then assumes a role. Therefore, reading the ACCESS_ID and ACCESS_SECRET from the .credentials file is something we don't want to do. In order to accomplish this, we need to set two hadoop configurations to the Spark Context fs.s3a.aws.credentials.provider com.amazonaws.auth.profile.ProfileCredentialsProvider This is done by running this line of code: sc._jsc.hadoopConfiguration().set("fs.s3a.aws.credentials.provider", "com.amazonaws.auth.profile.ProfileCredentialsProvider") Note! You need to set your environment var...

Powered by Blogger

Theme images by Michael Elkan