Ilkka Peltola

Posts

Showing posts from April, 2021

How to access AWS S3 with pyspark locally using AWS profiles tutorial

- April 14, 2021

At Zervant , we currently use databricks for our ETL processes, and it's quite great. However, there's been some difficulty in setting up scripts that work both locally and on the databricks cloud. Specifically, databricks uses their own prorpietary libraries to connect to AWS S3 based on AWS hadoop 2.7. That version does not support accessing using AWS profiles. Internally, we use SSO to create temporary credentials for an AWS profile that then assumes a role. Therefore, reading the ACCESS_ID and ACCESS_SECRET from the .credentials file is something we don't want to do. In order to accomplish this, we need to set two hadoop configurations to the Spark Context fs.s3a.aws.credentials.provider com.amazonaws.auth.profile.ProfileCredentialsProvider This is done by running this line of code: sc._jsc.hadoopConfiguration().set("fs.s3a.aws.credentials.provider", "com.amazonaws.auth.profile.ProfileCredentialsProvider") Note! You need to set your environment var...

Search This Blog

Ilkka Peltola

Posts

How to access AWS S3 with pyspark locally using AWS profiles tutorial

Popular posts from this blog

ThumbmarkJS: A free, open source device fingerprinting JavaScript library for the web

Simple way to query Amazon Athena in python with boto3

How to access AWS S3 with pyspark locally using AWS profiles tutorial