The Glue code that runs on AWS Glue and on Dev Endpoint

- April 05, 2019

When you develop code for Glue with the Dev Endpoint, you soon get annoyed with the fact that the code is different in Glue vs on Dev Endpoint

glueContext is created in a different manner
there's no concept of 'job' on dev endpoint, and therefore
no arguments for the job, either

So Mike from The MIS Theorist asked if there was a simpler way. And sure there is!

Template boilerplate Glue code

import sys
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
# Some common stuff not needed in this boilerplate
#from pyspark.sql.functions import *
#from awsglue.dynamicframe import DynamicFrame
#from awsglue.transforms import *

_dev_ep = False
try:
    ## There's no JOB_NAME in args, so code will raise an exception here
    args = getResolvedOptions(sys.argv, ['JOB_NAME', 'DAYS'])
    sc = SparkContext()
    glueContext = GlueContext(sc)
    job = Job(glueContext)
    job.init(args['JOB_NAME'], args)

except Exception as e:
    print("Exception:", e)
    _dev_ep = True
    args = {'JOB_NAME': 'Your Glue Job', 'DAYS': '1'}
    glueContext = GlueContext(SparkContext.getOrCreate())

spark = glueContext.spark_session

## Do your thing after this line
datasource0 = glueContext.create_dynamic_frame.from_catalog(
    database = "your_database_name",
    table_name = "your_table_name")
datasource0.printSchema()

## Don't change the rest
if not _dev_ep:
    job.commit()

This makes developing Glue code easier, since you can copy-paste your development code directly into Glue and it still works.

Let me know if you found this helpful 👇🏻. Cheers! 🙃

Search This Blog

Ilkka Peltola

The Glue code that runs on AWS Glue and on Dev Endpoint

Template boilerplate Glue code

Comments

Post a Comment

Popular posts from this blog

ThumbmarkJS: A free, open source device fingerprinting JavaScript library for the web

Simple way to query Amazon Athena in python with boto3

How to access AWS S3 with pyspark locally using AWS profiles tutorial