Friday, April 5, 2019

The Glue code that runs on AWS Glue and on Dev Endpoint


When you develop code for Glue with the Dev Endpoint, you soon get annoyed with the fact that the code is different in Glue vs on Dev Endpoint
  • glueContext is created in a different manner
  • there's no concept of 'job' on dev endpoint, and therefore
  • no arguments for the job, either
So Mike from The MIS Theorist asked if there was a simpler way. And sure there is!


Template boilerplate Glue code


import sys
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
# Some common stuff not needed in this boilerplate
#from pyspark.sql.functions import *
#from awsglue.dynamicframe import DynamicFrame
#from awsglue.transforms import *

_dev_ep = False
try:
    ## There's no JOB_NAME in args, so code will raise an exception here
    args = getResolvedOptions(sys.argv, ['JOB_NAME', 'DAYS'])
    sc = SparkContext()
    glueContext = GlueContext(sc)
    job = Job(glueContext)
    job.init(args['JOB_NAME'], args)

except Exception as e:
    print("Exception:", e)
    _dev_ep = True
    args = {'JOB_NAME': 'Your Glue Job', 'DAYS': '1'}
    glueContext = GlueContext(SparkContext.getOrCreate())

spark = glueContext.spark_session

## Do your thing after this line
datasource0 = glueContext.create_dynamic_frame.from_catalog(
    database = "your_database_name",
    table_name = "your_table_name")
datasource0.printSchema()

## Don't change the rest
if not _dev_ep:
    job.commit()

This makes developing Glue code easier, since you can copy-paste your development code directly into Glue and it still works.

Let me know if you found this helpful 👇🏻. Cheers! 🙃

No comments:

Post a Comment