Job variables are used to store and retrieve metadata in various job steps. This allows passing additional metadata between job steps, gathering logging info, or using it for additional logic in the job.
Access to job variables is through the etl.globals data structure.
Example of storing data:
<query connection-id=
"groovy"
>
<![CDATA[
etl.globals[
'start'
] =
new
Date().getTime();
]]>
</query>
Example of reading metadata (please note that the etl.globals can be also used in the IF condition of a job step):
<script connection-id=
"logger"
if
=
"etl.globals['logMe'] == true"
>
${etl.globals[
'start'
]}
</script>
Below is an example of a job that executes queries, measures the duration of the queries, and writes a JOB_LOG entry containing this information.
<!DOCTYPE etl SYSTEM
"https://scriptella.org/dtd/etl.dtd"
>
<etl>
<description>Import from XLSX to gremlin graph connection</description>
<properties>
file=/path/to/data.xlsx
</properties>
<connection id=
"gremlin"
driver=
"graphConnection"
>
project_id=
1
</connection>
<connection id=
"groovy"
driver=
"script"
>language=groovy</connection>
<connection id=
"logger"
driver=
"log"
>
level=WARN
</connection>
<connection id=
"excel"
driver=
"excel"
>
format=XLSX
</connection>
<!-- Initialize measuring variables with groovy.-->
<!-- Notice the use of
'query.next()'
, it is needed to call nested queries/scripts -->
<query connection-id=
"groovy"
>
<![CDATA[
etl.globals[
'sheet'
] = sheet
etl.globals[
'timeAgg'
] =
0
etl.globals[
'count'
] =
0
etl.globals[
'average'
] =
0
query.next()
]]>
<!-- Process XLSX as usual -->
<query connection-id=
"excel"
>
path=$file
sheet=${etl.globals[
'sheet'
]}
skip_rows=
1
num_columns=
15
escapeChars='"
<!-- Fetch start time -->
<query connection-id=
"groovy"
>
<![CDATA[
etl.globals[
'start'
] =
new
Date().getTime();
query.next()
]]>
<!-- Execute given query -->
<!-- Notice we use apostrophes in
this
query, so values in $row should not contain unescaped apostropes -->
<!-- For
this
reason we used escapeChars='" in the excel query above -->
<script connection-id=
"gremlin"
>
g.addV(
'Imported'
).property(
'uid'
,
'$row.A'
).property(
'name'
,
'$row.B'
).id()
</script>
</query>
<!-- Aggregate durations, increment execution counts-->
<script connection-id=
"groovy"
>
<![CDATA[
etl.globals[
'timeAgg'
] = etl.globals[
'timeAgg'
] + (
new
Date().getTime() - etl.globals[
'start'
])
etl.globals[
'count'
] = etl.globals[
'count'
] +
1
etl.globals[
'average'
] = etl.globals[
'timeAgg'
] / etl.globals[
'count'
]
etl.globals[
'logMe'
] = etl.globals[
'count'
] %
100
==
0
]]>
</script>
<!-- Log the measured value when logMe is
true
(which is every 100th script call - defined above) -->
<script connection-id=
"logger"
if
=
"etl.globals['logMe'] == true"
>
${etl.globals[
'average'
]}ms
</script>
</query>
</query>
</etl>