Required Permission: Jobs management (Read more about permissions in User Groups)

Groovy scripts can be used in ETL scripts so it's possible to write scripts to delete files on local disk (and other potentially dangerous tasks). Please be careful when writing scripts and with granting the Jobs management permission to users.

Table of contents

Graphlytic has an embedded instance of the Scriptella framework that is the foundation of the ETL engine used for managing automation jobs. The jobs are usually used for data import into a graph database, running queries and sending emails with data results (notifications in case a forbidden pattern is found in the graph, weekly reports, etc.).

ETL engine

ETL engine contains several drivers for databases: Postgresql, MySQL, MSSQL, Oracle, DB2, Derby, H2, HSQLDB, Sybase, XSL. Not all JDBC drivers are included in Graphlytic.

ETL engine contains other drivers: CSV, Neo4j, Mail, Groovy, Log, Text, XPath and more. ETL engine is configured in XML.

Using combination of driver you can create many scenarios. Here is a small overview of using scenarios which can be configured. Some examples are in Administration manual (see ETL job examples).

The Scriptella engine with several custom drivers is used as the ETL module in Graphlytic. Please see the reference documentation of Scriptella for details : http://scriptella.org/reference/

ETL engine contains several drivers for database connection and data manipulation (JDBC): Postgresql, MySQL, MSSQL, Oracle, DB2, Derby, H2, HSQLDB, Sybase, XSL. Not all JDBC drivers are included in Graphlytic and can be included in a custom build of Graphlytic. Please contact us for more information.

ETL engine contains also other drivers like CSV, Neo4j, Mail, Groovy, Log, Text, XPath and more. Using the combination of these drivers it's possible to create many automatization scenarios for data processing, pattern searching, logging or notifications.

ETL jobs are XML files that can be executed on demand, planned to execute at a defined time or executed periodically (with CRON like expressions).

See the examples here for more information.

Example 1: Load data from SQL database by SQL query and insert every record of result into CSV file.

Example 2: Load data from CSV into Neo4j using Cypher (LOAD CSV command).

Example 3: Load data from SQL database by SQL query, transform every record of result by Groovy (for example remove special characters) then insert every transformed record into CSV file. After this load data from CSV into Neo4j using Cypher (LOAD CSV command).

Example 4: Load data from Neo4j using Cypher query, transform every record of result by Groovy and append every record into result string. Send string as html mail to several recipients.

Create a new ETL job

  1. Main menu (top right) - Page Jobs
  2. Click on create new job
  3. Fill the form:
    • name of job
    • if there is need to run job periodically enter "cron expression" into field "New trigger" and click on button "Add". If there is need to remove runner, select "X" icon. Example of cron expression:
      • 0 45 13 * * ? - starts every day at 13:45
      • 0 0/15 * * * ? - starts every 15 minutes
      • 0/30 * * * * ? - starts every 30 seconds
    • enter script text into area "script" (see example of ETL job in next paragraph)
    • confirm by pressing "Create job".

Example of an ETL job

Write information to the log file

This ETL job writes some information into the log file.

<!DOCTYPE etl SYSTEM "https://scriptella.org/dtd/etl.dtd">
<etl>
<description>Test log</description>
<connection id="logInfo" driver="log">
level=INFO
</connection>
<connection id="groovy" driver="script">language=groovy</connection>
<script connection-id="groovy">
etl.globals['sysStart'] = new Date().format("yyyy-MM-dd'T'HH:mm:ss")
</script>
<script connection-id="logInfo">
start: ${etl.globals['sysStart']}
</script>
</etl>

Using predefined properties in ETL jobs

This ETL job shows how to use internally or externally defined properties in an ETL job.

Let's say we have a property file properties.conf with these values:

properties.conf
neo4j.connector.username=neo4j
neo4j.connector.password=admin

The external property file can be then used in a job:

<!DOCTYPE etl SYSTEM "https://scriptella.org/dtd/etl.dtd">
<etl>
<description>Use internal and external properties in a job</description>
<properties>
<include href="/path/to/external/property/file/properties.conf"/>
neo4j.connector.bolt=bolt://localhost:7687
neo4j.script.prop=some_prop_name
</properties>
<connection id="neo4j" driver="neo4j" url="$neo4j.connector.bolt" user="$neo4j.connector.username" password="$neo4j.connector.password"/>
<script connection-id="neo4j">
MATCH (n) SET n.$neo4j.script.prop="some value"
</script>
</etl>


Cron expressions

A cron expression is a string of 6 or 7 fields separated by spaces. Fields can contain any of the allowed values, along with various combinations of the allowed special characters for that field. The fields are as follows:

Field Name

Mandatory

Allowed Values

Allowed Special Characters

Seconds

YES

0-59

, - * /

Minutes

YES

0-59

, - * /

Hours

YES

0-23

, - * /

Day of month

YES

1-31

, - * ? / L W

Month

YES

1-12 or JAN-DEC

, - * /

Day of week

YES

1-7 or SUN-SAT

, - * ? / L #

Year

NO

empty, 1970-2099

, - * /

More examples can be found in the Quartz's library tutorial here: http://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html

Manage Jobs

  1. Main menu (top right) - Page Jobs
  2. List of existing jobs is shown
  3. There are several icons next to the each job. You can execute several operations with job:
    • run - job is started immediately
    • view history of executions - view when was job started, when was finished and if there was an error during execution
    • update - you can change job details
    • delete - confirmation dialog is shown. Job is deleted after confirmation