ETL: MS Excel
Table of Contents
1. Description
Executes the given query for each row of the provided MS Excel file. Allows to read from or write to MS Excel files of the "XLS" and "XLSX" formats.
Read files from a local disk or over HTTP, HTTPS, or FTP. Writes only to a local file.
Reading large datasets (more than 100.000 rows) can take a long time. Please consider using CSV format instead, e.g. ETL: CSV to Neo4j.
2. Connection
Defines a connection to a single MS Excel file.
2.1. Parameters
Name | Example | Description |
---|---|---|
format mandatory | format=XLSX | MS Excel format. Values: XLS, XLSX. |
path mandatory | path=/path/to/file.xlsx path=ftp://name:password@some.server/file.xlsx path=https://some.server/file.xlsx | Path to the imported MS Excel file. |
2.2. Example
<connection id=
"excel"
driver=
"excel"
>
format=XLSX
path=/path/to/file.xlsx
</connection>
3. Query
Reads from an MS Excel file. Provides values in the form of
$row.<column> or
$row.<column>.<text|url|formula|formatted>
to be used in subsequent queries/scripts.
Lines containing no values in all columns (from column 0 to max_columns) are skipped.
3.1. Parameters
Name | Example | Description |
---|---|---|
sheet mandatory | sheet=Sheet 1 | Name of the imported MS Excel sheet. |
skip_rows | skip_rows=10 | The number of rows in the sheet that will be skipped. Default: 0 |
max_columns | max_columns=10 | The maximal number of columns to read. Default: 255 |
escape_chars | escape_chars='" | In any output values all characters in this parameter will be escaped using "\". E.g if in your MS Excel file is value Jack O'Neil and you have set escape_chars=' then the output from the query will be Jack O\'Neil The escape_chars parameter can have multiple characters configured. E.g. using escape_chars='" will escape all single quotation marks and also all double quotes. |
3.2. Examples
Query example: Queries an excel file and inserts values into a Gremlin database.
<!DOCTYPE etl SYSTEM "
https://scriptella.org/dtd/etl.dtd
">
<
etl
>
<
description
>Import excel - gremlin example</
description
>
<
connection
id
=
"gremlin"
driver
=
"graphConnection"
>
project_id=1
</
connection
>
<
connection
id
=
"excel"
driver
=
"excel"
>
format=XLSX
path=/path/to/file.xlsx
</
connection
>
<
connection
id
=
"logInfo"
driver
=
"log"
>
level=INFO
</
connection
>
<
query
connection-id
=
"excel"
>
sheet=Nodes
skip_rows=1
max_columns=6
<
script
connection-id
=
"gremlin"
>
g.addV('$row.C').property('name', '$row.B').property('url', '$row.F.url')
</
script
>
<
script
connection-id
=
"logInfo"
>
Imported value ($rownum): '$row.B', '$row.C', '$row.F.url'
</
script
>
</
query
>
</
etl
>
4. Script
Writes entries into an MS Excel file. The file will be created if not present within the path. The sheet will also be created, if not present in the file.
4.1. Parameters
Name | Example | Description |
---|---|---|
sheet mandatory | sheet=Sheet 1 | Name of the connected MS Excel sheet. |
line_number mandatory | line_number=1 line_number=${rownum + 10} | Line number in the MS Excel sheet where the editing of cells will be executed (numbering starts with 1, which is the first line in the Excel sheet). Default: 1 Usually, the line_number parameter is defined as an incrementing number $rownum, which is the number of iterations of the parent query. $rownum numbering starts with 1. To start writing at line 10 and write every iteration into a new line please use ${rownum + 10}. |
append_mode xlsx only | append_mode=true | Applicable only for XLSX files. |
row.X mandatory | row.AB=example value | Contains value to be inserted, where X is the column character in the Excel sheet. |
row.X.type | row.AB.type=NUMERIC | Type of the cell to be created. Possible values: STRING, BOOLEAN, NUMERIC, FORMULA, BLANK |
4.2. Examples
Script example: Copy values from one MS Excel file (excel_1) into another (excel_2). The sheet will be created if it does not exist yet.
<query connection-id=
"excel_1"
>
sheet=input sheet
skip_rows=
0
max_columns=
3
<script connection-id=
"excel_2"
>
sheet=incrementing example
line_number=${rownum +
10
}
row.A=$
1
row.B=$
2
row.C=$
3
</script>
</query>
Script example: Insert cells into the 'test sheet' in the connected MS Excel file. The sheet will be created if it does not exist yet.
<!DOCTYPE etl SYSTEM "
https://scriptella.org/dtd/etl.dtd
">
<
etl
>
<
description
>Inserts entries into given excel file.</
description
>
<
properties
>
output=/path/to/excel/file/excel.xlsx
</
properties
>
<
connection
id
=
"excel_in"
driver
=
"excel"
>
format=XLSX
path=$output
</
connection
>
<
script
connection-id
=
"excel_in"
>
sheet=test sheet
line_number=0
row.A=Test1
row.B=Test2
row.C=5
row.C.type=NUMERIC
</
script
>
<
script
connection-id
=
"excel_in"
>
sheet=test sheet
line_number=1
row.A=Test3
row.B=Test4
row.C=6
row.C.type=NUMERIC
</
script
>
<
script
connection-id
=
"excel_in"
>
sheet=test sheet
line_number=0
row.E=SUM(C1:C2)
row.E.type=FORMULA
</
script
>
</
etl
>