ETL: Neo4jCsvNodes
Table of Contents
Description
Loads data from CSV into Neo4j. Data from CSV are used to insert or update nodes. Driver creates "LOAD CSV" Cypher in a background to load data.
This driver is useful in scenario where there are many columns in CSV and all columns need to be loaded into Neo4j nodes and node attribute name is equal to CSV column name. This is also useful when columns in CSV are changed from time to time because this driver loads all columns automatically.
Driver can be configured to insert all data from CSV or update data.
Connection
Attributes
Name | Description | Required |
---|---|---|
url | url to Neo4j server, Bolt or REST url | yes |
user | username for authorization at Neo4j | no |
password | password for authorization at Neo4j | no |
Parameters
Name | Description | Required | Default |
---|---|---|---|
node_labels | labels of nodes in Neo4j which will be created or updated (separated by comma) | yes | |
merge_columns | name of columns which are used to merge data (separated by comma) | no | |
merge_dbId_column | name of column which contains database id. This column is used to match node by database id. | no | |
timestamp_properties | names of attributes which will be set to current timestamp (separated by comma) | no | |
empty_string_is_null | true / false. If true and CSV column contains empty string then null is set into Neo4j attribute. If false and CSV column contains empty string then empty string is set into Neo4j attribute. | no | false |
commit_size | size of commit. If it is used then "USING PERIODIC COMMIT" is added to Cypher and data are committed on the fly. If it is not used data are committed at the end. | no | |
csv_path | path to CSV file with nodes | yes | |
csv_delimiter | delimiter of columns in CSV file | no | ; |
Query
Not used.
Script
Load all data from CSV and create or update nodes in Neo4j.
Create
If driver is configured to create data then driver takes all columns from CSV and create new nodes in Neo4j with attributes equal to columns. Example of final Cypher:
LOAD CSV WITH HEADERS FROM 'file:/nodes.csv' AS line FIELDTERMINATOR ';' CREATE (n:ci) SET n.logicalName=line.logicalName, n.type=line.type, n.subtype=line.subtype, n._created=timestamp(), n._updated=timestamp()
Merge
If driver is configured to merge data with Neo4j nodes then driver takes data from CSV and create merge statement with merge columns which are configured.
Data are merged in this way:
- if there is a node in CSV which matches node in Neo4j:
- if CSV contains column and Neo4j contains same attribute then update this attribute
- if CSV contains column and Neo4j doesn't contain attribute with this name then create this attribute
- if CSV doesn't contain column with name of some Neo4j attribute then attribute in Neo4j is not changed
- if there is a node in CSV which is not matched in Neo4j: create new node in Neo4j
- if there isn't a node in CSV which is in Neo4j: node in Neo4j is not changed
mergeColumns
If this parameter is used then columns with these names are used to merge with existing attributes. These merge columns are used only for merge and are not set in Neo4j. Example of final cypher if mergeColumns=logicalName:
LOAD CSV WITH HEADERS FROM 'file:/nodes.csv' AS line FIELDTERMINATOR ';' MERGE (n:Ci{logicalName:line.logicalName}) SET n.type=line.type, n.subtype=line.subtype, n._created=timestamp(), n._updated=timestamp()
mergeDbIdColumn
If this parameter is used then column with this name is used to merge with database id. This column name is used only for match and is not set in Neo4j. Example of final cypher if mergeDbIdColumn=_dbId:
LOAD CSV WITH HEADERS FROM 'file:/nodes.csv' AS line FIELDTERMINATOR ';' MATCH (n:Ci) WHERE id(n) = toInt(line._dbId) SET n.logicalName=line.logicalName, n.type=line.type, n.subtype=line.subtype, n._created=timestamp(), n._updated=timestamp()
Examples
Sample CSV data with nodes:
logicalName;type;subtype;comment;vendor;responsible
Server1;Server;Physical;HP server;HP;ITOPS
Server2;Server;Physical;HP server;HP;ITOPS
App1;Application modul;Application;IB application;Microsoft;MS support
Server3;Server;Physical;HP server;HP;ITOPS
App2;Application modul;Application;IB application;Microsoft;MS support
Create data (no merge).
<!DOCTYPE etl SYSTEM "
https://scriptella.org/dtd/etl.dtd
">
<
etl
>
<
description
>Nodes from CSV</
description
>
<
connection
id
=
"nodeImport"
driver
=
"neo4jCsvNodes"
url
=
"bolt://localhost:7687"
user
=
"neo4j"
password
=
"admin"
>
node_labels=Ci
timestamp_properties=_created,_updated
csv_path=/nodes.csv
</
connection
>
<
script
connection-id
=
"nodeImport"
/>
</
etl
>
Merge data based on column name.
<!DOCTYPE etl SYSTEM "
https://scriptella.org/dtd/etl.dtd
">
<
etl
>
<
description
>Nodes from CSV</
description
>
<
connection
id
=
"nodeImport"
driver
=
"neo4jCsvNodes"
url
=
"bolt://localhost:7687"
user
=
"neo4j"
password
=
"admin"
>
node_labels=Ci
merge_columns=logicalName
timestamp_properties=_created,_updated
csv_path=/nodes.csv
</
connection
>
<
script
connection-id
=
"nodeImport"
/>
</
etl
>
Merge data based on database id.
<!DOCTYPE etl SYSTEM "
https://scriptella.org/dtd/etl.dtd
">
<
etl
>
<
description
>Nodes from CSV</
description
>
<
connection
id
=
"nodeImport"
driver
=
"neo4jCsvNodes"
url
=
"bolt://localhost:7687"
user
=
"neo4j"
password
=
"admin"
>
node_labels=Ci
merge_dbId_column=_dbId
timestamp_properties=_created,_updated
csv_path=/nodes.csv
</
connection
>
<
script
connection-id
=
"nodeImport"
/>
</
etl
>