Table of Contents
Load data from CSV into Neo4j using Cypher (LOAD CSV command).
Used drivers:
- CSV
- Neo4j or Neo4jCsvNodes with Neo4jCsvRelationships
You can use Neo4jCsvNodes and Neo4jCsvRelationships to automatically generate Cypher for you or you can use Neo4j driver to execute custom Cypher. See examples below. Both examples import same data and the result in Neo4j is the same. Here are CSV files which are used in the examples.
Nodes
logicalName;type;subtype;comment;vendor;responsibleServer1;Server;Physical;HP server;HP;ITOPSServer2;Server;Physical;HP server;HP;ITOPSApp1;Application modul;Application;IB application;Microsoft;MS supportServer3;Server;Physical;HP server;HP;ITOPSApp2;Application modul;Application;IB application;Microsoft;MS supportRelationships
sourceLogicalName;targetLogicalName;relationshipName;type;subtypeApp1;Server1;App1-Server1;Physical;UsesApp2;Server2;App2-Server2;Physical;UsesImport using Neo4j driver
This import uses the Neo4j driver which can execute a Cypher command. To import CSV we can use the LOAD CSV command.
Advantage: Changes in the name of properties or adding calculations or conditions to the Cypher is possible.
Disadvantage: You need to write long Cypher to set every column of CSV. If new columns will be added to the CSV file it's needed to change the Cypher in the ETL job also. If you want to set the same property names as CSV column names and don't want to map anything it's better to use Neo4jCsvNodes and Neo4jCsvRelationships drivers instead of Neo4j driver.
<!DOCTYPE etl SYSTEM "https://scriptella.org/dtd/etl.dtd"><etl> <description>Load CSV nodes and rels into Neo4j</description> <properties> csv.file1=/nodes.csv csv.file2=/rels.csv </properties> <connection id="neo4j" driver="neo4j" url="bolt://localhost:7687" user="neo4j" password="admin"/> <script connection-id="neo4j"> USING PERIODIC COMMIT 5000 LOAD CSV WITH HEADERS FROM 'file:$csv.file1' AS line FIELDTERMINATOR ';' MERGE (n:Ci{logicalName:line.logicalName}) SET n.logicalName=line.logicalName, n.type=line.type, n.subtype=line.subtype, n.comment=line.comment, n.vendor=line.vendor, n.responsible=line.responsible, n._created=timestamp() </script> <script connection-id="neo4j"> USING PERIODIC COMMIT 5000 LOAD CSV WITH HEADERS FROM 'file:$csv.file2' AS line FIELDTERMINATOR ';' MATCH (n1:Ci{logicalName:line.sourceLogicalName}),(n2:Ci{logicalName:line.targetLogicalName}) MERGE (n1)-[r:RELATED{relationshipName:line.relationshipName}]->(n2) SET r.type=line.type, r.subtype=line.subtype, r._created=timestamp() </script></etl>Import using Neo4jCsvNodes and Neo4jCsvRelationships driver
This import uses Neo4jCsvNodes and Neo4jCsvRelationships drivers which automatically generate LOAD CSV Cypher for you in a background. There is no need to write Cypher to set every column of CSV.
Advantage: Short ETL script. There is no need to change ETL job if some columns are added into CSV.
Disadvantage: Cypher is generated in a background so you can't modify this Cypher to add conditions or something else. But you can use Neo4j driver after Neo4jCsvNodes and Neo4jCsvRelationships drivers to execute some custom Cypher and modify data in Neo4j.
<!DOCTYPE etl SYSTEM "https://scriptella.org/dtd/etl.dtd"><etl> <description>Load CSV nodes and rels into Neo4j</description> <properties> csv.file1=/nodes.csv csv.file2=/rels.csv </properties> <connection id="nodeImport" driver="neo4jCsvNodes" url="bolt://localhost:7687" user="neo4j" password="admin"> node_labels=Ci merge_columns=logicalName empty_string_is_null=true timestamp_properties=_created,_updated csv_path=$csv.file1 </connection> <connection id="relationshipImport" driver="neo4jCsvRelationships" url="bolt://localhost:7687" user="neo4j" password="admin"> source_node_labels=Ci target_node_labels=Ci relationship_type=RELATED source_node_match_columns=logicalName:sourceLogicalName target_node_match_columns=logicalName:targetLogicalName relationship_merge_columns=relationshipName empty_string_is_null=true timestamp_properties=_created,_updated csv_path=$csv.file2 </connection> <script connection-id="nodeImport"/> <script connection-id="relationshipImport"/></etl>Import using Neo4j driver without LOAD CSV
This import uses the Neo4j driver which can execute a Cypher command. To import CSV we don't use the LOAD CSV command in this case.
Advantage: You can make some custom CREATE or MERGE statements. For example to customize what node label will be created or what relationship type will be created. Node label and relationship type can't be customized in LOAD CSV (label and relationship type has to be a constant string). For example, you can write Cypher query:
Disadvantage: This approach is not very performant and can be used only on small datasets (less than 50.000 nodes and 50.000 relationships).
MERGE (n:'$type'{id:toint('$id')}) SET n.logicalName='$logicalName'Disadvantage: This import is very inefficient because every record is inserted separately and this cause poor performance.
<!DOCTYPE etl SYSTEM "https://scriptella.org/dtd/etl.dtd"><etl> <description>CSV to Neo4j. Insert every record from CSV by own Cypher</description> <properties> csv.nodes=/nodes.csv csv.rels=/rels.csv </properties> <connection id="csv_nodes" driver="csv" url="$csv.nodes"> separator=; </connection> <connection id="csv_rels" driver="csv" url="$csv.rels"> separator=; </connection> <connection id="neo4j" driver="neo4j" url="bolt://localhost:7687" user="neo4j" password="admin"/> <query connection-id="csv_nodes"> <script connection-id="neo4j"> MERGE (n:Ci{logicalName:'$logicalName'}) SET n.logicalName='$logicalName', n.type='$type', n.subtype='$subtype', n.comment='$comment', n.vendor='$vendor', n.responsible='$responsible', n._created=timestamp(), n._updated=timestamp() </script> </query> <query connection-id="csv_rels"> <script connection-id="neo4j"> MATCH (n1:Ci{logicalName:'$sourceLogicalName'}),(n2:Ci{logicalName:'$targetLogicalName'}) MERGE (n1)-[r:RELATED{relationshipName:'$relationshipName'}]->(n2) SET r.type='$type', r.subtype='$subtype', r._created=timestamp(), r._updated=timestamp() </script> </query></etl>