Graphlytic 5.1.1

Table of Contents

1. Description

Executes bulk import of data into Cosmos DB through the Cosmos .NET endpoint.

2. Connection

2.1. Parameters

Name	Description	Required
host	Cosmos DB .NET endpoint. Example: https://my-cosmos.documents.azure.com:443/	yes
database	Database name.	yes
graph	Graph name.	yes
throughput	Maximum allowed throughput.	yes
partition	Partition key path. Example: /Region	yes
key	Primary key to the database.	yes
commit_size	Size of entities to be committed, to free resources. The default is 10000.	no
rewrite	When true, entities with the same IDs will be overwritten. The default is false, which means only entities with unique IDs will be created.	no

3. Query

Not applicable

4. Script

Executes bulk import of vertices or edges. Vertices and edges are pushed to Cosmos DB after commit_size is reached, with either the number of vertices or the number of edges. If commit_size was not reached, data is pushed when the entire job is done.
Example: when the commit_size = 1000 and only 950 vertices or edges were processed with the <script> element, the data will be pushed at the end of the job.

4.1. Parameters

Name	Applies to	Description	Example
ENTITY	vertex edge	Either VERTEX or EDGE.	EDGE
ID	vertex edge	ID of the newly created vertex or edge. If not provided, a random UUID will be used.	Any unique string
LABEL	vertex edge	Label for a vertex or an edge.	Person
PARTITION	vertex	Value of the property used as the partition.	Asia
VERTEX_TAG	vertex	It saves the vertex information under a specific tag. The tag is overridden when a new vertex is created with the same tag. Usage: In a script, create a vertex with vertex_tag = source_vertex, in a subsequent script, create a vertex with vertex_tag=target_vertex. In the next script, it is possible to create an edge with SOURCE_TAG=source_vertex and TARGET_TAG=target_vertex without defining SOURCE_VERTEX_ID, SOURCE_VERTEX_LABEL, SOURCE_VERTEX_PARTITION, TARGET_VERTEX_ID, TARGET_VERTEX_LABEL, and TARGET_VERTEX_PARTITION.	Any string
SOURCE_TAG	edge	VERTEX_TAG of the source vertex.	Any string
SOURCE_VERTEX_ID	edge	ID of source vertex.
SOURCE_VERTEX_LABEL	edge	Label of source vertex.
SOURCE_VERTEX_PARTITION	edge	Partition value of the source vertex.
TARGET_TAG	edge	VERTEX_TAG of the target vertex.	Any string
TARGET_VERTEX_ID	edge	ID of target vertex.
TARGET_VERTEX_LABEL	edge	Label of the target vertex.
TARGET_VERTEX_PARTITION	edge	Partition value of the target vertex
<any_other_parameter>	vertex edge	Any other parameter is considered a string property of a vertex/edge to be created.	Any valid property value

5. Example

<!DOCTYPE etl SYSTEM "https://scriptella.org/dtd/etl.dtd">
<etl>
    
    <description>CosmosDB CSV bulk import</description>
    
    <connection id="logger" driver="log">
        level=WARN
    </connection>
    
    <connection id="import" driver="cosmosBulkImport">
        host=https://my-cosmos.documents.azure.com:443/
        database=MY-DATABASE
        graph=MY-GRAPH
        throughput=10000
        partition=/partition
        commit_size=10000
        key=......primary key
        rewrite=true
    </connection>
    
    <connection id="csv" driver="csv" url="/path-to-csv/data.csv">
        separator=,
        quote=
        empty_string=""
    </connection>
 
    <script connection-id="logger">
        STARTING BULK IMPORT
    </script>
    
    <!-- Process CSV --> 
    <query connection-id="csv">
        <script connection-id="import">
            LABEL=PERSON
            ENTITY=VERTEX
            VERTEX_TAG=source_node
            PARTITION=$6
            name=$3
            city=$2
            phone_number=$7
        </script>
        <script connection-id="import">
            LABEL=ADDRESS
            ENTITY=VERTEX
            VERTEX_TAG=target_node
            PARTITION=$6
            street=$5
            zip=$8
        </script>
        <script connection-id="import">
            LABEL=LIVES_IN
            ENTITY=EDGE
            SOURCE_TAG=source_node
            TARGET_TAG=target_node
        </script>
    </query>
    
    <script connection-id="logger">
      BULK IMPORT ENDED.
    </script>
    
</etl>