Table of Contents
1. Description
Executes bulk import of data into Cosmos DB through the Cosmos .NET endpoint.
2. Connection
2.1. Parameters
Name | Description | Required |
---|---|---|
host | Cosmos DB .NET endpoint. | yes |
database | Database name. | yes |
graph | Graph name. | yes |
throughput | Maximum allowed throughput. | yes |
partition | Partition key path. | yes |
key | Primary key to the database. | yes |
commit_size | Size of entities to be committed, to free resources. The default is 10000. | no |
rewrite | When true, entities with the same IDs will be overwritten. The default is false, which means only entities with unique IDs will be created. | no |
3. Query
Not applicable
4. Script
Executes bulk import of vertices or edges. Vertices and edges are pushed to Cosmos DB after commit_size is reached, with either the number of vertices or the number of edges. If commit_size was not reached, data is pushed when the entire job is done.
Example: when the commit_size = 1000 and only 950 vertices or edges were processed with the <script> element, the data will be pushed at the end of the job.
4.1. Parameters
Name | Applies to | Description | Example |
---|---|---|---|
ENTITY | vertex edge | Either VERTEX or EDGE. | EDGE |
ID | vertex edge | ID of the newly created vertex or edge. If not provided, a random UUID will be used. | Any unique string |
LABEL | vertex edge | Label for a vertex or an edge. | Person |
PARTITION | vertex | Value of the property used as the partition. | Asia |
VERTEX_TAG | vertex | It saves the vertex information under a specific tag. The tag is overridden when a new vertex is created with the same tag. Usage: In a script, create a vertex with vertex_tag = source_vertex, in a subsequent script, create a vertex with vertex_tag=target_vertex. In the next script, it is possible to create an edge with SOURCE_TAG=source_vertex and TARGET_TAG=target_vertex without defining SOURCE_VERTEX_ID, SOURCE_VERTEX_LABEL, SOURCE_VERTEX_PARTITION, TARGET_VERTEX_ID, TARGET_VERTEX_LABEL, and TARGET_VERTEX_PARTITION. | Any string |
SOURCE_TAG | edge | VERTEX_TAG of the source vertex. | Any string |
SOURCE_VERTEX_ID | edge | ID of source vertex. | |
SOURCE_VERTEX_LABEL | edge | Label of source vertex. | |
SOURCE_VERTEX_PARTITION | edge | Partition value of the source vertex. | |
TARGET_TAG | edge | VERTEX_TAG of the target vertex. | Any string |
TARGET_VERTEX_ID | edge | ID of target vertex. | |
TARGET_VERTEX_LABEL | edge | Label of the target vertex. | |
TARGET_VERTEX_PARTITION | edge | Partition value of the target vertex | |
<any_other_parameter> | vertex edge | Any other parameter is considered a string property of a vertex/edge to be created. | Any valid property value |
5. Example
<!DOCTYPE etl SYSTEM
"https://scriptella.org/dtd/etl.dtd"
>
<etl>
<description>CosmosDB CSV bulk
import
</description>
<connection id=
"logger"
driver=
"log"
>
level=WARN
</connection>
<connection id=
"import"
driver=
"cosmosBulkImport"
>
host=https:
//my-cosmos.documents.azure.com:443/
database=MY-DATABASE
graph=MY-GRAPH
throughput=
10000
partition=/partition
commit_size=
10000
key=......primary key
rewrite=
true
</connection>
<connection id=
"csv"
driver=
"csv"
url=
"/path-to-csv/data.csv"
>
separator=,
quote=
empty_string=
""
</connection>
<script connection-id=
"logger"
>
STARTING BULK IMPORT
</script>
<!-- Process CSV -->
<query connection-id=
"csv"
>
<script connection-id=
"import"
>
LABEL=PERSON
ENTITY=VERTEX
VERTEX_TAG=source_node
PARTITION=$
6
name=$
3
city=$
2
phone_number=$
7
</script>
<script connection-id=
"import"
>
LABEL=ADDRESS
ENTITY=VERTEX
VERTEX_TAG=target_node
PARTITION=$
6
street=$
5
zip=$
8
</script>
<script connection-id=
"import"
>
LABEL=LIVES_IN
ENTITY=EDGE
SOURCE_TAG=source_node
TARGET_TAG=target_node
</script>
</query>
<script connection-id=
"logger"
>
BULK IMPORT ENDED.
</script>
</etl>