DVC (Data Version Control)

We use DVC to version-control large data files (models, input datasets) without storing them in Git. DVC tracks file hashes in .dvc files committed to Git, while the actual data lives on remote storage.

Authentication

Create a file .dvc/config.local (already gitignored) with credentials for the remotes:

['remote "goodcloud"']
    user = nhi_api
    password = <RIBASIM_NL_CLOUD_PASS>
['remote "modeldata"']
    user = nhi_api
    password = <RIBASIM_NL_CLOUD_PASS>
['remote "minio"']
    region = eu-west-1
    endpointurl = https://s3.deltares.nl
    access_key_id = <MINIO_ACCESS_KEY>
    secret_access_key = <MINIO_SECRET_KEY>

The password is the same as RIBASIM_NL_CLOUD_PASS in your .env file. The MinIO keys correspond to MINIO_ACCESS_KEY and MINIO_SECRET_KEY in .env.

Note

Since pixi run install runs dvc pull, this configuration is needed before that step.

Pulling data

After authentication is set up, pull all tracked data:

pixi run dvc pull

If you have local changes that you want to overwrite, add --force.

Remotes

Three remotes are configured in .dvc/config:

Remote URL Purpose
minio (default) s3://ribasim-nl/dvc Primary DVC cache on MinIO
goodcloud The Good Cloud /dvc Legacy DVC storage
modeldata The Good Cloud /Ribasim modeldata Source data not under DVC control

Pipeline

The DVC pipeline is defined in dvc.yaml. Stages run sequentially with dependencies:

flowchart LR
    node1["bathymetry"]
    node2["bergend@aa_en_maas"]
    node3["bergend@brabantse_delta"]
    node4["bergend@de_dommel"]
    node5["bergend@drents_overijsselse_delta"]
    node6["bergend@hunze_en_aas"]
    node7["bergend@limburg"]
    node8["bergend@noorderzijlvest"]
    node9["bergend@rijn_en_ijssel"]
    node10["bergend@stichtse_rijnlanden"]
    node11["bergend@vallei_en_veluwe"]
    node12["bergend@vechtstromen"]
    node13["data/Basisgegevens/Baseline/baseline-nl_land-j23_6-v1/baseline.gdb.dvc"]
    node14["dynamic@aa_en_maas"]
    node15["dynamic@brabantse_delta"]
    node16["dynamic@de_dommel"]
    node17["dynamic@drents_overijsselse_delta"]
    node18["dynamic@hunze_en_aas"]
    node19["dynamic@limburg"]
    node20["dynamic@noorderzijlvest"]
    node21["dynamic@rijn_en_ijssel"]
    node22["dynamic@stichtse_rijnlanden"]
    node23["dynamic@vallei_en_veluwe"]
    node24["dynamic@vechtstromen"]
    node25["feedback@amstel_gooi_en_vecht"]
    node26["feedback@delfland"]
    node27["feedback@hollands_noorderkwartier"]
    node28["feedback@hollandse_delta"]
    node29["feedback@rijnland"]
    node30["feedback@rivierenland"]
    node31["feedback@scheldestromen"]
    node32["feedback@schieland_en_de_krimpenerwaard"]
    node33["feedback@wetterskip_fryslan"]
    node34["feedback@zuiderzeeland"]
    node35["forcing@amstel_gooi_en_vecht"]
    node36["forcing@delfland"]
    node37["forcing@hollands_noorderkwartier"]
    node38["forcing@hollandse_delta"]
    node39["forcing@rijnland"]
    node40["forcing@rivierenland"]
    node41["forcing@scheldestromen"]
    node42["forcing@schieland_en_de_krimpenerwaard"]
    node43["forcing@wetterskip_fryslan"]
    node44["forcing@zuiderzeeland"]
    node45["hws_demand"]
    node46["hws_transient"]
    node47["koppelen"]
    node48["parameterized@aa_en_maas"]
    node49["parameterized@brabantse_delta"]
    node50["parameterized@de_dommel"]
    node51["parameterized@drents_overijsselse_delta"]
    node52["parameterized@hunze_en_aas"]
    node53["parameterized@limburg"]
    node54["parameterized@noorderzijlvest"]
    node55["parameterized@rijn_en_ijssel"]
    node56["parameterized@stichtse_rijnlanden"]
    node57["parameterized@vallei_en_veluwe"]
    node58["parameterized@vechtstromen"]
    node59["profiles@amstel_gooi_en_vecht"]
    node60["profiles@delfland"]
    node61["profiles@hollands_noorderkwartier"]
    node62["profiles@hollandse_delta"]
    node63["profiles@rijnland"]
    node64["profiles@rivierenland"]
    node65["profiles@scheldestromen"]
    node66["profiles@schieland_en_de_krimpenerwaard"]
    node67["profiles@wetterskip_fryslan"]
    node68["profiles@zuiderzeeland"]
    node69["rwzi"]
    node70["samenvoegen"]
    node1-->node45
    node2-->node14
    node3-->node15
    node4-->node16
    node5-->node17
    node6-->node18
    node7-->node19
    node8-->node20
    node9-->node21
    node10-->node22
    node11-->node23
    node12-->node24
    node13-->node45
    node14-->node70
    node15-->node70
    node16-->node70
    node17-->node70
    node18-->node70
    node19-->node70
    node20-->node70
    node21-->node70
    node22-->node70
    node23-->node70
    node24-->node70
    node25-->node59
    node26-->node60
    node27-->node61
    node28-->node62
    node29-->node63
    node30-->node64
    node31-->node65
    node32-->node66
    node33-->node67
    node34-->node68
    node35-->node70
    node36-->node70
    node37-->node70
    node38-->node70
    node39-->node70
    node40-->node70
    node41-->node70
    node42-->node70
    node43-->node70
    node44-->node70
    node45-->node46
    node45-->node47
    node46-->node70
    node48-->node2
    node49-->node3
    node50-->node4
    node51-->node5
    node52-->node6
    node53-->node7
    node54-->node8
    node55-->node9
    node56-->node10
    node57-->node11
    node58-->node12
    node59-->node35
    node60-->node36
    node61-->node37
    node62-->node38
    node63-->node39
    node64-->node40
    node65-->node41
    node66-->node42
    node67-->node43
    node68-->node44
    node69-->node14
    node69-->node15
    node69-->node16
    node69-->node17
    node69-->node18
    node69-->node19
    node69-->node20
    node69-->node21
    node69-->node22
    node69-->node23
    node69-->node24
    node69-->node35
    node69-->node36
    node69-->node37
    node69-->node38
    node69-->node39
    node69-->node40
    node69-->node41
    node69-->node42
    node69-->node43
    node69-->node44
    node69-->node45
    node70-->node47
Figure 1: DVC pipeline DAG (click to zoom)

Reproduce the full pipeline:

pixi run dvc repro

Reproduce a single stage:

pixi run dvc repro dynamic@aa_en_maas

Importing data from remote storage

Use dvc import-url to download a file from a remote URL and take it under DVC control. The remote://modeldata alias refers to The Good Cloud storage that holds source data:

pixi run dvc import-url -f remote://modeldata/Zuiderzeeland/modellen/Zuiderzeeland_parameterized_2025_9_0 data/Zuiderzeeland/modellen/

Pushing changes

After producing new outputs (running pipeline stages or adding new data), push to the remote:

pixi run dvc push

Do this before pushing the updated dvc.lock file to git, so the data hashes referenced therein are avaiable for everyone.