Jump to page content

World Ocean Database

noaa-logo-transparent Logo EOSC node - DTO iode-logo-transparent


This use case describes a dedicated Beacon instance for the World Ocean Database (WOD) for fast, on-demand subsetting of ocean profile data. The basis for this work has been developed within the BlueCloud2026 project (now EOSC Node - Digital Twin of the Ocean) where it has been used within the VRE as resource for research on physical parameters, but this WOD Beacon instance including Python Notebooks to work with the data is now fully open to all.

World Ocean Database
World Ocean Database temperature data (2022-2023) accessed via Beacon using the notebook example below.

Datacollection context

WOD is NOAA NCEI's flagship archive of oceanographic profile observations, supported by IOC-IODE. It aggregates global historical and modern measurements into a consistent, quality-controlled collection that supports climate, oceanography, and marine science workflows.

Use case goal

Make WOD profiles directly queryable without downloading or indexing millions of NetCDF files. Users can select only the columns and time/space ranges they need and receive results immediately.

How Beacon is used

A Beacon instance is deployed using docker. WOD NetCDF files are converted into Beacon Binary Format (BBF) collections. BBF consolidates many source files into a small number of optimized datasets that support efficient filtering and projection.

Benefits for users:

  • Query the full WOD collection with server-side filters.
  • Directly access the WOD from your notebooks on the fly.
  • Retrieve only required columns and rows.
  • Avoid local preprocessing and large downloads.

Example workflow

Use the Jupyter notebook example to connect to the Beacon node, inspect schemas, and run queries with the Beacon Python SDK.

Install the SDK

pip install beacon-api

Connect

from beacon_api import Client

client = Client("https://beacon-wod.maris.nl")

Inspect available columns

available_columns = client.available_columns_with_data_type()
list(available_columns)[:50]

Query a subset

query_builder = client.query()

query_builder.add_select_column("wod_unique_cast")
query_builder.add_select_column("Platform", alias="PLATFORM")
query_builder.add_select_column("Institute", alias="INSTITUTE")
query_builder.add_select_column("Temperature", alias="TEMPERATURE")
query_builder.add_select_column("Temperature_WODflag", alias="TEMPERATURE_QC")
query_builder.add_select_column("Temperature.units", alias="TEMPERATURE_UNIT")
query_builder.add_select_column("z", alias="DEPTH")
query_builder.add_select_column("z.units", alias="DEPTH_UNIT")
query_builder.add_select_column("time", alias="TIME")
query_builder.add_select_column("lon", alias="LONGITUDE")
query_builder.add_select_column("lat", alias="LATITUDE")
query_builder.add_select_column(".featureType", alias="FEATURE_TYPE")

query_builder.add_range_filter("TIME", "2022-01-01T00:00:00", "2023-01-01T00:00:00")
query_builder.add_is_not_null_filter("TEMPERATURE")
query_builder.add_not_equals_filter("TEMPERATURE", -1e+10)
query_builder.add_equals_filter("TEMPERATURE_QC", 0.0)
query_builder.add_range_filter("DEPTH", 0, 10)

df = query_builder.to_pandas_dataframe()
df

Additional resources