Data-processing services

New in version 0.6.0

One of the core feature of the Galactica web application is to provide access to online data-processing services through a web form. Authenticated users can submit job requests online to trigger the remote execution of post-processing services (a.k.a. Terminus services). Once these jobs are completed, the web application notifies the requesting users by email so they can retrieve their post-processed datasets online.

For Galactica platform contributors, the astrophysix package provides a way to set bindings between data-processing services already available (and defined by an admin) on the Galactica web application and :

to allow authenticated visitors of the web application to submit job requests on these Snapshots/Catalogs.

Warning

Upon uploading your SimulationStudy HDF5 file on Galactica, you must be in the list service providers of the Terminus data host servers you defined into your study. Otherwise, you won’t have the necessary permissions to bind your Snapshots/Catalogs to its available services. Get in touch with a Galactica admin. to register as a service provider for a specific data host.

Snapshot-bound services

To link a particular Snapshot with a data-processing service, you must define a DataProcessingService with mandatory service_name and data_host attributes and attach it into the Snapshot.processing_services list property :

>>> from astrophysix.simdm.results import Snapshot
>>> from astrophysix.simdm.services import DataProcessingService
>>>
>>> >>> sn_Z2 = Snapshot(name="Z~2", data_reference="output_00481")
>>>
>>> # Add data processing services to a snapshot
>>> dps = DataProcessingService(service_name="column_density_map",
...                             data_host="My_Dept_Cluster")
>>> sn.processing_services.add(dsp)

Catalog-bound services

To link the items of a particular Catalog with a data-processing service, you must define a CatalogDataProcessingService with mandatory service_name and data_host attributes and attach it into the Catalog.processing_services list property :

>>> from astrophysix.simdm.services import CatalogDataProcessingService
>>> from astrophysix.simdm.catalogs import TargetObject, ObjectProperty, Catalog, CatalogField
>>> from astrophysix import units as U
>>>
>>> # Define a Target object : a spiral galaxy
>>> cluster = TargetObject(name="Spiral galaxy")
>>> x = tobj.object_properties.add(ObjectProperty(property_name="x", unit=U.Mpc,
...                                               description="Galaxy position coordinate along x-axis"))
>>> y = tobj.object_properties.add(ObjectProperty(property_name="y",  unit=U.Mpc,
...                                               description="Galaxy position coordinate along y-axis"))
>>> z = tobj.object_properties.add(ObjectProperty(property_name="z",  unit=U.Mpc,
...                                               description="Galaxy position coordinate along z-axis"))
>>> rad = tobj.object_properties.add(ObjectProperty(property_name="radius", unit=U.kpc,
...                                                 description="Galaxy half-mass radius"))
>>> m = tobj.object_properties.add(ObjectProperty(property_name="M_gas", unit=U.Msun,
...                                               description="Galaxy gas mass"))
>>>
>>> # Define a catalog of spiral galaxies
>>> gal_cat = Catalog(target_object=tobj, name="Spiral galaxy catalog")
>>> # Add the catalog fields into the catalog (positions, radiuses, masses)
>>> fx = gal_cat.catalog_fields.add(CatalogField(x, values=N.array([...]))) # xgal1, xgal2, ... xgaln
>>> fy = gal_cat.catalog_fields.add(CatalogField(y, values=N.array([...]))) # ygal1, ygal2, ... ygaln
>>> fz = gal_cat.catalog_fields.add(CatalogField(z, values=N.array([...]))) # zgal1, zgal2, ... zgaln
>>> frad = gal_cat.catalog_fields.add(CatalogField(rad, values=N.array([...]))) # rgal1, rgal2, ... rgaln
>>> fm = gal_cat.catalog_fields.add(CatalogField(m, values=N.array([...]))) # mgal1, mgal2, ... mgaln
>>>
>>> # Add the catalog in the snapshot (won't work if you insert it into a GenericResult instead)
>>> sn.catalogs.add(gal_cat)
>>>
>>> # Add a data processing service to the galaxy catalog
>>> dps = CatalogDataProcessingService(service_name="column_density_map",
...                                    data_host="Inst_cluster")
>>> gal_cat.processing_services.add(dps)

Warning

Only Catalogs belonging to a Snapshot can be bound to a CatalogDataProcessingService.

Catalog field bindings

For Catalogs, a data-processing service is meant to target a user-selected item in the catalog. To execute a service for that specific catalog item, (at least) some properties of the catalog item must be linked to some parameters of the data-processing service.

Otherwise, the data-processing service does not specifically target any item of the catalog. It is only executed as a generic data-processing service on the Catalog’s parent Snapshot.

As an example, let us assume one need to execute a 2D column density map (with e.g. map center coordinates, map size, image resolution parameters) service on a selection of galaxies identified in a catalog of spiral galaxies out of a cosmological simulation. All the galaxies of the catalog are characterized by x/y/z coordinates, mass and radius properties. To post-process column density maps of a set of galaxies from this catalog :

  • the coordinates (x/y/z) of the galaxies need to be used as map center coordinates parameter values of the service,

  • the radius of the galaxies need to be used as map size parameter values of the service (modulo a chosen scaling factor).

To define which CatalogField must be used as input value for a given data-processing service parameter, CatalogFieldBinding instances must be created and added into the CatalogDataProcessingService using its catalog_field_bindings property. Optionally, you can define a scaling relation :\(\textrm{param_value} = \textrm{scale} \times \textrm{field_value} + \textrm{offset}\):

>>> from astrophysix.simdm.services import CatalogFieldBinding
>>>
>>> # Here the galaxy coordinates are defined in the catalog wrt to the box (100 Mpc wide)
>>> # center, in the range [-50;50] Mpc.
>>> # Galaxy position normalisation [-50 Mpc; 50 Mpc] / 100 Mpc + 0.5 = [0.0; 1.0]
>>> fbx = CatalogFieldBinding(param_key="xmap", catalog_field=fx,
...                           scale=1.0e-2, offset=0.5)
>>> fbx = CatalogFieldBinding(param_key="ymap", catalog_field=fz,
...                           scale=1.0e-2, offset=0.5)
>>> fbz = CatalogFieldBinding(param_key="zmap", catalog_field=fy,
...                           scale=1.0e-2, offset=0.5)
>>> # The 'column_density_map' service map center parameters are in box normalised units ([0.; 1.])
>>> gal_cat.catalog_field_bindings.add(fbx)
>>> gal_cat.catalog_field_bindings.add(fby)
>>> gal_cat.catalog_field_bindings.add(fbz)
>>>
>>> # Here we choose to create a map four times larger than the galaxy radius.
>>> fb_rad = CatalogFieldBinding(param_key="map_size", catalog_field=frad,
...                              scale=4.0)
>>> gal_cat.catalog_field_bindings.add(fb_rad)

Note

By default, the scaling factor is 1.0 and the offset is 0.0 (no scaling).