Skip to content

Services

app.services.base

Service

Bases: UUIDAuditBase

close_tunnel(session: AsyncSession) -> None async

Kill the ssh tunnel connecting to the API. Assumes attached to session.

Finds all processes named "ssh" and kills any associated with the service's local port.

This is equivalent to the shell command:

pid = $(ps aux | grep ssh | grep 8080")
kill $pid

get_job(provider: str = None) -> Job

Fetch the Slurm job backing the service.

open_tunnel(session: AsyncSession) -> None async

Create an ssh tunnel to connect to the service. Assumes attached to session.

After creation of the tunnel, the remote port is updated and recorded in the database.

refresh(session: AsyncSession, config: State) async

Update the service status. Assumes running in an attached state.

Determines the service status by pinging the service and then checking the Slurm job state if the ping in unsuccessful. Updates the service database and returns the status.

The status returned depends on the starting status because services in a "STARTING" status cannot transitionto an "UNHEALTHY" status. The status life-cycle is as follows:

Slurm job submitted -> SUBMITTED
    Slurm job switches to pending -> PENDING
        Slurm job switches to running -> STARTING
            API ping successful -> HEALTHY
            API ping unsuccessful -> STARTING
            API ping unsuccessful and time limit exceeded -> TIMEOUT
        Slurm job switches to failed -> FAILED
    Slurm job switches to failed -> FAILED

A service that successfully starts will be in a HEALTHY status. The status remains HEALTHY as long as subsequent updates ping successfully. Unsuccessful pings will transition the service status to FAILED if the Slurm job has failed; TIMEOUT if the Slurm job times out; and UNHEALTHY otherwise.

An UNHEALTHY service becomes HEALTHY if the update pings successfully. Otherwise, the service status changes to FAILED if the Slurm job has failed or TIMEOUT if the Slurm job times out.

Services that enter a terminal status (FAILED, TIMEOUT or STOPPED) cannot be re-started.

start(session: AsyncSession, config: State, container_options: dict, job_options: dict) async

Start the service with provided Slurm job and container options. Assumes running in attached state.

Submits a Slurm job request, creates a new database entry and waits for the service to start.

Parameters:

  • container_options (dict) –

    a dict containing container options (see ContainerConfig).

  • job_options (dict) –

    a dict containing job options (see JobConfig).

Returns:

  • None.

stop(session: AsyncSession, config: State, delay: int = 0, timeout: bool = False, failed: bool = False) async

Stop the service after delay seconds. Assumes running in attached state.

The default terminal state is STOPPED, which indicates that the service was stopped normally. Use the failed or timeout flags to indicate that the service stopped due to a Slurm job failure or timeout, resp.

This process updates the database after stopping the service.

Parameters:

  • delay (int, default: 0 ) –

    The number of seconds to wait before stopping the service.

  • timeout (bool, default: False ) –

    A flag indicating the service timed out.

  • failed (bool, default: False ) –

    A flag indicating the service Slurm job failed.

app.services.text_generation

TextGeneration

Bases: Service

A containerized service running a text-generation API.