🐳
whale
  • Getting started
  • Setup
    • Connection configuration
    • Selective indexing
    • Git setup
  • Additional features
    • Running SQL queries
    • Query automation and alerting
    • Jinja2 templating
  • Customization
    • Custom UI/UX
    • All customizations
  • For Developers
    • ~/.whale file structure
    • Custom extraction
Powered by GitBook
On this page
  • Basic usage
  • Slack alerts
  • Setup [WIP]
  • Syntax

Was this helpful?

  1. Additional features

Query automation and alerting

PreviousRunning SQL queriesNextJinja2 templating

Last updated 4 years ago

Was this helpful?

Supported connections: BigQuery, Postgres, Presto, Redshift, Snowflake

Whale supports automatic, barebones scheduled metrics calculation. Metrics are defined by creating a ```metrics block, as explained below. Any metric defined in this way will automatically be scheduled alongside the metadata scraping job. Metric definitions support Jinja2 templating -- for more information on how to set this up, see .

Basic usage

A metric is simply a named SQL statement that returns a single value, defined in plain yaml in a table stub, as shown below:

```metrics
metric-name:
  sql: |
    select statement

For example, below two metrics, null-registrations and distinct-registrations are defined:

```metrics
null-registrations:
  sql: |
    select
      count(distinct user_id)
    from mart.user_signups
    where user_id is null
distinct-registrations:
  sql: |
    select
      count(distinct user_id)
    from mart.user_signups
    where user_id is not null

The same block is shown within the context of a full table stub, below:

schema.table

## Column details

## Partition info

------------------------------------------------------
*Do not make edits above this line.*

```metrics
null-registrations:
  sql: |
    select
      count(distinct user_id)
    from mart.user_signups
    where user_id is null
distinct-registrations:
  sql: |
    select
      count(distinct user_id)
    from mart.user_signups
    where user_id is not null

These metrics will be scheduled, with the latest calculations injected into the programmatic portion of the table stub. An example is shown below:

schema.table

## Column details

## Partition info

## Metrics
null-registrations: 103 @ 2020-04-01 05:12:15
distinct-registrations: 30104 @ 2020-04-01 05:12:18

------------------------------------------------------
*Do not make edits above this line.*

```metrics
null-registrations:
  sql: |
    select
      count(*)
    from mart.user_signups
    where user_id is null
distinct-registrations:
  sql: |
    select
      count(distinct user_id)
    from mart.user_signups
    where user_id is not null

A full list of all historical values are saved in ~/.whale/metrics.

Slack alerts

Metrics can be enhanced with Slack alerts. These will send a message to you or your channel if a certain condition is met.

Setup [WIP]

Syntax

The syntax is as follows:

```metrics
metric-name:
  sql: |
    select statement
  alerts:
    - condition: "condition"
      message: "message"
      slack: 
        - "channel"

Using the earlier example we could set an alert every time we find a null in column user_id like this:

```metrics
null-registrations:
  sql: |
    select
      count(*)
    from mart.user_signups
    where user_id is null
  alerts:
    - condition: "> 0"
      message: "Nulls found in column 'user_id' of mart.user_signups."
      slack:
        - "#data-monitoring"
        - "@bob"

As you can see, you can send a message on Slack to individuals as well as Slack channels. In case you are interested, it's also possible to attach several conditions and messages to one metric.

All in all your table.md file with metrics and corresponding alerts could look like this:

schema.table

## Column details

## Partition info

## Metrics
null-registrations: 103 @ 2020-04-01 05:12:15
distinct-registrations: 30104 @ 2020-04-01 05:12:18

------------------------------------------------------
*Do not make edits above this line.*

```metrics
null-registrations:
  sql: |
    select
      count(*)
    from mart.user_signups
    where user_id is null
  alerts:
    - condition: ">0"
      message: "Nulls found in column 'id' of mart.user_signups."
      slack:
        - "#data-monitoring"
        - "@bob"
    - condition: "> 100"
      message: "More than 100 nulls found in column 'id' of mart.user_signups."
      slack:
        - "#incident-room"
        - "@joseph"

distinct-registrations:
  sql: |
    select
      count(distinct user_id)
    from mart.user_signups
    where user_id is not null
  alerts:
    - condition: "<10"
      message: "Less than 10 users in mart.user_signups."
      slack:
        - "#data-monitoring"
        - "@bob"

We are building out our dedicated 🐳 Slack app, but in the meantime, feel free to join and set up alerts there (ask us how to set this up in #general).

To enable Slack alerts for your Slack workspace first add the 🐳 Slack app by clicking . This will provide you with a code you'll need to convert into a token through Slack's API (TODO: instructions), which should be stored as an environment variable WHALE_SLACK_TOKEN. That's all!

Jinja2 templating
our community
this link