Code Monkey home page Code Monkey logo

aspects-dbt's People

Contributors

bmtcril avatar feanil avatar ian2012 avatar soryrawyer avatar

Watchers

 avatar  avatar  avatar

aspects-dbt's Issues

Add display names to models

Problem

The dbt models currently only have entity IDs. Superset dashboards should have the entity names, which are much more readable and would bring the dashboards in line with previous insights UX. This data currently lives in the event_sink schema in two tables: course_blocks, which has display names for entities like videos; and course_overviews, which contains the course name.

There are a couple different ways of approaching this problem and I'd love to get feedback before proceeding.

Approaches

As I see it, there are two high-level approaches:

  1. Create new views in dbt that include display names
  2. Create virtual datasets in Superset that build off the existing dbt models and joins in display names

The key performance difference between these two approaches is that ClickHouse will not use primary keys for right-hand side tables in joins. This means that the following query will not do any filtering of the course_blocks table despite filtering on org, which is the first part of the primary key in course_blocks:

select
    b.location as video_name,
    count(*) as num_plays
from
    xapi.video_plays vp
    join event_sink.course_blocks b
        on vp.video_id = b.location
where
    vp.org = 'edX' and b.org = 'edX'
group by 1

This could become a problem if the right-hand side table is sufficiently large, as ClickHouse defaults to reading the entire right-hand side into memory (though ClickHouse has a setting where queries could adapt to use a slower, more memory-resilient join algorithm).

The following query, however, will use the primary key in course_blocks:

with blocks as (
    select
        location,
        display_name
    from
        event_sink.course_blocks
    where org ='edX'
)
select
    b.display_name as video_name,
    count(*) as num_plays
from
    xapi.video_plays vp
    join blocks b
        on vp.video_id = b.location
where
    vp.org = 'edX'
group by 1

The advantage of Superset would be that we could use SQL templates to use a version of the second query to power dashboards. Given a dashboard filter with the name org, the templated version would look similar to this:

with blocks as (
    select
        location,
        display_name
    from
        event_sink.course_blocks
    where
        -- where_in is provided by Superset. If we want to filter on the org 'edX', where_in
        -- converts the filter value to something like this: `('edX')` 
        org in {{ filter_values('org') | where_in }}
)
select
    b.display_name as video_name,
    count(*) as num_plays
from
    xapi.video_plays vp
    join blocks b
        on vp.video_id = b.location
where
    vp.org {{ filter_values('org') | where_in }}
group by 1

Given this information, I think the dbt approach is more appealing from a developer experience perspective while sacrificing performance. The Superset approach has a performance benefit but at the cost of added complexity by defining what is essentially another dbt model but outside of dbt.

enrollments_by_day does not include the current day

I would expect there to be rows returned for the following query, but instead it is 0.

select count(*)
from reporting.enrollments_by_day
where enrollment_status_date = today()
and course_key = 'course-v1:UMONTREAL+GPT101+2023_04'
and enrollment_status = 'registered'
;

I'm able to get the expected rows by changing int_enrollment_windows window_end_date to:

dateTrunc('day', coalesce(window_end_at, today() + toIntervalDay(1))) AS window_end_date, but I'm not sure what the impacts of that change are. I think this is just a "midnight time boundary" issue?

Models for enrollment reports

Reports being served:

  • Cumulative enrollment events over time, broken down by enrollment mode
  • Daily enrollments and un-enrollments

WIP PR:
#1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.