Code Monkey home page Code Monkey logo

avni-etl's People

Contributors

1t5j0y avatar himeshr avatar petmongrels avatar rudresh avatar vinayvenu avatar vindeolal avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

himeshr

avni-etl's Issues

Add organisation id to ETL tables

Story

In an ETL schema for an organisation group, it is hard to identify the source organisation of each row of data. As an implementer, I would like to have the organisation_id in the ETL tables so that I can use them in my reports.

Background

As part of the RWB group of organisations, there is a new organisation that has come up, who need a copy of their own set of reports. The complication this year is that there are two set of geographies that are exactly the same, and this causes trouble in reports.

The alternative mechanism we have here is to have a copy of all the reports for this new organisation, and to connect these new set of reports to a different schema populated by the new organisation alone. Maintaining two set of reports can be hard though. The alternative is to add the organisation id to each report, and mark users with their organisation.

Changing data source based on login is possible only through modification of code, and is not a good idea.

Acceptance criteria

  • organisation_id is added to all tables that are populated by ETL.

PS: We might need to make this release sooner to allow for reports development.

Move job to run separately per organisation

Acceptance criteria

  • ETL job history for each organisation/group needs to be available so that this can be shared on a per-organisation basis
  • Jobs per organisation should still be run only once an hour
  • Failure logs should be available and queryable separately
  • Jobs should not block each other or cause deadlocks in the system (ensure jobs do not run concurrently etc)
  • When has_analytics_db is turned on/off for an org, then ETL is scheduled/unscheduled for that organisation
  • If ETL service is down during this time, then has_analytics_db switch cannot be turned on or off
  • Org creation/edit should work with has_analytics_db status "Unknown" if ETL service is not running. Rest of the functionality should work fine

Tech notes

  • Use API call to ETL service when has_analytics_db is toggled
  • Do not use the REST API. Use regular API and wrap with our call (with admin privileges)
  • Remove has_analytics_db flag. Use/create API to figure out if ETL is turned on/scheduled for an org (with admin privs)
  • Verify if 2 orgs can run in parallel

Out of scope

  • Public availability of job run history

Improve performance of loading Analytics reports in Avni-webapp

Improve performance of loading Analytics reports in Avni-webapp, change is in the way our React Web-app uses Redux to repeatedly invoke same API requests, even when the user only switches between the various tab in the Canned-Analytics screen.

  • Do not refresh report on changing tabs, use takeOnce instead of takeAlways

ETL enhancements v1

This epic will make some necessary modifications that will improve operational and other capability of ETL.

Both implementers and users need to understand if the ETL for their organisation is healthy. Currently, the only way to find this out is by looking at the logs. In order to change that, we should be able to see status of ETL broken down by organisation.

We also need to be able to run ETL for an organisation separately (with auth) so that any issues can be fixed separately without using up too much of system resources

We also have to ensure that no two ETL job runs together to avoid issues of data corruption.

ETL is failing for RWB

If we clear metadata it will work but after some time it fails again.

Stack trace

org.springframework.jdbc.UncategorizedSQLException: StatementCallback; uncategorized SQLException for SQL [reset role;]; SQL state [25P02]; error code [0]; ERROR: current transaction is aborted, commands ignored until end of transaction block; nested exception is org.postgresql.util.PSQLException: ERROR: current transaction is aborted, commands ignored until end of transaction block
	at org.springframework.jdbc.core.JdbcTemplate.translateException(JdbcTemplate.java:1542) ~[spring-jdbc-5.3.16.jar!/:5.3.16]
	at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:393) ~[spring-jdbc-5.3.16.jar!/:5.3.16]
	at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:431) ~[spring-jdbc-5.3.16.jar!/:5.3.16]
	at org.avniproject.etl.repository.JdbcContextWrapper.wrap(JdbcContextWrapper.java:19) ~[classes!/:na]
	at org.avniproject.etl.repository.JdbcContextWrapper.runInOrgContext(JdbcContextWrapper.java:8) ~[classes!/:na]
	at org.avniproject.etl.repository.sync.TransactionalTablesSyncAction.perform(TransactionalTablesSyncAction.java:34) ~[classes!/:na]
	at org.avniproject.etl.repository.sync.TransactionalTablesSyncAction$$FastClassBySpringCGLIB$$9ad1e530.invoke(<generated>) ~[classes!/:na]
	at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218) ~[spring-core-5.3.16.jar!/:5.3.16]
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:783) ~[spring-aop-5.3.16.jar!/:5.3.16]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) ~[spring-aop-5.3.16.jar!/:5.3.16]
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:753) ~[spring-aop-5.3.16.jar!/:5.3.16]
	at org.springframework.dao.support.PersistenceExceptionTranslationInterceptor.invoke(PersistenceExceptionTranslationInterceptor.java:137) ~[spring-tx-5.3.16.jar!/:5.3.16]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.16.jar!/:5.3.16]
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:753) ~[spring-aop-5.3.16.jar!/:5.3.16]
	at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:698) ~[spring-aop-5.3.16.jar!/:5.3.16]
	at org.avniproject.etl.repository.sync.TransactionalTablesSyncAction$$EnhancerBySpringCGLIB$$5c706208.perform(<generated>) ~[classes!/:na]
	at org.avniproject.etl.repository.sync.EntityRepository.lambda$saveEntities$0(EntityRepository.java:32) ~[classes!/:na]
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1541) ~[na:na]
	at org.avniproject.etl.repository.sync.EntityRepository.saveEntities(EntityRepository.java:31) ~[classes!/:na]
	at org.avniproject.etl.repository.sync.EntityRepository$$FastClassBySpringCGLIB$$763ad55c.invoke(<generated>) ~[classes!/:na]
	at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218) ~[spring-core-5.3.16.jar!/:5.3.16]
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:783) ~[spring-aop-5.3.16.jar!/:5.3.16]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) ~[spring-aop-5.3.16.jar!/:5.3.16]
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:753) ~[spring-aop-5.3.16.jar!/:5.3.16]
	at org.springframework.dao.support.PersistenceExceptionTranslationInterceptor.invoke(PersistenceExceptionTranslationInterceptor.java:137) ~[spring-tx-5.3.16.jar!/:5.3.16]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.16.jar!/:5.3.16]
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:753) ~[spring-aop-5.3.16.jar!/:5.3.16]
	at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:123) ~[spring-tx-5.3.16.jar!/:5.3.16]
	at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:388) ~[spring-tx-5.3.16.jar!/:5.3.16]
	at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:119) ~[spring-tx-5.3.16.jar!/:5.3.16]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.16.jar!/:5.3.16]
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:753) ~[spring-aop-5.3.16.jar!/:5.3.16]
	at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:698) ~[spring-aop-5.3.16.jar!/:5.3.16]
	at org.avniproject.etl.repository.sync.EntityRepository$$EnhancerBySpringCGLIB$$c19c3f19.saveEntities(<generated>) ~[classes!/:na]
	at org.avniproject.etl.service.SyncService.migrateTable(SyncService.java:49) ~[classes!/:na]
	at org.avniproject.etl.service.SyncService.lambda$sync$0(SyncService.java:40) ~[classes!/:na]
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1541) ~[na:na]
	at org.avniproject.etl.service.SyncService.sync(SyncService.java:40) ~[classes!/:na]
	at org.avniproject.etl.service.SyncService$$FastClassBySpringCGLIB$$196cbf61.invoke(<generated>) ~[classes!/:na]
	at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218) ~[spring-core-5.3.16.jar!/:5.3.16]
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:783) ~[spring-aop-5.3.16.jar!/:5.3.16]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) ~[spring-aop-5.3.16.jar!/:5.3.16]
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:753) ~[spring-aop-5.3.16.jar!/:5.3.16]
	at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:123) ~[spring-tx-5.3.16.jar!/:5.3.16]
	at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:388) ~[spring-tx-5.3.16.jar!/:5.3.16]
	at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:119) ~[spring-tx-5.3.16.jar!/:5.3.16]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.16.jar!/:5.3.16]
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:753) ~[spring-aop-5.3.16.jar!/:5.3.16]
	at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:698) ~[spring-aop-5.3.16.jar!/:5.3.16]
	at org.avniproject.etl.service.SyncService$$EnhancerBySpringCGLIB$$3ebde848.sync(<generated>) ~[classes!/:na]
	at org.avniproject.etl.service.EtlService.runForOrganisation(EtlService.java:63) ~[classes!/:na]
	at org.avniproject.etl.service.EtlService.lambda$runForOrganisationSchemaNames$1(EtlService.java:49) ~[classes!/:na]
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) ~[na:na]
	at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177) ~[na:na]
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655) ~[na:na]
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[na:na]
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) ~[na:na]
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) ~[na:na]
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) ~[na:na]
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[na:na]
	at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) ~[na:na]
	at org.avniproject.etl.service.EtlService.runForOrganisationSchemaNames(EtlService.java:47) ~[classes!/:na]
	at org.avniproject.etl.EtlApplication.run(EtlApplication.java:46) ~[classes!/:na]
	at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:777) ~[spring-boot-2.6.4.jar!/:2.6.4]
	at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:761) ~[spring-boot-2.6.4.jar!/:2.6.4]
	at org.springframework.boot.SpringApplication.run(SpringApplication.java:310) ~[spring-boot-2.6.4.jar!/:2.6.4]
	at org.springframework.boot.SpringApplication.run(SpringApplication.java:1312) ~[spring-boot-2.6.4.jar!/:2.6.4]
	at org.springframework.boot.SpringApplication.run(SpringApplication.java:1301) ~[spring-boot-2.6.4.jar!/:2.6.4]
	at org.avniproject.etl.EtlApplication.main(EtlApplication.java:39) ~[classes!/:na]
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:na]
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:na]
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
	at java.base/java.lang.reflect.Method.invoke(Method.java:566) ~[na:na]
	at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:49) ~[avni-etl.jar:na]
	at org.springframework.boot.loader.Launcher.launch(Launcher.java:108) ~[avni-etl.jar:na]
	at org.springframework.boot.loader.Launcher.launch(Launcher.java:58) ~[avni-etl.jar:na]
	at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:88) ~[avni-etl.jar:na]
Caused by: org.postgresql.util.PSQLException: ERROR: current transaction is aborted, commands ignored until end of transaction block
	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2675) ~[postgresql-42.3.3.jar!/:42.3.3]
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2365) ~[postgresql-42.3.3.jar!/:42.3.3]
	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:355) ~[postgresql-42.3.3.jar!/:42.3.3]
	at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:490) ~[postgresql-42.3.3.jar!/:42.3.3]
	at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:408) ~[postgresql-42.3.3.jar!/:42.3.3]
	at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:329) ~[postgresql-42.3.3.jar!/:42.3.3]
	at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:315) ~[postgresql-42.3.3.jar!/:42.3.3]
	at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:291) ~[postgresql-42.3.3.jar!/:42.3.3]
	at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:286) ~[postgresql-42.3.3.jar!/:42.3.3]
	at com.zaxxer.hikari.pool.ProxyStatement.execute(ProxyStatement.java:94) ~[HikariCP-4.0.3.jar!/:na]
	at com.zaxxer.hikari.pool.HikariProxyStatement.execute(HikariProxyStatement.java) ~[HikariCP-4.0.3.jar!/:na]
	at org.springframework.jdbc.core.JdbcTemplate$1ExecuteStatementCallback.doInStatement(JdbcTemplate.java:422) ~[spring-jdbc-5.3.16.jar!/:5.3.16]
	at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:381) ~[spring-jdbc-5.3.16.jar!/:5.3.16]
	... 75 common frames omitted
Caused by: org.postgresql.util.PSQLException: ERROR: column "Total silt requested by the family members โ€“ Nu (1206887472)" of relation "farmer_farmer_interaction_" does not exist
  Position: 943
	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2675) ~[postgresql-42.3.3.jar!/:42.3.3]
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2365) ~[postgresql-42.3.3.jar!/:42.3.3]
	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:355) ~[postgresql-42.3.3.jar!/:42.3.3]
	at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:490) ~[postgresql-42.3.3.jar!/:42.3.3]
	at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:408) ~[postgresql-42.3.3.jar!/:42.3.3]
	at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:329) ~[postgresql-42.3.3.jar!/:42.3.3]
	at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:315) ~[postgresql-42.3.3.jar!/:42.3.3]
	at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:291) ~[postgresql-42.3.3.jar!/:42.3.3]
	at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:286) ~[postgresql-42.3.3.jar!/:42.3.3]
	at com.zaxxer.hikari.pool.ProxyStatement.execute(ProxyStatement.java:94) ~[HikariCP-4.0.3.jar!/:na]
	at com.zaxxer.hikari.pool.HikariProxyStatement.execute(HikariProxyStatement.java) ~[HikariCP-4.0.3.jar!/:na]
	at org.springframework.jdbc.core.JdbcTemplate$1ExecuteStatementCallback.doInStatement(JdbcTemplate.java:422) ~[spring-jdbc-5.3.16.jar!/:5.3.16]
	at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:381) ~[spring-jdbc-5.3.16.jar!/:5.3.16]
	at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:431) ~[spring-jdbc-5.3.16.jar!/:5.3.16]
	at org.avniproject.etl.repository.sync.TransactionalTablesSyncAction.lambda$perform$0(TransactionalTablesSyncAction.java:35) ~[classes!/:na]
	at org.avniproject.etl.repository.JdbcContextWrapper.wrap(JdbcContextWrapper.java:17) ~[classes!/:na]
	... 73 common frames omitted

Optimize ETL Jobs performance and resource consumption

Optimize ETL Jobs performance and resource consumption

  1. Minimize long-running queries impact : Evaluate Query plans to optimise the SQL queries used in ETL jobs.
  2. Optimize scheduling for various organizations : We should provide ways to specify different frequency of ETL job trigger for each specific organisation. This would help address performance concerns at the same time satisfying Customer needs, that vary across different organizations.
  3. Distribute load over larger period of time: Depends on above mentioned ability to individually trigger ETL jobs for each organisation, so that we space out the jobs with sleep-time interleaved rather than triggering ETL job for all orgs at once.

Full hierarchy is not getting added in Address table

In RWB district taluka missing for some dams.

Locations in DB

set role rwbka; select dam.title,dam.type_id,tal.title,tal.type_id,dist.title,dist.type_id from address_level dam left join address_level tal on tal.id=dam.parent_id left join address_level dist on dist.id=tal.parent_id where dam.id in ( 373674, 373675, 116208, 373676 )

Location in ETL Table

select * from organisation; select * from rwb.address where id in ( 373674, 373675, 116208, 373676 );

Reference ticket:
https://avni.freshdesk.com/a/tickets/2380

Convert ETL to service

Motivation

With multiple applications in Avni now, we need to start delineating the boundary of data for each service. This becomes important for us to make sense of the data and actors working on the data.

Proposed End-State

https://docs.google.com/document/d/11RMG3JmfjUWicF11H1J7jt8JuMRi2Ib-R5NawkxlIbU/edit?usp=sharing

As a first step to get there, we will first move the ETL job into a service.

Tasks

  • - Upgrade to Java 17 and latest possible Spring/Boot versions
  • - Move to a Spring Boot service avni-etl
  • - Change CircleCI deploy for etl to target integration ec2 instance
  • - Add a ping service (https://int-staging.avniproject.org/etl/ping)
  • - Move the ETL job to run as a batch job (quartz/spring batch etc) every hour
  • - Setup systemd service for etl service
  • - Create CircleCI deploy for int-server and int-adminapp

Run ETL through API

Story

Earlier, we were able to run ETL for an organisation. However, now that it is running as a service, we are not able to run ETL for an organisation, and need to wait an hour each time (unless we restart the service). As part of this story, we will enable ETL for an organisation

Acceptance criteria

  • POST /job?schema="abc" with admin role required
  • Job should not interfere with existing jobs
  • GET /job/status?schema="abc" to get status of jobs for schema "abc" - gives start time, status

Add tables to etl

The following items have not yet been moved to ETL, and should be

  • individual_relationship
  • group_subject
  • checklists

Handle group form elements

  • Fix column name for group form elements
  • Ensure values of individual items of the group is showing up in the final table

Allow ETL to be run per organisation

Motivation

Both implementers and users need to understand if the ETL for their organisation is healthy. Currently, the only way to find this out is by looking at the logs. In order to change that, we should be able to see status of ETL broken down by organisation.

We also need to be able to run ETL for an organisation separately (with auth) so that any issues can be fixed separately without using up too much of system resources

We also have to ensure that no two ETL job runs together to avoid issues of data corruption

Timezone problems for date concepts

Date observations are stored in the format "2022-01-01 18:30:00.000" in the observations jsonb. This gets converted to 2022-01-01 in ETL (ideally should have been 2022-01-02 because it is based on IST).

Solution

  • Move all dates to IST date.
  • Recreate ETL data in prod so that dates show up correctly

Acceptance criteria

  • A date recorded as "2022-01-01" on the avni-client should show up as "2022-01-01" on ETL reports as well.

https://avni.freshdesk.com/a/tickets/2056

Create basic tables from schema

Create the basic tables required for different subject types, their encounters and enrolments.

Difference between existing views

  1. Add audit columns
  2. Remove unnecessary columns such as encounter type, encounter type name, subject type etc
  3. Add address_id everywhere

Support group subjects in ETL schema

Reason

Typically, group_subject table is just an m-n relationship table which is usually searched by a specific foreign key. However, when it comes to households, retrieving "head of household" is a common need. Having a separate table for each group type helps improve both performance as well as adds simplicity to the queries (if we add a head of household as a direct column).

Schema description

  • table named groupSubject_memberSubject
  • If group subject is of type household, then head_of_household_id is additionally added to the table schema

Testing notes

  • Change of head of household must be handled correctly

Media table ETL mandates presence of registration sync concept 1 and 2

ETL fails if a subject type for an org does not have registration sync concept 1 and 2 configured.

Analysis:
resources/sql/etl/media.sql has the registration sync concepts as placeholders which resolve to column name "entity.null" when both registration sync concepts are not configured for a subject type and this column cannot be found.

[avni-etl] ETL not running

.. ETL was stuck at some place without any relevant log or db command info.

This was the last log which was showing-

2023-04-17 11:45:09.314 INFO 7706 --- [ main] org.avniproject.etl.service.SyncService : Migrating table sakhiapp.individual_child_hbyc_cancel
2023-04-17 11:45:09.483 INFO 7706 --- [ main] org.avniproject.etl.service.SyncService : Migrating table sakhiapp.individual_child
2023-04-17 11:45:10.170 INFO 7706 --- [ main] org.avniproject.etl.service.SyncService : Migrating table sakhiapp.address
2023-04-17 11:45:10.983 INFO 7706 --- [ main] org.avniproject.etl.service.EtlService : Running ETL for schema rwbniti with dbUser rwbniti

Media listing API

Story

  • avni-etl will have a media listing api so that avni-media can show them on the screen

PS: Please talk to Vinay when picking up this story

API Details

  • API /media?size=x&page=y to retrieve the following values
[
    {
      uuid: "" (from the media table)
      url: 
      subjectName: <subjectName>
      subjectTypeName: <subjectTypeName>
      programEnrolment: <programName> 
      encounterTypeName: <encounterTypeName>
      lastModifiedDateTime: <lastModifiedDateTime> 
      createdDateTime: <createdDateTime>
      syncConcept1Name: <Sync Concept 1 for this subject type> 
      syncConcept2Name:  <Sync Concept 2 for this subject type> 
      syncParameterValue1
      syncParameterValue2
      address: {
        District: "abc",
        Block: "abc",
       }
    }
]
  • Auth based on value in cookie or auth-token. App will identify organisation/schema based on this
  • Will be sorted by createdDateTime desc so that latest pictures show up first
  • We will eventually also build a filter functionality here
  • Pagination will be required

Ability to run sync for a single organisation

Context

Eventually, we would want the ability to run ETL for a single organisation on demand. Currently, the need arises because our QA would want to make some quick changes and run tests for this.

Acceptance criteria

  • ETL job will have a mechanism to run for a single organisation. This can be passed in as a command line parameter.
  • If no parameters are passed, then ETL will run for all organisations
  • The distinguishing parameter will be provided by the schema name, since this is the only parameter that distinguishes multiple ETL schemas
  • Tests should be done against both organisations and organisation groups.

Post this change, we should be able to invoke org_schema names as comma-separated-values in the first parameter of start.sh script.
Ex: sh ./start.sh ihmp,jnscp

Getting error while generating ETL tables for Muktipath

While generating tables for Muktipath ETL gives an error.
They have added concepts in marathi.

Error :-
create table muktipath.reporting_user_samupdeshak_daily_report ("id" integer, "uuid" text, "is_voided" boolean, "created_by_id" integer, "last_modified_by_id" integer, "created_date_time" timestamp with time zone, "last_modified_date_time" timestamp with time zone, "name" text, "individual_id" integer, "address_id" integer, "earliest_visit_date_time" timestamp with time zone, "max_visit_date_time" timestamp with time zone, "encounter_date_time" timestamp with time zone, "encounter_location" point, "cancel_date_time" timestamp with time zone, "cancel_location" point, "legacy_id" text, "latest_approval_status" text, "????? ??? ?????" text, "?? ?????? ????????? ??? ???? ???? ???? ????? ?? ? (257783007)" numeric, "?? ?? ??????? ????? ???? ???? ????? ???" numeric, "????? ??? ??????" text, "?? ?? ??????? ???? ????? ???? ???? ????? ???" numeric, "?? ?????? ?????? ????? ???? ????? ????? ???? ?" time, "??????? ????????? ????????? ?????" text, "???? ???? ????? ?? ????? ????? ?????" numeric, "????? ???" text, "?? ??? ?????? ???? ????? ?? ???????? ?????" numeric, "?? ?????? ????????? ??? ???? ???? ???? ????? ?? (1514654694)" numeric, "?? ?????? ??????? ????????? ??? ????" text, "????? ?????" date, "?? ?????? ??? ??? ???? ?? ?????" text, "village" text, "????? ????? ??? ??? ?? ?????" text, "?????? ?? ??????? ????????? ??? ?????" text, "?? ?????? ??? ???? ???? ???" text, "?? ???????? ??? ??? ?????? ???? ???? ??? ????? ?" text, "???? ??? ?????" text, "?? ?????? ??????? ????????? ??? ????" text, "?? ?????? ???? ???? ?????? ?????? ?" numeric, "?? ?????? ????????? ????? ??? ????????? ???????? (932606255)" text ); grant all privileges on all tables in schema muktipath to muktipath; ]; nested exception is org.postgresql.util.PSQLException: ERROR: column "?? ?????? ????????? ???" specified more than once

Search API should be able to search by concept

The current search does not search by concept. With this epic, we will be able to search by concept

Acceptance criteria

  • New conceptFilters field added
  • The filter can be at any level. We need to figure out where the field is present, and then add relevant filters
  • Concept filters should work for Coded, Numeric and Date filters.
{
    "subjectTypeNames": ["Individual"],
    "programNames": [],
    "encounterTypeNames": [],
    "conceptFilters": [
    	{
    		conceptName: "CodedConcept",
    		values: ["", ""]
    	},
    	{
    		conceptName: "DateConcept | NumericConcept",
    		from: ""
    		to: "" //inclusive filter
    	},
    ],
    "fromDate": "2019-04-26T03:46:41.694Z",
    "toDate": "2022-04-26T03:46:41.694Z",
    "imageConcepts": ["Image"],
    "syncValues": [{
        "syncConceptName": "concept",
        "syncConceptValues": ["value1", "value2"]
    }],
    "addresses": [
        {
            "addressLevelType": "Subcenter",
            "addressLevelIds": [1224, 1122]
        },
        {
            "addressLevelType": "Village",
            "addressLevelIds": [1244, 1122]
        }
    ]
}

ETL failures

ETL fails for organisations that have a form for ManualProgramEligibility. See stack trace below for lahi in staging

java.lang.IllegalArgumentException: No enum constant org.avniproject.etl.domain.metadata.TableMetadata.Type.ManualProgramEnrolmentEligibility
	at java.base/java.lang.Enum.valueOf(Enum.java:273)
	at org.avniproject.etl.domain.metadata.TableMetadata$Type.valueOf(TableMetadata.java:211)
	at org.avniproject.etl.repository.rowMappers.TableMetadataMapper.getTableType(TableMetadataMapper.java:88)
	at org.avniproject.etl.repository.rowMappers.TableMetadataMapper.populateCommonColumns(TableMetadataMapper.java:78)
	at org.avniproject.etl.repository.rowMappers.TableMetadataMapper.create(TableMetadataMapper.java:58)
	at org.avniproject.etl.repository.SchemaMetadataRepository.lambda$getFormTables$4(SchemaMetadataRepository.java:110)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
	at java.base/java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1779)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682)
	at org.avniproject.etl.repository.SchemaMetadataRepository.getFormTables(SchemaMetadataRepository.java:110)
	at org.avniproject.etl.repository.SchemaMetadataRepository.getNewSchemaMetadata(SchemaMetadataRepository.java:40)
	at jdk.internal.reflect.GeneratedMethodAccessor71.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:343)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:196)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:750)
	at org.springframework.dao.support.PersistenceExceptionTranslationInterceptor.invoke(PersistenceExceptionTranslationInterceptor.java:137)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184)
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:750)
	at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:123)
	at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:390)
	at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:119)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184)
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:750)
	at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:702)
	at org.avniproject.etl.repository.SchemaMetadataRepository$$SpringCGLIB$$0.getNewSchemaMetadata(<generated>)
	at org.avniproject.etl.service.SchemaMigrationService.migrate(SchemaMigrationService.java:34)
	at jdk.internal.reflect.GeneratedMethodAccessor68.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:343)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:196)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:750)
	at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:123)
	at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:390)
	at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:119)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184)
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:750)
	at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:702)
	at org.avniproject.etl.service.SchemaMigrationService$$SpringCGLIB$$0.migrate(<generated>)
	at org.avniproject.etl.service.EtlService.runForOrganisation(EtlService.java:62)
	at org.avniproject.etl.service.EtlService.lambda$runForOrganisationSchemaNames$1(EtlService.java:48)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
	at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
	at org.avniproject.etl.service.EtlService.runForOrganisationSchemaNames(EtlService.java:46)
	at org.avniproject.etl.service.EtlService.run(EtlService.java:36)
	at org.avniproject.etl.scheduler.EtlJob.execute(EtlJob.java:23)
	at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)

Secondary table for the group form element

Store all the group form element observations in a separate table. This will be essential if we are to run reports for grouped form elements.

With more implementations using group (especially repeated groups) in their implementation, this becomes important to implement.

ETL fails consistently for muktipath

ETL job works fine for the first time but starts to fail when run again. It gives the error that a column with some name does not exist.
This happens because while truncating non-UTF 8 chars sometimes the function returns bytes of length > 63.

ETL for media table

  • New task for the etl job to create a media table that contains the following fields
    • address_id
    • created_date_time
    • last_modified_date_time
    • image_url
    • sync_parameter_key1
    • sync_parameter_value1
    • sync_parameter_key2
    • sync_parameter_value2
    • subject_type_name
    • encounter_type_name
    • program_name
    • concept_name
    • entity_id
    • uuid

Fixes required for avni-etl

Missed requirements on avni-etl

  1. Sync parameters in tables
  • Ensure that sync_parameters are available in encounter, program_encounter and program_enrolment tables
  • These columns will be named after the name of the concept
  • These columns will be indexed
  1. Location attributes in address table
  • These attributes are important dimensions when reporting. It is best if they are available on the etl tables.

ETL issues

Acceptance criteria

The following acceptance criteria was missing in the original media table creation story and are added here.

  • When media observations are an array, then the app should create multiple rows in the system
  • When media observations are updated, then the older observations are removed and new observations are added

Respond faster to ETL job failures

We currently do not have sufficient notifications during failure of ETL jobs, and therefore do not look at them.

Need to have better monitoring mechanisms that will notify us within a day so that someone can take a look at them.

Tech tasks

  • Set up healthchecks for the job
  • Set up bugsnag for each exception with org name in the message

Ensure that etl job works for organisations extending from 1

Testing

  • Ensure works for IHMP
  • Ensure doesn't break for existing organisations
  • Try new org
  • Verify data for IHMP and see that it comes through fine

-- Cases - Removed form elements, replaced form elements, normal form elements, introduced form elements

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.