dassl / classdb Goto Github PK
View Code? Open in Web Editor NEWAn open-source system to let students experiment with relational data
Home Page: https://dassl.github.io/ClassDB/
License: Other
An open-source system to let students experiment with relational data
Home Page: https://dassl.github.io/ClassDB/
License: Other
metaFunctions.sql
uses pg_views
, prepareClassDB.sql
uses pg_roles
. Using Information_Schema
increases the use of standards compliant code (desirable), and makes it easier to port the scripts to another implementation.
Evaluate the suitability of the applicable views provided by Information_Schema
and implement them wherever possible.
Since all SECURITY DEFINER functions are currently set to be owned by DBManager, all privileges necessary for the functions to operate are also given to users with the DBManager role.
Some of these privileges can lead making changes that cause anomalies in an instance of ClassDB. One issue in particular is the ability to modify the userName
column in the classdb.Student
and classdb.Instructor
tables (or alternatively, the potential future classdb.User
table). We cannot restrict that privilege to only one function, but rather only to a particular role.
It seems that most straightforward approach to solve this would be to create another role that would own the functions marked as SECURITY DEFINER. This new role would need the createrole
privilege, CREATE on current_database
, and ALL on the classdb.Student
and classdb.Instructor
/ classdb.User
tables. The DBManager and Instructor roles would then only have SELECT granted on the userName column in those tables. This new role could also become the owner of the classdb schema. (A potential name for this new role could also be "ClassDB".)
I believe that Dr. Murthy had mentioned a set up similar to this previously.
NAME seems to be heavily discouraged for non-internal operations, however, there does not seem to be a straightfoward way to implement the userName field, without placing a constraint on the rolenames allowed. Andrew Figuero(@afig)
When importing logs, the lastActivity column should only be updated by connections that have happened later than the last activity. This should also apply to incrementing the connection count.
testPrepareClassDB.sql
throws an exception when executing classdb.createStudent()
for the first example student. There error appears to be from the ALTER DEFAULT PRIVILEGES
statement. The docs state that, "You can change default privileges only for objects that will be created by yourself or by roles that you are a member of," which I think may be the root of the issue. We may be able to solve this with SET ROLE
or similar.
Code in createStudent
, createInstructor
, and createDBManager
can be refactored.
Just as example, functions in metaFunctions.sql
use TEXT
for parameters instead of using narrower types.
Not sure if it is ISO SQL, but consider using the table_name.column_name%TYPE
to reference the type of a column. (Also not sure if this type can be used to refer to view columns.)
In general, review all functions for possible improvement.
Also see discussion in Issue #4.
The use of current_timestamp
stores the time at which the row is updated instead of the time at which the query was issued. Use statement_timestamp()
instead?
Connection logging should be deployed on the local VM soon to generate a couple days worth of log data to analyze.
Presently, there is no way for an instructor to learn if/when a student (last) changed their password. I wonder if an event trigger or a log entry provides the information required to discover pwd-change activity.
I was hoping ALTER ROLE is an event trigger, but it doesn't appear to be.
The pg_event_trigger_dropped_objects() does not seem to correctly return object_identity in prepareUserLogging.sql. I will investigate this further.
prepareClassDB.sql
is now 500+ lines long and I expect it to get close to 600 lines. At this point, it is rather hard to see where code for any aspect begins/ends. For now, I am placing a dashed line between code segments so I can see the boundaries, but that is just band-aid.
Perhaps it can help to split the file into multiple files and define a sequence of execution for the files.
Storing timestamps in UTC is the right thing to do, but it is typical for systems to return timestamps in "local timezone"
Perhaps a function such as getStudents(optional timezone)
: will perform SELECT *
on the table but convert timestamps to the specified timezone or to "local" timezone if parameter is missing.
Developing a view is an alternative, but views cannot be parameterized. A view that reuses the aforementioned function to always apply the local timezone is an option. However, a view seems redundant if a function is available.
The \p (password) command provided by the psql
client provides a better way for users to change their password. It is also more secure than the functions created by the scripts, since the password is never displayed or transmitted in clear-text when using the command. Consider whether all functions in passwordProcedures.sql
are necessary.
pgAdmin stores its user data in a separate sqlite database. We should see if a script can add user data to this database, and if user information can be synced between postgres and pgAdmin.
Currently, if the script is run as a user who is not a superuser, privileges on the public
schema are not revoked and the following warning is reported:
prepareClassDB.sql:53: WARNING: no privileges could be revoked for "public"
In short, only the owner of a schema can revoke privileges on it. By default, the owner of the public
schema is the postgres
user, not the owner of the database.
Two possible solutions:
Require the user running the script to be a superuser (easiest, but most restrictive, and might not be a good idea in general).
Require the person setting up the ClassDB instance to ensure that the user they are running the script as has been set as the owner of the public
schema. Issue a descriptive warning/exception if they are not the owner.
In order to prevent confusion and unnecessary long-running queries, student roles should have a timeout on execution time for queries sent by them. This will mitigate frustration if students do not know how to cancel a query, and communicate to users that a long running time is likely to be an issue with the query, rather than a connection/server-client issue.
The amount of time to allow queries also needs to be decided. The setting is measured in milliseconds.
Some mentions of group roles ClassDB, Student, etc. are in upper case; others are in lower case. Some of these instances cause query failures (especially in the code I just checked in).
Need some guidance on when to use which case, using the same case as much as possible.
DB-level
prepareUserLogging.sql
to a new addLogMgmt.sql
prepareUserLogging.sql
to a new addDDLMonitors.sql
Server-level
ALTER SYSTEM
statements and the call to pg_reload_conf();
from prepareUserLogging.sql
to enableServerLogging.sql
with a clear note about running the script from psql or running each statement individuallyprepareUserLogging.sql
should be no more after this revision.
Not being format-police, but:
At least in prepareClassDB.sql, some tabs are 3 spaces and others are 4 spaces; some tabs are spaces; others are tab characters (or so it seems). Likewise, some code is broken to be under ~80 chars/line, but other lines have 120+ chars.
Any decision is OK as long we have consensus and follow it reasonably. Specifically, line width does not have to be exactly adhered, but we shouldn't also exceed it frequently by more than 5-10 chars.
We need a means to know which version of ClassDB scripts were used to create a DB. For example, we do not presently know which version was used to create the dassl DB on campus.
I propose creating and populating a settings table in the DB prepared by prepareClassDB.sql
and other .sql files used to set up the DB. Specifically, the hash of the file used should be maintained any time that file is run.
Could we automate getting git's hash for files? I imagine yes.
Generally, Make sure queries use ISO SQL as much as possible. This includes using ISO standard data types, statements, and schemas such as Information_Schema
(see #1).
A function such as dropAllStudents()
can be helpful so instructors can drop all existing students in preparation for a new semester.
Currently, classdb.killUserConnections
requires that its definer have permission to kill any user's connections. If prepareClassDB.sql
is not run as superuser, then classdb.killUserConnections
will only be able to target connections belonging to its defining user.
Postgres provides a facility to fix this problem. There is a predefined role pg_signal_backend
, which allows a granted user to kill connections from any non-superusers. I propose that we grant this to the ClassDB
role.
One requirement for this fix is that the user running prepareClassDB.sql
must have ADMIN
rights on pg_signal_backend
. It seems common for non-superuser 'admin' accounts, for example the ones for AWS and Bluemix, to have ADMIN
rights on pg_signal_backend
to other users. We will definitely want to document this requirement.
Only the owner of the current database or a superuser can grant roles CONNECT on the current database. The script needs to be updated to declare this, and check to see if the current user is either the owner of the current database or a superuser.
In its current state, the script does not fail if the current user is not the DB owner or a superuser, but does result in several WARNINGS, along with not allowing the proper roles to connect to the DB.
The column name classDB.Student.schoolID can be construed as the "ID of the student's school", instead of "the student's school-issued ID". I don't presently have alternatives to suggest.
Attempting to run the prepareClassDB script results in the transaction being aborted due to a few syntax errors in the file, and an incompatibility with an older version. The specific problems are listed below:
CREATE OR REPLACE does not work if the functions have different parameter names. This means that in its current state the script cannot be run over an older version with different parameter names. This can be fixed by adding a DROP FUNCTION IF EXISTS prior to the affected functions.
Line 102 uses %1
to refer to a parameter in the createUser function. This resulted in an error when running the script. It can be fixed by using $1
instead. Here is a copy of the line:
format('CREATE USER %I ENCRYPTED PASSWORD %L', %1, COALESCE($2, $1));
Lines 294, 347, and 380: It seems like there is a missing AND
between the two boolean expressions. Here is an example:
WHERE pg_catalog.pg_has_role($1, oid, 'member') rolname != $1
The ON CONFLICT clause resulted in an error when attempting to execute the createStudent and createInstructor functions. The error is copied below. It seems like there was an issue with username
being both the name of a parameter and of a column in each table. A quick fix was to change the name of the parameter, but we may want to choose a different solution.
ERROR: column reference "username" is ambiguous
LINE 2: ON CONFLICT (username) DO UPDATE SET studentname =...
^
DETAIL: It could refer to either a PL/pgSQL variable or a table column.
QUERY: INSERT INTO classdb.Student VALUES($1, $2, $3)
ON CONFLICT (username) DO UPDATE SET studentname = $2
CONTEXT: PL/pgSQL function createstudent(
Since the event triggers have been moved to prepareUserLogging, I don't think there is anything in prepareClassDB that requires a superuser. This change would also make prepareClassDB suitable for PaaS deployment
Tests should be run to ensure users only have permissions on the appropriate objects and schemas for their role(s). These tests should be run on a complete instance of the application using roles that have been created by the application.
Related to #27 and #23. classdb.listConnections(username)
can't be owned by dbManager or similar, because only a superuser can read all rows and columns in pg_stat_activity. I will do some research to understand how the row/column permissions are set in it, and if there is any way to grant full read permissions to other roles.
Strangely, pid for ALL connections can be read by anyone, only details like the timestamps and ip address are protected by default.
As we saw today, it will be helpful to provide a function to dbManagers, and possibly instructors, to kill the connections for a given user. We should discuss which role should be required for this functionality. Below is the query I used to do this today. It should be sufficient to wrap this in a function to start:
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE usename = '<username>';
File prepareClassDB.sql in branch pginstance-2ndrev-dev contains code to test existence of users, schemas, etc. These tests use variables such as valueExists which is then used in IF statements. Instead, the SELECT statements could directly be used in the IF statement in conjunction with the EXISTS operator. This change eliminates the need for variables and the use of EXISTS makes the code’s intent much clearer.
Columns studentname
and instructorname
need to enforce NOT NULL
.
Function changeUserPassword
should be executable only by ClassDB
for security reasons. Allowing execution by other roles means the function can be called with the new pwd in clear from a client.
I propose merging the tables Student
and Instructor
to a single User
table. This unification eliminates some repeated code, simplifies some operations, and provides connection/DLL tracking for all users. (The unified table will probably also add more parameters to existing functions.)
This issue is based on a comment by @srrollo.
Club creation of logging-related fields in table Student with the appropriate logging-code: use ALTER table to add the fields lastConnection
and connectionCount
just before the procedure to import server log. Also, add the fields lastDDL*
just before the function to update DDL activity.
At end only the core fields should remain in CREATE TABLE STUDENT
.
Directories need to be urgently reorged. It will be nice if we can create the directory structure, move files, and merge to master before making any more code changes.
Here is a summary of the top-level dirs based on e-mail conversations on this topic. I am making it a to do list so it can be updated.
docs
can directly contain .MD files or sub-dirs md
and html
if we have both MD and equivalent HTML files. We should start with the assumption that we will have only MD files. So, it is OK to start by putting MD files directly in the docs
folderexamples
can contain all files related to example schemas. For now we will have only the Shelter schema. As long as we have only one schema, we can put all the files directly inside the examples folder with a README
src
can contain all the .sql
files (excluding those intended for testing). At this point, there is no reason to maintain sub-dirstests
can contain all files related to testing. It should have a README
Information such as connection limit are presently hard coded. Perhaps the (default) value for such settings could be maintained in a "Settings" table and those values could be used instead of hard coding.
For example, the following schema permits maintaining role-specific default settings related to connections:
Role Category Setting Value
Student Connection Timeout 1000
Student Connection Number 5
A thorough discussion is needed. No need to implement now.
Most functions in ClassDB are created with CREATE OR REPLACE
and are not dropped prior to creation. We should find all instances of CREATE OR REPLACE
and replace them with CREATE
and a preceeding DROP FUNCTION IF EXISTS
. This will help in making the setup scripts idempotent.
We can maintain a list of instances that need to be/have been fixed here:
public.listTables
in metaFunctions.sql
public.describe
in metaFunctions.sql
pg_temp.createGroupRole
in prepareClassServer.sql
(Even though this function is placed in pg_temp, we may want to add the drop for consistency.classdb.importLog(DATE)
in prepareUserLogging.sql
classdb.importLog()
in prepareUserLogging.sql
classdb.updateStudentActivity
in prepareUserLogging.sql
classdb.createUser
in prepareClassDB.sql
classdb.createStudent
in prepareClassDB.sql
classdb.createInstructor
in prepareClassDB.sql
classdb.createDBManager
in prepareClassDB.sql
classdb.dropStudent
in prepareClassDB.sql
dropAllStudents
in prepareClassDB.sql
(This also needs to be scoped to classdb)classdb.dropInstructor
in prepareClassDB.sql
classdb.dropDBManager
in prepareClassDB.sql
classdb.dropUser
in prepareClassDB.sql
classdb.changeUserPassword
in prepareClassDB.sql
classdb.resetUserPassword
in prepareClassDB.sql
classdb.killUserConnections
in prepareClassDB.sql
classdb.listUserConnections
in prepareClassDB.sql
classdb.killConnection
in prepareClassDB.sql
A script/procedure to add students from a CSV file (for example, a roster) will be helpful
Presently, a student's password is either something explicitly supplied or a school ID if it exists, or the same as the user name. However, after a student is created, it is not clear what the initial password is. This will especially be a problem if students are created in batch by reading a roster.
I propose we remove the automatic choice of school ID (if it exists) as the initial password. Specifically, use the supplied value if provided or use the username. This revision will make it easier for instructors to communicate initial password to students and make the logic uniform for all kinds of users.
For improved cognition, it is helpful to place each CREATE TABLE statement prior to any procedure/block where that table is used.
It is a good idea to limit the number of connections that a student is allowed to make, in order to prevent idle or abandoned connections from causing too many issues. This can be set with the SET CONNECTION LIMIT
setting when creating or altering a role. The default for that setting is -1 (no limit). According to @smurthys , 5 connections seems like a good limit, and I would agree.
The script createPVFC.sql
does not currently contain any meaningful queries. We will have to port the existing schema over to Postgres.
Functions KillConnection
and KillUserConnections
are not granted to DBManager
. They are granted to Instructor
.
In general, avoid executing dynamically constructed SQL as much as possible. Using SQL functions rather than PLpgSQL ones increases performance and portability in some situations.
Connection logging setup has two parts - One time setting of server configuration, and creation of logging functions. These should be split into their own SQL files.
Rename attr LastActivity
to lastDDLStartTime
; add attr such as ddlStartCount
which maintains a count of ddl starts; consider capturing some information about the last ddl that was run, e.g. object name, operation, etc. (say, DROP TABLE X)
Sean Murthy (smurthys)
I propose a createClassDB.sql
to create a DB with the role ClassDB
as owner. This script should be run after running prepareClassServer.sql
but before running prepareClassDB.sql
.
CREATEDB
privilegeprepareClassDB.sql
)ClassDB
Parameterizing the script to accept a database name probably fixes psql
as the client, but it can be instructive to consult the docs for the createdb
program.
It looks like the function dropUser
could be called (with some revision if necessary) from functions dropStudent
, dropInstructor
, and dropDBManager
(each of which have quite a bit of repeated code).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.