bight23checker's People
bight23checker's Issues
Mismatch type coercion flagging non existent logic errors
There was an issue where the merging in mismatch caused an error, if the types on the columns did not match
Now there is an issue (A very extremely weird issue) where the type coercion sometimes changes the date format of one dataframe but not the other, and it causes a mismatch, even though it should not be flagged as a logic error
Field, Trawl, Grab Logic checks are broken
@pdsmith sent this email describing the issue
Robert,
It looks to me like the logic checks aren't working at all. I cleared the station "B18-10015" from the database just in case a database call was being made:
check:
Trawl and Grab submissions require a related StationOccupation record. Records are matched on StationID, SampleDate, and Sampling Organization.
occupation:
grab:
columns are sometimes of incorrect datatype - not what we would expect as the code is running
There was an issue with this file
2023_RHMP_Fish_Abundance_and_Biomass_Data_Bight_format_for_upload.xlsx
In excel, the sampledate column was text, although the text was a perfectly formatted date.
Pandas read the thing in as an "object" datatype and we ran into trouble when we merged it with another dataframe in custom checks. We expected that since the dataframe passed Core checks, the datatype would be the correct type, but that did not happen.
Of course, we can make it a habit to coerce merging columns to the same type, and i think we might want to do that regardless, but this seems like a cheap duct tape kind of fix. I don't see any reason why, if the data matches the database rules, and the columns are matched to the database table, that the corresponding datatypes of the pandas dataframe should match what the postgres tables column datatypes are
Although the checker has improved a lot over the years and we have seen many bugs come up and get resolved, this datatypes issue remains, and even now at the end of 2023 exposed another bug and caused the app to crash yet again
We need a way to ensure that the pandas dataframe columns match the postgres table
This sounds simple, but if we are not careful, this can cause many problems trying to implement a fix
Toxicity summary mussel control acceptability criteria (mean control value) - Incorrect species name and value
station occupation distance check broken
Paul dropped a file
bight18-field-subset-clean-validating-checks.xlsx
And changed a longitude from -117 to -118, which should have caused the two points to be over 100m apart (which they actually are) but it wasnt giving us the error
Loading section of the checker is failing to handle qc sample stationid 0000
The qc sample stationid 0000 passes core checks when matched against lu_station lookup list, but fails on the same lookup list when the data is actually loaded to the database - it is rounding the number 0000 to 0 instead of retaining the data type as text and value as 0000.
Error:
Checker application came across an error
(psycopg2.errors.ForeignKeyViolation) insert or update on table "tbl_toxresults" violates foreign key constraint "tbl_toxresults_stationid_fkey"
DETAIL: Key (stationid)=(0) is not present in table "lu_station".
Potential fix:
Same portion of code that tells the checker to handle each data type so it matches the database schema (used in checks) needs to get used prior to data loading.
Wrong fix: Don't use ceriodaphnia checker fix for the problem. We do not want the values truncated to 0.
ToxBatch duplicates check doesnt work
For some reason, the checkDuplicatesInProduction check for ToxBatch is not catching errors, if the submission contains records already in the database
tox_demo.xlsx
App crashes if 2 out of 3 tables match a dataset
@pdsmith Tried submitting field, trawl and grab data today. The field and grab tabs matched, but trawl didnt. The match dataset let him through, giving him a match dataset of "field_grab" and the app crashed, since the trawl tab was still left in there, but it wasnt matching any db table. This needs to be fixed in the match tables routine
Mismatch function fails when column datatypes differ
This bug was discovered in the microplastics checker. In a logic check, one column "labbatch" had a value "batch1" while another dataframe's labbatch column had simply the number 1
I got an error which said "You are trying to merge on object and int64 columns"
This bug is fixed in the microplastics checker and needs to be applied to the other checker applications.
its the function called "mismatch" in proj/custom/functions.py
Column inf does not exist
When loading the clean bight 18 tox data, it dies trying to load the tox summary table, saying "column inf does not exist'
Toxicity batch - check for duplicate toxbatch
errors/warnings tab not updated when new file that fails match tables is dropped
the warnings/errors for the previous file are displayed in the errors and warning tabs if a different file is dropped and fails match tables routine. based on order of checks, no errors and warnings should be displayed:
drop file 1: file with warnings
test-file-with-warnings.xlsx
drop file 2: file that fails match table
test-file-with-warnings.xlsx
emails for critical errors sending to spam folder
Duplicate check didnt work
@pdsmith dropped this file
bight18-field-subset-clean-use-with-toxicity.xlsx
And it didnt catch the duplicates, and caused a critical error on final submit
Here is the error message:
(psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "tbl_stationoccupation_pkey"
DETAIL: Key (stationid, occupationdate, occupationtime, samplingorganization, collectiontype)=(0, 2018-01-01 00:00:00, 00:00:00, Southern California Coastal Water Research Project, Grab) already exists.
[SQL:
INSERT INTO tbl_stationoccupation
(occupationdepth, navtype, created_date, windspeed, swellperiod, seastate, login_email, salinity, stationfail, comments, stationid, occupationtimezone, collectiontype, created_user, occupationdate, winddirection, submissionid, swellheight, occupationlatitude, last_edited_user, objectid, globalid, vessel, login_agency, occupationdepthunits, samplingorganization, warnings, occupationdatum, salinityunits, abandoned, swelldirection, weather, swellheightunits, occupationlongitude, occupationtime, last_edited_date, windspeedunits)
VALUES (1.3, 'GPS', '2022-10-14 19:23:51', 10, 0, 'Calm', '[email protected]', 34.5, 'None or No Failure', 'Bolsa Chica - North of pedestrian wooden foot bridge. Original SITE ID = BBOLR', 'B18-10156', 'PDST', 'Grab', 'checker', '2018-08-29 00:00:00', 'W', 1665775431, 0, 33.69625, 'checker', sde.next_rowid('sde','tbl_stationoccupation'), sde.next_globalid(), 'Early Bird 3', 'Southern California Coastal Water Research Project', 'm', 'AMEC, Foster, & Wheeler / WOOD', 'SamplingOrganization - More than one agency detected', 'WGS84', 'psu', 'No', 'NR', 'Clear', 'ft', -118.04604, '13:20:00', '2022-10-14 19:23:51', 'kts'),
(5.0, 'DGPS', '2022-10-14 19:23:51', 8, 0, 'Calm', '[email protected]', -88.0, 'None or No Failure', NULL, 'B18-10000', 'PDST', 'Trawl 5 Minutes', 'checker', '2018-07-19 00:00:00', 'SW', 1665775431, 0, 33.75935, 'checker', sde.next_rowid('sde','tbl_stationoccupation'), sde.next_globalid(), 'Marine Surveyor', 'Southern California Coastal Water Research Project', 'm', 'City of Los Angeles Environmental Monitoring Division', NULL, 'NAD83', 'psu', 'No', 'C', 'Clear', 'ft', -118.16285, '12:41:15', '2022-10-14 19:23:51', 'kts'),
(5.0, 'DGPS', '2022-10-14 19:23:51', 4, 0, 'Calm', '[email protected]', -88.0, 'None or No Failure', NULL, 'B18-10000', 'PDST', 'Grab', 'checker', '2018-07-24 00:00:00', 'SW', 1665775431, 0, 33.75918333, 'checker', sde.next_rowid('sde','tbl_stationoccupation'), sde.next_globalid(), 'Marine Surveyor', 'Southern California Coastal Water Research Project', 'm', 'City of Los Angeles Environmental Monitoring Division', NULL, 'NAD83', 'psu', 'No', 'C', 'Clear', 'ft', -118.16263333, '11:48:40', '2022-10-14 19:23:51', 'kts'),
(0.0, 'AGPS', '2022-10-14 19:23:51', 0, 0, 'Calm', '[email protected]', -88.0, 'None or No Failure', 'placeholder - do not delete', 0000, 'PDST', 'Grab', 'checker', '2018-01-01 00:00:00', 'C', 1665775431, 0, 33.0, 'checker', sde.next_rowid('sde','tbl_stationoccupation'), sde.next_globalid(), 'None', 'Southern California Coastal Water Research Project', 'm', 'Southern California Coastal Water Research Project', 'StationID - Distance from Occupation Latitude/Longitude in submission to Target Latitude/Longitude in field assignment table is greater than 100 meters.', 'NAD83', 'psu', 'No', 'C', 'Clear', 'ft', -117.0, '00:00:00', '2022-10-14 19:23:51', 'kts')
]
(Background on this error at: https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsqlalche.me%2Fe%2F14%2Fgkpj&data=05%7C01%7Cpauls%40sccwrp.org%7Cb78a2c876bcb41fd465508dab38428c7%7Ca4a8f23d1ae14b1c9902eaa153028190%7C0%7C0%7C638019677583852464%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FqyTijBQKXQoZrtwzGFXrIy%2BYYfvN0Vo5z%2F143gxwR8%3D&reserved=0)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.