Code Monkey home page Code Monkey logo

bight23checker's People

Contributors

duynguyen2019 avatar r7butler avatar

Watchers

 avatar  avatar  avatar

bight23checker's Issues

Mismatch type coercion flagging non existent logic errors

There was an issue where the merging in mismatch caused an error, if the types on the columns did not match

Now there is an issue (A very extremely weird issue) where the type coercion sometimes changes the date format of one dataframe but not the other, and it causes a mismatch, even though it should not be flagged as a logic error

Field, Trawl, Grab Logic checks are broken

@pdsmith sent this email describing the issue

Robert,

It looks to me like the logic checks aren't working at all. I cleared the station "B18-10015" from the database just in case a database call was being made:

check:

Trawl and Grab submissions require a related StationOccupation record. Records are matched on StationID, SampleDate, and Sampling Organization.

occupation:

occupation

grab:

grab

columns are sometimes of incorrect datatype - not what we would expect as the code is running

There was an issue with this file
2023_RHMP_Fish_Abundance_and_Biomass_Data_Bight_format_for_upload.xlsx

In excel, the sampledate column was text, although the text was a perfectly formatted date.

Pandas read the thing in as an "object" datatype and we ran into trouble when we merged it with another dataframe in custom checks. We expected that since the dataframe passed Core checks, the datatype would be the correct type, but that did not happen.

Of course, we can make it a habit to coerce merging columns to the same type, and i think we might want to do that regardless, but this seems like a cheap duct tape kind of fix. I don't see any reason why, if the data matches the database rules, and the columns are matched to the database table, that the corresponding datatypes of the pandas dataframe should match what the postgres tables column datatypes are

Although the checker has improved a lot over the years and we have seen many bugs come up and get resolved, this datatypes issue remains, and even now at the end of 2023 exposed another bug and caused the app to crash yet again

We need a way to ensure that the pandas dataframe columns match the postgres table

This sounds simple, but if we are not careful, this can cause many problems trying to implement a fix

Loading section of the checker is failing to handle qc sample stationid 0000

The qc sample stationid 0000 passes core checks when matched against lu_station lookup list, but fails on the same lookup list when the data is actually loaded to the database - it is rounding the number 0000 to 0 instead of retaining the data type as text and value as 0000.

Error:
Checker application came across an error
(psycopg2.errors.ForeignKeyViolation) insert or update on table "tbl_toxresults" violates foreign key constraint "tbl_toxresults_stationid_fkey"
DETAIL: Key (stationid)=(0) is not present in table "lu_station".

Potential fix:
Same portion of code that tells the checker to handle each data type so it matches the database schema (used in checks) needs to get used prior to data loading.

Wrong fix: Don't use ceriodaphnia checker fix for the problem. We do not want the values truncated to 0.

App crashes if 2 out of 3 tables match a dataset

@pdsmith Tried submitting field, trawl and grab data today. The field and grab tabs matched, but trawl didnt. The match dataset let him through, giving him a match dataset of "field_grab" and the app crashed, since the trawl tab was still left in there, but it wasnt matching any db table. This needs to be fixed in the match tables routine

Mismatch function fails when column datatypes differ

This bug was discovered in the microplastics checker. In a logic check, one column "labbatch" had a value "batch1" while another dataframe's labbatch column had simply the number 1

I got an error which said "You are trying to merge on object and int64 columns"

This bug is fixed in the microplastics checker and needs to be applied to the other checker applications.

its the function called "mismatch" in proj/custom/functions.py

Column inf does not exist

When loading the clean bight 18 tox data, it dies trying to load the tox summary table, saying "column inf does not exist'

Toxicity batch - check for duplicate toxbatch

Each toxbatch id should be unique for a toxicity submission. So make sure the toxbatch field in the toxicity batch Excel tab is unique. See attached file for an example of the issue.
toxbatch-issue

Duplicate check didnt work

@pdsmith dropped this file
bight18-field-subset-clean-use-with-toxicity.xlsx

And it didnt catch the duplicates, and caused a critical error on final submit

Here is the error message:
(psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "tbl_stationoccupation_pkey"
DETAIL: Key (stationid, occupationdate, occupationtime, samplingorganization, collectiontype)=(0, 2018-01-01 00:00:00, 00:00:00, Southern California Coastal Water Research Project, Grab) already exists.

[SQL:
INSERT INTO tbl_stationoccupation
(occupationdepth, navtype, created_date, windspeed, swellperiod, seastate, login_email, salinity, stationfail, comments, stationid, occupationtimezone, collectiontype, created_user, occupationdate, winddirection, submissionid, swellheight, occupationlatitude, last_edited_user, objectid, globalid, vessel, login_agency, occupationdepthunits, samplingorganization, warnings, occupationdatum, salinityunits, abandoned, swelldirection, weather, swellheightunits, occupationlongitude, occupationtime, last_edited_date, windspeedunits)
VALUES (1.3, 'GPS', '2022-10-14 19:23:51', 10, 0, 'Calm', '[email protected]', 34.5, 'None or No Failure', 'Bolsa Chica - North of pedestrian wooden foot bridge. Original SITE ID = BBOLR', 'B18-10156', 'PDST', 'Grab', 'checker', '2018-08-29 00:00:00', 'W', 1665775431, 0, 33.69625, 'checker', sde.next_rowid('sde','tbl_stationoccupation'), sde.next_globalid(), 'Early Bird 3', 'Southern California Coastal Water Research Project', 'm', 'AMEC, Foster, & Wheeler / WOOD', 'SamplingOrganization - More than one agency detected', 'WGS84', 'psu', 'No', 'NR', 'Clear', 'ft', -118.04604, '13:20:00', '2022-10-14 19:23:51', 'kts'),
(5.0, 'DGPS', '2022-10-14 19:23:51', 8, 0, 'Calm', '[email protected]', -88.0, 'None or No Failure', NULL, 'B18-10000', 'PDST', 'Trawl 5 Minutes', 'checker', '2018-07-19 00:00:00', 'SW', 1665775431, 0, 33.75935, 'checker', sde.next_rowid('sde','tbl_stationoccupation'), sde.next_globalid(), 'Marine Surveyor', 'Southern California Coastal Water Research Project', 'm', 'City of Los Angeles Environmental Monitoring Division', NULL, 'NAD83', 'psu', 'No', 'C', 'Clear', 'ft', -118.16285, '12:41:15', '2022-10-14 19:23:51', 'kts'),
(5.0, 'DGPS', '2022-10-14 19:23:51', 4, 0, 'Calm', '[email protected]', -88.0, 'None or No Failure', NULL, 'B18-10000', 'PDST', 'Grab', 'checker', '2018-07-24 00:00:00', 'SW', 1665775431, 0, 33.75918333, 'checker', sde.next_rowid('sde','tbl_stationoccupation'), sde.next_globalid(), 'Marine Surveyor', 'Southern California Coastal Water Research Project', 'm', 'City of Los Angeles Environmental Monitoring Division', NULL, 'NAD83', 'psu', 'No', 'C', 'Clear', 'ft', -118.16263333, '11:48:40', '2022-10-14 19:23:51', 'kts'),
(0.0, 'AGPS', '2022-10-14 19:23:51', 0, 0, 'Calm', '[email protected]', -88.0, 'None or No Failure', 'placeholder - do not delete', 0000, 'PDST', 'Grab', 'checker', '2018-01-01 00:00:00', 'C', 1665775431, 0, 33.0, 'checker', sde.next_rowid('sde','tbl_stationoccupation'), sde.next_globalid(), 'None', 'Southern California Coastal Water Research Project', 'm', 'Southern California Coastal Water Research Project', 'StationID - Distance from Occupation Latitude/Longitude in submission to Target Latitude/Longitude in field assignment table is greater than 100 meters.', 'NAD83', 'psu', 'No', 'C', 'Clear', 'ft', -117.0, '00:00:00', '2022-10-14 19:23:51', 'kts')
]
(Background on this error at: https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsqlalche.me%2Fe%2F14%2Fgkpj&data=05%7C01%7Cpauls%40sccwrp.org%7Cb78a2c876bcb41fd465508dab38428c7%7Ca4a8f23d1ae14b1c9902eaa153028190%7C0%7C0%7C638019677583852464%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FqyTijBQKXQoZrtwzGFXrIy%2BYYfvN0Vo5z%2F143gxwR8%3D&reserved=0)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.