Comments (1)
Thanks @jeancochrane for investigating. Pasting our discussion here for @julia-klauss and @dfsnow and closing for now.
Query to check dupe comps:
WITH unpivoted_comps AS (
SELECT pin, comp_pin
FROM (
SELECT
pin,
ARRAY[
comp_pin_1, comp_pin_2, comp_pin_3, comp_pin_4, comp_pin_5,
comp_pin_6, comp_pin_7, comp_pin_8, comp_pin_9, comp_pin_10,
comp_pin_11, comp_pin_12, comp_pin_13, comp_pin_14, comp_pin_15,
comp_pin_16, comp_pin_17, comp_pin_18, comp_pin_19, comp_pin_20
] AS comp_array
FROM model.comp
WHERE run_id = '2024-03-17-stupefied-maya'
) AS comp_data
CROSS JOIN UNNEST(comp_array) AS comp_pin(comp_pin)
)
SELECT pin, COUNT(comp_pin) AS num_comps, COUNT(DISTINCT comp_pin) as num_distinct_comps
FROM unpivoted_comps
GROUP BY pin
HAVING COUNT(DISTINCT comp_pin) < COUNT(comp_pin);
What we learned:
- Results suggest that 137,094 / 1,098,988 or about 13% of PINs have a PIN show up more than once in their set of 20 comps
- But if we restrict the query to just look for dupes in the top two comps (comp_pin_1 and comp_pin_2) we only get 6,204 results, or 0.6% of PINs
Therefore, when analyzing comps, we should take care to not assume distinct rows for PINs in the training data and in the comps. Such that:
-
if a PIN only appears once in a comp set, it's reasonably a safe assumption to select its most recent sale.
-
if a PIN appears more than once in a comp set, assign the most recent sale to the comp with the higher comp_score, then the next lowest comp_score to the next most recent sale
from model-res-avm.
Related Issues (20)
- Add workflow/process for tagging models with `run_type` HOT 1
- Add SHAP maps for location and proximity features
- Diagnostics: "Challenge groups" HOT 1
- Create aggregate maps using comps output HOT 2
- Refactor comps to calculate tree weights on a per-card basis HOT 1
- Add sales val run_id to res model outputs
- Add sale ratio column to desk review sheets HOT 1
- Flesh out characteristics in desk review workbooks
- Refactor `export` stage to use a config dict representing the workbook structure
- Update `ingest` stage to use `noctua` `unload = TRUE` option HOT 3
- Improve modeling multi-cards
- Spike upgrading comps algorithm with taichi HOT 1
- Spike model rebuild in Python/polars
- Create explainer/diagrams for multi-PIN and multi-card aggregations
- Investigate merging the residential and condo model codebases
- Adjust `model_fetch_run()` to optionally fetch input data
- Update comps algorithm to save `instruno` in addition to `parid` HOT 1
- Generate historical model API workbooks
- Edit `dvc.yaml` to include files invoked as deps
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from model-res-avm.