hyrise / index_selection_evaluation Goto Github PK
View Code? Open in Web Editor NEWPlatform to evaluate index selection algorithms
License: MIT License
Platform to evaluate index selection algorithms
License: MIT License
We currently consider ALL columns in a query. However, it should be sufficient to consider columns that are part of the where clause, right? Thereby, we can reduce the number of evaluated possible indexes.
I am not fully convinced by the architecture/indirection that the CostEvaluation uses WhatIfIndexCreation uses DBConnector.
But this has maybe not a high priority.
Besides, I see the danger that CostEvaluation and WhatIfIndexCreation become inconsistent when calling reset()?
Calling all_simulated_indexes() should not be that expensive.
I didn't find the implementation of CoPhy in the code. Could you please add the CoPhy code?
Remove all magic numbers, remove unused functions
Add information to readme about how to generate diagrams
python3 csv_to_tikz.py tpcds.csv tpcds_cost.tex cost
pdflatex tpcds_cost.tex
Does currently not check whether multi-attribute extensions are within budget, if the corresponding single column index is not.
Add DOI, DBLP verbal reference or link to paper.
Hello, thank you for sharing the code!
INUM refers to this paper: Efficient Use of the Query Optimizer for Automated Physical Design.
CoPhy divides costs into internal sub plan costs and access costs based on INUM, and models them using integer programming. Although the cophy_input_generation.py code indicates that this approach is not necessary, I am still interested in the implementation details of INUM. If there is INUM code, I would greatly appreciate it.
▶ python3 -m selection
INFO:root:Starting Index Selection Evaluation
INFO:root:Using config file example_configs/config.json
DEBUG:root:Database connector created: None
DEBUG:root:Postgres connector created: None
DEBUG:root:Database with given scale factor already existing
DEBUG:root:Database connector created: indexselection_tpch___0_1
DEBUG:root:Postgres connector created: indexselection_tpch___0_1
INFO:root:Generating TPC-H Queries
DEBUG:root:No need to run make
INFO:root:Queries generated
INFO:root:Dropping indexes
INFO:root:Postgres: Run `vacuum analyze`
INFO:root:Dropping indexes
INFO:root:Postgres: Run `vacuum analyze`
DEBUG:root:Init selection algorithm
INFO:root:Dropping indexes
DEBUG:root:Init cost evaluation
INFO:root:Cost estimation with whatif
DEBUG:root:Init WhatIfIndexCreation
Do we have to create statistics before each algorithm?
Currently, complete support for Microsoft server is missing even though it seems to be almost done. There is mainly an issue with hypothetical indexes not properly picked up for TPC-H queries.
Another problem is the missing functionality to predict index sizes as it is provided by HypoPG.
A test should verify that the benchmark queries are correct. This could maybe achieved by checking if results are returned or comparing to a fixed validation result set.
The Benchmark class has a lot of parameters. This could be made a bit smaller, e.g. by creating a dictionary for values that are only used to store in the csv
It should call COPY
on the server, similar to that:
self.exec_only(f"COPY {table} FROM '/tmp/{path}' WITH (FORMAT csv, DELIMITER '{delimiter}')")
At least the DropHeuristic has to be adapted.
See: #18 (comment)
See also comment in _generate_job()
in selection/query_generator.py
.
E.g. comparing different index objects with same columns
Hello, I found that b_to_mb and mb_to_b in selection/utils.py are multiplied or divided by 1000. But I remember people often use 1MB=220 B instead of 106 which is the case of conversion of second and miliseconds. Will this matter?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.