Code Monkey home page Code Monkey logo

index_selection_evaluation's People

Contributors

bensk1 avatar klauck avatar marcelja avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

index_selection_evaluation's Issues

Change indexable columns and possible index methods

We currently consider ALL columns in a query. However, it should be sufficient to consider columns that are part of the where clause, right? Thereby, we can reduce the number of evaluated possible indexes.

Refactor architecture of CostEvaluation

I am not fully convinced by the architecture/indirection that the CostEvaluation uses WhatIfIndexCreation uses DBConnector.
But this has maybe not a high priority.
Besides, I see the danger that CostEvaluation and WhatIfIndexCreation become inconsistent when calling reset()?
Calling all_simulated_indexes() should not be that expensive.

Where is CoPhy code

I didn't find the implementation of CoPhy in the code. Could you please add the CoPhy code?

Cleanup csv_to_tikz.py

Remove all magic numbers, remove unused functions

Add information to readme about how to generate diagrams

python3 csv_to_tikz.py tpcds.csv tpcds_cost.tex cost
pdflatex tpcds_cost.tex

Do you have a code implementation for INUM?

Hello, thank you for sharing the code!
INUM refers to this paper: Efficient Use of the Query Optimizer for Automated Physical Design.
CoPhy divides costs into internal sub plan costs and access costs based on INUM, and models them using integer programming. Although the cophy_input_generation.py code indicates that this approach is not necessary, I am still interested in the implementation details of INUM. If there is INUM code, I would greatly appreciate it.

Remove 2nd create statistics (vacuum analyze)

▶ python3 -m selection              
INFO:root:Starting Index Selection Evaluation
INFO:root:Using config file example_configs/config.json
DEBUG:root:Database connector created: None
DEBUG:root:Postgres connector created: None
DEBUG:root:Database with given scale factor already existing
DEBUG:root:Database connector created: indexselection_tpch___0_1
DEBUG:root:Postgres connector created: indexselection_tpch___0_1
INFO:root:Generating TPC-H Queries
DEBUG:root:No need to run make
INFO:root:Queries generated
INFO:root:Dropping indexes
INFO:root:Postgres: Run `vacuum analyze`
INFO:root:Dropping indexes
INFO:root:Postgres: Run `vacuum analyze`
DEBUG:root:Init selection algorithm
INFO:root:Dropping indexes
DEBUG:root:Init cost evaluation
INFO:root:Cost estimation with whatif
DEBUG:root:Init WhatIfIndexCreation

Do we have to create statistics before each algorithm?

Complete Microsoft SQL Server Support

Currently, complete support for Microsoft server is missing even though it seems to be almost done. There is mainly an issue with hypothetical indexes not properly picked up for TPC-H queries.

Another problem is the missing functionality to predict index sizes as it is provided by HypoPG.

Improved query testing

A test should verify that the benchmark queries are correct. This could maybe achieved by checking if results are returned or comparing to a fixed validation result set.

Benchmark interface

The Benchmark class has a lot of parameters. This could be made a bit smaller, e.g. by creating a dictionary for values that are only used to store in the csv

Refactor Postgres' import_data

It should call COPY on the server, similar to that:

self.exec_only(f"COPY {table} FROM '/tmp/{path}' WITH (FORMAT csv, DELIMITER '{delimiter}')")

Question with b_to_mb and mb_to_b

Hello, I found that b_to_mb and mb_to_b in selection/utils.py are multiplied or divided by 1000. But I remember people often use 1MB=220 B instead of 106 which is the case of conversion of second and miliseconds. Will this matter?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.