Framework to help me as a QA Engineer to check the data quality of a dataset.
To practice this I used a data-driven approach to run the test class using a csv file with different data and also a structure of 5 data checks (type, constraint, structured, consistency and code)
A Python unittest-based test suite for validating data quality in CSV datasets.
This test suite is designed to get fast feedback on data quality in CSV datasets. It performs various checks, including data type validation, constraint validation, structured validation, consistency validation and code validation.
To run the tests, you need Python 3.x and the following dependencies:
pip install pandas parameterized
Clone the repository. Run the tests using unittest:
python -m unittest -v test_data_validation.py
- Checks that the data in each field, column, list, range or file corresponds to the correct type and format e.g.: int, str, bool, etc.
- Checks that the data meets valid value ranges or expectations e.g.: age field has a valid age limit, not null, not empty, etc.
- Checks compliance with a data format, structure or schema e.g.: primary key, foreign key, data name, column name
- Checks data style e.g.: date masks, currency, etc.
- Checks that the classifications adhere to 'Yes' or 'No', '0' or '1', etc.
- Python unittest Library Documentation: Official documentation for the Python unittest library.
- Unit Testing in Python: A comprehensive guide to unit testing in Python.
- Data Validation - TechTarget: Information about data validation on TechTarget.
This project is licensed under the MIT License - see the LICENSE file for details.