Code Monkey home page Code Monkey logo

faux_lars's Introduction

Faux-lars

fauxlars is a speedy fake data generation library.

Example

Install the library

python -m venv .venv && source .venv/bin/activate && python -m pip install faux_lars

Let's generate some data

import polars as pl
from faux_lars import generate_lazyframe

rows =5000
df = (
    generate_lazyframe(
        {
            "clicks": pl.UInt8,
            "Name": "name",
            "City Name": "city_name",
            "Phone": "mobile_number",
        },
        rows,
        "en",
    ).collect()
)
print(df)
┌────────┬──────────────────┬──────────────────────────┬────────────────┐
│ clicksNameCity NamePhone          │
│ ------------            │
│ u8strstrstr            │
╞════════╪══════════════════╪══════════════════════════╪════════════════╡
│ 207Erik ConnBeier stad512-797-9451   │
│ 66Hildegard MurphyWintheiser bury          ┆ (907) 011-3125 │
│ 38Tremayne CasperOkuneva land933.955.5987   │
│ 157Kamille HaleyMcGlynn furt329.524.7080   │
│ 123Erika KozeyEast Favian Fisher burgh1-846-364-8772 │
│ …      ┆ …                ┆ …                        ┆ …              │
│ 25Chadd RosenbaumNader furt201-837-7966   │
│ 50Tevin JerdeDavis shire1-125-407-6570 │
│ 22Antwan JonesWest Sammie Hirthe shire365-256-8860   │
│ 120Eden McCulloughGoldner shire1-950-582-2326 │
│ 76Tevin BatzBauch mouth560-202-3844   │
└────────┴──────────────────┴──────────────────────────┴────────────────┘

Docs

The faux_lars python API is deliberately small and lightweight.

There are two functions:

  • generate_lazyframe
  • generate_dataframe

Each receive the following arguments:

  • schema: A dictionary of column_name, data_type key, value pairs. data_type can be a string data_type in the table below, or a non-complex polars DataType object.
  • rows: The number of rows to generate.
  • language: Any of the supported locales; see the table below.

Supported Locales

Data Type en fr jp ar pt_br zh_tw zh_cn
polars_datatypes ✓* ✓* ✓* ✓* ✓* ✓* ✓*
name
first_name
last_name
building_number ✓* ✓* ✓* ✓* ✓* ✓* ✓*
city_name
country_code ✓* ✓* ✓* ✓* ✓* ✓* ✓*
country_name
latitude
longitude
postcode
secondary_address
secondary_address_type
state_abbr
state_name
street_name
time_zone
zip_code
isbn ✓* ✓* ✓* ✓* ✓* ✓* ✓*
isbn_10 ✓* ✓* ✓* ✓* ✓* ✓* ✓*
isbn_13 ✓* ✓* ✓* ✓* ✓* ✓* ✓*
bs
company_name
industry
profession
credit_card_number ✓* ✓* ✓* ✓* ✓* ✓* ✓*
currency_code ✓* ✓* ✓* ✓* ✓* ✓* ✓*
currency_name
currency_symbol ✓* ✓* ✓* ✓* ✓* ✓* ✓*
dir_path
file_ext
file_name
file_path
mime_type
semver
semver_stable
semver_unstable
licence_plate
health_insurance_code
free_email
bic ✓* ✓* ✓* ✓* ✓* ✓* ✓*
isin ✓* ✓* ✓* ✓* ✓* ✓* ✓*
ip ✓* ✓* ✓* ✓* ✓* ✓* ✓*
ipv4 ✓* ✓* ✓* ✓* ✓* ✓* ✓*
ipv6 ✓* ✓* ✓* ✓* ✓* ✓* ✓*
mac_address ✓* ✓* ✓* ✓* ✓* ✓* ✓*
password
safe_email
user_agent ✓* ✓* ✓* ✓* ✓* ✓* ✓*
user_name ✓* ✓* ✓* ✓* ✓* ✓* ✓*
field
position
seniority
title
lorem ✓* ✓* ✓* ✓* ✓* ✓* ✓*
cell_number ✓* ✓* ✓* ✓* ✓* ✓* ✓*
phone_number ✓* ✓* ✓* ✓* ✓* ✓* ✓*

Key:

  • ✓ : supported for this locale.
  • ✓* : supported, but this value does not vary by locale.

All non-complex polars DataType objects are supported. These can be passed by name as a string, or as a polars DataType object (see the example above).

For non-sring types, locale is irrelevant. A complex polars Datatype is a struct, array, list, or enum.

Locales:

  • en: English
  • fr: French
  • jp: Japanese
  • ar: Saudi Arabian Arabic
  • pt_br: Brazilian Portugese
  • zh_tw: Traditional Chinese
  • zh_cn: Simplified Chinese

Benchmarks

  • On a laptop with 16GB of RAM and 8 cores, 5,000 rows with 1 utf8 column and three string columns generates in under 0.25 seconds.
  • 1,000,000 rows generates in under 7.
  • Take a look at benchmarks for comparison with two popular python fake data generation libraries: mimesis and faker.

faux_lars's People

Contributors

tomburdge avatar

Stargazers

 avatar

Watchers

 avatar

faux_lars's Issues

Non-string data types evaluated as string

Some non-string data types are evaluating as a Polars String type rather than a more appropriate object (int/whatever).
This is a personal pet-peeve for me, with casting types in-appropriately to string.
This can probably be fixed just by using a different method to generate the values than the boilerplate macro.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.