Brainstorm about: Generator architecture Base generat

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Data Generator - kickoff about splinter HOT 4 CLOSED

leckijakub commented on September 17, 2024

Data Generator - kickoff

from splinter.

Comments (4)

Debskij commented on September 17, 2024

Hi @leckijakub @marekhering!
My initial proposition is separating those into three major modules

Mark Generator - generates randomized shape, as a parameter it receives shape name (str), tuple with shape of expected output (Tuple[int]) and odds for answer to be "negated" (float). Example of calling generator with parameters could be
mg = MarkGenerator(config=config) # TBD on config
cross_marks = mg.generate(mark='cross', shape=(1, 10, 15), negated=0.2)
Which means that we want to have 10 sets of random marks (each set containing marks for 15 questions) with negation rate of 20%.
Personal Data Generator - This module will receive list of names (or rules to generate values) and translate them to handwritten text. In similar manner as Mark Generator it can receive config file and than method can be called to generate sequence of values. Lets make an example that personal data will be students id (index), name and surname.
pdg = PersonalDataGenerator(config=config) # TBD on config
personal_data = pdg.generate([{'type': int, 'shape': (1, 10, 6)}, {'type': str, 'shape': (1, 10, 12), 'min_length': 5, 'capitalize': True}, {'type': str, 'shape': (1, 10, 12), 'min_length': 3, 'capitalize': True}])
print([data.shape for data in personal_data]) # [(1, 10, 6), (1, 10, 12), (1, 10, 12)]
This means that personal_data holds 10 sets of indexes (6 handwritten int each), names (5-12 letters type string with first letter capitalized), surnames (3-12 letters type string with first letter capitalized)
Exams Combiner - receives exam template with positions of answers that can be marked and personal data fields. With random position offset places generated marks place mark (as a mask) and personal data on top of template. Using previously defined datasets getting ready set for training could look similar to:
ec = ExamsCombiner(config=config) # TBD on config
exams = ec.extract(templates=[{'path': path, 'positions': positions}])
marked_exams = exams.mask(marks=cross_marks, data=personal_data)
print(marked_exams.shape) # (10, height_of_exam, width_of_exam, 3)

from splinter.

leckijakub commented on September 17, 2024

@Debskij

Mark Generator

I'd extract negation from shape generation and create it as a separate mark (circle or so)
If an answer is withdrawn we expect that another is given - in such case we should generate two cross marks and one circle mark.

Personal Data Generator

agree

Exams Combiner

To be consistent let us use Exam Generator with the possibility to generate one as well as multiple exams.

I like the overall architecture of the modules 👍

from splinter.

Debskij commented on September 17, 2024

@leckijakub

I'd extract negation from shape generation and create it as a separate mark (circle or so)
If an answer is withdrawn we expect that another is given - in such case we should generate two cross marks and one circle mark.

That's good point. How about doing it this way?
mg = MarkGenerator(config=config) # TBD on config
cross_marks = mg.generate([{'mark'='cross', 'shape'=(1, 10, 15)}, {'mark'='negated_cross', 'shape'=(1,10,3)}])

To be consistent let us use Exam Generator with the possibility to generate one as well as multiple exams.

Agree

from splinter.

leckijakub commented on September 17, 2024

@Debskij

That's good point. How about doing it this way? mg = MarkGenerator(config=config) # TBD on config cross_marks = mg.generate([{'mark'='cross', 'shape'=(1, 10, 15)}, {'mark'='negated_cross', 'shape'=(1,10,3)}])

Yes, something like this 👍

from splinter.

Data Generator - kickoff about splinter HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent