Code Monkey home page Code Monkey logo

draynut's People

Watchers

 avatar  avatar  avatar

draynut's Issues

Annotated dataset

We require an annotated dataset

We require an annotated dataset to perform the machine learning tasks. An annotated dataset is a database, spreadsheet or file of any kind formally connecting a PDF with a full data model example (as provided in previous correspondence) with all its fields complete and visible. There is no need for marks or references on the PDF itself, but for an example fully populated JSON or CSV according to the data model and with a link to the PDF file it corresponds to. Please connect the right resources to perform this activity as discussed before.

This issue was marked as urgent. Completion of following milestones dependes on it.

Dictionary: Defining all fields for the data model

Issue
The provided data model has several ambiguous fields and concepts we need to define ASAP.
It is very important to understand that no matter if those fields are not always or not necessarily extracted from the PDF, we still need to know exactly what they all mean within the ingestion data and with no exceptions. The reason for this is that the named entity recognition system detects its target entities by discarding those not matching the criteria, thus all possible entities in a PDF have to be accounted for in a deterministic way. Current mechanics is expected to retrieve data from Tideworks, however, and for all the reasons here described, formal definitions to all these entities is critical.

Please create a Google Sheet to specify each definition and link it to this ticket when responding. This ticket was flagged as urgent, please attend to this ASAP as development milestones depend on your delivery times

Data model
Below is an annotated version of the JSON data model, please respond inline for each case. Please notice we've space separated certain fields according to the group of things we believe they define. WARNING:Please make no assumptions when responding to these inquiries. Should this data model come from a currently beta solution, please consider that the very company defining these fields might have have different definitions for certain values when they happen to be ambiguous. It is critical that we know for sure, making no assumptions on previous experiences, that the definitions below adjust to exactly the usage Tideworks/eModal expects to produce of it.
**Please provide algorithmically complete and non-ambiguous definitions"

Begin Definitions Required
`[SmartTruckingJobExtended] [int]

##job specs
IDENTITY(1,1) NOT NULL,
[JobId] [int] NOT NULL, (??) # Who defines this ID? We're assuming is an integer created by Tideworks. Is this something to be retrieved from Tideworks assuming the job was created and exists on port? What happens with the orders that are received by the system but are not active on Tideworks and no further information is available? How is a data queue expected to be created and where should it be loaded to?
[Booking] nvarchar NULL, # Same as above, what is this entity? We assume it defines a relationship between cargo and delivery engagement but is not defined anywhere.

##container specs)
[Container] nvarchar NULL, # It usually is a string of ~4 letters and ~7 digits (not less than 6). We're going to work assuming a minimum string length, please clarify if exceptions could happen and also what to do with the case of something like UCLA432344/5 which we will translate to UCLA4323445.
[Size] [int] NULL, # This is an integer determining longitudinal feet. Please specify what to do when an order has a multiplier in front of the feet amount suggesting is more than one container (e.g. what to do with 2 X 20').
[ContainerType] [int] NULL, # We need a complete list, no exceptions made, of the container types expected to be ingested by this system.
(timeframes)
[EstimateAvailebleOn] [date] NULL, # Is this ETA from the PDF or is it something we retrieve from Tideworks? At this time we're assuming is the ETA on the PDF.
[LastFreeDay] [date] NULL, # We've seen just one or zero entries of this value, please provide some examples.
[DaysToReturn] [int] NULL, # We've seen just one or zero entries of this value, please provide some examples.
[LastDayToReturn] [date] NULL, #We've seen just one or zero entries of this value, please provide some examples.

##flags
[Status] [int] NULL, # Is this on Tideworks?

##location
[Terminal] nvarchar NULL, #Is this retrieved from Tideworks exclusively using the PORT value? We have no PORT entity on the data model! One of the provided examples defines this entity next to the "Miami" string and then clarifies "This is the port, not the terminal", so which is it? We're speculating that you're expecting for us to retrieve terminal using the port name (again, port name is not an entity in the data model and we don't retrieve things that are not in the data model!). We speculate that codes identifying a terminal (on Tideworks perhaps) will have some string in them specifying that's "Miami", e.g. : MIA0304 for Miami Terminal 03 dock 04. Please define.
[FeesDues] [decimal](18, 2) NULL, # We speculate this is some amount to be paid to free the container, please define and tell us how to retrieve this data from Tideworks
[FeesPaid] [decimal](18, 2) NULL, # Same as above. Please define.
[PreGate] nvarchar NULL, # We speculate this is the pre-announced retrieving gate, please define.
[YardLocation] nvarchar NULL, # We speculate this is location data for the container and comes from Tideworks. Please define.
[YardStatus] nvarchar NULL, # We speculate this is location data for the container and comes from Tideworks. Please define.
[Holds] nvarchar NULL, # We speculate this is location data for the container and comes from Tideworks. Please define.
[Line] nvarchar NULL, # This represents apparently the company handling the shipping. Please notice there's again an ambiguous use of the word "Line" in the examples provided (it's defining integers in one PDF). Please specify.

##sort this please
[LoadEmpty] nvarchar NULL, # Is this a Tideworks flag? Please define.
[Weight] [decimal](18, 2) NULL, # Weight load.
[OverDimension] nvarchar NULL, # Please define.
[Hazmat] nvarchar NULL, # Please define.
[VoyageCode] nvarchar NULL, # Please define.
[LloydsNo] nvarchar NULL, # Please define.
[VesselName] nvarchar NULL, # We're currently reading this from the PDF.
[EstimateDischarge] [datetime] NULL, # Please define. Arrival date? Is this the same as EstimateAvailebleOn? Ambiguous.
[SCAC] nvarchar NULL, # Please define. Carrier Alpha Code? Where is it?
[EarlyReturnDate] [datetime] NULL, # Please define. Some PDFs may or may not have these fields.

misc values -> we don't understand who governs / creates the metadata below

[CreatedBy] nvarchar NOT NULL, #This appears to be a string referring to a username on the eModal platform and specific to it. It may or may not match the "Sent By" entry in the PDF. It very possibly is an eModal/Tideworks specific login. Please be very careful defining.
[CreatedOn] [datetime] NOT NULL, # Same as above.
[ModifiedOn] [datetime] NOT NULL, # Same as above.
[ModifiedBy] nvarchar NOT NULL,` # Same as above.

Access to Tideworks

Access to Tideworks is Required

Many fields for the ingestion to be tested depend on entities coming from Tideworks. Access and documentation of the Tideworks platform is critical. All items in the data model require a definition and a recipe for retrieval.

This issue was marked as urgent. Completion of following milestones dependes on it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.