Code Monkey home page Code Monkey logo

reestimator's Introduction

Data analysis

  • Document here the project: reestimator
  • Description: Project Description
  • Data Source:
  • Type of analysis:

Please document the project the better you can.

Startup the project

The initial setup.

Create virtualenv and install the project:

sudo apt-get install virtualenv python-pip python-dev
deactivate; virtualenv ~/venv ; source ~/venv/bin/activate ;\
    pip install pip -U; pip install -r requirements.txt

Unittest test:

make clean install test

Check for reestimator in gitlab.com/{group}. If your project is not set please add it:

  • Create a new project on gitlab.com/{group}/reestimator
  • Then populate it:
##   e.g. if group is "{group}" and project_name is "reestimator"
git remote add origin [email protected]:{group}/reestimator.git
git push -u origin master
git push -u origin --tags

Functionnal test with a script:

cd
mkdir tmp
cd tmp
reestimator-run

Install

Go to https://github.com/{group}/reestimator to see the project, manage issues, setup you ssh public key, ...

Create a python3 virtualenv and activate it:

sudo apt-get install virtualenv python-pip python-dev
deactivate; virtualenv -ppython3 ~/venv ; source ~/venv/bin/activate

Clone the project and install it:

git clone [email protected]:{group}/reestimator.git
cd reestimator
pip install -r requirements.txt
make clean install test                # install and test

Functionnal test with a script:

cd
mkdir tmp
cd tmp
reestimator-run

Description colonnes

Colonnes conservées

nom_de_colonne (numéro de colonne) description (dtype/conversion) modif à faire

id_mutation(0) id keys (str)

date_mutation(1) date de la mutation (à convertir en datetime)

nature_mutation(3) nature de la mutation : vente, partage, adjucation (str) _conserver seulement les lignes 'ventes', envoyer les autres dans la table 'non-traité'

valeur_fonciere(5) notre target ! (à convertir en int32)

adresse_numero(6) numéro dans la rue (adresse) (à convertir en int8)

adresse_suffixe(7) suffixe numéro d'adresse : A, B, bis, ter... (str)

adresse_nom_voie(8) nom de la rue (str)

code_commune(11) code de la commune sur le plan cadastral

code_departement(12) (str à cause de la Corse)

id_parcelle(15) agrége code commune / code secteur cadastral / numéro parcelle extraire le code secteur cadastral dans autre colonne

type_local(30) type du local : maison ou appartement (dépendance est encodée dans une nouvelle colonne)

surface_reelle_bati(31) un de nos rares prédicteurs (convertir en int32)

nombre_pieces_principales(32) un de nos rares prédicteurs (convertir en int32)

surface_terrain(37) un de nos rares prédicteurs(convertir en int32)

longitude(38) latitude(39) coordonnées pour la géolocalisation (float 64) il y a des communes non vectorisées où la géolocalisation n'est pas dispo

Colonnes supprimées

adresse_code_voie(9) code FANTOR pour l'administration

ancien_code_commune(13)

ancien_nom_commune(14) utile seulement si on fouille dans le cadastre passé

ancien_id_parcelle(16) utile seulement si on fouille dans le cadastre passé

numero_volume(17) utile seulement si on fouille dans le cadastre passé

code_type_local(29) encodage du type de local. double emploi avec type_local (conservée)

code_nature_culture(33)

nature_culture(34)

code_nature_culture_speciale(35)

nature_culture_speciale(36)

pas de corrélation des cols nature avec valeur foncière

lot1_numero(18)

lot1_surface_carrez(19)

lot2_numero(20)

lot2_surface_carrez(21)

lot3_numero(22)

lot3_surface_carrez(23)

lot4_numero(24)

lot4_surface_carrez(25)

lot5_numero(26)

lot5_surface_carrez (27)

nombre_lots(28) pas de corr. avec valeur foncière, et pas toujours bien rempli

numero_disposition(4) Numéro d'ordre si ventes simultanées. Pas toujours bien rempli

Colonnes qui posent question

code_postal(10) code postal, différent du code commune, mais utilisé pour l'adressage

Colonnes à créer

Prix au m2

Présence dépendance

Description fonctions

Preprocessing

get_data.py

Methods (class dloading) to get datas (DataFrame) from the database Housing_France

class dloading: load_data_chunk(table_name,chunksize) Loads a dataframe by chunks of size chunksize from table database

get_random_rows(table_name, numrows) Loads a dataframe of size numrows from random lines of the table database

get_all_rows(table_name) Loads a dataframe from an entire database table

get_num_rows(table_name, rownums) Loads a dataframe of size rownums from database table

show_tables() show all the tables in the database Housing_France

data_to_sql(df, tablename, if_exists) Export Data to Sql, if exists takes one of the two strings : ['replace','append']

exploration.py

Methods (class Explration_data) to explore data

class Exploration_data:

get_float_columns(self): Get float columns

get_int_columns(self): Get integer columns

get_object_columns(self): Get object columns

get_count_of_missing_values(self): Get count of missing values in DataFrame

get_columns_with_missing_values(self): #df dataframe Get columns with missing values

get_columns_without_missing_values(self): #df dataframe Get columns with out missing values

get_count_missing_vals_in_1column(self, col_name): #df dataframe & col_name : name of column Get the count of missing values in one column

visualize_feature_types(self): Visualize a plot bar with the different types of features

visualize_type_local(self): Visualize a plot bar with the number of each different types of local

visualize_lot_surface_columns(self): Visualize a plot bar with the surface of lot for columns "lot number1-5"

visualize_lot_numero_columns(self): Visualize a plot bar with the number of lot for columns "lot number1-5"

preprocessing.py

Methods (class Preprocessing_data) to preprocess data

class Preprocessing_data:

def conv_int(col): Convert a column 'col' dtype (str, float, int) to the smallest type integer according to data

def conv_downcast(df): Downcast numeric dtypes in dataframe df to save memory

def conv_date(col): Convert a datestr column 'col' to datetime format YYYY-MM-DD

def drop_rows_of_specific_column(df, col_name): Drop rows of specific columns with Nan

def remplacement_mutation(df): Remplace Sale by 1 and Others type of mutation data by 0

def cadastral_sector(df): Get secteur_cadastral from id_parcelle and add a column to df

Docker steps to GCP

There are 2 remaining steps in order to enable the developers from anywhere around the world to play with it:

  • Push the Docker image to Google Container Registry
  • Deploy the image on Google Cloud Run so that it gets instantiated into a Docker container

1) Push our prediction API image to Google Container Registry

  1. make sure to enable Google Container Registry API for your project in GCP: https://console.cloud.google.com/flows/enableapi?apiid=containerregistry.googleapis.com&redirect=https://cloud.google.com/container-registry/docs/quickstart

  2. If your account is not listed then you have to authenticate: gcloud auth login

  3. let’s configure the gcloud command for the usage of Docker: gcloud auth configure-docker

  4. verify your config. You should see your GCP account and default project: gcloud config list

  5. define an environment variable for the name of your project: export PROJECT_ID=wagon-bootcamp-323012 echo $PROJECT_ID gcloud config set project $PROJECT_ID

  6. define an environment variable for the name of your docker image: export DOCKER_IMAGE_NAME=reestimator_docker echo $DOCKER_IMAGE_NAME

  7. Now we are going to build our image =to have container: docker build -t eu.gcr.io/$PROJECT_ID/$DOCKER_IMAGE_NAME .

  8. let’s make sure that our image runs correctly: docker run -e PORT=8000 -p 8000:8000 eu.gcr.io/$PROJECT_ID/$DOCKER_IMAGE_NAME

  9. We can now push our image to Google Container Registry: docker push eu.gcr.io/$PROJECT_ID/$DOCKER_IMAGE_NAME

  10. check the image in Google Container Registry https://console.cloud.google.com/gcr/images/wagon-bootcamp-323012?project=wagon-bootcamp-323012

2) Deploy the Container Registry image to Google Cloud Run

We have pushed the Docker image for our Prediction API to Google Container Registry. The image is now available for deployment by Google services such as Cloud Run. We are going to deploy our image to production using Google Cloud Run.Cloud Run will instantiate the image into a container and run the CMD instruction inside of the Dockerfile of the image. This last step will start the uvicorn server serving our Prediction API to the world 🌍

  1. Let’s run one last command: gcloud run deploy --image eu.gcr.io/$PROJECT_ID/$DOCKER_IMAGE_NAME --platform managed --region europe-west1

  2. Any developer in the world 🌍 is now able to browse to the deployed url and make a prediction using the API ATTENTION!!!!!!!!!!!!!!!!!! Keep in mind that you pay for the service as long as it is up 💸

1er test RESULTS : https://reestimatordockerimage-jw6jz6q2fq-ew.a.run.app Service name (reestimatordockerimage): reestimatordockerimage API [run.googleapis.com] not enabled on project [607412583234].

2eme test Service name (reestimatordocker): reestimatordocker Allow unauthenticated invocations to [reestimatordocker] (y/N)? y

Deploying container to Cloud Run service [reestimatordocker] in project [wagon-bootcamp-323012] region [europe-west1] ✓ Deploying new service... Done. ✓ Creating Revision... ✓ Routing traffic... ✓ Setting IAM Policy... Done. Service [reestimatordocker] revision [reestimatordocker-00001-six] has been deployed and is serving 100 percent of traffic. Service URL: https://reestimatordocker-jw6jz6q2fq-ew.a.run.app

3eme test

➜ reestimator git:(krys_urldockerGCP) ✗ gcloud run deploy
--image eu.gcr.io/$PROJECT_ID/$DOCKER_IMAGE_NAME
--platform managed
--region europe-west1
--set-env-vars "GOOGLE_APPLICATION_CREDENTIALS=/credentials.json" Service name (reestimatordocker): reestimatordocker Deploying container to Cloud Run service [reestimatordocker] in project [wagon-bootcamp-323012] region [europe-west1] ✓ Deploying... Done. ✓ Creating Revision... ✓ Routing traffic... Done. Service [reestimatordocker] revision [reestimatordocker-00002-for] has been deployed and is serving 100 percent of traffic. Service URL: https://reestimatordocker-jw6jz6q2fq-ew.a.run.app

3) Writing to Google Cloud Storage from Google Cloud Run

  1. add your credentials to your image so that your code is allowed to push data to your bucket:

    1. check the path to the Google Cloud Plaform credentials you created during setup day echo $GOOGLE_APPLICATION_CREDENTIALS
    2. update your Dockerfile with the correct path to your credentials file: COPY /path/to/your/credentials.json /credentials.json
  2. And deploy the new image that is able to write to GCS: gcloud run deploy
    --image eu.gcr.io/$PROJECT_ID/$DOCKER_IMAGE_NAME
    --platform managed
    --region europe-west1
    --set-env-vars "GOOGLE_APPLICATION_CREDENTIALS=/credentials.json"

###OTHER CHOICE FOR A CONTINUOU+S DEPLOYMENT

4) Create and configure a Cloud Run service for Continuous Deployment

  1. Go to Cloud Run. https://console.cloud.google.com/run?project=wagon-bootcamp-322821&folder=&organizationId=

  2. Click on the Create Service button: -Enter a name for your service -Select a region on which to run the container of the project (for example europe-west1 for Belgium) -Click Next

  3. Select Continuously deploy new revisions from a source repository: -Click on Set up with Cloud Build

  4. Connect your GitHub account: -Select GitHub as a repository provider -Click on Authenticate to connect to your GitHub account

  5. Install the Google Cloud Build app on the project repository: -Click Install Google Cloud Build -If asked to, select the your GitHub account -Check Only selected repositories -Select the repository of your project (🚨 Container Registry will only work correctly with repositories having a name following the kebab-case naming convention: my-repo-name) link to understand kebab-case: https://betterprogramming.pub/string-case-styles-camel-pascal-snake-and-kebab-case-981407998841

  6. Select the source repository: -Select the configured repository -Read and check I understand … -Click Next

  7. Configure your project: -Select the branch of your repository on which new commits will trigger the CD (for example ^master$) -Select the Dockerfile build type and enter the path to the Dockerfile in your project if required -Click Save

  8. Select the parameters for the service: -Allow all traffic -Allow all unauthenticated invocations -Click Create

  9. Get the production URL from the interface, it should look something like: Exemple: https://lw-docker-test-xi54eseqrq-ew.a.run.app/

  10. Once your application is in production, as usual you will see the built image stored in Container Registry.

reestimator's People

Contributors

krys28 avatar kelhoussaini avatar jehadel avatar frenchcoder345 avatar anna-pl avatar heryson avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.