Code Monkey home page Code Monkey logo

yinspy's Introduction

YinsPy: Yin's Python Projects

YinsPythonProjects PRs Welcome

This is Yin's Python Project, YinsPy, github repo offered by Mr. Yin at Yin's Capital. The site stores sample .py scripts and ipynb notebooks for software development, statistical analysis, and machine learning applications.

  • Copyright ยฉ Yiqiao Yin, 2017 โ€“ Present
  • Contact / Author: Yiqiao Yin
  • Email: [email protected]

Why Is It Special Here

I pursue each and every one of my data science, machine learning, or AI projects using the following procedures. There are two phases. Phase I is about end-to-end research and due dilligence. Phase II is about product management and software maintainance as well as client relationship. In brief, they are listed out in the following.

Phase I:

  1. Develops operational procedures for collection, editing, verification, and management of statistical data.
  2. Develops and implements relevant statistical programs to incorporate data from multiple projects.
  3. Designs comprehensive relational databases with working knowledge of the scientific applications impacting on the data analysis and reporting.
  4. Assists faculty, boss, clients and other research professionals with the formulation and description of appropriate statistical methods.
  5. Evaluates research studies and recommends statistical procedures to analyze the data.
  6. Carries out comprehensive statistical analysis for a broad spectrum of types of data and types of studies.
  7. Integrates the research methodologies of multiple projects into bio-statistical analyses, statistical analysis, big data analysis, machine learning experimental design, deep learning architecture design, and so on.

Phase II:

  1. Prepares reports summarizing the analysis of research data, interpreting the findings and providing conclusions and recommendations.
  2. Presents talks, seminars, or other oral presentations on the methodology and analysis used in scientific studies.
  3. Assists investigators in preparation of research grant applications by writing research methods sections pertaining to acquisition, analysis, relevance and validity of data.
  4. Participates in the preparation of manuscripts that are submitted for peer reviewed publication.
  5. May supervise and provide training for lower level or less experienced employees.
  6. May perform other duties as assigned.
  7. Develop statistical packages for clients, debug, product management.

From my experience, the above checklist is important to evaluate for oneself once a while to ensure that proper steps are taken to execute data science, machine learning, and AI projects in an appropriate and efficient manner.

Money Management

Having a sound long-term investment strategy as a part of a comprehensive money management plan helps investors keep their focus on their personal benchmarks rather than meaningless market benchmarks or indexes, enabling them to ignore short-term market events. It is about personal wealth improvement.

  • As the first section discussed on this site, Yin's Timer is always going to be the frontier pipelines of our products. We are proud to present a beta version of python notebook here.

  • When it comes to portfolio construction, Markowitz is my go-to guy to discuss. Here is a notebook to discuss his point of view of efficient portfolio. Link is here.

  • The most basic pricing model, Capital Asset Pricing Model (CAPM), is without a doubt an important discussion here on my platform. Here is a python notebook for a quick discussion.

  • After foundation of capital markets, we have some understanding of asset classes risk premiums. How about the risk premiums of parameters of different asset classes or different portfolios? How do we explain these quantitative factors? This notebook I discuss Fama-Macbeth regression and how 17 industry portfolios downloaded live from Fama and French's website can be used to carry out a cross-sectional panel study.

  • An important skill is to conduct simulations when it comes to money management. Monte Carlo Markov Chain is a good method to adopt. Hence, I come up with this notebook to execute this idea.

  • Traditional asset pricing models or factor-base trading algorithms look into the historical data to help fund managers to make decisions. Instead of looking into the historical data, we can also use machine learning to gain insights for future data. In this notebook, I discuss how to use Long Short Term Memory (LSTM) as main RNN architecture to forecast stock price. This algorithm can be generalized into a package and I wrote a notebook to discuss it. The algorithms are written in YinsDL and one can always load this script by using typing in * %run "../scripts/YinsDL.py" * into Python notebook.

Data Structures

Data Structures are the key part of many computer algorithms as they allow the programmers to do data management in an efficient way. A right selection of data structure can enhance the efficiency of computer program or algorithm in a better way.

  • An important component in data structure is navigation through different formats of data and information extraction. I actually had an interview about this following question (source is provided by HackerRank): this problem is a derivation of the problems provided by HackerRank (since it is an encrypted interview question, I can't copy/paste here, but I took the idea and provide my own replication of the problem) Validate and Report IP Address

  • I have seen arrays and strings being tested in software engineer positions in technical interviews. I start with a simple Evaluate Palindrome here.

  • The capability to partition a data set is very important and the functionality plays an important role in data structures. I wrote this notebook to practice coding influence measure by partitioning data set according to discretized variable levels.

Feature Selection and Feature Engineer

Domain knowledge always gives a data scientist, a machine learning practitioner, or an AI specialist the edge they needed to design the appropriate machine leaning algorithms. The information to represent domain knowledge is to construct informative features.

  • The most common feature engineer methodology is to use k-nearest neighbors and this notebook I explain how to do that on titanic data set.

Machine Learning

Machine Learning is another big component of Yin's Capital product.

  • One should always start with vanilla regression problem to start the journey of machine learning. This is why I start with a simply python notebook of Housing Price Analysis.

  • As data gets bigger, we can start to see how multiple variables together impose an impact on dependent variable. It is not horribly difficult to make the transition from simple linear regresssion to multivariate linear regression problem. A data science project can be carried out to investigate investment startup profits using multivariate regression model which is here.

  • Simple and Multivariate Linear Models may be fast in making predictions. However, the variables can only be assumed to affect the dependent variable marginally. This may miss some interactions that have joint effect to the target variable. This fact provide us motivations to use tree-based learning algorithms. This notebook I discuss Decision Tree Regressor as a leap from conventional regression problem and we are going to use visualization tools to help us understand why the machines do what it does.

  • Machine Learning has a lot of components. This notebook I introduce a famous machine learning techniques using K-Nearest Neighborhood and I walk through each and every step in conducting a standardized machine learning projects including but not limiting to training and validating set examination, cross validation, feature selection, explore different loss functions, and performance visualization.

Deep Learning

A higher level of form of machine learning is deep learning. The mystery of deep learning pose a great potential as well as threat to mankind. Data is unbiased. Algorithms are biased. However, the design of experiment by human can almost never by 100% impartial.

  • To take things to another level, I move away from regression problem and attempt a very basic classification problem. The notebook here conducts a basic 3-layer neural network architecture and lands on software development.

  • Object Detection is a higher level of artistic representation of the usage of deep learning. Specifically, there are object detection, facial detection, gender detection, and object localization. In this notebook, I use open source cvlib as playground to illustrate some basic applications of this style. An advanced version is YOLO algorithm with live camera feed. Fortunately, open source cvlib have made the production code fairly easy for customer use and in this notebook I explain how to deploy the usage of such technology. An interesting application is posted below.

Personal Home AI Surveillance:

Deep Animation:

Though this branch of application is commonly called "DeepFake", I simply called a fun animation using deep learning, specifically a special type of field of Generative Adversarial Network (GAN). I have not yet seen or found any legally profitable business operation that can be built or designed using this technology. However, I do believe this type of technology can be used in examine real/fake social media. In other words, we are fighting against the people who abuse the technology with the same tech they used to abuse other celebrity or famous people to boost their own interest.

yinspy's People

Contributors

yiqiao-yin avatar

Stargazers

WebClinic avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.