yorkie / me Goto Github PK

View Code? Open in Web Editor NEW

71.0 14.0 12.0 465 KB

CV at Github and Notes based on Issues

personal blog yorkie

me's Introduction

Yorkie Liu

Welcome to Yorkie's landscape, here you would see the following:

My contributions to Open Source Projects
My business working time line
My notes/blogs here: Notes
My diaries here: Diary

Overview

Property name	value
Birth	XXth, April, 1XXX
Languages	Node.js, C
Platforms	Linux/Mac/Windows
Email	[email protected]
Github	@yorkie
Twitter	@yorkienell
NPM	yorkie

Side Projects

Project	description
ShadowNode	Use Node.js in your embeddable devices
tensorflow-nodejs	The Node.js binding for Tensorflow C APIs.
react-native-wechat	React-native functionalities includes WeChat login, share, favorite and payment
rust.js	NPM and Node.js compatible backend JavaScript platform in Rust
watjs	Write WebAssembly Text Format Files(.wat) in JavaScript

Visit here to get the complete repositories list.

Members on Organizations

Organization	description
nodejs	Node.js collaborator
clibs	clibs member
bpkg	bpkg member

Contributions

Node.js Core

Repository	Description	Commits
nodejs/node	The Node.js repository	commits
libuv/libuv	The event-based framework	commits
npm/npm	The NPM package	commits

Node.js Ecosystem

Agent Library
- andris9/inbox an IAMP client in pure JavaScript
- dmcquay/node-apac The APAC client for Node.js
- segmentio/analytics-node The hassle-free way to integrate analytics into any node application.
- michael/github A higher-level wrapper around the Github API. Intended for the browser.
- agnat/node_mdns mdns/zeroconf/bonjour service discovery add-on for node.js
- marcello3d/node-mongolian MongoDB client for Node.js
- marcello3d/node-buffalo BSON parser for Node.js
- LearnBoost/mongoose A Mongo ORM for Node.js
Performance
- node-inspector/v8-profiler v8 profiling tool for Node.js
- dominicatarr/level-sublevel
- cjihrig/will-it-optimize Suite of tests for determining what v8 will optimize
Streaming
- stream-utils/hash-stream a simple hash stream for Node.js
Utility
- component/type a type utilities for Node.js
- component/mutate
- jonathanong/fs-read-cache
- fiveisprime/nodemavens
- expressjs/statuses
- jwerle/is-localhost
- razorjack/quicksand
- isaacs/natives Do stuff with Node.js's native JavaScript modules
Tooling
- visionmedia/n(bash) nodejs version management
- clowwindy/shadowsocks-nodejs shadownsocks client for Node.js, deprecated by @clowwindy
- shakyShane/browser-sync
- feross/hostile
Command-line
- ewnd9/progress-control node-progress wrapper to control bar from keyboard
Web Framework
- koajs/koa a web framework based on co
- visionmedia/jade a html template writtern by @tj
- mschema/mschema
- paulomcnally/json-api-response response JSON API in a neat way
- strongloop/loopback-component-passport LoopBack passport integration to support third party logins and account linking
- strongloop/strong-remoting Communicate between objects in servers, mobile apps, and other servers.
- strongloop/loopback-boot Convention-based bootstrapper for LoopBack applications.
Frontend
- evilstreak/markdown-js markdown tool in JavaScript
- weflex/markdown-editor-flavored a markdown editor just like github flavored editor based on markdown-it
- museui/muse-ui Material Design UI library for Vuejs 2.0
- BensonDu/vue-directive-touch Vue touch directive
- GoogleChrome/lighthouse Automated auditing, performance metrics, and best practices for the web.
- facebook/docusaurus Easy to maintain open source documentation websites.
Testing
- rvagg/servertest
- lightsofapollo/superagent-promise Simple/dumb promise wrapper for superagent.
- reliablejs/reliable-master reliable master
React-Native && Electron
- brentvatne/react-native-modal A component for react-native
- fireball-x/editor-framework A framework gives you power to easily write professional multi-panel desktop software in HTML5 and io.js.
Compilers
- swc-project/swc Rust-based platform for the Web

Embedded Operating System

openwrt/openwrt The buildsystem for the OpenWrt Linux distribution.
Samsung/iotjs Platform for Internet of Things with JavaScript.

Standards & Proposals

tc39/proposal-hashbang #! for JS https://tc39.github.io/proposal-hashbang/out.html
WebBluetoothCG/web-bluetooth Bluetooth support for the Web.

Machine Learning

pytorch/pytorch Tensors and Dynamic neural networks in Python with strong GPU acceleration
tensorflow/tensorflow Computation using data flow graphs for scalable machine learning
onnx/onnx Open Neural Network Exchange
onnx/keras2onnx Convert tf.keras/Keras models to ONNX
TadasBaltrusaitis/OpenFace a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.
rhdunn/espeak an open source speech synthesizer that supports 99 languages and accents.
aszxqw/nodejieba

C/C++

clibs/clib(c) The C library mangement
clibs/readline(c) (deprecated) The readline for C
clibs/net(c) The C library supports tls with libuv
clibs/clib-validate(c) clib(1) plugin for validating a package.json
luna/luna(c) A language writtern by @tj
jwerle/fs.c(c)
dinhviethoa/libetpan(c) An IMAP client in Clang
dinhviethoa/libetpan.node A Node.js IMAP client based on libetpan
MailCore/mailcore.node The mailcore for Node.js

Golang

docker/docker(Golang) docker container

Bash and Bpkg

bpkg/bpkg(bash) bash package
bpkg/github(bash) bash-based github tool

Ruby

tarcieri/http(ruby) a http client at ruby

Python

django/django(python) The web framework Django

Haskell

pixbi/duplo A opinionated, framework-less build tool for web applications

Work Experience

Hola, Israel (2014.7-2014.9)
Pixbi, NYC, US (2014.9-2015.3)
51DegreesMobi, Reading, UK (2014.10-2016.5)
WeFlex, Shanghai, China (2015.3-2016.7)
Alibaba Group, Hangzhou, China (2016.10-2017.8)
Rokid, Hangzhou, China (2017.8-2020.1)

AMA(Ask me anything)

If you want to ask me anything, go to https://github.com/yorkie/ama, I will put my answers publicly. This is inspired by https://github.com/sindresorhus/ama.

BTW, you could also visit Notes to read what I have noted.

me's People

Contributors

Stargazers

Watchers

Forkers

heianxing alexanderchou fagonhou mazouri axure handy2015 jiayong ayidaweiwei zhoutian94 strange-jiong zijian1996 gocs

me's Issues

How to inject DOM to GMail's user card

I was working on chrome plugin for GMail in this week, and at first I'm assigned
to make a tooltip that's shown when an email address is mouse over, and hidden once
move out of that element.

The interaction is same to gmail's user card, so if I was following this, it may be
mess to users because of two "card" that will be shown simultaneously.

Ok, This is the problem that I met in this week, let's address the issue more gracefully.

Caveat

Gmail's user card is an iframe tag within a different domain from gmail's like below:

Gmail: mail.google.com
UserCard: apis.google.com

So when you try to access that anything in the cross-domain iframe's document by:

document.querySelector('#cross-iframe-id').contentWindow.document;

It will prints the error by chrome:

SecurityError: Blocked a frame with origin "https://mail.google.com" from accessing a cross-origin frame.

This is prohibited by any HTTP browsers of course included chrome, but chrome plugins are the buster.

Plugin Permission

In manifest.json, I just added one domain like this:

{
  ...
  permission: [
    "*://mail.google.com/*",
    "*://apis.google.com/*"
  ],
  ...
}

Okay, then write:

var usercard = document.querySelector('#cross-iframe-id').contentWindow.document;
console.log(usercard);

in your plugin scripts, then run it, now you are able to get document and do any DOM injection based on
this document such as add a line of your company name, change the card avatar etc.

In fact the permission field will be used at firefox add-on, for more details you could visit:
https://developer.mozilla.org/en-US/Add-ons/SDK/Tools/package_json

WebAssembly: create a library to convert JavaScript code to WAT

To enable WebAssembly to be read and edited by humans, there is a textual representation of the wasm binary format.

The above sentences are from MDN's article: Understanding WebAssembly text format.

This way that we write the code runs on modern browsers is still not readable, this is the example that wasm text format project: https://github.com/mafintosh/blake2b-wasm/blob/master/blake2b.wat. I hardly read over 300 lines, even the whole source file contains 2800+ lines.

Improving this source would have the following 2 ways for now:

split the big file into many modules in a structured place, that's play WAT better
writing the same implementation in existing language that's friend to developers

I'll go for the latter, the former method may works, but with a hard learning curve. Obviously the existing language is the JavaScript itself. Aha, that we would do: write JavaScript and compile it to binary executable, and run in JavaScript VM. Interesting, right?

Having to say, this idea looks like what I was expecting in yorkie/lv, which gets AST via esprima, and generates Assembly source code for NASM. About the implementation of the new thoughts, we don't need esprima any more, because we can provide APIs by following WAT's specification, and generates WAT source directly.

If you have interested in this project, feel free to leave comments here :)

Node.js是如何启动的?

在每次安装node，我指的是install from source的时候，我们都是通过./configure来执行joyent/node下的configure文件作为开始的，下面简单地说一说node究竟在configure下了多大功夫吧。

查看这个文件可以从第637行开始看起：

通过output来声明一个默认的配置属性，然后通过调用：

configure_node(output)
configure_libz(output)
configure_http_parser(output)
configure_cares(output)
configure_libuv(output)
configure_v8(output)
configure_openssl(output)
configure_winsdk(output)

给output初始化，在每一个配置函数中给相应的依赖软件添加配置项，方便后面将这些软件集成到Node程序中。

configure_node: 定义了一些关于操作系统的相关的参数配置
configure_libz: 定义了与zlib相关的参数配置，zlib是提供数据压缩用的函式库，代码在deps/zlib中。
configure_http_parser: 用于解析HTTP报文的一个C函数库，代码同样在node的deps目录下可以找到，以下类似。
...

下面我们将注意力转到代码的第663行，pprint.pprint(output, indent=2)，这段代码打印了我们当前的所有配置信息，本人clone了node v0.10.12的源文件，并在其目录下运行./configure，得到如下输出：

{ 'target_defaults': { 
                       'cflags': [],
                       'default_configuration': 'Release',
                       'defines': ['OPENSSL_NO_SSL2=1'],
                       'include_dirs': [],
                       'libraries': [] },
  'variables': { 'clang': 1,
                 'host_arch': 'x64',
                 'node_install_npm': 'true',
                 'node_prefix': '',
                 'node_shared_cares': 'false',
                 'node_shared_http_parser': 'false',
                 'node_shared_libuv': 'false',
                 'node_shared_openssl': 'false',
                 'node_shared_v8': 'false',
                 'node_shared_zlib': 'false',
                 'node_tag': '',
                 'node_use_dtrace': 'true',
                 'node_use_etw': 'false',
                 'node_use_mdb': 'false',
                 'node_use_openssl': 'true',
                 'node_use_perfctr': 'false',
                 'python': '/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python',
                 'target_arch': 'x64',
                 'uv_parent_path': '/deps/uv/',
                 'uv_use_dtrace': 'true',
                 'v8_enable_gdbjit': 0,
                 'v8_no_strict_aliasing': 1,
                 'v8_use_snapshot': 'true'}}
creating  ./config.gypi
creating  ./config.mk

在输出的最后两行，可以看到这个脚本生成了两个文件：./config.gypi和config.mk，它们将会在makefile中有所用途。

在脚本的最后一行：
subprocess.call([sys.executable, 'tools/gyp_node'] + gyp_args)会执行命令python tools/gyp_node，因此我们需要继续阅读tools/gyp_node中的代码。

代码：tools/gyp_node
分析：这个脚本仍然是一个python脚本，我们可以先从第23行代码入手，从23到倒数第二行的过程中，程序通过判断操作系统类型来生成运行gyp的相关参数，最后通过函数run_gyp来生成。

Node如何启动只是开了个头，不过偶是一个懒惰的人，如果你无意间看到，只希望你一扫而过就行，别太认真。

How tensorflow stores data

In tensorflow's world, its low-level implementor should split all of types into the following two: non-TF_String and TF_String.

Because tensorflow treats TF_String not like the traditional languages, it's an element with variable-length. For example, the string "yorkie is so cute" is a TF_String element, correspondingly, the character "x" is also a TF_String one.

What's a tensor

Let's start with the keyword "tensor". The tensor is actually a data structure, also as a multidimensional array. That I have to say, an array, the vector in mathematical, is the edge case of the tensor structure.

For example, an tensor could be represents as:

100
[ 1, 2, 3 ]
[ [ 2, 3 ], [ 5, 7 ] ]
[ [ [ 1 ], [ 2 ], [ 3 ], [ 4 ] ] ]

In the above example, every line is a tensor. The top number is called scalar tensor, and next is vector, matrix and n-tensor. The n represents the dimension of your tensor or array as:

scalar: n = 0
vector: n = 1
matrix: n = 2

Formally, I'm going to introduce the concept of shape array, and show how it works with the parameter n. Every tensor's structure is shaped by an vector, and the number n is the length of this vector.

And from the start position to the end, every element describes what's size in its specific dimension. The shape [5] describes a vector which owns 5 elements, [3, 2] describes a matrix which owns 3 sub-vectors, which owns 2 scalar elements, that the total number of elements is 3 * 2 = 6.

Take a more complex example, the shape [100, 99, 5, 5] represents a tensor which owns a 100 elements, which's shape is [99, 5, 5] as a matrix, the total elements number of this tensor is 100 * 99 * 5 * 5.

Store a tensor

In last section, we have covered what's a tensor, and how to represent it in an human-readable way. Next, we will take a look at how to store a tensor in machine.

{
  "type": "int8/int16/int32/float16/float32/string",
  "shape": <vector>,
  "buffer": <....>
}

The above structure describes the 3 fields. In fact, all the real data is put to the field buffer, it's a fixed array in storage, and we could call the field type and shape as the metadata of the buffer:

type describes the element size
shape describes how to encoding and decoding with buffer

Oh, string has no fixed size

As we have written in the beginning of this note, tensorflow treats the string in a variable-length type, that's the problem of the encoding method util now.

To represent the tensor composed with string correctly, implementor should introduce another array, offsets indices, to tell the encoder/decoder that every element's size. The offsets indices is a uint64 array, and its size is the number of elements. For example, if we have a string tensor:

"foobar", "yorkie is so cute"

The encoder just writes normally as before, the only difference is that we should put the start position of every string into the "offsets indices" vector. Corresponding, we decoding the buffer by reading this vector, "offsets indices" as well.

Here we have a C API story, actually in tensorflow's C API, there are 4 relevant functions thats:

TF_StringEncode
TF_StringDecode
TF_EncodeStrings
TF_DecodeStrings

And the TF_EncodeStrings and TF_DecodeStrings are not exposed. In my first attempt to implement string tensor encode/decode at yorkie/tensorflow-nodejs@ce922f7, I misunderstood the full implementation are inside those functions, I got an error as:

Malformed TF_STRING tensor; element 0 out of range

After reading the function TF_Tensor_DecodeStrings and TF_Tensor_EncodeStrings, the "offsets indices" logic is defined there, and got to know that TF_StringEncode/TF_StringDecode is for another purpose. Then I re-implement the "offsets indices" in JavaScript in my own implementation.

Summary

In this note, I share about how tensorflow stores data internally, alternatively put a little story with its C API, too. If you are going to implement a tensorflow client, this might help you to build the basis of your library.

I have implemented the encoding and decoding at tensorflow-nodejs, if you are interested in getting more details, take a look at the following links:

2015-12-07

今日GET

loopback中，可以使用{"root": true}来设置返回传入的对象本身（来自于团队小伙伴）
loopback中，关系belongsTo中的foreignKey是当前模型的字段名，如：userId, commentId等
星巴克的瓶装饮料很解困
原来_京东_送货也有不能当日抵达的

如何使用 MongoDB 构建一个关系图谱?

由于这两天在给某个项目中加入社交相关联的东西，于是记录一下学习的成果

原来其实对文档数据库与关系数据库一直含含糊糊地，弄不清楚。知道昨天看了一篇文章：

neo4j-vs-mysql-vs-mongodb: http://addisonlee.azurewebsites.net/neo4j-vs-mysql-vs-mongodb/

分别对比了文档数据库、关系数据库以及图谱数据库，也直接引出了我今天的思考。每一类型的数据库跟它的数据结构有直接关系，由于数据结构的不同，会出现不同的用法。简单地说：

关系数据库 就是一个带范型的集合，比如：List, List等等
文档数据库 可以看成是一个个的对象组成，比如：new User(), new Order()
图谱数据库 我觉得只是在文档数据库的基础上，存储了从对象A到对象比的一个动作

中途我参考了一个叫 Wordnik 的例子，从这个团队的名字就能看出，他们是做字典应用的，从他们的 Keynote 中，我大概了解到图谱(graph)作为一种通用的数据结构，一般有以下的用途：

地图应用，主要是查看从地点1到地点2中间的路径，属于路径问题
字典应用，主要是查看可以从单词a拓展到哪些其他相关的单词，属于关联性问题

他们在视频里讲了很多废话，我就直接上他们的数据结构：

{
  "id": "cat+context",
  "tn": [
    { "weight": 1, "id": "dog+context" },
    { "weight": 2, "id": "yorkie+context" }
  ]
}

当然具体的数据是我瞎编的，其实他们的实现跟我之前的想法大致一样，不过还是有些出入，我的思维可能会更偏关系型数据库一些：

[{ "source": "cat+context"
,  "linkTo": "dog+context" },
{  "source": "cat+context"
,  "linkTo": "yorkie+context" }]

所以，前者才是文档数据库正确的打开方式

于是我参考了 Wordnik 的方案，把id从指向的是一个单词，换为指向人名(username)：

{
  "id": "jobs",
  "tn": [
    { "type": "like", "id": "yorkie" },
    { "type": "hate", "id": "bill" }
  ]
}

这样一个简单的图谱数据结构就在 MongoDB 中定义完成了。

Use Repo with Git/GitHub repository

Recently I'm switching company's OS development from buildroot to openwrt framework. In the past, I just writing a buildroot package, which *_SITE_METHOD is git, and every time, the buildroot just download the .tgz from GitHub.

When switching to openwrt, there are such many features from Buildroot not getting supported, like locally debugging from git package. That says we could not modify the source code under build directory.

Then I have to figure out a way to make the locally debug easy. Now I will show you how the Android repo works with any Git/GitHub repository.

Open your local manifest of repo, and add a remote tag like this:

<mainifest>
  <remote name="github" fetch="https://github.com" review="https://github.com"/>
</manifest>

Then add a project:

<manifest>
  ...
  <project name="Rokid/ShadowNode" remote="github" path="frameworks/shadow-node" revision="master"/>
</manifest>

Now, you could run repo sync to download the source code from GitHub. The advantage than the buildroot/openwrt remote package is the free modification of the source code by yourself, and perfect workflow lets any Git/GitHub repository works with Repo if you set correctly permissions.

The keyword let in Rust

The programming keyword let is basic and useful, but very fallible, the reason I’ll show you later.

How it works in JavaScript?

In Javascript, we use var to define our variable and constants(actually there is no real constants in javascript), very simple, but to state one key point to you:

var x = 10;
var y = x;
console.log(x, y);

we could use x and y totally, let us see what happened in these 3 lines.

line 1: specify a variable x, then assign the value 10 to it, in our memory, it apply for blocks to do this.

line 2: specify a variable y, then read the value of previous variable x, copy it to the memory of variable, then come so far we will spend 2x blocks to save the same value, it looks you do sleep in two beds, so luxurious, right?

This is not the failure of Javascript, in fact in C-style language family, all the members always looks so wasteful.

How it works in Rust?

let x = 10;
let y = x;
print!("x:{}", x);

The line 3 could not be executed correctly, because the x has been moved to y, please notice this: rust does move, not copy.

It indicates the program only waste 1x(itself) blocks in memory, that’s the reason why I think the program written by rust is better than other languages.

At my beginner of learning rust, this feature(bug at that time) impede me more or less, I think you are, too, then next section, I wanna show you how to skip this hinder, and then enjoy the fun with Rust.

Tip: take your brain out first and then in something about Rust

Yeah, you must change your brain in a way, I can compare this problem to compute process.

The keyword var is very similar with NPM, it does much redundancies IMO(In my opinion), but NPM use disk, it is cheap and easy to gain, but memory is not. Once you consider programming in this way, it is easy to skip this let hinder, actually the meaning of this creative way to define variable just guide us(programmers, geeks) to the right road: save your precious resouces ASAP.

End

Rust ever has amount number of features that introduced by its developers, however the language is going, it is valuable to you.

Rust Language: http://rust-lang.org
Github: https://github.com/rust-lang/rust

Tip of Loopback

This is not a travel-related note, and this doesn’t cover any about the loopback address 127.0.0.1(if you know this), and this is not covering any aspect of the web framework loopback.io.

Hey, I’m just going to tell you the story of me using loopback.io framework since 1 month ago.

Background

Few month ago, I was still working on a New York startup and an Asian startup, we definitely were using express. The NYC project looks pretty good in front-end because of we were not using expressjs, however the backend within the express framework, it always went worse, and we had to refactor code over and over, if there are some time more than 1 week, that we didn’t clean the stale lines, the whole codebase would go some place where is hard to maintain and develop.

The plight of using express did go worse in the Asian team, even though we still have other issues on this team, actually the members of the team are from the whole Asia area, including China mainland, Japan, Vietnam Canada and Indian, and we are still working remotely in different cities and time zones.

I have to say, we did always delay the progress because of the architecture express-based are really bad. There are too many bike-shilds to distract us. Until one day in last month, I was nominated as the backend team leader, then the first task of mine is using loopback.io to make development more agile.

Why not choosing Meteor?

As for choosing loopback over meteor, the another more pop web framework at Node.js, the reason is super simple that is we only need an API service.

What did loopback.io help us

The most reason of using loopback is we can just implement most business needs by only updating the model file. Plus, the StrongLoop team has built many built-in models like UserModel, PersistentModel and etc. They are easy to use for most of your business needs.

Thanks to the clear structure of source of loopback project(Github: https://github.com/strongloop/loopback), we can learn and track what had been happened easily and quickly.

After comparing the complexity between codebases, we are clearly to get the following report:

The total of lines did reduce by 70%
The lines of controllers did reduce by 90%
The lines of models did reduce by 10%
The lines of views keeps unchanged
The lines which are not related to business logic did reduce by 99%

The improvements are obviously, especially the 5th are what I strongly suggest you should use loopback in your business project. The reason is exactly obvious, there are too many failures in my previous projects are caused by bike-sheilds problems.

What did loopback.io bother me

Yup, as you have also concerned before, the loopback’s community is not activity than express, koa and Meteor.

But this is absolutely Okay for using this framework, again thanks to the clear structure of source of loopback, we can easily track problems by reading documentation and source code.

Trust me, that’s really not too hard for most of node developers, because I’m just doing this in my current projects.

If you have any problems with using loopback, you could send me your question to my email address: [email protected].

By the way, I’m not really the staff from StrongLoop, Inc. The initial purpose of this story is helping developers to write more robust and reliable apps in simplest way.

网络爬虫漫谈

最近几周，我写了大量的Javascript代码来实现了一个与HTTP有很大不同的服务器模型，我不再关心及时响应问题，我写的程序其实是一个爬虫，它们运行在服务器的某个进程组，然后像客户端一样去访问其它服务器的资源，它们在业务上并不占用任何的Unix Socket端口或TCP端口，它们更像是HTTP服务背后的那个默默支持着他的女人，我更想说的是世界上有男人和女人，我也在想，在目前的服务器世界，也可以分为TCP服务器和爬虫类服务类程序。

目前最深入人心的一个概念便是web了，尽管Mobile走在风口浪尖，尽管App Store踊跃出如此多的优秀应用，但这些东西都会江郎才尽，退居二线，web才是互联网的未来，是天之骄子，而为web打下坚实基础的攻坚部队便是能够顶住十几二十亿个请求的强力Web服务器。面对着服务器压力的暴力增长，为了优雅地解决这个Problem，Geek们当然不期望用简单地增加大量服务器来解决这一难题。

首先，为了可以提高请求并发数量，程序员们想到everything is the datum这句经典名句，于是将所有的请求数据纪录下面，并且按照时间或其它标准把请求排列起来，然后使用另一个进程来处理这个队列，那么问题来了，如果有一个处在队列的任务需要耗费大量的时间或计算机资源，那么排在它后面的请求就会吃大亏。

Geek们又一次想到解决方案：尽量简化HTTP请求的逻辑，基本上就是使得请求到来之后，只是从数据库里取一条简单已有的数据回去，而不需要什么全文搜索、文本处理、排序以及IO之类的耗时费力的过程。但业务在增长，逻辑怎么可能如此简单呢，该做的事情永远需要人去做，此时就诞生了爬虫程序，它们预测HTTP客户端用户所期望的数据，然后把数据提前准备好，通过某一种方式推到用户面前。爬虫程序与HTTP服务器相辅相成，前者简化了后者的流程，后者则强烈依赖于前者。

下面来说说压力：
HTTP的压力很重要，因为压力直观表现在用户量，请求数量在超过一个请求阀值之后，会对服务器产品大量压力，造成服务器程序崩溃。相反，爬虫程序不会因为用户数量的突然增长而失去工作能力，无非是遭到HTTP服务器的抱怨，“诶，你怎么还没更新这个用户的数据？”，爬虫左耳朵进，右耳朵出，自己该干嘛还是会干嘛。这一点很讨厌是么，哈哈。

所以一般当这两类服务器遇到服务瓶颈后的做法都不同，HTTP的做法一般是通过修改程序代码（包括前端）来合并请求，缓存数据等，但爬虫最简单管用的做法则是增加硬件提供程序的处理能力。

2020-01-20

Leave Rokid, thanks all guys here and wish you better Rokid.

libuv 的贡献链接失效了。

需更新为：
https://github.com/libuv/libuv/commits?author=yorkie

2015-12-02

Docker：

RUN 是运行在docker build时，所以一般运行的是安装、设置的脚本
CMD 则是在运行docker run时，用来启动容器本身的一些脚本

关于日记：在这里我将会记录每天所得所感，也会记录一些个人的想法，当作待做列表来管理，同步在 Github 与微信公众号中。

2015-12-05

VC Dimension:

关键点在于超平面(hyperplane)，即对于二维空间中，给定的数据集合，没有数据点都有正向(Positive)和负向(Negative)之分，然后我们需要用一条直线(hyperplane)把数据按正/负划分开，所以当数据集呈现为一个矩形，并且正负正好位于对角线位置时，是不可能用一条直线把数据分类的，不过对于只有三个点的数据集时，我们总是有办法用一条直线进行切割。
上一条是例子，所以对于VC维来说，有两个变量，一个是数据集S，另一个是散列函数H，上例中的H是一条直线（二维空间的超平面），对于四个点，当然也可以用一个瘦椭圆来作为H值，此时VC(S, H) = 4。

在知道了如何计算VC维之后，我开始学习这个数值是用来做什么的，于是我参考了这个Quora答案：

This is where the VC dimension comes in - it enables you to conduct your search in a principled way. For a family of surfaces - or to be precise, a family of functions - the VC dimension gives you a number on which you can peg its capability to separate labels.

The general idea is that the VC dimension points you to a reasonable family of functions to inspect. You pick a specific member within this family based on the exact data-set at hand.

然后按照我的理解是：在进行一些预测、分类时，VC维可以有效地帮助你筛选出哪一部分的数据是可以被有效分类的，但作者也指出：

Risk <= (Empirical Risk) + (VC dimension)

这里的 Empirical Risk 还不是特别明白，不过大致了解下来呢，就是一个通过努力可以降低的参数，从而降低错误率。因此这里就存在一个博奕，即：

较大的VC维虽然可以让我们使用更多的数据进行筛选，不过也会增加其错误率
较小的VC维虽然可以让错误率保持很低，但是经常会遇到数据不在范围，只得经由人类干涉

参考文献：

http://www.svms.org/vc-dimension/vc-dimension.pdf

https://www.quora.com/Explain-VC-dimension-and-shattering-in-lucid-Way

2017-07-19

tensorflow-node supports running a predefined graph

Firedoc: a new doc building tool compatible with YUIDoc

In last week, I was starting a new task for Fireball team, there are a bundle of new needs for YUIDoc from other team members like the following:

we need the ability that can define the methods, properties or events under the module over class.
we need the ability that can generate markdown files.
we need the ability that can support multiple languages feature in an easy or handy way.

For more features that I have implemented in the Firedoc, you could check this GUIDE.md.

Yes, as you have found that, the new build tool is fireball-x/firedoc, which is currently used in all my 2 teams.

The Magic: Generate a markdown-based documentation in your project

The first magic art of this tool is to let you generate own markdown-based documentation automatically. The fast example to show this art is here: https://github.com/fireball-x/firedoc/tree/master/docs, the firedoc itself. I build its documentation via firedoc over the original HTML pages.

To complete this, only there is one line required: https://github.com/fireball-x/firedoc/blob/master/Makefile#L11. Now when you pushed the changes that firedoc did in your local machine, then you will check your live documentation at your Github repo page.

Anyway, now I released this tool to share with you, if you are interested in this project, feel free to use it and submit your suggestion, pull-request or issue, thank you.

2015-12-08

今日GET

通常说的黑白图片也叫"Binary Image"
推荐一个网站，可以在线通过指定函数名进行绘图：https://graphsketch.com/

Why bcrypt has both sync and async methods?

Note: This is another proven case to use Node.js properly.

Because of working with loopback, which uses bcrypt as its algorithm to encrypt user’s password, in these recent weeks. Then I watched something test result in speed aspect, found some tests that always create user then encrypt password would cost over 3s, which is very weird as I was thinking.

So I was about to take what the issue is, and want to know why the cases are that slow. Then I got an answer is the today’s topic: BCRYPT. Yea, maybe you have got known about this word, and I did know it costs much CPU resources to avoid the rainbow from other hackers. But after I looked the loopback’s following line

this.$password = bcrypt.hashSync(plain, salt);

I’m little confusion on the usage from loopback, this is in an async routine, right? as an opposite of Node.js best practice, shouldn’t we use async functions here, too?

I was still thinking that loopback is wrong and gonna submit a PR, but just now, when I decided to write this post, I got I’m wrong.

The story just looks like I get known that this is an absolute CPU consuming operator, so there is no such that async I/O method. Do we should call it as ASYNC CPU function? I’m quite not sure what’s the answer, but let’s take a look at how bcrypt.js handles with the arguments in async function.

In the [email protected], the async hash function’s normal usage should be:

crypt.hash(
    'password', 
    'salt',
    function ondone (err) {
        // get fired when done or fail
    },
    function onprogress (num) {
        // get fired so many times to notify what’s the progress number
    }
)

Internally, the bcryptjs library just splits the whole stuff into some little subtasks by declared rounds, that is a number in 4 - 64. And it would do the following thing when one round is done:

Check if the onprogress is provided, it’s optional.
Wrap the next round into the Node.js nextTick utility.

Such that, I can give you an example to simplify the workflow, if I decided to use 31 as the rounds number, then the onprogress function would be called 32 times, and will executes over 32 ticks minimally. Now if I use the same sync function, this should only be executed without wrapping by nextTick.

Ok, this is a preparation for us to enter the final part, is there a necessarity to use async copy in loopback? I have to tell you that I still have no answer here. Without the async calls, when server is going to encrypt someone’s password, the any other requests would be blocked, right? So the sequence of users would looks like: A1-A2-A3-B1-B2-B3-C1-C2-C3. In opposite of this, async function enables we did the same thing like: A1-B1-C1-A2-B2-C2-A3-B3-C3. This is still confusion, no one solution is getting less cost than another one, the former just can let the first one to quit at first, the latter is going to do not block others.

But the difference is from the other requests, assuming the encryption would consume 3s to achieve, that is when someone is going to login or register, the other requests without requiring to execute the algorithm would be blocked, this is unfair to them. And this is why Node.js or event-loop-based server are not able to handle with the large CPU tasks, it’s not about the JavaScript or other dynamic language is not good at doing CPU stuff (even the JavaScript/v8 has been proven that did align with C/C++ on speed). The best practice of using event-loop just is we procedure should be split into these small enough micro tasks which costs extreme less CPU time.

SIGKILL

The problem is about forever. While using this module to start a script, everything is OK, at an opposite of this operation, everything is not pretty good. Following simple codes is an example:

process.on("exit",function(){
  // cleanup some bad stuff, failed stuff.
});

At 1st time, I did a lookup in forever, then I found it stop a script or process by killing it, namely call the signal named SIGKILL. Codes is fixed on: nodejitsu/forever-monitor
I was actually thinking that the full success would be coming, such that I coded this:

process.on("SIGKILL",function(){
  // cleanup some bad stuff, failed stuff.
}

Fool, fuck self-confidence.
Had googled some blogs, I realized this way is absolutely wrong. I cannot handle the SIGKILL, It's an absolute quit system signal, then I know why forever stop scripts by this way.
But what is this problem's solution? Of course I find an answer to solve it, like that:

// head of script
// cleanup some bad stuff, failed stuff, 
//      what i wanna clean in while previous process is killed

That means that I make cleanup in next startup for this script.

Core Dump Tutorial

Hi, my readers, I shall do share a little experience about a debugging tip with you.
As you have seen, what I want to tell you about is core dump. Stuff beforehand, we need
make a description for u about what is the core dump.

In computing, a core dump (in Unix parlance), memory dump, or system dump[1]
consists of the recorded state of the working memory of a computer program
at a specific time, generally when the program has terminated abnormally
(crashed).

Yes, core dump is an awesome tool that provides traces of a program in Runtime, especially
in production.

And then, I will show you a total usage of core dump in MacOS.

Write a program that maybe crushed and after that would create a core dump file

See signals table first, then we know a signal like SIGQUIT(3), SIGILL(4),
SIGTRAP(5) and others could occured OS would create a core image what we need.

Then we create a source file named core-dump-file.c:

#include <unistd.h>
#include <signal.h>

int main(int argc, char ** args) {
  pid_t pid = getpid();
  kill(pid, 3);
}

And compiling it and run:

$ gcc core-dump-file.c -o core-dump-file
$ ./core-dump-file

Next, we get a output Quit: 3, everything is ok, right?

Enable Core Dump in your OS

Actually I want to say no, in this OSX man page, we see:

This memory image is written to a file named by default core.pid, where pid is the
process ID of the process, in the /cores directory, provided the terminated process
had write permis-sion permissionsion in the directory, and the directory existed.

What we really expected is we can find a core dump file once my program is crushed within a specified
signal. However just example, we could not find any files in /cores/, that is incorrect behavior.

The reason is that we have ever not enabled core dump in our OSX. Just run below:

$ ulimit -c
> 0
$ ulimit -c unlimited
$ ulimit -c
> unlimited

All right, then again run ./core-dump-file, you would get a different output like:

Quit: 3 (core dumped)

lldb/gdb and Core Dump

In the end, we just run below command:

$ lldb ./core-dump-file /cores/core.19504

Now we could get a trace log of your just curshed program.
This is a best functional practice once your programs have crushed in production environment.

Refs

How to use libomp at Node.js addon(node-gyp)

OpenMP is an awesome parallelism computing tool for C/C++, recently I'm working on a Node.js project which requires OpenMP to boost the application performance on Macbook.

Unfortunately, after having googled a few hours, there are no practices about node-gyp with libomp(OpenMP), thus I decided to achieve this by myself :)

Install

Firstly, we need to install the libomp on Macbook, I use homebrew to do that via the following command:

$ brew install libomp

Oops, you might occur an error that told you there is no binary for your OSX, just run the following to fix that:

$ brew install --build-from-source libomp

Test

Now, we need to test if OpenMP is working on your machine, here is an example source code:

#include <iostream>

int main()
{
  #pragma omp parallel
  {
    std::cout << "Hello World" << std::endl;
  }
  return 0;
}

OpenMP is not working with default clang, but gcc-{n} is working, such as gcc-8 and gcc-10, at my Macbook, I used the gcc-10 to compile this example.

$ gcc-10 -fopenmp main.cc

And executed the ./out which is compiled by the above, it outputs:

Hello World
Hello World
Hello World
...
Hello World

Congratulations! The OpenMP is working for you!

Final: compile with node-gyp

Now, we could configure your binding.gyp to make libomp work with node-gyp:

"conditions": [
  ['OS=="mac"', {
    "xcode_settings": {
      "OTHER_CPLUSPLUSFLAGS": [
        "-fopenmp",
      ],
    },
  }],
]

Add the above config OTHER_CPLUSPLUSFLAGS [ "-fopenmp" ] to enable OpenMP for your target/targets.

If you wanna use OpenMP runtime library, add the following to the linking config:

"libraries": [
  "-L<!@(brew --prefix libomp)/lib -lomp",
]

Finally, we could use OpenMP directives and runtime functions at Node.js addon!

The first experience with new macbook(12-inch) at 2015

The Bad Parts

can’t feel physically the keyup and keydown in the new keyboard.
the new force click is played, but fails, just move back to disable it and use 3-fingers for looking up.

The Good Parts

the retina screen looks really pretty pretty pretty well
the usb-c makes me to need an extra adaptor to connect my iPhones and iPad, that’s really awful, and really don’t think just one USB-C port is enough for one machine even though this port is compatible with all devices.

2015-12-13

旅行的意义在于敞开自己的心扉，去接纳别人看似奇怪却合其理的想法，并相互尊重，成为朋友

WeakMap与弱引用(Weak Reference)

今天在改 Fibula.js 代码的时候，我决定引入一个内部的对象来缓存每次都要运行的一个结果集，于是我打算使用WeakMap来试试，不过结果却得到了下面的错误：

TypeError: Invalid value used as weak map key
    at WeakMap.set (native)

于是我就纳闷了，我明明是这样使用的：

let dirs = new WeakMap();
// balabala...
dirs.set('name', []);

于是我先是去查v8的代码，我很快就定位到错误是在src/js/weak-collection.js的第70行处：

function WeakMapSet(key, value) {
  if (!IS_WEAKMAP(this)) {
    throw MakeTypeError(kIncompatibleMethodReceiver,
                        'WeakMap.prototype.set', this);
  }
  if (!IS_SPEC_OBJECT(key)) throw MakeTypeError(kInvalidWeakMapKey); // 就是此处
  return %WeakCollectionSet(this, key, value, GetHash(key));
}

然后我就有个疑问，为什么只有当key的值是对象(Object)才可以呢，后来我又查了下MDN，得到如下解释：

The key in a WeakMap is held weakly. What this means is that, if there are no other strong references to the key, then the entire entry will be removed from the WeakMap by the garbage collector.

读到这里，我明白了为什么会报错，但是看到什么"strong reference"什么的，我整个人还是不好的，为了弄明白为什么要这么设计，我继续查资料，于是找了下面这段代码：

Counter counter = new Counter(); // strong reference - line 1 
WeakReference<Counter> weakCounter = new WeakReference<Counter>(counter); 
//weak reference 
counter = null; // now Counter object is eligible for garbage collection

我就恍然大悟，于是我有了这样的对弱引用的解释：

弱引用即 WeakMap 中的键(Key)是指向一个对象(Object)，但是这个引用并不会在垃圾回收器判断对象是否被回收时影响其结果，也就是说弱引用(Weak Reference)是一个单向引用。

这样一种特性被用在键值对(Key/Value)真是一种比较内存友好的实现方式，比如：

var map = new WeakMap();
function init(map) {
  var key = new String('this is a key');
  map.set(key, '任意值');
}
init(map);

上面的代码，在我们调用完init函数之后，由于key是在函数内定义的，所以当函数结束，尽管我们在一个全局作用域下的WeakMap中引用了它，但由于是弱引用(Weak Reference)，所以key最后还是会被回收掉。

如果上面的代码，我们换成是{}或者Map，那么key无疑会被保护，一直到map先被回收为止。所以这类引用方式其实就是通过代码进行key的约定，并且对key没有特别的要求。

然后我又回过头去看了一下WeakMap和Map的API，并进行了对比，发现Map相较于前者，多了Map.prototyoe.entries（类似于Object.keys)，因此我们也不能在WeakMap对象上进行遍历操作了。

WeakMap特别适合这类应用：当我有一个需要长时间维护的键值对集合，这个集合会被在程序多处使用，然后再各自使用完后，可以放心地新建对象作为键(Key)，而不用担心由于循环引用而造成相关对象一直无法释放的情况了。

所以，原来一直说的 Node.js 不适合维护长周期变量的最佳实践应该也不复存在了。

Unix domain socket学习

Unix domain socket或称IPC socket是用来提供在同一台机器内，两个进程用来交换数据的端口，然而与命名pipeline相比，这个本地的socket以连接模式或无连接字节流的方式创建，而pipe只能通过后者来创建。

进程在使用Unix domain socket不需要共享一个相同的对象，其API与网络socket类似，但并不使用网络协议来进行通信。Unix domain socket是Posix操作系统的一个标准组件。

在使用Unix domain socket时，需要指定一个处在磁盘的文件路径，一般以sock为拓展名后缀，然后并不用担心Unix domain socket在性能上的问题，因为这里指定文件路径仅仅是作为命名空间，标示着监听相同文件路径的的进程可以进行数据交换，而真正的数据交换完全是系统在内核空间内进行的。

2015-12-03

How to push a Docker:

DockerHub 并不存你的源代码，所以需要先docker build，这里最好用--tag指定名字
然后执行docker login
最后运行docker push [tag-name]就可以啦

Pass through arguments to Instantiate the given class

Here is the demo function:

// utils for pass a class constructor and arguments, and return the
// instance of the given class.
function passthroughModule(initializer, args) {
  return new (Function.prototype.bind.apply(initializer, [null].concat(args)));
}

Then we would use this function:

class Foo { ... }
passthroughModule(Foo, args);

new issue

1
2
- 2.1

2016-01-18

《数据库设计与关系理论》读书笔记

原始的关系模型有3个主要组件：结构、完整性和操作。

关系通常都是定义成表格形式？

After googling some results about "table and relationships", just got some articles from Microsoft Access Database, which is about documenting their user can create a relationship between tables.

Nothing more about why relations is usually defined in table form.

Type is the pool for valuing in every column in Database.

In relational database, selecting a master key usual operation, but not required.

Operations has the following cases:

Restriction, namely query on where
Projection, namely fields
Production, namely joining

Relations is consists of elements and attributions, namely the rows and columns.

However the document-based database is only consists of document, namely the JSON/BSON object. We can group some documents into a collection, and put documents or BSON file under a same directory, now we call the directory a database in MongoDB.

Don't use relation system in normal simple method for saving and reading. (Why?)

Relation model's goal is to make system consider the performance, not users. (This sounds good, I thought the SQL has a slow performance)

2015-12-06

typify-json

今天完成了一个新库：https://github.com/weflex/typify-json ，创造这个库的原因起于 Fibula.js 的开发过程中，我使用了如下的代码：

const exec = require('child_process').execSync;
exec(`mongo --eval "db.foo.insert(${JSON.stringify(input)})"`);

如果input的值包含有Date这样的值，那么JSON.stringify的时候就会默认地把Date转换成字符串再插入数据库中了，这个不是我们想要的结果。

因此，typify-json 使用了与JSON类似的 API：

const typifyJSON = require('typify-json');
const exec = require('child_process').execSync;
exec(`mongo --eval "db.foo.insert(${typifyJSON.stringify(input)})"`);

这样如果我们输入：

{
  "date": new Date()
}

那它等价于如下的"Shell语句"：

> db.foo.insert({"date":new Date()})

在开发这个库的过程中，我尝试过很多方案：

使用JSON.stringify(input, replacer)的方式去重写，不过后来发现最后我所需要的并不是一个标准的 JSON，因此无法通过这种方法得出；
在上一个方案未果后，决定试试Date的值使用一个特殊的散列值存储在一个WeakMap内，然后在最后使用正则换掉，不过这种方法不够直观简单，WeakMap的API也不够稳定，于是作罢。

在尝试方法2时，还发现了JSON.stringify的replacer函数有一个比较奇怪的特性，比如以下代码：

JSON.stringify({
  foo: {
    date: new Date()
  }
}, function (key, val) {
  console.log(key, val);
});

当key为"date"的时候，发现对应的val已经不是一个字符串类型了，所以只能在上一层，即key等于"foo"的时候，来判断。