yfractal / blog Goto Github PK

View Code? Open in Web Editor NEW

10.0 10.0 0.0 6.98 MB

blog's People

Contributors

$yfractal avatar$

Stargazers

Watchers

blog's Issues

WIP Interesting In Memory Cache Techs

[draft] Regression Test Selection in Ruby

Regression Test Selection in Ruby

Introduction

For a serious project, before we merge code into main branch, we need run all the tests in the project.

As the project grows, it will have more and more tests and the testing time will increase from seconds to minutes, even hours.

And most time people will integrate the tests running phrase in CI and run them in parallel.

In my team, we use 45 instances to run the tests for achieving a reasonable time.

As we have too many test cases, we can't run them during development, so we have to submit code and rely on the slow CI.

Such process is slow and disturbing.

Most pull request, will only affect some of test cases, regression test selection(RTS) techs can help us to select such test cases.

In this article, I will analysis our codebase briefly and introduce some RTS methods. Then I will provide a possible solution for our use case.

Do we need to run all tests?

We have so many tests and it's take too much time to run in local. And I just to want get a rough picture.

So I select run all tests in spec/requests directory.

Ruby provides Coverage#peek_result to collect coverage info. And we can run it before and after a test case,

then the difference of the two results is the affected code. The detail is described in @tenderlove's article[1] and the code is here[2] which is based on crystalball[3].

After collected the coverage info, I checked some recent pr to see how many test cases they really affect.

The result as below:

From the result, we can see, most pr will not edit a lot of files and most pr will not affected many tests.

But they are some really popular file, such as authorize_helper.rb, it will affect 1011 specs, it is 21.9% of the total request test cases.

For all files, we can find 90% files affect less 1% requests tests, p95 is 3% and p99 is 10.1%.

So most PR will affect only a small set of all the test cases.

RTS Techniques

Slicing Technique

Agrawal et al. introduced a family of test case selection techniques based on different program slicing techniques[4].

And they are based on simple observations:

If a statement is not executed under a test case, it can not affect the program output for the test case.

@tenderlov's Predicting Test Failures[1] uses such tech.

But if the control flow graph of the program is altered, the technique is not safe.

For example, we a bar function,

def bar
   'bar'
end

and then we change it to

def bar
  'bar'
  'bar2'
end

It will affect result obviously, but it will not be detected. Agrawal et al. provides additionl method to handle such case,

such as detecting the added statement's affect variab or finding control-dependent for a predicate statement[4][6].

An Unsafe Technique from Google

The Firewall Technique

Future Works and Additional Thing

Find a fast way to get executing info

Report affected API/features to QA team

QA team will do regression test regularly.

If we can provide affected API to QA that should be helpful.

And recently, our team is working on adding feature tag to API.

After we finished the tagging, we can match API to feature and we can provide affected features to QA too.

References

Predicting Test Failures
https://tenderlovemaking.com/2015/02/13/predicting-test-failues.html
rts_rb
https://github.com/yfractal/rts_rb
Crystalball
https://github.com/toptal/crystalball
Hiralal Agrawal. Incremental Regression Testing, 1993
Gregg Rothermel. Analyzing Regression Test Selection Techniques, 1996
S. Yoo, M Harman. Regression Testing Minimisation, Selection and Prioritisation: A Survey, 2007

Tired LSM Tree

[WIP] User level thread

This article will introduce how CPU handle function call and how to handle user level threead switch.

Function Call

Let's image a simple computer, all instructions are stored in memory, and CPU will execute the instruction on by one in sequence.

As we want to pick an instruction to execute, we need to know the instruction's address in memory.

So we have to find a place to hold such info for CPU, CPU uses registers for this purpose.

The registers are much more faster than memory, accessing a register may need one CPU cycle and accessing a memory location may need hundred CPU cycles.

But CPU just has limited registers, may 32 or 64 or more, but can't get as much as we want.

The register for storing next instruction's addrress is PC.

After we executed one instruction, we move the PC 4 byte forward (suppose 32 bits for each instruction).

Then CPU will execute the instruction which is point by PC.

We can describe this in code

instructions = [i0, i1, ....in]

pc = 0
while true
    CPU_exec(instructions[pc])
    pc += 4

Now let's to see how loop works.

For loop we need exec some instructions again and agin, such as instruction-0, instruction-1, .... instruction-n, then instruction-0, instruction-1, .... instruction-n...

As we know PC is pointed to the next instruction to execute, so after we executed instruction-n, we just need to set PC to the instruction-0's address, then cpu will loop again.

It's time for function call.

Suppose there is some code as below:

1 foo :=
2  instruction0
3  call bar
4  instruction1
5  instruction2
6
7 bar :=
8  instruction0
9  ret

For function foo, we need execute some instruction then call bar function and execute remain instructions.

For function bar, we need execute some instruction and return.

So we need jump to bar and jump back.

When we jump to bar, we just need to update PC to bar's location. Then CPU will execute bar's instructions.

After we finished bar's execution code we need jump back to line 4's address.

For doing this we need one place to hold the line 4's address (for jumping back).

We can store it in memory but memory is slow, so we store it in register. This special purpose register is called ra usually.

so the call and ret can be defined as below:

call label :=
  ra <- pc + 4 // assign next instruction for ret, which is line 4's address in our example
  pc <- label  // jump to the callee, for foo is bar's address

ret :=
  pc <- ra // jump back

It works for on level function call.

But what if we have 3 functions? Likes function f calls foo and foo calls bar.

1  0000  f :=                    exec order       ra's value      description
2  0004    instruction-x            0                 0
3  0008    call foo                 1                 000a
4  000a    instruction-x
5  000e    ret
6  0010  foo :=
7  0014    instruction-x            2                 000a
8  0016    call bar                 3                 0001e
9  001e    instruction-x            6                 001e
10 0010    ret                      7                 001e        jump to 001e, line 9 again...
11 0014  bar :=
12 0018    instruction-x            4                 001e
13 001a    ret                      5                 001e         jump to 001e, line 9

Let's walk above code step by step.

For line 3, the ra's value is 000a(at line 3), then we jump to foo function(at line 6).

Then we execute call bar at line 8 and the ra's value will be updated to 001e(at line 9).

After bar has been executed, ra's value is 001e(at line 9), so we jump to line 9.

But when we execute line 10, current ra's value is 001e(at line 9), we jump back to line 9 again.

It's an infinite loop.

That happens because at line 8 we overwrite the original pa's value.

We need many places to store the ongoing functions' return addresses but we only have limited register.

It is impossible to store all those return address into registers, so we store them in memory.

For each function call we need some small memory to save ra and after the function has been executed we will need get the data back and free the memory.

For function f we need allocate memory and store 000e(line 3) to it and for foo we need allocate memory and store 0001e(at line 9),

after bar has been executed then we need get last value(0001e at line 9) back and deallocate memory

then do same thing for value 000e(line 3).

The operations are push, push and pop, pop, so we can use stack.

For stack we need allocates some memory(eg: 4kb) and one pointer which points to the top of the stack.

When we do push, we move the pointer forward and store ra to the last location.

After a function has been called, we pop the value and move the pointer backward.

As this happens so often, hardware designer provides one register for us, it is called sp usually.

Register ra's value only has meaning in current function, when we executed a inner function such as bar, the ra should have different value.

It is a temporary register, such registers should be saved by caller.

And there is another kind of register which are preserved across function calls,

so if callee needs to use such register, he needs to save them and restore them back before return to caller.

This kind of registers are callee saved registers.

Above is how we handle function call.

Light Sharing

TCP Congestion Control Brief Description by Pseudo Code

Introduction

TCP 需要在异构，动态的网络环境下，通过不可靠的下层服务(IP)，提供可靠（reliable）的服务，而拥塞控制是 TCP 必不可少的机制。

TCP 拥塞控制本事是一个非常困难，也非常有趣的问题。

本文主要根据 RFC 2001[1] 对 TCP 拥塞控制的状态变化进行一个简单的描述。

TCP Congestion Control Brief Description

TCP 拥塞控制通过调整 cwnd(congestion window) 对流量进行控制，未经过确认(ack)的数据总和不能超过 cwnd 的大小。

TCP 拥塞控制有三个状态，在不同的状态下，会使用不同的方式调整 cwnd 的大小。

State Changes

Slow Start and Congestion Avoidance

slow start 和 congestion avoidance 这两个状态的转换由 cwnd 和 ssthresh(slow start threshold) 控制。

当 cwnd > ssthresh 则由 slow start 转换为 congestion avoidance。

在 slow start 状态下，当收到新的 ack 的时候，cwnd 会成指数增长，而 congestion avoidance 状态下，则成线性增长。

在当前的状态是 slow start 或者是 congestion avoidance 的时候，如果收到 duplicate ACK, 则会将 ssthresh 设置为 window size( min{cwnd, rwnd} )的一半。

Fast Recovery

当收到 3 个或者更多的重复的 ACK 的时候，则进入 Fast Recovery 状态，在收到新的 ACK 的时候，则 Fast Recovery 结束，进入 Congestion Avoidance。

当进入 Fast Recovery 的时候，会将 threshold 设置为 window size( min{cwnd, rwnd} )的一半，重传丢失的 segment，将 cwnd 设置为 ssthresh + 3 倍的 segment size。

Timeout

当 retransmit timer timeout 的时候，则会将 ssthresh 设置为 window size( min{cwnd, rwnd} )的一半，并将 cwnd 设置为 segment size。即从新开始 slow start。

How Congestion Window Changes

图片截取自 MIT 6829 Computer Networking L8 [4]

Pseudo Code

TCPCongestionControl#receive 在收到 ack 或者 retransmit timer timeout 的时候会被触发。

可以理解成一种 callbck，在有收到 ack 的时候会被调用，在 timeout 事件发生的时候，也会被调用。类似于 Erlang 的 receive。

class TCPCongestionControl
  # rwnd receiver's advertised window
  def initialize(segment_size)
    # segment size announced by the other end, or the default,
    # typically 536 or 512
    @segment_size = segment_size
    @cwnd = segment_size
    @ssthresh =  65535 # bytes
    @in_fast_recovery = false
    # other logic ....
  end

  def receive
    # indicate congestion
    if current_algorithm == :slow_start && new_ack?
      # exponential growth
      @cwnd += @segment_size
      # may go_to congestion_avoidance state
      # Slow start continues until TCP is halfway to where it was when congestion
      # occurred (since it recorded half of the window size that caused
      # the problem in step 2), and then congestion avoidance takes over.
    elsif current_algorithm == :congestion_avoidance && new_ack?
      # linear growth
      @cwnd += segsize * segsize / @cwnd
    elsif three_or_more_duplicate_ack?
      # TCP does not know whether a duplicate ACK is caused by a lost segment or just a reordering of segments
      # it waits for a small number and assume it is just a a reordering of segments
      # 3 or more duplicate ACKs are received in a row,
      # it is a strong indication that a segment has been lost
      # go_to fast recovery
      @in_fast_recovery = true
      @ssthresh = current_window_size / 2
      retransmit_missing # retransmit directly without waiting timer
      @cwnd = @ssthresh + 3 * @segment_size
    # This inflates the congestion window by the number of segments that have left the network and which the other end has cached (3).
    elsif current_algorithm == :fast_recovery && new_ack?
      @cwnd = @ssthresh # go_to congestion_avoidance
      @in_fast_recovery = false
    elsif (current_algorithm == :slow_start || current_algorithm == :congestion_avoidance) && duplicate_ack?
      @ssthresh = current_window_size / 2 # go_to congestion avoidance
    elsif current_algorithm == :fast_recovery && duplicate_ack?
      @cwnd += @segment_size
      # This inflates the congestion window for the additional segment that has left the network.  Transmit a packet, if allowed by the new value of cwnd.
    elsif timeout?
      # slow_star and congestion_avoidance should come into here
      # not sure when timout occurs when fast_recevery should come into here too...
      @ssthresh = current_window_size / 2
      @cwnd = segment_size # go_to slow start
    end
    
    maybe_transmit_packet
  end

  private
  def current_algorithm
    return :fast_recovery if @in_fast_recovery

    @cwnd <= @ssthresh ? :slow_start : :congestion_avoidance
  end

  def current_window_size
    min(@cwnd, @rwnd)
  end
end

Relative

Congestion Avoidance and Control[2]，对用塞控制背后的原理有详细的解释，不过比较难读（我只读懂了部分）。

Computer Networking: A Top Down Approach 和 Principles of Computer System 都对 TCP 拥塞控有所描述。更深入的可以参考 MIT 的 Computer Networking 课程[3]。

在操作系统中，也有类似的问题。之前，在 package 超过系统负载能力的时候，CPU 都被用在了处理中断，而没有时间执行应用层的逻辑，导致没办法完整的处理一个 package。和 Congestion Avoidance and Control 有相似的思路，但是用了不同的方法[4]。

工业上，类似的机制有：流量控制(rate limiter)，背压(Back pressure)。Facebook 在 memcache cluster 中有使用类似的机制[5]。

Netfix 有开源 Hystrix[6]，但现在已经被 concurrency-limits[7] 所替代，而 concurrency-limits 使用类似 TCP congestion control 的机制。

参考

[1] W. Stevens. RFC2001: TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery Algorithms, 1997
[2] V. Jacobson, Congestion Avoidance and Control
[3] MIT 6829 Computer Networking L8 End-to-End Congestion Control
https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-829-computer-networks-fall-2002/lecture-notes/
[4] Jeffrey Mogul. Eliminating Receive Livelock in an Interrupt-driven Kernel, 1996
[5] Rajesh Nishtala. Scaling Memcache at Facebook, 2013
[6] Netflix Hystrix https://github.com/Netflix/Hystrix
[7] Netflix concurrency-limits https://github.com/Netflix/concurrency-limits

Reading Notes

I read, ~~I forget~~ I record

Training

Name	Hard	Progress	Link
540(单腿 360)	2	learned	https://www.loopkickstricking.com/tricks/540
Cheat 720	2	learned	https://www.loopkickstricking.com/tricks/cheat-720
Backside 720	1	learned	https://www.loopkickstricking.com/tricks/backside-720
Backside 900	3	learning	https://www.loopkickstricking.com/tricks/backside-900
Back Tuck(Back Flip)	3	learned	https://www.loopkickstricking.com/tricks/back-tuck
Front Tuck(Front Flip)	2	learned	https://www.loopkickstricking.com/tricks/front-tuck
Sideflip	2	learning	https://www.loopkickstricking.com/tricks/sideflip
Moon Kick	3	learning	http://www.club540.com/tricktionary/moon-kick
Double Kong	3	learning	https://parkour.fandom.com/wiki/Double_Kong
Raiz	3		https://www.loopkickstricking.com/tricks/raiz
Touchdown Raiz	3		https://www.loopkickstricking.com/tricks/touchdown-raiz

How linux `mmap` works

Need explain:

process
1. page table
2. VMA
3. fork
trap and page fault handler

For page table or virtual addressing may need write another blog.

JemMlloc

How Meltdown Works

Can one tab's JavaScript read the other tab's data?

In theory, it's not possible as the operating system guarantees isolation.

How the operating system guarantees isolation and provides multiplexing

Kernel Mode and User Mode:

A process can only affect itself in user mode.
When accessing shared resources, it must be under OS control through kernel mode.
- Shared resources include disk, network card, and kernel memory.
- Procedure for accessing shared resources:
  The user program calls a system call (e.g., read file).
  User mode switches to kernel mode.
  The job is performed in kernel mode, and the data is returned (copied) to the user program.
When a process is in user mode, it cannot access other processes' data or kernel data.
This concept is similar(very very roughly) to a front-end and back-end setting:
- Front-end app: User mode process, accessing its own data and functions.
- Back-end server: Controls shared resources, performing operations requested by the front-end app.
  e.g. when a user wants to withdraw money, the backend checks whether its account has enough money or not first.
- RPC: Similar to a system call

Virtual memory ensures data privilege:

Virtual memory is a hardware-based mapping that maps virtual addresses to physical addresses.

How it works:

When data is requested, virtual memory translates the virtual address to the corresponding physical address.

def read(virtual_address)
  pte = pte_through_virtual_address(virtual_address)
  physical_address = physical_address_through_pte(pte)
  read_data(physical_address)
end

Before returning the data, virtual memory checks for the appropriate privilege.

def read(virtual_address)
  pte = pte_through_virtual_address(virtual_address)
  can_read!(pte)
  physical_address = physical_address_through_pte(pte)
  read_data(physical_address)
end

def can_read!(pte)
  if required_mode(pte) == :kernel_mode && current_mode == :user_mode
  raise "No right"
end

Reading Kernel Memory in User Mode:

Plan:

Reading kernel data in user mode.
Making the result visible to the user program.

Reading kernel data in user mode:

Precondition: user mode has kernel memory mapping.
- Kernel memory is mapped in high memory addresses for faster system calls,
- which was nearly universal until these attacks were discovered.
speculative execution executes unprivileged Instructions:
```
if a # load a through memory
  do_some_thing
else
 do_other_thing
end
```
- CPU may execute "do_something" without knowing the value of "a."
- If the CPU guesses correctly, it saves time; if not, it must revert changes.
- CPU should not raise exceptions if "do_something" raises an exception before knowing a's value.
accessing kernel memory:
```
if a # load data from memory
 read_a_kernel_address
end
```
- CPU accesses "read_a_kernel_address" due to speculative execution.
- However, the CPU doesn't check the privilege, bypassing the security checks(I don't know the detail, the CPU is a black box..).
- It accesses the kernel data, but it's not visible to the user program.

Making Data Visible to User:

CPU guarantees that such actions are invisible to programs and cannot be directly accessed.
We can determine if data is read in the CPU cache by measuring time using instructions like "rdtsc" for x86 architecture.
L1 hit: A few cycles.
L2 hit: A dozen or two cycles.
RAM: Around 300 cycles.

Goal: Read one kernel memory bit.

Plan:
1. Make speculative execution visible.
2. Read one kernel memory bit.

Making speculative execution visible:

char buf[8192];
clflush(buf[0]);
clflush(some_condition);

if(some_condition) { // some_condition is false, need to read from memory
  // speculative execution starts
  a = buf[0];
  // speculative execution ends
}

c0 = rdtsc;
a = buf[0];
c1 = rdtsc;

if (c1 - c0 < x00) {
  // buf[0] had been loaded into memory
} else {
  // buf[0] hadn't been loaded into memory
}

Read data during speculative execution, then read it again and measure the time.
If the duration is fast, it means the address has been loaded into the cache, indicating visible speculative execution.

Reading one bit in the kernel.

char buf[8192];
clflush(buf[0]);
clflush(buf[4096]);
clflush(some_condition);

if(some_condition) { // some_condition is false, but need to read from memory
  // speculative execution starts

  r1 = a_kernel_virtual_address;
  r2 = *r1;         // read content
  r2 = r2 & 1;      // first bit
  r2 = r2 * 4096;   // 0 or 4096
  r3 = buf[r2];     // access the data
  // speculative execution end
 }

c0 = rdtsc;
a = buf[0];
c1 = rdtsc;
b = buf[4096];
c2 = rdtsc;

if (c1 - c0 > c2 - c2) {
  // low bit was probably 0;
} else {
  // low bit was probably 1;
}

yfractal / blog Goto Github PK

blog's People

Contributors

Stargazers

Watchers

blog's Issues

Regression Test Selection in Ruby

Introduction

Do we need to run all tests?

RTS Techniques

Slicing Technique

An Unsafe Technique from Google

The Firewall Technique

Future Works and Additional Thing

Find a fast way to get executing info

Report affected API/features to QA team

References

Function Call

TCP Congestion Control Brief Description by Pseudo Code

Introduction

TCP Congestion Control Brief Description

State Changes

Slow Start and Congestion Avoidance

Fast Recovery

Timeout

How Congestion Window Changes

Pseudo Code

Relative

参考

Can one tab's JavaScript read the other tab's data?

How the operating system guarantees isolation and provides multiplexing

Kernel Mode and User Mode:

Virtual memory ensures data privilege:

Reading Kernel Memory in User Mode:

Plan:

Reading kernel data in user mode:

Making Data Visible to User:

Goal: Read one kernel memory bit.

Recommend Projects

Recommend Topics

Recommend Org