Code Monkey home page Code Monkey logo

riscv-isa-manual's People

Contributors

asb avatar aswaterman avatar avpremjith avatar brucehoult avatar columbus240 avatar daniellustig avatar david-horner avatar davideschiavone avatar dp-sc avatar fshaked avatar gameboo avatar gfavor avatar imphil avatar jcb62281 avatar jhauser-us avatar joshuascheid avatar kasanovic avatar luismarques avatar neelgala avatar pabs3 avatar palmer-dabbelt avatar pdonahue-ventana avatar pmundkur avatar richardxia avatar rongcuid avatar rsnikhil avatar sorear avatar stnolting avatar takahirox avatar tmw-wdc avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

mahmoudzamani

riscv-isa-manual's Issues

Typo in atomicity axiom

\newcommand{\atomicityaxiom}{If $r$ and $w$ are a paired load and store, and if $s$ is any store to byte $x$ from which $r$ returns a value and to which $x$ stores, then there can be no store from another hart to byte $i$ following $s$ and preceding $w$ in the global memory order.}

should be

\newcommand{\atomicityaxiom}{If $r$ and $w$ are a paired load and store, and if $s$ is any store to byte $x$ from which $r$ returns a value, then there can be no store from another hart to byte $x$ following $s$ and preceding $w$ in the global memory order.}

Preventing SC from being reordered with its paired LR

The intention (if not the wording) of the A-extension was for an LR to be visible before its paired SC.
This behavior has been lost in the current memory model.
I suggest adding it back with the following rule:

\newcommand{\ppopair}{$a$ and $b$ are paired}

Reporting Failure of SC

The question of how long an SC instruction has to stall subsequent memory accesses that depend on its result flag has been raised. The following is a discussion about the case where the SC reports failure, not success.

Luc has suggested three types of dependencies:

  1. result flag can only be used for subsequent memory accesses once both data and address are available
  2. result flag can only be used for subsequent memory accesses once the address is available
  3. result flag can be used for subsequent memory accesses immediately when the implementation decides to fail the SC

Luc favors suggestion 2), and as far as I understood Paul and I favor suggestion 3).

Luc also posted a set of tests to showcase some of the differences between suggestions 1) ("variant") and 3) ("herd"):
http://moscova.inria.fr/~maranget/cats7/riscv/variant.html

Atomicity of LR/SC with intermittent stores and fences

The intent of the A-extension was for no memory access to the same address of an LR to be performed by any hart between the LR and a paired SC. This is still written in the text, but no longer valid in the memory model.

An example for how this could make a difference would be
LR x
ST x
SC x

where the SC in the text has to fail, but in the memory model can succeed.

Furthermore, the memory model allows for a second hart to see the value of the ST, which may be considered to violate atomicity of the LR/SC sequence.

I suggest changing the text to match the memory model, keep this behavior as allowed for performance/cost reasons (and the fact that this code is not common), but document this behavior as a
litmus test in the relevant section.

Clarifying distinction between memory operations/instructions and memory accesses

In the spec, the words memory operation/instruction/access are sometimes used interchangeably, even though the spec clearly defines instructions and memory accesses to be separate things.

This affects mostly mathematical consistency, e.g., of definitions such as PO, PPO, and the dependencies, but at least in the case of PPO Rule 14 (\ppoaddrpo) creates an ambiguity, in particular in case of a failed SC.

The rule should be clarified to hold only in case the instruction actually causes a memory access.

The remainder of the uses of instruction/memory access/memory operation should also be cleared up, and the definitions of PO, PPO, etc., should be fixed accordingly.

Reporting Success of SC

The question of how long an SC instruction has to stall subsequent memory accesses that depend on its result flag has been raised. The following is a discussion about the case where the SC reports success, not failure.

In particular, in implementations that can guarantee success of SC instructions long before address and data are known -- such as a single-hart implementation that did not observe any exceptions between the LR and the SC -- could:

  1. speculatively* execute loads that depend on the result flag before the address of the SC is available
  2. speculatively* execute loads that depend on the result flag before the address of the LR is available
  3. execute loads and stores that depend on the result flag before the the data of the SC is available

The current status is that 1) and 3) are forbidden, but 2) is allowed.

For example, the following appears to be valid:

Hart1:
r1 = LD from y
LR r1
r2 = SC 0 into x 
LD from z+r2

Hart2:
ST 1 into z
fence
ST &x into y

Memory order:
LD 0 from z+r2 (r2 = 0)
ST 1 into z
ST &x into y
r1 = LD from y (r1 = &x)
LR r1 (LR x)
SC 0 into x (successful)

I believe that there are three sensible options:
a) forbid all of them
b) forbid 1) and 2) only
c) forbid none of them

Forbidding 1) but allowing 2) seems inconsistent.

My personal preference is c).

*I want to emphasize that the speculation here is not on the result flag, but on a lack of address clash. If the address of the LR/SC is found to be equal to the address of the load, the load has to be quashed and retried, in order to enforce the load/load ordering and the load value axiom. The result flag on the other hand is completely known at this point and is not speculative. The question is just whether the implementation is allowed to use it, or whether it has to artificially stall execution of subsequent loads until the addresses are resolved. Compare this with:


Hart1:
r1 = LD from y
LR r1
r2 = SC 0 into r1 
if (r2 == 0): LD from z

Hart2:
ST 1 into z
fence
ST &x into y

The program is nearly the same as the one above, for two exceptions: i) the address of the SC is never known before the address of the LR, ii) instead of an address dependency, there is a control dependency.
In this program, the load can always be executed speculatively, even if all behaviors 1)-3) are forbidden.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.