Comments (19)
Well the immediate thing that strikes me is that wherever we do a type switch maybe that's a hint that you could perhaps make it an interface instead (or type switching over interfaces), perhaps, for example on types that have an "Elem()" method.
I think in general if the user wants to implement their own objects they have to accept that things might break. At least this kind of breakage will tend to be fairly obvious!
from llvm.
I start to analyze this issue, in my view, there have two problems have to solve:
-
which structures should be replaced with interfaces
I think this is quite hard to answer. In my opinion, every instruction should be able to be customized, even those primitive(int, float), for consistently sum-type.
For every instruction would haveIs<InstructionName>
, e.g.Int
hasIsInt()
,Struct
hasIsStruct
. -
where is typecasting
To answer the second, run
ag 'case \*' --ignore "*_test.go"
under directoryllvm/ir
can see the result as following shows(ignore*_test.go
).constant/expr_memory.go:149: case *Int: constant/expr_memory.go:152: case *ZeroInitializer: constant/expr_memory.go:154: case *Vector: constant/expr_memory.go:170: case *Int: constant/expr_memory.go:193: case *Undef: constant/expr_conversion.go:148: case *Int: constant/expr_other.go:47: case *types.IntType, *types.PointerType: constant/expr_other.go:49: case *types.VectorType: constant/expr_other.go:107: case *types.FloatType: constant/expr_other.go:109: case *types.VectorType: constant/expr_aggregate.go:143: case *types.ArrayType: constant/expr_aggregate.go:145: case *types.StructType: inst_memory.go:551: case *constant.Int: inst_memory.go:554: case *constant.ZeroInitializer: inst_memory.go:556: case *constant.Vector: inst_memory.go:572: case *constant.Int: inst_memory.go:595: case *constant.Undef: func.go:312: case *InstCall, *TermInvoke, *TermCallBr: inst_other.go:54: case *types.IntType, *types.PointerType: inst_other.go:56: case *types.VectorType: inst_other.go:119: case *types.FloatType: inst_other.go:121: case *types.VectorType: inst_aggregate.go:145: case *types.ArrayType: inst_aggregate.go:147: case *types.StructType: inst_aggregate.go:149: case *types.PointerType:
Once we know the decision of the first question we know how to update these codes.
p.s. Does sumgen can handle the multiple levels of algebra data type? For example Instruction = Struct | Int, Struct = Struct | StructWithComment
.
from llvm.
Hi @dannypsnl,
Thanks for starting to take a look at this! I agree, we have to figure out what kind of interfaces we need to define.
I think the list of usages you produced is very interesting, as from there we should be able to infer common behaviour that needs an interface definition. In the case of ir/types
I think @pwaller pointed out that we need something like:
type Elemer interface {
Elem() types.Type
}
This would handle *types.VectorType
, *types.PointerType
and *types.ArrayType
.
Note that most of the code is used to handle getelementptr
.
- Used by
getelementptr
instructions and expressions (to compute type).
inst_memory.go:551: case *constant.Int:
inst_memory.go:554: case *constant.ZeroInitializer:
inst_memory.go:556: case *constant.Vector:
inst_memory.go:572: case *constant.Int:
inst_memory.go:595: case *constant.Undef:
constant/expr_memory.go:149: case *Int:
constant/expr_memory.go:152: case *ZeroInitializer:
constant/expr_memory.go:154: case *Vector:
constant/expr_memory.go:170: case *Int:
constant/expr_memory.go:193: case *Undef:
- Used by
extractvalue
andinsertvalue
instructions and expressions (to compute type):
inst_aggregate.go:145: case *types.ArrayType:
inst_aggregate.go:147: case *types.StructType:
inst_aggregate.go:149: case *types.PointerType:
constant/expr_aggregate.go:143: case *types.ArrayType:
constant/expr_aggregate.go:145: case *types.StructType:
- Used by
icmp
andfcmp
instructions and expressions (to compute type).
inst_other.go:54: case *types.IntType, *types.PointerType:
inst_other.go:56: case *types.VectorType:
inst_other.go:119: case *types.FloatType:
inst_other.go:121: case *types.VectorType:
constant/expr_other.go:47: case *types.IntType, *types.PointerType:
constant/expr_other.go:49: case *types.VectorType:
constant/expr_other.go:107: case *types.FloatType:
constant/expr_other.go:109: case *types.VectorType:
- Used when simplifying constant expressions:
constant/expr_conversion.go:148: case *Int:
- Used by
Func.AssignIDs
to check forvoid
calls.
func.go:312: case *InstCall, *TermInvoke, *TermCallBr:
from llvm.
check for void calls: @mewmew Why this case didn't just use return n.Type().Equal(types.Void)? Could there any instruction with type void an not a call, invoke or terminator?
Fair point, this would work.
I think the reason is historical, as before the type of a call instruction could be either the return type of the callee or the function type of the callee. This was needed to handle variadic functions, but as of rev c56773e this logic was simplified.
As such, we can definitely use your suggested approach instead; that is to just use n.Type().Equal(types.Void)
directly.
from llvm.
I understand your approach and do think it solves a lot of problems, but we still have some remaining issues unless we further refine how we handle sumtypes internally within the llir/llvm/ir
code base.
For instance, lets consider the user-defined EStruct
type. It's prefect use case of how to use struct embedding to extend the *types.StructType
to also support field names.
The problem is that it will not work for some parts of our type checking code.
Take for instance *types.StructType.Equal, which is implemented as follows:
func (t *StructType) Equal(u Type) bool {
if u, ok := u.(*StructType); ok {
if len(t.TypeName) > 0 || len(u.TypeName) > 0 {
// Identified struct types are uniqued by type names, not by structural
// identity.
//
// t or u is an identified struct type.
return t.TypeName == u.TypeName
}
// Literal struct types are uniqued by structural identity.
if t.Packed != u.Packed {
return false
}
if len(t.Fields) != len(u.Fields) {
return false
}
for i := range t.Fields {
if !t.Fields[i].Equal(u.Fields[i]) {
return false
}
}
return true
}
return false
}
Notice on line 788 the runtime type assertion:
if u, ok := u.(*StructType); ok {
If we pass a *EStructType
to *ir.StructType.Equal
, then the check would always return false
. e.g.:
structType := types.NewStruct(types.I32, types.Float)
structTypeWithNames := &EStruct{
StructType: structType,
FieldNames: []string{"x", "y"},
}
structTypeWithNames.Equal(structType) // return true.
structType.Equal(structTypeWithNames) // return false. <<< NOTE: unexpected result.
structType.Equal(structTypeWithNames.StructType) // return true
So, for these reasons, we still have to think about relaxing the internal code base of llir/llvm/ir
to handle interfaces rather than using type assertions, to make it easier to create user-defined types (and instructions), such as the extended types you provided.
Cheers,
Robin
from llvm.
While here, we should also add remaining test cases to ensure that sumtypes are implemented and enforced.
e.g.
// Assert that each constant implements the constant.Constant interface.
var _ constant.Constant = (*Global)(nil)
from llvm.
I don't have a strong opinion on this. I can see the benefit of "locking down an API to only hold objects you exactly know what they are". But the alternative view in my mind is that you don't care what the object is, but only the interface it satisfies. If someone is willing to pass an object that satisfies the interface, I don't immediately see a strong reason to disallow it.
Someone also recently pointed out that you can actually defeat this by embedding anyway!
https://news.ycombinator.com/item?id=19347281
The context they brought it up was that this technique is actually used in go SSA.
from llvm.
I can see the benefit of "locking down an API to only hold objects you exactly know what they are".
The benefit is unquestionably stability. However, with stability comes limitation, just like in life :p
If someone is willing to pass an object that satisfies the interface, I don't immediately see a strong reason to disallow it.
The main reason I can think of is that there are definitely parts of the llir/llvm/ir
code base that would start to break down, as they perform type switches on known concrete types.
For instance, let say someone defines their own llir/llvm/ir/types.Type
. This would break when used with the getelementptr
instruction, as we use the concrete types.FooType
types to determine the element type of the gep
instruction.
Line 560 in fe7863f
However, we also want to allow such use cases as they are very powerful. For instance, it is easy to envision a user-defined LLVM IR type for structures with named fields.
package foolang
type StructType struct {
FieldNames []string
*types.StructType
}
I really wish we can find a way to fix such fragile code, so that we may allow users to define their own LLVM IR types, values, etc. I think it is good to enable using the IR package in ways not envisioned by the authors of the IR package. Essentially following the principles of Unix.
Someone also recently pointed out that you can actually defeat this by embedding anyway!
That's a fair point.
from llvm.
I stumbled upon sumgen, and it may be worth investigating whether we should use such a tool to generate the dummy method definitions for sumtypes. Perhaps it's worth it, perhaps not. Regardless, it would be interesting to explore.
from llvm.
Having worked a bit on getting Geode up to speed with using v0.3.0 (in geode-lang/geode#27), I'm quite convinced that we should work on getting sumtypes right for the v0.4.0 release, to allow for user-defined instructions (e.g. line comments), types (e.g. structs with field names), constants, etc.
Just putting this here, so we can prioritize working on this issue early on as it could have quite invasive API changes.
from llvm.
When I implementing getelementptr
I found using Elemer
would be a huge change. I haven't dig into how gep work in codebase but can see there would be a big impact. Currently it actually wants gep.Index
rather than get element directly. So I create a Indexer as following code changes. I don't feel index at here is a good abstraction, but if we keep this I think I would like to use the same implementation rather than current way, that means have to use llir/llvm ast rather than the llir/ll ast. But first focus on how to implement gep use Elemer
.
// NOTE: keep getIndex in sync with getIndex in:
//
// * ast/inst_memory.go
// * ir/inst_memory.go
// * ir/constant/expr_memory.go
//
// The reference point and source of truth is in ir/constant/expr_memory.go.
code changes
diff --git a/ir/constant/const_indexer.go b/ir/constant/const_indexer.go
new file mode 100644
index 0000000..0f54aee
--- /dev/null
+++ b/ir/constant/const_indexer.go
@@ -0,0 +1,64 @@
+package constant
+
+import (
+ "fmt"
+
+ "github.com/llir/llvm/internal/gep"
+)
+
+type Indexer interface {
+ GetIndex() gep.Index
+}
+
+func (i *Int) GetIndex() gep.Index {
+ return gep.NewIndex(i.X.Int64())
+}
+
+func (z *ZeroInitializer) GetIndex() gep.Index {
+ return gep.NewIndex(0)
+}
+
+func (u *Undef) GetIndex() gep.Index {
+ return gep.Index{HasVal: false}
+}
+
+func (index *Vector) GetIndex() gep.Index {
+ // ref: https://llvm.org/docs/LangRef.html#getelementptr-instruction
+ //
+ // > The getelementptr returns a vector of pointers, instead of a single
+ // > address, when one or more of its arguments is a vector. In such
+ // > cases, all vector arguments should have the same number of elements,
+ // > and every scalar argument will be effectively broadcast into a vector
+ // > during address calculation.
+ if len(index.Elems) == 0 {
+ return gep.Index{HasVal: false}
+ }
+ // Sanity check. All vector elements must be integers, and must have the
+ // same value.
+ var val int64
+ for i, elem := range index.Elems {
+ switch elem := elem.(type) {
+ case *Int:
+ x := elem.X.Int64()
+ if i == 0 {
+ val = x
+ } else if x != val {
+ // since all elements were not identical, we must conclude that
+ // the index vector does not have a concrete value.
+ return gep.Index{
+ HasVal: false,
+ VectorLen: uint64(len(index.Elems)),
+ }
+ }
+ default:
+ // TODO: remove debug output.
+ panic(fmt.Errorf("support for gep index vector element type %T not yet implemented", elem))
+ //return gep.Index{HasVal: false}
+ }
+ }
+ return gep.Index{
+ HasVal: true,
+ Val: val,
+ VectorLen: uint64(len(index.Elems)),
+ }
+}
diff --git a/ir/constant/expr_memory.go b/ir/constant/expr_memory.go
index 392c5f8..be2b73d 100644
--- a/ir/constant/expr_memory.go
+++ b/ir/constant/expr_memory.go
@@ -116,7 +116,7 @@ func (index *Index) String() string {
func gepExprType(elemType, src types.Type, indices []Constant) types.Type {
var idxs []gep.Index
for _, index := range indices {
- idx := getIndex(index)
+ idx := GetIndex(index)
// Check if index is of vector type.
if indexType, ok := index.Type().(*types.VectorType); ok {
idx.VectorLen = indexType.Len
@@ -126,7 +126,7 @@ func gepExprType(elemType, src types.Type, indices []Constant) types.Type {
return gep.ResultType(elemType, src, idxs)
}
-// NOTE: keep getIndex in sync with getIndex in:
+// NOTE: keep GetIndex in sync with GetIndex in:
//
// * ast/inst_memory.go
// * ir/inst_memory.go
@@ -134,8 +134,8 @@ func gepExprType(elemType, src types.Type, indices []Constant) types.Type {
//
// The reference point and source of truth is in ir/constant/expr_memory.go.
-// getIndex returns the gep index corresponding to the given constant index.
-func getIndex(index Constant) gep.Index {
+// GetIndex returns the gep index corresponding to the given constant index.
+func GetIndex(index Constant) gep.Index {
// unpack inrange indices.
if idx, ok := index.(*Index); ok {
index = idx.Constant
@@ -146,52 +146,8 @@ func getIndex(index Constant) gep.Index {
index = idx.Simplify()
}
switch index := index.(type) {
- case *Int:
- val := index.X.Int64()
- return gep.NewIndex(val)
- case *ZeroInitializer:
- return gep.NewIndex(0)
- case *Vector:
- // ref: https://llvm.org/docs/LangRef.html#getelementptr-instruction
- //
- // > The getelementptr returns a vector of pointers, instead of a single
- // > address, when one or more of its arguments is a vector. In such
- // > cases, all vector arguments should have the same number of elements,
- // > and every scalar argument will be effectively broadcast into a vector
- // > during address calculation.
- if len(index.Elems) == 0 {
- return gep.Index{HasVal: false}
- }
- // Sanity check. All vector elements must be integers, and must have the
- // same value.
- var val int64
- for i, elem := range index.Elems {
- switch elem := elem.(type) {
- case *Int:
- x := elem.X.Int64()
- if i == 0 {
- val = x
- } else if x != val {
- // since all elements were not identical, we must conclude that
- // the index vector does not have a concrete value.
- return gep.Index{
- HasVal: false,
- VectorLen: uint64(len(index.Elems)),
- }
- }
- default:
- // TODO: remove debug output.
- panic(fmt.Errorf("support for gep index vector element type %T not yet implemented", elem))
- //return gep.Index{HasVal: false}
- }
- }
- return gep.Index{
- HasVal: true,
- Val: val,
- VectorLen: uint64(len(index.Elems)),
- }
- case *Undef:
- return gep.Index{HasVal: false}
+ case Indexer:
+ return index.GetIndex()
case Expression:
// should already have been simplified to a form we can handle.
return gep.Index{HasVal: false}
diff --git a/ir/inst_memory.go b/ir/inst_memory.go
index f0aef35..0a1e90a 100644
--- a/ir/inst_memory.go
+++ b/ir/inst_memory.go
@@ -538,69 +538,5 @@ func gepInstType(elemType, src types.Type, indices []value.Value) types.Type {
// getIndex returns the gep index corresponding to the given constant index.
func getIndex(index constant.Constant) gep.Index {
- // unpack inrange indices.
- if idx, ok := index.(*constant.Index); ok {
- index = idx.Constant
- }
- // Use index.Simplify() to simplify the constant expression to a concrete
- // integer constant or vector of integers constant.
- if idx, ok := index.(constant.Expression); ok {
- index = idx.Simplify()
- }
- switch index := index.(type) {
- case *constant.Int:
- val := index.X.Int64()
- return gep.NewIndex(val)
- case *constant.ZeroInitializer:
- return gep.NewIndex(0)
- case *constant.Vector:
- // ref: https://llvm.org/docs/LangRef.html#getelementptr-instruction
- //
- // > The getelementptr returns a vector of pointers, instead of a single
- // > address, when one or more of its arguments is a vector. In such
- // > cases, all vector arguments should have the same number of elements,
- // > and every scalar argument will be effectively broadcast into a vector
- // > during address calculation.
- if len(index.Elems) == 0 {
- return gep.Index{HasVal: false}
- }
- // Sanity check. All vector elements must be integers, and must have the
- // same value.
- var val int64
- for i, elem := range index.Elems {
- switch elem := elem.(type) {
- case *constant.Int:
- x := elem.X.Int64()
- if i == 0 {
- val = x
- } else if x != val {
- // since all elements were not identical, we must conclude that
- // the index vector does not have a concrete value.
- return gep.Index{
- HasVal: false,
- VectorLen: uint64(len(index.Elems)),
- }
- }
- default:
- // TODO: remove debug output.
- panic(fmt.Errorf("support for gep index vector element type %T not yet implemented", elem))
- //return gep.Index{HasVal: false}
- }
- }
- return gep.Index{
- HasVal: true,
- Val: val,
- VectorLen: uint64(len(index.Elems)),
- }
- case *constant.Undef:
- return gep.Index{HasVal: false}
- case constant.Expression:
- // should already have been simplified to a form we can handle.
- return gep.Index{HasVal: false}
- default:
- // TODO: add support for more constant expressions.
- // TODO: remove debug output.
- panic(fmt.Errorf("support for gep index type %T not yet implemented", index))
- //return gep.Index{HasVal: false}
- }
+ return constant.GetIndex(index)
}
from llvm.
extractvalue
andinsertvalue
:aggregateElemType
inir
andir/constant
has different behavior. One allowsPointerType
and one not, if this is not expected then we could use one sumtype else has to create two.icmp
andfcmp
: They actually are type, I think they are not the target of this issue. But we could have another issue for user-extended type, it could be useful, e.g. simulateclass
from a high perspective.- simplifying: Only
*Int
for now, I'm not sure would users need to extend Int?
4. check for void calls: @mewmew Why this case didn't just usereturn n.Type().Equal(types.Void)
? Could there any instruction with typevoid
an not acall
,invoke
orterminator
?
BTW, in document mentions that pointer could be an integer in some cases. This probably a good reason to unify them by an interface.
If the operands are pointer typed, the pointer values are compared as if they were integers.
from llvm.
I have another idea: What if we make every IR structure became an interface. Therefore Function
for example, would be:
type Function interface {
// ...
}
var _ Function = &function{}
type function struct {
// ...
}
Anyone want to extend it would be able to:
type Class struct {
ir.Function
ir.Struct
}
func NewClass() *Class {
return &Class {
Function: ir.NewFunction(...),
Struct: ir.NewStruct(...),
}
}
The benefit I can see at here, is users don't need to learn advanced concept like Elemer
. People can understand Elemer
, but which can be an Elemer
is not clean as extend ir.Vec
, ir.Array
and ir.Pointer
.
BTW, this kind of implementation also using in Kubernetes.
from llvm.
I have another idea: What if we make every IR structure became an interface. Therefore Function for example, would be:
I like the idea! We should definitely create an experimental branch of llir/llvm/ir
where we try this out!
The main downside I see is that the fields would no longer be exported, so we would have to create a lot of getters and setters, and this would make the API quite bloated.
Besides that, the extensibility of the sumtypes would be hugely improved. So there are definitely benefits and drawbacks. So, to get a better understanding of how to balance this, we should implement it in an experimental branch and get more experience.
Cheers,
Robin
from llvm.
@mewmew You can get the experimental implementation at this commit, it only changes ir.Func
BTW.
from llvm.
@mewmew You can get the experimental implementation at this commit, it only changes ir.Func BTW.
Thanks for starting to experiment with this!
As you already pointed out, there are benefits and drawbacks with the interface-everything apprach. The benefits of such extensive use of interfaces is that we would enable users to extend the IR in almost every way. Whether we need this full power of extensibility is yet to be determined, thus great to have more concrete experiments to base such decisions on.
For now, lets consider some of the drawbacks with the current experimental implementation. One of the drawbacks to the API seems to be the handling of slices (as you already pointed out in https://github.com/dannypsnl/llvm/commit/ac2baf5820d9c13291f0be568546aba18fceac9e#r37512892). E.e.
f.SetFuncAttrs(append(f.GetFuncAttrs(), funcAttr))
The other issue is that type assertions on interfaces may not be enough to fully and uniquely discriminate them. The basic issue is that an interface with just getters/setters for its fields may not be enough to fully determine a unique concrete underlying type, e.g. add
and sub
instructions have the same fields (see https://github.com/dannypsnl/llvm/commit/ac2baf5820d9c13291f0be568546aba18fceac9e#r37514418). For what it's worth, the fix should be quite simple, we just add a IsAddInst
method or something along those lines to differentiate e.g. add
instructions from sub
instructions.
Cheers,
Robin
Edit: one possible outcome that we see from this experiment is that some IR concepts may be turned into interfaces to allow user-defined types, but others may be kept as core IR concepts with concrete struct types. Rather than going for a purist approach of having everything be an interface, it may turn out more pragmatic to have the concepts of the IR which are most likely for a user to modify and extend be turned into interfaces. For this, instructions is the obvious contender. And also the LLVM IR defined types. One such example I've wanted personally is an LLVM IR struct type which also tracks the names of its fields. When it comes to instructions, the pseudo-instruction used for comments is also a common example.
from llvm.
I suddenly realize we can embed ir.Module
as a solution for use-cases mentioned here, should I close the issue?
from llvm.
I suddenly realize we can embed ir.Module as a solution for use-cases mentioned here, should I close the issue?
Can you provide a quick example code which uses embedding of ir.Module
?
from llvm.
For example, we want to generate a class handle, in the meanwhile affect our declaration~~~
type EModule struct {
*ir.Module
}
func (m *EModule) NewClass() *Class {
return &Class {
Function: m.NewFunction(...),
Struct: m.NewStruct(...),
}
}
or
type EModule struct {
*ir.Module
}
type EStruct struct {
FieldNames []string
*types.StructType
}
func (m *EModule) NewStruct(fieldNames []string, fieldTypes []types.Type) *EStruct {
s := &EStruct {
StructType: m.NewStruct(...),
}
s.FieldNames = fieldNames
return s
}
from llvm.
Related Issues (20)
- `SRet` struct location HOT 3
- How to use Function Pointers? HOT 3
- Wrong function pointer type HOT 1
- indirect br only takes constant addr HOT 1
- Returning void as a value HOT 3
- update llir/llvm to support 14.0 HOT 2
- update llir/llvm to support LLVM 15.0 HOT 7
- How to have two structs referencing eachother HOT 4
- How to convert to exe? HOT 3
- `module.NewTypeDef` has unexpected side effects HOT 17
- Provide working Hello, World example HOT 1
- support llvm 16.0.0 HOT 2
- incorrect cmpxchg signature HOT 4
- How to generate ir for embedded struct? HOT 6
- Convert to string HOT 1
- Hello 10 instead of Hello world. Passing a IntPtr to printf (go - LLIR/LLVM) HOT 3
- malloc and strcpy in LLVM HOT 1
- unable to generate LLVM: structure, external , noundef and ptr HOT 1
- No way to find underlying type of pointer or array HOT 1
- How to compile LLVM IR to machine code programatically? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llvm.