Code Monkey home page Code Monkey logo

zasm's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

zasm's Issues

Please provide a example ready to build project.

I couldn't use it in my own project. It gives linker errors.

Build started... 1>------ Build started: Project: Test, Configuration: Release x64 ------ 1>Test.obj : error LNK2001: unresolved external symbol "public: enum zasm::Error __cdecl zasm::Serializer::serialize(class zasm::Program const &,__int64)" (?serialize@Serializer@zasm@@QEAA?AW4Error@2@AEBVProgram@2@_J@Z) 1>Test.obj : error LNK2001: unresolved external symbol "public: __cdecl zasm::Serializer::~Serializer(void)" (??1Serializer@zasm@@QEAA@XZ) 1>Test.obj : error LNK2001: unresolved external symbol "public: __cdecl zasm::Serializer::Serializer(void)" (??0Serializer@zasm@@QEAA@XZ) 1>Test.obj : error LNK2001: unresolved external symbol "public: enum zasm::Error __cdecl zasm::x86::Assembler::emit(class zasm::Instruction const &)" (?emit@Assembler@x86@zasm@@QEAA?AW4Error@3@AEBVInstruction@3@@Z) 1>Test.obj : error LNK2001: unresolved external symbol "public: virtual __cdecl zasm::x86::Assembler::~Assembler(void)" (??1Assembler@x86@zasm@@UEAA@XZ) 1>Test.obj : error LNK2001: unresolved external symbol "public: __cdecl zasm::x86::Assembler::Assembler(class zasm::Program &)" (??0Assembler@x86@zasm@@QEAA@AEAVProgram@2@@Z) 1>Test.obj : error LNK2001: unresolved external symbol "public: enum zasm::MachineMode __cdecl zasm::Program::getMode(void)const " (?getMode@Program@zasm@@QEBA?AW4MachineMode@2@XZ) 1>Test.obj : error LNK2001: unresolved external symbol "public: __cdecl zasm::Program::~Program(void)" (??1Program@zasm@@QEAA@XZ) 1>Test.obj : error LNK2001: unresolved external symbol "public: __cdecl zasm::Program::Program(enum zasm::MachineMode)" (??0Program@zasm@@QEAA@W4MachineMode@1@@Z) 1>Test.obj : error LNK2001: unresolved external symbol "public: class zasm::Expected<class zasm::Instruction,enum zasm::Error> __cdecl zasm::Decoder::decode(void const *,unsigned __int64,unsigned __int64)" (?decode@Decoder@zasm@@QEAA?AV?$Expected@VInstruction@zasm@@W4Error@2@@2@PEBX_K1@Z) 1>Test.obj : error LNK2001: unresolved external symbol "public: __cdecl zasm::Decoder::Decoder(enum zasm::MachineMode)" (??0Decoder@zasm@@QEAA@W4MachineMode@1@@Z)

Serialiser fail at 64bit relative addresses

Based on the provided example, trying to serialise a far jmp / call or r/w mem instructions results in a
ImpossibleInstruction error.
e.g.

    using namespace zasm;

    const uint64_t address = 0x00000001400019A4;
    const std::vector<uint8_t> code = {
      0xFF, 0x15, 0x73, 0x16, 0x00, 0x00 // CALL QWORD PTR DS:[0x0000000140003028]
};

    Program program(MachineMode::AMD64);
    x86::Assembler assembler(program);
    Decoder decoder(program.getMode());
    Serializer serializer;

    // Decode all bytes.
    size_t bytesDecoded = 0;
    while (bytesDecoded < code.size())
    {
        const auto curAddress = address + bytesDecoded;
        auto decoderRes = decoder.decode(code.data() + bytesDecoded, code.size() - bytesDecoded, curAddress);
        if (!decoderRes)
        {
            std::cout << "Failed to decode at " << std::hex << curAddress  << "\n";
            return;
        }
        const auto& instr = decoderRes.value();
        assembler.emit(instr);
        bytesDecoded += instr.getLength();
    }
    serializer.serialize(program, address); // Error
    const auto codeDump = getHexDump(serializer.getCode(), serializer.getCodeSize());
    std::cout << codeDump << "\n";
}

Impossible to set ReadWrite access to file descriptor in FileStream

Hello,it seems there is a bug that does not allow to open file descriptor in read&write mode despite we have StreamMode::ReadWrite enum value.The problem in the following line of FileStream::open function _wfopen_s(&fp, path.wstring().c_str(), mode == StreamMode::Read ? L"rb" : L"wb");.The problem if we pass ReadWrite enum value to FileStream::open method or to constructor it open the file in wb mode and overwrites the content.It is not the problem for load and store func cause as I understood there is no sence to pass FileStream in ReadWrite value to save func cause it has to clear the content before writing bytes using his own patterns.But it is problem for user who wants to use FileStream as a wrapper to work with files cause he is not able to open file in rw mode using FileStream.I could create pull request but I can not firgure out how to fix it to not break load and store func.

P.S.As an option we can write something like this _ _wfopen_s(&fp, path.wstring().c_str(), mode == StreamMode::Read ? L"rb" : mode == StreamMode::Write ? L"wb" : L"r+b");
but it will break save func cause file will not be cleared before writing our Program;

Needs proper documentation!

The example code is useful but barely scratches the surface of any kind of practical application. Documentation definitely needed.

Calculating offset of memory operand type relative to instruction

image

In the above example I need to know the offset of the uint (highlighted in grey) for memory operands for the purposes of fixing up rva's in a mutation engine.

Please can you suggest a way to either extract the size of the mnemonic and I can work it out that way, or even better would be to know the offset of the bytes of the operand relative to the bytes of the full instruction.

Thanks :)

Calculating addresses of call, jmp etc.

With zydis, you can call ZydisCalcAbsoluteAddressEx, but there is no way to retreive the relevant data to perform this from the Decoder.

Can you suggest any way to do this please?

Invalid relative instructions size estimation

Hello, first of all, thanks for all that huge work, but i've been getting some problems with program size estimation that I want to report and maybe you can help me in fixing it(if its even possible).

In the example below I've used the estimateCodeSize and allocatePage from the basic_jit example.

static std::size_t estimate_code_size(const zasm::Program& program) {
    std::size_t size = 0;
    for (auto* node = program.getHead(); node != nullptr; node = node->getNext()) {
        if (auto* nodeData = node->getIf<zasm::Data>(); nodeData != nullptr) {
            size += nodeData->getTotalSize();
        } else if (auto* nodeInstr = node->getIf<zasm::Instruction>(); nodeInstr != nullptr) {
            const auto& instrInfo = nodeInstr->getDetail(program.getMode());
            if (instrInfo.hasValue()) {
                size += instrInfo->getLength();
            } else {
                std::cout << "Error: Unable to get instruction info\n";
            }
        } else if (auto* nodeEmbeddedLabel = node->getIf<zasm::EmbeddedLabel>(); nodeEmbeddedLabel != nullptr) {
            const auto bitSize = nodeEmbeddedLabel->getSize();
            if (bitSize == zasm::BitSize::_32)
                size += 4;
            if (bitSize == zasm::BitSize::_64)
                size += 8;
        }
    }
    return size;
}

static void* allocate_page(std::size_t codeSize) {
    return VirtualAlloc(0, codeSize, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
}

int main(int argc, char* argv[]) {
    using namespace zasm;

    Program program(MachineMode::AMD64);
    x86::Assembler a(program);

    {
        auto test_cond = program.createLabel("test_cond");

        a.cmp(x86::rcx, 0x1337);
        a.jz(test_cond);
        a.ret();

        a.bind(test_cond);
        a.nop();
        a.ret();
    }

    const auto estimated_size = estimate_code_size(program);
    void* code_page = allocate_page(estimated_size);

    Serializer serializer;
    if (auto err = serializer.serialize(program, reinterpret_cast<int64_t>(code_page)); err != zasm::Error::None) {
        std::cout << "Serialization failure: " << zasm::getErrorName(err) << "\n";
        return EXIT_FAILURE;
    }

    const auto serialized_size = serializer.getCodeSize();
    std::memcpy(code_page, serializer.getCode(), serialized_size);

    auto fmt_args = disasm::c_instance::fmt_args_t{.data_ptr = (uint8_t*)code_page,
                                                   .address = code_page,
                                                   .dump_opcodes = true,
                                                   .dump_address = true,
                                                   .dump_offset = true};
    logger::info("Generated:\n{}", disasm::get().format_range(fmt_args, estimated_size));

    logger::warn<1>("estimated size: {} | serialized_size: {}", estimated_size, serialized_size);

    return 0;
}

The results of executing this function are:
preview

As you can see, the estimated code size and serialized code size are different because of the JZ instruction, which produced the 5 bytes size instruction at the moment when we were estimating the code size, and it produced the 2 bytes size instruction at the moment when we were serializing the output, and because of that, the output of my format function produced 2 invalid instructions at the end.

The reason why this happens is that at the moment we are estimating the code size we are encoding instructions one by one and we don't know the addresses of the prev/next instructions, so we can't estimate the relative offset.

Originally I thought about just creating a PR where I fix this, but the more I think the more questions I get.
Surely we can fix this kind of behaviour for the JMPs/JCCs to the instruction label, but I am not entirely sure how we can calculate the right instruction size if we are dealing with immediate addresses.

More basic examples needed

I asked before for the ability to get the address of a label, and you supplied that.

Now I need a few other things:
Examples of basic things
a) actually calling code you assembled. There are no examples of calling code you created.
b) an example of calling into user or library code from assembly.
c) disassembling to human readable form. Getting the abi right for a platform is complex, and you pretty much need to be able to disassemble and see code that the compiler generates so that you can make templates from entry and exit code that's correct in every detail. Also it would be useful for making sure that the code you're generating is what you thought it was.
d) deallocating the result of an assembly - for a jit you not only create code, but you throw it away when you're done with it and are replacing it with altered code.

Edit I guess I can just use the debugger for disassembly.

Zasm wrong jump calculation

Hello,recently I have faced a problem with incorrect jump encoding.Do not think about the sense of the god it is just an example.
` Program program(MachineMode::I386);

x86::Assembler assembler(program);

auto label = assembler.createLabel();

ASSERT_EQ(assembler.jmp(label), Error::None);
for (int i = 0; i < 100; i++)
ASSERT_EQ(assembler.nop(), Error::None);
ASSERT_EQ(assembler.bind(label), Error::None);
ASSERT_EQ(assembler.int3(), Error::None);
ASSERT_EQ(assembler.align(Align::Type::Code, 10), Error::None);

Serializer serializer;
ASSERT_EQ(serializer.serialize(program, 0x0000000000401000), Error::None);`

My jmp should went to int3 instruction but it goes 3 bytes futher.I looked at the source code of zasm and noticed possible incorrect logic(I think).When jmp is encoding first time its size is 5.But on extrapass its value 2(cause now we already have bounded label and it changes from far jmp to short).So now ctx.drift is 3 and we should run pass one more time.But because of aligning at the end of the code drift becomes 0(3 - 3).And now zasm does not run the third pass (despite aligning happens not in range of our jmp)and he thinks that offset of int 3 is still 105 instead of 103.Am I doing something wrong or it is a bug?

P.S.Code is just an example dont take it as smth meaningful.

Compiling question

Hello,I am trying to incorporate zasm into my project and I found that I have to include headers and link libraries not only zasm part but also Zydis.lib and its headers to make it compile?Is that okey or am I doing smth wrong?
image
image

Improve error handling for serialization.

Currently its quite difficult to tell which node causes the error, there should be a way to get extended information about what went wrong and which node is causing it.

jmp imm bug

Snipaste_2024-06-07_13-27-47
My target address is 0x347486 (0x224AB8A1486), but the generated assembly instruction's target address is 0x347485 (0x224AB8A1485). Is this a bug?

    zasm::Program program(zasm::MachineMode::AMD64);
    zasm::x86::Assembler assembler(program);

    assembler.jmp(zasm::Imm(instr.jmp_rva));

    zasm::Serializer serializer{};
    auto res = serializer.serialize(program, instr.rva);
    if (res == zasm::ErrorCode::None) {
      auto ptr = serializer.getCode();
      auto size = serializer.getCodeSize();
      std::memcpy(reinterpret_cast<void *>(base.module_base + instr.rva),
                  serializer.getCode(), serializer.getCodeSize());
     }
     

Invalid encoding

Incorrect instruction coding. Faced the problem that I incorrectly encode ANY INSTRUCTION. After looking at the code for a bit, I sort of realized that it thinks that I am passing the wrong operand size to the instruction. It doesn't matter, just a preface. As I understood further when I looked at the function

i wanna to encode push 0xDEADC0DE ( 0x68 opcode), i got error because i use x64 library to encode x32 instructions and encoder thinks that my operand immediate value is signed but i use x64 bit application and thats why

static ZyanU8 ZydisGetSignedImmSize(ZyanI64 imm)
{
    if (imm >= ZYAN_INT8_MIN && imm <= ZYAN_INT8_MAX)
    {
        return 8;
    }
    if (imm >= ZYAN_INT16_MIN && imm <= ZYAN_INT16_MAX)
    {
        return 16;
    }
    if (imm >= ZYAN_INT32_MIN && imm <= ZYAN_INT32_MAX)
    {
        return 32;
    }

    return 64;
}

all checks failed and i got 64 bit size, and thats wrong. I got it because of x64 application.

i need to pass to parameters value with 0xffffffff???????? if i wanna say function to encode 32bit value but it is not convenient way.

a.emit(ZYDIS_MNEMONIC_PUSH, Imm((int32_t)0xdeadc0de));

solved, but how can i do it in more convenient way?

A few touches to make zasm useable for something

Lets say I want to use this write a jit, well in that case I need to be able to pass addresses in of library routines and to get addresses out of generated routines.

Ie, I need to be able to take the address of a label after serialization, and I need to be able to SET a constant address for some labels before serialization.

I don't see a way to do these things. There are certainly no examples that do them.

I made a fork to do this myself.

Immediate too large

#include "examples.common.hpp"

#include <iostream>
#include <windows.h>

static void* allocatePage(std::size_t codeSize)
{
#ifdef _WIN32
    return VirtualAlloc(0, codeSize, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
#else
    // TODO: mmap for Linux.
    return nullptr;
#endif
}

static std::size_t estimateCodeSize(const zasm::Program& program)
{
    std::size_t size = 0;
    for (auto* node = program.getHead(); node != nullptr; node = node->getNext())
    {
        if (auto* nodeData = node->getIf<zasm::Data>(); nodeData != nullptr)
        {
            size += nodeData->getTotalSize();
        }
        else if (auto* nodeInstr = node->getIf<zasm::Instruction>(); nodeInstr != nullptr)
        {
            const auto& instrInfo = nodeInstr->getDetail(program.getMode());
            if (instrInfo.hasValue())
            {
                size += instrInfo->getLength();
            }
            else
            {
                std::cout << "Error: Unable to get instruction info\n";
            }
        }
        else if (auto* nodeEmbeddedLabel = node->getIf<zasm::EmbeddedLabel>(); nodeEmbeddedLabel != nullptr)
        {
            const auto bitSize = nodeEmbeddedLabel->getSize();
            if (bitSize == zasm::BitSize::_32)
                size += 4;
            if (bitSize == zasm::BitSize::_64)
                size += 8;
        }
    }
    return size;
}

int main()
{
    using namespace zasm;

    const uint64_t address = 0x7ff7cf0055b3;

    const std::vector<uint8_t> code = { 
        0x48, 0x2B, 0x1D, 0x00, 0x00, 0x00, 0x00 
        //sub    rbx,QWORD PTR [rip+0x0]  
    };

    Program program(MachineMode::AMD64);
    x86::Assembler assembler(program);;

    Decoder decoder(program.getMode());
    Serializer serializer;

    size_t bytesDecoded = 0;

    while (bytesDecoded < code.size())
    {
        const auto curAddress = address + bytesDecoded;

        const auto decoderRes = decoder.decode(code.data() + bytesDecoded, code.size() - bytesDecoded, curAddress);

        const auto& instrInfo = *decoderRes;

        const auto instr = instrInfo.getInstruction();
        printf( "%s\n", formatter::toString(&instr).c_str());
        assembler.emit(instr);

        bytesDecoded += instrInfo.getLength();
    }

    const auto requiredSize = estimateCodeSize(program);
    void* pCodePage = allocatePage(requiredSize);
    Error error = serializer.serialize(program, reinterpret_cast<uint64_t>(pCodePage)); // Error

    if (error != Error::None){
        printf("error = %s\n", getErrorName(error));
    }
    system("pause");
    return 0;
}

output:

sub rbx, qword ptr ds:[rel 0x7ff7cf0055ba]
imm = 139215200671155 | 7E9D909555B3
error = Error::ImpossibleInstruction

https://github.com/zyantific/zydis/blob/ffde0f46398a86417c860462f6af0556331cb5bb/src/Encoder.c#L455
https://github.com/zyantific/zydis/blob/ffde0f46398a86417c860462f6af0556331cb5bb/src/Encoder.c#L1598

Mem constructor for [base + index]

Should the Mem constructor that takes base and index registers have scale set to 1 instead of 0?

static constexpr Mem ptr(BitSize bitSize, const Gp& base, const Gp& index) noexcept

Succeeds:

a.mov(zasm::x86::al, zasm::x86::byte_ptr(zasm::x86::rsi, zasm::x86::rcx, 1, 0));

Fails impossible instruction:

a.mov(zasm::x86::al, zasm::x86::byte_ptr(zasm::x86::rsi, zasm::x86::rcx));

Likely cause:

    // ptr [base + index]
    // ex.: mov eax, ptr [ecx+edx]
    static constexpr Mem ptr(BitSize bitSize, const Gp& base, const Gp& index) noexcept
    {
        return Mem(bitSize, Seg{}, base, index, 0, 0);
    }

Performance problem.

I'm currently writing a program which will reencode nearly every instruction of a big problem(a game).
So there will be millions of calls on reencode callback. (see here.)

The problem is that if I declare the Program and Assembler as local variables, then there will be millions of object/memory allocations and deallocations which will be a big impact on speed.

But if I declare them as global variables, I didn't see any clear methods so the generated code will be accumulated on every call.

Is there any solutions to this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.