fare9 / shuriken-analyzer Goto Github PK

Repository for a library focused on binary analysis (mainly for Java related bytecodes)

License: BSD 3-Clause "New" or "Revised" License

CMake 2.39% C++ 91.69% Java 3.22% Python 2.70%

shuriken-analyzer's Introduction

Shuriken-Analyzer

Welcome to the repository of Shuriken Analyzer, a library intended for bytecode analysis! Shuriken is an evolution from Kunai-Static-analyzer project, where the architecture of the library has been modified in order to better adapt it to other bytecodes. Shuriken is intended to offer analysts parsing, disassembly and analysis capabilities, and it is planned to have an improved version of the Intermediate Representation (IR) provided by Kunai.

Inside the repository you will find the next folders:

shuriken: folder with the code from the main library. Here the core code from Shuriken is written the code from the parsers, the disassemblers, etc.
shuriken-dump: command line tool for dumping the structure of a DEX file (for the moment).

APIs In Other Programming Languages

For supporting other programming languages, we are working on offering a shim API in C. Once we have a stable API in C, we plan to start writing the APIs for other languages, right now we plan the next APIs:

C API
Python API

The Project

The project is still in an "alpha" version, but we are in continuous development. If you want to help do not hesitate to open an issue, or if you want to write some code, check opened issues and read the CONTRIBUTING.md which contains a few points about the coding style of the project.

The logo has been designed and created by ShanShan Bu, and now distributed under Creative Common License.

Shuriken Analyzer Logo by ShanShan Bu is licensed under Attribution-ShareAlike 4.0 International

shuriken-analyzer's People

Contributors

Stargazers

Watchers

Forkers

jeppojeps gmh5225 lukhio sisco0

shuriken-analyzer's Issues

Create github CI/CD

Create the necessary configuration for compiling and running the tests in CI on github, two CIs can be created:

One for testing each push into a branch.
A testing CI for allowing merge request into the main branch.

Different compilers can be used for the test, as well as different systems.

Create a Dalvik Disassembler

Create a Dalvik Disassembler that allows different algorithms for the disassembly process. It would be useful that the disassembler produces as output the smali format, more information about this format on the repository: https://github.com/JesusFreke/smali.
The disassembler must be written inside of the folder disassembler/dalvik, an easy to use API should be provided to the user.

Modify the API to Return `std::reference_wrapper` instead of `std::unique_ptr`

Now that the API is more stable, to avoid sharing a unique_ptr by reference to keep the ownership, we will need to change the getter functions for functions that return the same version of the structure, but instead of returning std::unique_ptr& return a copy of the object with the type std::reference_wrapper which will avoid problems with ownership.
You can check the next example: https://github.com/Fare9/Shuriken-Analyzer/blob/cpp-api/shuriken/lib/api/Dex/ShurikenClassManager.cpp

We can probably create cache objects to avoid generating the same object all the time.

Create a logo for the project

Create a logo for Shuriken Analyzer project and include it in the README.md file.

Start the development of a proper documentation

For helping in the usage of Shuriken, it would be good to have a documentation of usage, as well as a doxygen documentation. Doxygen documentation is directy generated from the comments of the source code, but for the usage documentation probably it would be useful to have a format that allows to generate a website with all the usage information, with examples, etc.

Improve Python part of Shuriken

In order to improve the Python library from Shuriken, and make it more usable we will need to do the next:

Add a better way to look for the shared library libshuriken in different systems (Linux, Mac & Windows).
Add an installation mechanism for Python, probably a setuptools or a way to create a pip package.

Create a Shim in C for Dalvik Parser

In order to allow easily writing bindings in other languages an idea is creating a Shim of the code in C, this will be a header that will contain structures representing all the data from the C++ code. This is known as Hourglass API interface. The Binary Ninja tool has this as a Core API available in the next link: https://github.com/Vector35/binaryninja-api/blob/dev/binaryninjacore.h, and they state the next about this header: The Core API is designed to only be used as a shim from other languages and is not currently intended to be used to build C plugins directly.
This is related to the next issue: #4
And with the next branch: https://github.com/Fare9/Shuriken-Analyzer/tree/4-create-a-simpler-dalvik-parser

Refactor headers

Improve the code from the headers to avoid memory management in .h files.

Add a CONTRIBUTING.md

For allowing people to easily contribute to the project following some coding standards, and the way to contribute to the project. For the coding standards I think it would be a nice idea to follow the LLVM Coding Standards. Another good idea is following the recommendations from the CppCoreGuidelines.

How to include code in the repository?

If there's an issue with the problem or the feature

Create a branch from the issue and work on the issue only on that branch, main branch is protected so no merge request will be allowed directly to master without at least 1 review.

If no issue exists for the feature or problem

Create an issue explaining the feature or the problem, and from there apply the previous point.

How to add commits?

Commits in master should be explanatory of the features included or the problems fixed. Also we will try to keep the number of commits as low as possible.

How to do that?

While you are working on some issue or feature, upload commits in your branch whenever you need it. But once you have finished working on the branch, try to squash the commits, grouping all those which are related. If you don't know what is this, I recommend reading this post

I think all these notes will be needed in the CONTRIBUTING.md file.

Create installers for Shuriken and APIs

With cmake is possible to add commands for installing the different artifacts generated during the compilation. In this case the idea is installing all the header files to a header directory, the library in a proper folder where the system can find it, and the APIs to paths where the interpreter (for example in the case of Python) can find the libraries and being included when writing a program/script with them.

Generate the documentation with doxygen

Generate a documentation with doxygen, the CI should generate that documentation when a branch is merged into master. This documentation once has been published, it must be uploaded into a github website for the documentation.

Create a Simpler Dalvik Parser

Create a new Dalvik Parser on its corresponding folder. In Kunai, the parser became a little difficult when it comes to the Annotations part because there were many nested lists. A super class could be created in order to create sub-classes for the different annotations. The parser should provide ranges for accessing the data since it would provide a safe way for accessing data.
Next you can find the Parser from Kunai: https://github.com/Fare9/KUNAI-static-analyzer/tree/refactoring/kunai-lib/include/Kunai/DEX/parser

This issue can be used for design discussion.

Add LICENSE File

We need to provide a licensing mechanism

Fix issues in headers to adapt to newer version of libstdc++

There have been some changes in newer versions from libstdc++ and some headers need to be included explicitly, directly taken from GCC website:

The following headers are used less widely in libstdc++ and may need to be included explicitly when compiling with GCC 13:

(for std::string, std::to_string, std::stoi etc.)
<system_error> (for std::error_code, std::error_category, std::system_error).
(for std::int8_t, std::int32_t etc.)
(for std::printf, std::fopen etc.)
(for std::strtol, std::malloc etc.)

More information can be found in: https://gcc.gnu.org/gcc-13/porting_to.html

Probably this will be easily fixed including the missing headers.

Add `Analysis` classes in order to manage things like

We will need analysis classes that will be used to calculate things like cross-references between classes, methods, strings and fields. These analysis classes will also create control-flow-graphs for each method, each control-flow-graph will contain basic blocks of instructions, and due to how Shuriken manage now the instructions they will not be a copy of the instructions from the method, but a pointer to the beginning and end from the range of instructions that the basic block represents, this will allow saving memory but also it will have a better performance.

Error: Process completed with exit code 100.

There's an error compiling in the GCC pipeline, the next error appears:

Error: Process completed with exit code 100.

Example of job with an error: https://github.com/Fare9/Shuriken-Analyzer/actions/runs/9100651167/job/25407937453

Write a command line tool for showing information

A command line tool could be useful for testing since it can be used for analyzing files, parsing them, and printing information. That information can be used in tests to check that some analyses are correct, also it can give ideas for improvement in the library.

Write testing documentation & think about the DEX files

For testing purposes we need to compile a small folder with java files to dex files, to avoid people installing javac and d8 we can upload the DEX files to the repository, and work with them from there. Also it would be useful to write in the README the documentation about how to compile and run the tests.

Add more tests for the different modules of Shuriken and include tests in CI

As a way to have a more stable library, it is always good to include tests, tests and more tests. We will include these tests as part of the CI, in this way for accepting a pull requests into main branch, all these tests must properly run.

Architectural design and research of current work

For the development of an easily extendable library, it will be necessary to focus on creating an appropriate architecture. Some of the ideas for shuriken includes:

Writing APIs for other languages (Python, Lua, Rust...), an idea could be using a shim in C from the C++ classes.
Writing a plugin for ****** and works as backend for structures like DEX or CLASS.
Write simpler classes from the parser that also allows to rewrite the data back to a file in a proper way.

Also it would be nice to start researching current state of the art, and current work done in the area, write it as comments into this issue.