lukhio / dexparser Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
For now the parser just disassembles the full app and prints the result on stdout
. Add some CLI arguments to allow for:
Some instructions use signed values (e.g., relative offset). Right now in the code we do not properly handle that and only use unsigned values.
It is not needed to use a trait here since there is a finite set of instruction types. Instead we can use an emum, which should simplify things down the road.
List of structures that can appear in a DEX file:
header_item
map_list
map_item
string_id_item
string_data_item
type_id_item
proto_id_item
field_id_item
method_id_item
class_def_item
call_site_id_item
call_site_item
method_handle_item
class_data_item
encoded_field
encoded_method
type_list
type_item
code_item
try_item
encoded_catch_handler_list
encoded_catch_handler
encoded_type_addr_pair
debug_info_item
annotations_directory_item
field_annotation
method_annotation
parameter_annotation
annotation_set_ref_list
annotation_set_ref_item
annotation_set_item
annotation_off_item
annotation_item
encoded_array_item
hiddenapi_class_data_item
dalvik.annotation.AnnotationDefault
dalvik.annotation.EnclosingClass
dalvik.annotation.EnclosingMethod
dalvik.annotation.InnerClass
dalvik.annotation.MemberClasses
dalvik.annotation.MethodParameters
dalvik.annotation.Signature
dalvik.annotation.Throws
If a string cannot be decoded from MUTF-8 to ASCII, it is added "raw" to the lsit of strings (i.e., undecoded). This can lead to issues down the road because the various DEX classes that uses strings expect the list of strings to be ordered "by string contents, using UTF-16 code point values" (see documentation).
There is three possibilities for the source of this bug:
There are still some lesser used instructions that we do not disassemble (we do parse them though). This issue is to keep track of them:
invoke-polymorphic
invoke-polymorphic/range
filled-new-array/range
invoke-custom
invoke-custom/range
const-method-handle
const-method-type
packed-switch-payload
sparse-switch-payload
fill-array-data-payload
Some apps have multiple DEX files when they have more than 65,536 methods. The code is instead split into multiple DEX files which are then merged by the system when installing the app. Such apps have mutliple DEX files named classes.dex
, classes2.dex
, etc.
Looks like merging them in our parser is simply a matter of parsing each DEX file individually and merging the lists of parsed data (e.g., strings, prototypes).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.