Code Monkey home page Code Monkey logo

wechat-dump's Introduction

Dump WeChat Messages from Android

导出安卓微信聊天数据

WeChat(微信), as the most popular mobile IM app in China, doesn't give users any method to export well-formatted history message. This tool can parse and export WeChat messages on a rooted android phone.

Right now it can dump messages in text-only mode, or generate a single-file html containing voice messages, images, emoji, etc.

NEWS: WeChat 6.0+ uses silk to encode audio. The code is updated.

NEWS: WeChat 6.3 uses a new avatar storage. The code is updated.

HELP NEEDED: Starting from May 2016, the first 1KB of all emojis in resource/emoji are encrypted. Right now I'm using emoji URL which covers most of them.

If you are good at cryptography / reverse engineereing, or you work at Tencent, feel free to contact me or help take a look. It is also possible to recover the image without knowing the first 1KB (just have to detect chunks without knowing metadata), but I don't have time to do that either.

If this tools works for you, please take a moment to add your phone/OS to the wiki. If it doesn't work, please leave an issue together with your phone/OS/wechat version.

How to use:

Dependencies:

  • requests
  • python-PIL
  • PyQuery
  • pysox and sox
  • pysqlcipher
  • numpy
  • csscompressor (suggested, optional)
  • adb and rooted android phone connected to a Linux/Mac OSX/Win10+Bash.
  • Silk audio decoder (included; just run ./third-party/compile_silk.sh)
  • gnu-sed

On Debian/Ubuntu systems, these dependencies can be installed via:

sudo apt-get install sox android-tools-adb python-requests python-pil python-pyquery python-numpy libsox-dev libsqlite3-dev libsqlcipher-dev
pip install --user pysqlcipher csscompressor
pip install --user --pre pysox

Get Necessary Data:

  • Get the decrypted WeChat database and the avatar index:

    • Automatic: ./android-interact.sh db-decrypt
    • Manual:
      • Figure out your ${userid} by inspecting the contents of /data/data/com.tencent.mm/MicroMsg on the root filesystem of the device. It should be a 32-character-long name consisting of hexadecimal digits.

      • Get /data/data/com.tencent.mm/MicroMsg/${userid}/{EnMicroMsg.db,sfs/avatar.index} from the device, possible ways are:

        • ./android-interact.sh db
        • Use your rooted file system manager app
      • Get WeChat uin (an integer), possible ways are:

        • ./android-interact.sh uin, which pulls the value from /data/data/com.tencent.mm/shared_prefs/system_config_prefs.xml
        • Login to web wechat, get wxuin=1234567 from document.cookie
      • Get your phone IMEI/MEID number (a positive integer), possible ways are:

        • ./android-interact.sh imei
        • Call *#06# on your phone
        • Find IMEI in system settings
      • Decrypt database, will produce decrypted.db:

        ./decrypt-db.py <path to EnMicroMsg.db> <imei> <uin>
        

    NOTE: you may need to try different ways to get IMEI/MEID, because things behave differently on different phones. Some phones may have multiple IMEIs, you may need to try them all. See #33.

      If decryption doesn't work, you can also try the [password cracker](https://github.com/chg-hou/EnMicroMsg.db-Password-Cracker)
      to brute-force the password.
    
  • Copy the WeChat user resource directory /mnt/sdcard/tencent/MicroMsg/${userid}/{avatar,emoji,image2,sfs,video,voice2} from the phone to the resource directory:

    • ./android-interact.sh res
    • You might need to change RES_DIR in the script if the default is incorrect on your phone.
    • This script needs busybox and base64 on your phone. If they are not available, there is a slow fallback method in the script you can use.
    • This can take a few minutes. One way to do it faster:
      • If there's enough free space on your phone, you can log in and archive all required files via busybox tar with or without compression, and use adb pull to copy the archive. Note that busybox is needed as the Android system's tar may choke on long paths.
    • What you'll need in the end is a resource directory with the following subdir: avatar,emoji,image2,sfs,video,voice2.
  • (Optional) Download the emoji cache from here and decompress it under wechat-dump. This will avoid downloading too many emojis during rendering.

      wget -c https://github.com/ppwwyyxx/wechat-dump/releases/download/0.1/emoji.cache.tar.bz2
      tar xf emoji.cache.tar.bz2
    

Run:

  • Parse and dump text messages of every chat (requires decrypted.db):

    ./dump-msg.py decrypted.db output_dir
    
  • List all chats (requires decrypted.db):

    ./list-chats.py decrypted.db
    
  • Generate statistical report on text messages (requires output_dir from ./dump-msg.py):

    ./count-message.sh output_dir
    
  • Dump messages of one contact to html, containing voice messages, emojis, and images (requires decrypted.db, avatar.index, and resource):

    ./dump-html.py "<contact_display_name>"
    

    The output file is output.html.

    Check ./dump-html.py -h to use different paths.

Examples:

Screenshots of generated html:

byvoid

See here for an example html.

TODO List

  • Attack the emoji encryption problem
  • Fix rare unhandled message types: > 10000 and < 0
  • Better user experiences... see grep 'TODO' wechat -R

Donate!

[paypal]

wechat-dump's People

Contributors

chaserhkj avatar futurewrg avatar mygod0rz avatar ppwwyyxx avatar swordyork avatar trammel avatar vuryleo avatar weakish avatar ymy513 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.