Code Monkey home page Code Monkey logo

pixiv_ai_crawler's Introduction

人工智能pixiv高质量涩图爬虫

Open In Colab

能学会你xp的AI涩图爬虫

爬虫部分基于 PixivCrawler 修改实现, 涩图识别分类部分使用 ConvNeXt 作为backbone的分类模型实现, 性能优于Trasnformer类模型。

自动筛选效果

环境配置

环境配置参考 ConvNeXt

需要 pytorch==1.8 timm==0.3.2

下载miniconda,创建新python环境并激活

conda create -n pixivai python=3.9
conda activate pixivai

安装pytorch

conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch-lts -c conda-forge
# 没有N卡的用这个
conda install pytorch torchvision torchaudio cpuonly -c pytorch-lts

安装其他依赖

pip install -r requirements.txt

使用方法

下载预训练权重放在ckpt/文件夹内:

下载权重-百度网盘 提取码:mmwi 或 下载权重

根据 PixivCrawler 的说明配置爬虫,设置账号和cookie,设置要爬的内容。

pixiv_crawler/config.py中配置爬虫基本参数。

运行命令启动AI爬虫:

# 不加关键字默认爬日榜
python AIcrawler.py --ckpt 模型权重 --n_images 总图像个数 [--keyword 关键字] 

按自己的xp训练模型

数据处理

准备至少5000张图。 用labeler.py打标签,数据集标签会储存为json格式。

把不同类别放入不同文件夹,用labeler_folder.py一键打标签。

images
|--0
|  |--1.png
|  |--2.png
|
|--1

data_proc.py划分训练集和测试集,并对图像进行预处理。

修改参数,运行脚本训练:

python train.sh

训练参数设置参考 ConvNeXt

pixiv_ai_crawler's People

Contributors

irisrainbowneko avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.