Code Monkey home page Code Monkey logo

opencsgs / csghub Goto Github PK

View Code? Open in Web Editor NEW
414.0 10.0 22.0 27.25 MB

CSGHub is an opensource large model assets platform just like on-premise huggingface which helps to manage datasets, model files, codes and more. CSGHub是一个开源、可信的大模型资产管理平台,可帮助用户治理LLM和LLM应用生命周期中涉及到的资产(数据集、模型文件、代码等)。CSGHub提供类似私有化的Huggingface功能,以类似OpenStack Glance管理虚拟机镜像、Harbor管理容器镜像以及Sonatype Nexus管理制品的方式,实现对LLM资产的管理。欢迎关注反馈和Star⭐️

Home Page: https://portal.opencsg.com/models

License: Apache License 2.0

Dockerfile 0.04% Ruby 28.30% JavaScript 6.51% CSS 2.65% SCSS 2.21% Vue 55.28% HTML 4.78% Shell 0.22%
ai huggingface llm management-system models platform datasets

csghub's Introduction

简体中文English

CSGHub README

CSGHub is an open source, trustworthy large model asset management platform that can assist users in governing the assets involved in the lifecycle of LLM and LLM applications (datasets, model files, codes, etc).

With CSGHub, users can perform operations on LLM assets, including uploading, downloading, storing, verifying, and distributing, through Web interface, Git command line, or natural language Chatbot. Meanwhile, the platform provides microservice submodules and standardized OpenAPIs, which could be easily integrated with users' own systems.

CSGHub is committed to bringing users an asset management platform that is natively designed for large models and can be deployed On-Premise for fully offline operation. CSGHub offers functionalities similar to a privatized Huggingface(on-premise Huggingface), managing LLM assets in a manner akin to how OpenStack Glance manages virtual machine images, Harbor manages container images, and Sonatype Nexus manages artifacts.

You can try the free SaaS version of CSGHub through the OpenCSG Community official website.https://portal.opencsg.com/models
You can also jump to the Quick Start section to quickly launch your local instance and explore all the features of CSGHub.

UPDATES

  • [2024.03.15] v0.3 Plan: Files online editing, organization edit, dataset preview.
  • [2024.02.15] v0.2 Improve the function of model dataset hosting, and add the feature of inviting new organization members.
  • [2024.01.15] v0.1 CSGHub Alpha version release, supports model and dataset management functions, detailed function is as blew.

CORE FUNCTIONS

In the era of LLM, data and models are increasingly becoming the most important digital assets for businesses and individual users. However, there are currently issues such as fragmented management tools, limited management methods, and localization, which not only pose potential threats to secure operations but also might hinder the updating and iteration of enterprise-scale models. If you believe that large models will become a major driving force in the upcoming revolution, you may also be considering how to manage core assets — models, data, and large model application code — more efficiently and securely. CSGHub is an open-source project designed to address these issues.

CSGHub's core funtions(update reguarly):

  • Unified Management of LLM Assets: A one-stop Hub for unified management of model files, datasets, and large-scale model application codes.
  • Development Ecosystem Compatibility: Supports both HTTPS and SSH protocols for Git commands and web interface operations, ensuring convenient usage for different users.
  • Large Model Capability Expansion: Natively supports version management, model format conversion, automatic data processing, and dataset preview functions.
  • Permissions and Security: Supports integration with corporate user systems, setting of asset visibility, and zero-trust authentication interface design for both external and internal users, maximizing security.
  • Support for Private Deployment: Independent of internet and cloud vendors, enabling one-click initiation of private deployment.
  • Native Design for Large Models: Supports natural language interaction, one-click model deployment, and asset management for Agent and Copilot App.

TECH DESIGN

The technical design of CSGHub are as follows:

  • CSGHub integrates multiple technologies including Git Servers, Git LFS (Large File Storage) protocol, and Object Storage Service (OSS), providing a reliable data storage layer, a flexible infrastructure access layer, and extensive support for development tools.
  • Utilizing a service-oriented architecture, CSGHub offers backend services through CSGHub Server and a management interface via CSGHub Web Service. Ordinary users can quickly initiate services using Docker compose or Kubernetes Helm Chart for enterprise-level asset management. Users with in-house development capabilities can utilize CSGHub Server for secondary development to integrate management functions into external systems or to customize advanced features.
  • Leveraging outstanding open-source projects like Apache Arrow and DuckDB, CSGHub supports previewing of Parquet data file formats, facilitating localized dataset management for researchers and common users.
  • CSGHub provides an intuitive web interface and permission design for enterprise organization structure. Users can realize version control management, online browsing and downloading through the web UI, as well as set the visibility scope of datasets and model files to realize data security isolation, and can also initiate topic discussions on models and datasets.

Our R&D team has been focusing on AI + DevOps for a long time, and we hope to solve the pain points in the development process of large models through the CSGHub project. We encourage everyone to contribute high-quality development and operation and maintenance documents, and work together to improve the platform, so that large models assets can be more traceable and efficient.

DEMO VIDEO

In order to help users to quickly understand the features and usage of CSGHub, we have recorded a demo video. You can watch this video to get a quick understanding of the main features and operation procedures of this program.

  • CSGHub Demo video is as blew,you can also check it at YouTube or Bilibili
    csghub-demo-1080p.mp4

ROADMAP

  • Asset Management
    • Built-in Code Repo: Built-in Code Repo management function to associate the code of model, dataset, Space space application.
    • Multi-source data synchronization: Support configure and enable remote repository, automatic data synchronization, support OpenCSG community, Huggingface and other remote sources。
  • AI Enhancement
    • One-Click Fine-Tuning: Support integration with OpenCSG llm-finetune tool to start model fine-tuning training with one click.
    • One-Click Reasoning: Support integration with OpenCSG llm-inference tool to start model reasoning service with one click.
  • LLM App and Enterprise Features
    • App Space: Support hosting Gradio/Streamlit applications and publishing them to App Space.
    • Fine-grained Permission Control: Fine-grained permission and access control settings for enterprise architecture.
  • Security Compliance
    • GitServer Adapter: Generic GitServer adapter to support multiple major Git repository types through Adaptor mode.
    • Asset Metadata: Asset metadata management mechanism, supporting customized metadata types and corresponding AutoTag rules.

The detailed roadmap is designed as follows: full roadmap

ARCHITECTURE

CSGHub is made with two typical parts: Portal and Server. This repo corresponds to CSGHub Portal, while CSGHub Server is another high-performance backend project implemented with Golang.

If you want to dive deep into CSGHub Server detail or wish to integrate the Server with your own frontend system or more, you can check the CSGHub Server open-source project.

CSGHub Portal Architecture

CSGHub Server Architecture

QUICK START

You can quickly deploy a CSGHub instance with portal/server and all other relevant dependencies to your environment using the following commands:

# please replace [IP Address] with your own LAN/WLAN ip address
export SERVER_DOMAIN=[IP Address]
curl -L https://raw.githubusercontent.com/OpenCSGs/csghub/main/all-in-one.yml -o all-in-one.yml
docker compose -f all-in-one.yml up -d

if you are in China or you meet dockerhub network connection issue, you can try our aliyun docker registry alternative version with blew:

# please replace [IP Address] with your own LAN/WLAN ip address
export SERVER_DOMAIN=[IP Address]
curl -L https://raw.githubusercontent.com/OpenCSGs/csghub/main/all-in-one-CN.yml -o all-in-one-CN.yml
docker compose -f all-in-one-CN.yml up -d

or if you still meet Github network connection issue, you can try this one:

# please replace [IP Address] with your own LAN/WLAN ip address
export SERVER_DOMAIN=[IP Address]
curl -L https://opencsg-public-resource.oss-cn-beijing.aliyuncs.com/csghub/all-in-one-CN.yml -o all-in-one-CN.yml
docker compose -f all-in-one-CN.yml up -d

Then, you could visit http://[IP Address] with your web browser to access this new CSHub instance; you could try all feature with your inital admin account: admin001/admin001 You can check our website for more user guide information: User Guide

Note:

  • SERVER_DOMAIN ([IP Address]) should be the IP address or domain name of the target host. Please avoid using 127.0.0.1 or localhost.
  • Released container images are for the x86_64 architecture only and have been tested on Linux/Windows and Mac environments. For Mac Silicon user, it is necessary to enable the Rosetta for x86/AMD64 emulation Feature in your Docker Desktop.
  • WARNING: This quick start is only for trial testing and does not support production-level deployment. The CSGHub instance that deployed with this all-in-one script do not effectively persist user data: When using the docker compose up command to reload the service, errors may occur. In this case, you can use the docker compose down -v to completely remove the instance before relaunch it. Please always follow the Step-by-Step Deployment Guide for regular service deployment.
  • WARNING: The quick start does not include space application's deployment. Starting from CSGHhub v0.4.0, the space function is supported. Since it still requires addtional Kubernetes and other services, please refer Full Deployment Guide.

Tech docs in detail

Contributing

We welcome developers of all levels to contribute to our open-source project, CSGHub. If you would like to get involved, please refer to our contributing guidelines. We look forward to your participation and suggestions.

ACKNOWLEDGEMENTS

This project is based on Rails, Vue3, Tailwind CSS, Administrate, Postgresql, Apache Arrow, DuckDB and GoGin, whose open source contributions are deeply appreciated!

CONTACT WITH US

If you meet any problem during usage, you can contact with us by any following way:

  1. initiate an issue in github
  2. join our WeChat group by scaning wechat helper qrcode
  3. join our offical discord channel: OpenCSG Discord Channel
  4. join our slack workspace:OpenCSG Slack Channel
                                     

csghub's People

Contributors

kinglywayne avatar hiveer avatar zhendi avatar zzxr6 avatar ymh6315431 avatar wayneliu0019 avatar zhenrong-wang avatar hiveerli avatar

Stargazers

程旭文 avatar 103.cloud avatar YuqiZhu avatar  avatar Ming Wang avatar Edgar avatar  avatar  avatar LinYi avatar Jason avatar Jian avatar chiefass avatar DiscoverTruth avatar Meta Luo avatar Pan Zhang avatar DanL0 avatar  avatar Sam Chen avatar Nietzsche_w avatar 桃花依旧笑春风 avatar  avatar Nina Lindgren avatar Sean avatar  avatar spele avatar Yongzheng Lai avatar Tommy in Tongji avatar  avatar  avatar  avatar Jinge Li avatar Joe Wu avatar Eugene Klimov avatar K avatar Wind avatar Kevin avatar xuyi avatar  avatar  avatar makeler avatar  avatar steamgjk avatar  avatar wv avatar Simen Chen avatar hopper avatar HelloWorld avatar Jerry LI avatar 王兴月 avatar @hello-rocky hello-rocky avatar  avatar  avatar Kevins avatar 宗宸·谢尔比 avatar  avatar yaxingwang avatar ZiHao Zhou avatar SoulHappy avatar Hypho avatar MeiCXi avatar  avatar Amen8 avatar Z_HAHA avatar Zhaha avatar Liu Zheng avatar  avatar  avatar WwyDev avatar Yuanyi Wang avatar Chenrui Hu avatar LingYuZhao avatar Liu Zheng avatar  avatar lDevin avatar 孙娇女 avatar  avatar  avatar  avatar long2x avatar Dao avatar  avatar  avatar li kaiguang avatar  avatar Nliver avatar  avatar helloworld avatar  avatar  avatar happy new year avatar  avatar Dong avatar  avatar Jiajun Li avatar whynot avatar  avatar peelsannaw avatar NoCr2acks avatar  avatar HelloBoy! avatar

Watchers

 avatar rader avatar  avatar  avatar MixSu avatar  avatar Jason12896 avatar  avatar Sam Chen avatar  avatar

csghub's Issues

记录快速安装时遇到的一些问题,方便大家交流

1.针对快速部署README中可以注明下支持docker的版本,我用的docker是20.x所以compose中不支持bool类型报错,还有docker compose -f命令报错;
2.部署后如果浏览器中直接输入ip地址,则会打不开csghub界面。需要docker ps找到对应端口号,然后输入ip:端口号即可打开。
后续可以说明的再详细点,以减少与技术人员沟通频率,提高效率。

docker 无法查看opencsg-nginx-1组件的日志

通过docker-cmpose一件部署csghub后,通过docker logs -f opencsg-nginx-1 查看组件的日志,一直hung住;但是进入到容器中的nginx的日志目录下是可以日志的;不过没有docker logs方便。

在腾讯云 ubuntu 系统中使用 docker compose 拉起项目,git server 不断重启

使用过程

使用项目 README.md 中的命令拉起 csghub 环境,git server 不断重启,日志显示 git server 的镜像刚刚启动进程就会被 kill 掉

image
image

系统环境

系统发行版

NAME="Ubuntu"
VERSION="20.04 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

系统架构

Linux VM-0-11-ubuntu 5.4.0-169-generic #187-Ubuntu SMP Thu Nov 23 14:52:28 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

docker daemon 进程启动者的权限

root     1138500  1.2  0.1 5071680 128548 ?      Ssl  Jan02 273:42 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
ubuntu   3163390  0.0  0.0   6432   724 pts/0    S+   14:25   0:00 grep --color=auto dockerd

docker 版本

Client: Docker Engine - Community
 Version:           24.0.2
 API version:       1.43
 Go version:        go1.20.4
 Git commit:        cb74dfc
 Built:             Thu May 25 21:52:22 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.2
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.4
  Git commit:       659604f
  Built:            Thu May 25 21:52:22 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.21
  GitCommit:        3dce8eb055cbb6872793272b4f20ed16117344f8
 runc:
  Version:          1.1.7
  GitCommit:        v1.1.7-0-g860f061
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker 镜像源是否有proxy (否)

硬件信息(内存,cpu):

内存

image

cpu

image
image
image

是否是虚拟环境: 腾讯云

Failed to download csghub artifact.

(base) samchen@bogon ~ % export SERVER_DOMAIN=172.20.7.104
(base) samchen@bogon ~ % curl -L https://raw.githubusercontent.com/OpenCSGs/csghub/main/all-in-one-CN.yml -o all-in-one-CN.yml
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:46 --:--:-- 0

0 0 0 0 0 0 0 0 --:--:-- 0:01:15 --:--:-- 0
curl: (28) Failed to connect to raw.githubusercontent.com port 443 after 75015 ms: Couldn't connect to server
(base) samchen@bogon ~ %
(base) samchen@bogon ~ %
(base) samchen@bogon ~ % ping raw.githubusercontent.com
PING raw.githubusercontent.com (182.43.124.6): 56 data bytes
64 bytes from 182.43.124.6: icmp_seq=0 ttl=53 time=45.702 ms
64 bytes from 182.43.124.6: icmp_seq=1 ttl=53 time=41.738 ms
64 bytes from 182.43.124.6: icmp_seq=2 ttl=53 time=42.661 ms
64 bytes from 182.43.124.6: icmp_seq=3 ttl=53 time=42.102 ms
64 bytes from 182.43.124.6: icmp_seq=4 ttl=53 time=42.455 ms
64 bytes from 182.43.124.6: icmp_seq=5 ttl=53 time=36.326 ms
64 bytes from 182.43.124.6: icmp_seq=6 ttl=53 time=35.578 ms
64 bytes from 182.43.124.6: icmp_seq=7 ttl=53 time=44.073 ms
^C
--- raw.githubusercontent.com ping statistics ---
8 packets transmitted, 8 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 35.578/41.329/45.702/3.327 ms

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.