Code Monkey home page Code Monkey logo

minima's Introduction

About Me

I am currently a Ph.D. student at Beijing Institute of Technology. My current research interests lie in the general area of natural language processing, particularly efficient language models and language agents.

๐Ÿ“ซ Contact me via chenzhang9702[AT]outlook[DOT]com.

Chen's GitHub stats

minima's People

Contributors

genezc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

minima's Issues

Getting errors when trying to replicate the distilling operation

Trying with the llama2 base weights.

I get the following error:

File "/root/MiniMA/minima/modules/flash_attn_monkey_patch_sparsellama.py", line 47, in forward
    assert not use_cache, "use_cache is not supported"

After hardcoding use_cache=False, and continuing, I get the following error:

File "/venv/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 52, in _flash_attn_varlen_forward
    out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.varlen_fwd(
RuntimeError: FlashAttention only support fp16 and bf16 data type

Can you help please?

Also: from modules.fused_rope_monkey_patch_llama import apply_rotary_pos_emb this seems to be a wrong import?

Should it be: from modules.modeling_llama import apply_rotary_pos_emb in the file: flash_attn_monkey_patch_llama.py: line 10

Distill Mistral 7B?

Mistral-7b is a much better model (and perhaps a teacher) than Llama-2-7b. Would you kindly release checkpoints for a distilled mistral? Would greatly appreciate it!

Code for Training MiniMoE

Can you please release code for "upcycling" LLMs to make MoEs? I have a use-case for multi-lingual LLMs where this would be incredibly helpful!

Inconsistent response from interactive MiniChat-3B

Hi, happy new year!!

Good work, first of all!!
I am trying to use MiniChat-3B as an interactive Chatbot in my application. However, the response from the model either returns

  1. ? or nothing
  2. I am a language model, I do not have feelings blah blah
  3. Irrevelant stuff (Sometimes the first response is perfect, but after the first response, it's getting out of control)

It might be my misuse of this model, but I will show briefly my use case. I have a custom prompt, treated as a scenario(I defined upfront), this will be used as the starting prompt + prompt from Minichat for the model. And user can interact with the bot and receive a response.

I have a few questions

  1. Do we have a mechnism to store message history(both user and assistant currently it only saves the last one only?), so the model will output consistent response
  2. How to send message history as input to the model to return the response to receive a consistent response
  3. Is it feasible to use Minichat as a chatbot?
  4. How we can control the response flow to make sure it's getting consistent responses?
  5. How we can reduce the response like I am a bot etc.
  6. What is the best practice for prompt engineering for minichat?

Here's my sample usage based on your sample code which returns inconsistent response
` def generate_response(self, user_input, main_topic, subtopic):
# Retrieve and print the system prompt
system_prompt = self.get_prompt(main_topic, subtopic)
print("System Prompt:", system_prompt)
if system_prompt is None:
return "Prompt not found for the given topic and subtopic.", None

    # Append user input to the conversation history and print it
    self.conv.append_message(self.conv.roles[0], user_input)
    print("Appended User Input:", user_input)

    # Generate and print the conversation history prompt
    conversation_prompt = self.conv.get_prompt()
    print("Conversation History Prompt:", conversation_prompt)

    # Combine the system prompt with the conversation history and print the combined prompt
    combined_prompt = system_prompt + "\n" + conversation_prompt
    print("Combined Prompt for Model:", combined_prompt)

    # Generate model input IDs
    input_ids = self.tokenizer([combined_prompt]).input_ids

    # Generate output from the model
    output_ids = self.model.generate(
        torch.as_tensor(input_ids).cuda(),
        do_sample=True,
        temperature=0.7,
        max_new_tokens=50,
    )

    # Decode and print the chatbot's response
    output_ids = output_ids[0][len(input_ids[0]):]
    response = self.tokenizer.decode(output_ids, skip_special_tokens=True).strip()
    print("Chatbot Response:", response)

    # Append the chatbot's response to the conversation history
    self.conv.append_message(self.conv.roles[1], response)

    return response`

Please take a look if you have time

Thanks a lot!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.