The minima from genezc

About Me

I am currently a Ph.D. student at Beijing Institute of Technology. My current research interests lie in the general area of natural language processing, particularly efficient language models and language agents.

📫 Contact me via chenzhang9702[AT]outlook[DOT]com.

minima's People

Contributors

Stargazers

Watchers

minima's Issues

Getting errors when trying to replicate the distilling operation

Trying with the llama2 base weights.

I get the following error:

File "/root/MiniMA/minima/modules/flash_attn_monkey_patch_sparsellama.py", line 47, in forward
    assert not use_cache, "use_cache is not supported"

After hardcoding use_cache=False, and continuing, I get the following error:

File "/venv/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 52, in _flash_attn_varlen_forward
    out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.varlen_fwd(
RuntimeError: FlashAttention only support fp16 and bf16 data type

Can you help please?

Also: from modules.fused_rope_monkey_patch_llama import apply_rotary_pos_emb this seems to be a wrong import?

Should it be: from modules.modeling_llama import apply_rotary_pos_emb in the file: flash_attn_monkey_patch_llama.py: line 10

Distill Mistral 7B?

Mistral-7b is a much better model (and perhaps a teacher) than Llama-2-7b. Would you kindly release checkpoints for a distilled mistral? Would greatly appreciate it!

Code for Training MiniMoE

Can you please release code for "upcycling" LLMs to make MoEs? I have a use-case for multi-lingual LLMs where this would be incredibly helpful!

Inconsistent response from interactive MiniChat-3B

Hi, happy new year!!

Good work, first of all!!
I am trying to use MiniChat-3B as an interactive Chatbot in my application. However, the response from the model either returns

? or nothing
I am a language model, I do not have feelings blah blah
Irrevelant stuff (Sometimes the first response is perfect, but after the first response, it's getting out of control)

It might be my misuse of this model, but I will show briefly my use case. I have a custom prompt, treated as a scenario(I defined upfront), this will be used as the starting prompt + prompt from Minichat for the model. And user can interact with the bot and receive a response.

I have a few questions

Do we have a mechnism to store message history(both user and assistant currently it only saves the last one only?), so the model will output consistent response
How to send message history as input to the model to return the response to receive a consistent response
Is it feasible to use Minichat as a chatbot?
How we can control the response flow to make sure it's getting consistent responses?
How we can reduce the response like I am a bot etc.
What is the best practice for prompt engineering for minichat?

Here's my sample usage based on your sample code which returns inconsistent response
` def generate_response(self, user_input, main_topic, subtopic):
# Retrieve and print the system prompt
system_prompt = self.get_prompt(main_topic, subtopic)
print("System Prompt:", system_prompt)
if system_prompt is None:
return "Prompt not found for the given topic and subtopic.", None

    # Append user input to the conversation history and print it
    self.conv.append_message(self.conv.roles[0], user_input)
    print("Appended User Input:", user_input)

    # Generate and print the conversation history prompt
    conversation_prompt = self.conv.get_prompt()
    print("Conversation History Prompt:", conversation_prompt)

    # Combine the system prompt with the conversation history and print the combined prompt
    combined_prompt = system_prompt + "\n" + conversation_prompt
    print("Combined Prompt for Model:", combined_prompt)

    # Generate model input IDs
    input_ids = self.tokenizer([combined_prompt]).input_ids

    # Generate output from the model
    output_ids = self.model.generate(
        torch.as_tensor(input_ids).cuda(),
        do_sample=True,
        temperature=0.7,
        max_new_tokens=50,
    )

    # Decode and print the chatbot's response
    output_ids = output_ids[0][len(input_ids[0]):]
    response = self.tokenizer.decode(output_ids, skip_special_tokens=True).strip()
    print("Chatbot Response:", response)

    # Append the chatbot's response to the conversation history
    self.conv.append_message(self.conv.roles[1], response)

    return response`

Please take a look if you have time

Thanks a lot!

Recommend Projects

genezc / minima Goto Github PK

minima's Introduction

About Me

minima's People

Contributors

Stargazers

Watchers

Forkers

minima's Issues

Getting errors when trying to replicate the distilling operation

Distill Mistral 7B?

Code for Training MiniMoE

Inconsistent response from interactive MiniChat-3B

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent