From readme: max_seq_length = 2048 # Can c

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Longer than 4096 Token Sequences (RoPE Scaling) about unsloth HOT 4 CLOSED

lapp0 commented on July 23, 2024

Longer than 4096 Token Sequences (RoPE Scaling)

from unsloth.

Comments (4)

danielhanchen commented on July 23, 2024 1

@lapp0 Oops apologies I didn't reply - I made CodeLlama-34b work, and it has a RoPE scaling factor of 1000000. So Yi-34B with the extra RoPE Scaling should work fine!!

from unsloth.

danielhanchen commented on July 23, 2024

@lapp0 Yes technically RoPE scaling was just added as a preliminary support.
You can either change max_seq_length to whatever number you desire, and I will handle RoPE scaling internally.

Or you can add a RoPE scaling dictionary yourself.

This is all preliminary support though

from unsloth.

lapp0 commented on July 23, 2024

Does unsloth currently implement the expected behavior of rope theta being set by the base models config.json on hf?

e.g. https://huggingface.co/chargoddard/Yi-34B-Llama/blob/main/config.json#L18

Or are there any issues I might run into which I should be aware of when dealing with long context base models?

from unsloth.

danielhanchen commented on July 23, 2024

@lapp0 I'm not 100% sure - I think it'll use the internal rope theta / scaling if you don't touch rope_scaling as the parameter. I'll have to get back to you

from unsloth.

Longer than 4096 Token Sequences (RoPE Scaling) about unsloth HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent