OpenMOSE(@_m0se_) 's Twitter Profile Photo

RWKV-LM-State-4bit

I implemented 4-bit quantization of the main weights using Bitsandbytes and State-Tuning. and I have enabled differential output in the checkpoint outputs.

Quantizing to 4 bits can reduce VRAM usage by about 40% 😀 2B @ 10GB

github.com/OpenMOSE/RWKV-…

account_circle
Roberto Luis Rodriguez(@rluis77) 's Twitter Profile Photo

new portal. Produced by RLuis77

“I Can Make Your Peoples DANCE”

YOU CANT??? Kendrick Lamar

How?🤔

What does that say?🤔🤷🏽‍♂️

REALLY???

THEY MY PPL’S…

The Drums Never Lie…Kendrick Lamar

NO QUANTIZING

STRAIGHT SOUL.

account_circle
Benjamin Trent(@benwtrent) 's Twitter Profile Photo

A new parameter is added to the linear offset correction created when scalar quantizing for dot-product. This means we can optimize the quantization bucketing and that correction: elastic.co/search-labs/bl…

A new parameter is added to the linear offset correction created when scalar quantizing for dot-product. This means we can optimize the quantization bucketing and that correction: elastic.co/search-labs/bl…
account_circle
OpenMOSE(@_m0se_) 's Twitter Profile Photo

Experimenting with quantizing the permanent freeze layer in Bitsandbytes to reduce VRAM usage.

On RWKV x060 7b, LISA Last 4layer Enabled, ds2offload,1 Layer active each steps

NonQuant: OOM
NF4Quant: around 19GB

but, lost the speed advantages of TorchJIT.....

Experimenting with quantizing the permanent freeze layer in Bitsandbytes to reduce VRAM usage.

On RWKV x060 7b, LISA Last 4layer Enabled, ds2offload,1 Layer active each steps

NonQuant: OOM
NF4Quant: around 19GB

but, lost the speed advantages of TorchJIT.....
account_circle
Christian Hernandez(@ChriStannnis) 's Twitter Profile Photo

Never felt more vindicated than hearing Danny Carey say he’s never tracked a Tool song to a metronome. I’m always against quantizing (for the most part) and a big advocate for organic sound and letting a groove breathe

account_circle
Yorkie(@heyitsyorkie) 's Twitter Profile Photo

Pretty cool that I have my own fine-tuned model working in LM Studio now. Rather successful weekend learning about fine-tuning, quantizing models etc. Here's to more!

Pretty cool that I have my own fine-tuned model working in @LMStudioAI now. Rather successful weekend learning about fine-tuning, quantizing models etc. Here's to more!
account_circle
Burny — Effective Omni(@burny_tech) 's Twitter Profile Photo

Loop quantum gravity and string theory are the two main approaches that attempt to reconcile quantum mechanics and general relativity to develop a theory of quantum gravity. However, they have some key differences:

- LQG focuses on quantizing space-time itself, treating it as…

Loop quantum gravity and string theory are the two main approaches that attempt to reconcile quantum mechanics and general relativity to develop a theory of quantum gravity. However, they have some key differences:

- LQG focuses on quantizing space-time itself, treating it as…
account_circle
KD(@Reveur_7) 's Twitter Profile Photo

Just finished quantizing and uploading Mixtral 8x7B to Apple MLX. Enjoy!

huggingface.co/mlx-community/…
huggingface.co/mlx-community/…

account_circle
Rohan Paul(@rohanpaul_ai) 's Twitter Profile Photo

A nice example from Official implementation of Half-Quadratic Quantization (HQQ) repo

HQQ’s `HQQLinear.quantize` and `HQQLinear.dequantize` methods have been modified to support FSDP training by viewing int dtype quantized weights as a selectable float dtype when quantizing, and…

A nice example from Official implementation of Half-Quadratic Quantization (HQQ) repo

HQQ’s `HQQLinear.quantize` and `HQQLinear.dequantize` methods have been modified to support FSDP training by viewing int dtype quantized weights as a selectable float dtype when quantizing, and…
account_circle
Private LLM(@private_llm) 's Twitter Profile Photo

M Maarouf Quantizing the model’s embedding layer, hurts the model’s perplexity and by extension, not quantizing it, improves perplexity. It’s similar with Gemma 2B IT. For phi-3-mini, it comes at the cost of ~100MB increase in memory footprint, which we feel is a fair trade off.

account_circle
cocktail peanut(@cocktailpeanut) 's Twitter Profile Photo

llama3-gradient first impressions

At least for the quantized llama3-gradient from ollama, the quality is pretty bad. Hallucinates like hell.

I did hear that quantizing llama3 results in bad results generally and maybe that's why. Anyone tried the original one?

Some results:

account_circle
Ozlem Peksoy Bishop 💥♻️🛸(@opeksoy) 's Twitter Profile Photo

: continuums inquiries are precious :: our efforts of quantizing and categorizing everything artificially confuse our ‘understanding’ ::: fluidity as step one affords quite a lot especially when you free your perceptions of time limitations

account_circle
Black Radioactive Boi 🚂☢(@TokenOfTheMonth) 's Twitter Profile Photo

We as a society are COOKED if mfrs are coming on sayin quantizing a track is harder than rocket science like bro what is HAPPENING in our schools LMAOOO

account_circle