June Launch
Collection
Launched models in june, WILL UPDATE • 3 items • Updated
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Matrix 2 is a fine-tuned version of DeepSeek-R1-Distill-Qwen-7B, trained on a focused mixture of chain-of-thought reasoning, math, coding, and logic data. It is the flagship reasoning model of the Inelly lineup -- built for deep, accurate, step-by-step problem solving.
Matrix 2 is intended for:
Matrix 2 was fine-tuned for 1 epoch on ~5,225 samples drawn from:
| Dataset | Samples | Purpose |
|---|---|---|
| Bespoke-Stratos-35k | 3,000 | Chain-of-thought math & reasoning |
| OpenThoughts-114k | 2,500 | Code generation with reasoning |
| dolphin-r1 | 2,000 | General reasoning (DeepSeek-R1 distill) |
All samples were deduplicated and reasoning-weighted (2x oversample for CoT examples). Maximum sequence length: 512 tokens.
| Parameter | Value |
|---|---|
| Base model | DeepSeek-R1-Distill-Qwen-7B |
| Quantization | 4-bit NF4 (bitsandbytes) |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| Learning rate | 2e-4 |
| Batch size | 8 (gradient accumulation) |
| Epochs | 1 |
| Max seq length | 512 |
| Optimizer | AdamW 8-bit |
| LR scheduler | cosine |
| Warmup ratio | 0.05 |
| Training time | ~74 min |
| Hardware | RTX 3090 (24GB VRAM) |
| Property | Value |
|---|---|
| Model type | Qwen2ForCausalLM |
| Hidden size | 3,584 |
| Layers | 28 |
| Attention heads | 28 |
| Head dim | 128 |
| Intermediate size | 18,944 |
| Vocab size | 152,064 |
| Context length | 131,072 |
| Total parameters | ~7.62B |
| Trainable parameters | ~6.5M (LoRA) |
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("path/to/matrix-2", torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("path/to/matrix-2")
messages = [{"role": "user", "content": "Solve for x: 3x + 7 = 22. Show all steps."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=256, temperature=0.7, top_p=0.9)
response = tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)
Informal GPU testing across 8 categories:
| Category | Result |
|---|---|
| Chain-of-Thought reasoning | ✅ Excellent multi-step logic |
| Math | ✅ Accurate with detailed work shown |
| Code generation | ✅ Clean, well-commented Python |
| Logic puzzles | ✅ Thorough deductive reasoning |
| General knowledge | ✅ Accurate, detailed explanations |
| Complex reasoning | ✅ Handles multi-step word problems well |
| Model | Size | Focus |
|---|---|---|
| Matrix 2 (this model) | 7B | Deep CoT reasoning, math, coding |
| Inelly 4.5 | 3B | Conversation + politeness + CoT |
| Inelly 4.5 Blaze | 1.5B | Fast reasoning + CoT |
@misc{matrix2,
title = {Matrix 2: A 7B Chain-of-Thought Reasoning Model},
author = {Bry},
organization = {GenueAI},
year = {2026},
note = {Fine-tuned from DeepSeek-R1-Distill-Qwen-7B using QLoRA},
}