Instructions to use User01110/testing-2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use User01110/testing-2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="User01110/testing-2") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("User01110/testing-2") model = AutoModelForCausalLM.from_pretrained("User01110/testing-2") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use User01110/testing-2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "User01110/testing-2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "User01110/testing-2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/User01110/testing-2
- SGLang
How to use User01110/testing-2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "User01110/testing-2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "User01110/testing-2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "User01110/testing-2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "User01110/testing-2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use User01110/testing-2 with Docker Model Runner:
docker model run hf.co/User01110/testing-2
testing-2
This is an experimental ChatML SFT run from SupraLabs/Supra-1.5-50M-Base-exp.
Training Setup
| Field | Value |
|---|---|
| Base model | SupraLabs/Supra-1.5-50M-Base-exp |
| Output repo | User01110/testing-2 |
| Sequence length | 1024 |
| Max optimizer steps | 10,000 |
| Per-device batch size | 128 |
| Gradient accumulation | 4 |
| Sample presentations per GPU | 5,120,000 |
| Max token slots per GPU | 5,242,880,000 |
| Learning rate | 2.00e-04 |
| Warmup steps | 100 |
| Weight decay | 0.05 |
| Save/push cadence | every 1,000 optimizer steps plus final |
| Loss mask | all assistant spans only |
| Chat format | ChatML |
| System prompt | You are a helpful assistant. |
The stream reloops datasets as needed to reach the fixed step budget. Cutecat6152/python-data-basic is capped at three passes because it only has 100 rows.
Unique one-pass source rows listed below: 4,128,528. First-cycle source presentations with the python-data-basic cap included: 4,128,728. The 10,000-step training budget presents 5,120,000 examples per GPU, so larger sources are expected to reloop during training.
ChatML Compatibility
The tokenizer is saved with:
| Token | Purpose |
|---|---|
| `< | im_start |
| `< | im_end |
The uploaded tokenizer includes the ChatML template, so inference and future SFT should not require manually adding these tokens again.
Example prompt:
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain what a neural network is in simple terms."},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
Dataset Mix
| Dataset | Config | Split | Rows | Schema | Mapping | Pass policy |
|---|---|---|---|---|---|---|
| nvidia/Nemotron-SFT-Instruction-Following-Chat-v2 | default | reasoning_off | 1,068,273 | messages[{role, content, reasoning_content}] | user/assistant message pairs; reasoning_off only | reloops as needed |
| microsoft/orca-math-word-problems-200k | default | train | 200,035 | question, answer | user=question; assistant=answer | reloops as needed |
| TIGER-Lab/MathInstruct | default | train | 262,039 | instruction, output | user=instruction; assistant=output | reloops as needed |
| Programming-Language/codeagent-python | default | train | 296,837 | prompt, response | user=prompt; assistant=response | reloops as needed |
| Cutecat6152/python-data-basic | default | train | 100 | id, instruction, response | user=instruction; assistant=response | max 3 passes, 300 presentations max |
| flytech/python-codes-25k | default | train | 49,626 | instruction, input, output, text | user=instruction plus optional Input block; assistant=output | reloops as needed |
| QuixiAI/open-instruct-uncensored | default | train | 1,756,115 | dataset, id, messages[{role, content}] | user/assistant message pairs | reloops as needed |
| User01110/multiturn-instruct | default | train | 2,732 | messages[{role, content}], token_count, session metadata | full multi-turn ChatML; labels only on assistant spans | reloops as needed |
| Jackrong/Kimi-K2.5-Reasoning-1M-Cleaned | 4 configs: General-Distillation, General-Math, MultilingualSTEM, PHD-Science | train | 457,825 | id, conversations[{from, value}], input, output, domain, meta | user=input; assistant=output after removing ... blocks | reloops as needed |
| openai/gsm8k | main | train | 7,473 | question, answer | user=question; assistant=answer | reloops as needed |
| openai/gsm8k | socratic | train | 7,473 | question, answer | user=question; assistant=answer | reloops as needed |
| EleutherAI/arithmetic | 10 selected subsets | validation raw JSONL | 20,000 | context, completion | user=context with trailing Answer: stripped; assistant=completion | reloops as needed |
Notes
- Dataset schemas and row counts were checked through Hugging Face Dataset Viewer metadata where available.
- Nemotron is loaded from the direct
reasoning_off.jsonlfile to avoid mixing in reasoning-on schema fields. - EleutherAI arithmetic is loaded from raw JSONL files to avoid old dataset-script loading issues.
- Kimi K2.5 cleaned reasoning rows are loaded from all four train configs. Assistant
<think>...</think>blocks are stripped before ChatML rendering, so hidden reasoning is not included in the completion-only loss. - Streaming source open/read failures are retried and reopened before a source is dropped, which protects long cloud runs from transient Hub DNS/client errors.
- RoPE buffers and tokenizer/model load are verified during final export.
- Downloads last month
- -
Model tree for User01110/testing-2
Base model
SupraLabs/Supra-1.5-50M-Base-exp