Instructions to use Maincode/Maincoder-1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Maincode/Maincoder-1B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Maincode/Maincoder-1B", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Maincode/Maincoder-1B", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Maincode/Maincoder-1B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Maincode/Maincoder-1B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Maincode/Maincoder-1B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Maincode/Maincoder-1B

SGLang

How to use Maincode/Maincoder-1B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Maincode/Maincoder-1B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Maincode/Maincoder-1B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Maincode/Maincoder-1B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Maincode/Maincoder-1B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Maincode/Maincoder-1B with Docker Model Runner:
```
docker model run hf.co/Maincode/Maincoder-1B
```

Maincoder-1B / README.md

yue-maincode

Update README.md

088ec98 verified 5 days ago

preview code

raw

history blame contribute delete

6.51 kB

	---
	license: apache-2.0
	language:
	- en
	library_name: transformers
	tags:
	- code
	- python
	- maincoder
	- code-generation
	- reinforcement-learning
	- mcpo
	pipeline_tag: text-generation
	# base_model: Maincode/Maincoder-1B
	---
	<img src="https://huggingface.co/datasets/Maincode/assets/resolve/e51154e034201be1a5dad0e9c8de31d8b9f17643/maincoder_logo.png" alt="" width="1250">

	[Maincoder-1B](https://maincode.com/maincoder/) is a code-focused language model optimized for code generation and completion tasks. The model achieves strong performance on coding benchmarks while maintaining a compact size suitable for local deployment.

	# Key Features

	- Code Generation: Optimized for Python code completion and generation tasks.
	- Compact Size: 1 billion parameters, lightweight enough to run on consumer hardware.
	- Deep Architecture: Modern transformer architecture with RoPE embeddings, grouped-query attention, QK normalization and high depth-to-width ratio.
	- Advanced Data Mixing: Pre-trained and mid-trained on custom data mixes developed for high-performance coding.
	- MCPO Algorithm: Fine-tuned with specialised reinforcement learning policy optimisation algorithm to improve training stability and accelerate convergence.
	- SOTA Performance: State-of-the-art performance on Python coding benchmarks HumanEval, HumanEval+ and MBPP+.

	# Benchmark Results

	<img src="https://huggingface.co/datasets/Maincode/assets/resolve/main/performance_h.png" alt="Benchmark Performance Across Baseline LLMs" width="1050">

	\| Model \| HumanEval \| HumanEval+ \| MBPP+ \| MMLU \| GSM8K \|
	\|---\|---:\|---:\|---:\|---:\|---:\|
	\| [Maincode/Maincoder-1B](https://huggingface.co/Maincode/Maincoder-1B) \| 0.7622 \| 0.7256 \| 0.7090 \| 0.3054 \| 0.2976 \|
	\| [deepseek-ai/deepseek-coder-1.3b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-instruct) \| 0.5610 \| 0.5305 \| 0.6217 \| 0.2705 \| 0.0413 \|
	\| [HuggingFaceTB/SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B) \| 0.5366 \| 0.5000 \| 0.6799 \| 0.5928 \| 0.5505 \|
	\| [Qwen/Qwen2.5-Coder-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct) \| 0.4634 \| 0.4451 \| 0.6561 \| 0.4984 \| 0.4944 \|
	\| [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) \| 0.4024 \| 0.3780 \| 0.5582 \| 0.5571 \|0.6865 \|

	# Model Overview

	Maincoder uses a modern transformer decoder architecture with:

	- Rotary Position Embeddings: With theta of 1,000,000.
	- RMSNorm: Pre-normalization for stable training.
	- Grouped Query Attention: 4:1 ratio of query to key-value heads.
	- QK Normalization: RMSNorm applied to attention queries and keys.
	- SwiGLU MLP: Gated linear units with SiLU activation.

	\| Attribute \| Value \|
	\|-----------\|-------\|
	\| Parameters \| 1B \|
	\| Hidden Size \| 1536 \|
	\| Layers \| 32 \|
	\| Attention Heads \| 16 (4 KV heads) \|
	\| Head Dimension \| 96 \|
	\| Vocabulary Size \| 151,936 \|
	\| Context Length \| 2,048 \|
	\| Precision \| bfloat16 \|

	# Usage

	### Installation

	```bash
	pip install transformers torch
	```

	### Quick Start

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained(
	"Maincode/Maincoder-1B",
	torch_dtype="auto",
	device_map="auto",
	trust_remote_code=True,
	)
	tokenizer = AutoTokenizer.from_pretrained(
	"Maincode/Maincoder-1B",
	trust_remote_code=True,
	)

	# Code completion example
	prompt = '''def fibonacci(n: int) -> int:
	"""Return the n-th Fibonacci number."""
	'''

	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(
	**inputs,
	max_new_tokens=256,
	temperature=0.2,
	do_sample=True,
	)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	### Code Completion

	```python
	# Function completion
	prompt = '''def quicksort(arr: list) -> list:
	"""Sort a list using the quicksort algorithm."""
	'''

	# Class completion
	prompt = '''class BinarySearchTree:
	"""A binary search tree implementation."""

	def __init__(self):
	'''

	# Algorithm implementation
	prompt = '''def dijkstra(graph: dict, start: str, end: str) -> tuple:
	"""Find the shortest path using Dijkstra's algorithm.

	Args:
	graph: Adjacency list representation of the graph
	start: Starting node
	end: Target node

	Returns:
	Tuple of (distance, path)
	"""
	'''
	```

	# Additional Notes

	## Reproducibility

	<details>
	<summary>Model evaluations were run on 8 AMD MI355X GPUs via the <a href="https://github.com/EleutherAI/lm-evaluation-harness">EleutherAI</a> framework.</summary>

	```bash
	docker run --rm -it \
	--device=/dev/kfd --device=/dev/dri --group-add=video \
	--ipc=host --security-opt seccomp=unconfined \
	-v $(pwd):/workspace -w /workspace \
	-e HF_TOKEN \
	-e PYTHONHASHSEED=0 \
	-e TORCH_DETERMINISTIC=1 \
	-e ROCBLAS_ATOMICS_MODE="0" \
	-e MIOPEN_FIND_MODE="1" \
	-e CUBLAS_WORKSPACE_CONFIG=":4096:8" \
	-e HF_ALLOW_CODE_EVAL="1" \
	rocm/pytorch:rocm7.1.1_ubuntu24.04_py3.12_pytorch_release_2.9.1 \
	bash -c 'pip install "lm_eval[hf]" && \
	accelerate launch -m lm_eval \
	--model hf --model_args "pretrained=Maincode/Maincoder-1B,trust_remote_code=True,dtype=float32" \
	--tasks humaneval,humaneval_plus,mbpp_plus,mmlu,gsm8k \
	--device cuda:0 --batch_size 32 --seed 42 \
	--confirm_run_unsafe_code'
	```

	</details>

	## Limitations

	- Context length limited to 2,048 tokens
	- Primarily optimized for Python, performance may vary on other languages
	- May generate code with bugs or security issues - always review generated code

	<div style="margin-left:14px; border-left:4px solid #3b82f6; background:rgba(59,130,246,0.08); padding:8px 10px; border-radius:8px; font-size:0.92em; margin:10px 0;">
	<strong>Disclaimer</strong>: This model has <strong>not</strong> undergone any alignment or safety tuning (e.g., RLHF/RLAIF, DPO, or safety fine-tuning). Outputs may be unsafe or biased. Please use appropriate safeguards and evaluate carefully for your use case.
	</div>

	## License

	This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).

	## Citation

	```bibtex
	@misc{maincoder2025,
	title = {Maincoder-1B: A High-Performance 1B Parameter Coding Model},
	author = {Maincode Team},
	year = {2025},
	organization = {Maincode},
	howpublished = {\url{https://huggingface.co/Maincode/Maincoder-1B}}
	}
	```

	## Contact

	For questions, issues, or collaboration inquiries, please visit [Maincode](https://maincode.com).