Text Generation
Transformers
Safetensors
English
maincoder
feature-extraction
code
python
code-generation
reinforcement-learning
mcpo
conversational
custom_code
Instructions to use Maincode/Maincoder-1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Maincode/Maincoder-1B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Maincode/Maincoder-1B", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Maincode/Maincoder-1B", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Maincode/Maincoder-1B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Maincode/Maincoder-1B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Maincode/Maincoder-1B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Maincode/Maincoder-1B
- SGLang
How to use Maincode/Maincoder-1B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Maincode/Maincoder-1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Maincode/Maincoder-1B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Maincode/Maincoder-1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Maincode/Maincoder-1B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Maincode/Maincoder-1B with Docker Model Runner:
docker model run hf.co/Maincode/Maincoder-1B
| license: apache-2.0 | |
| language: | |
| - en | |
| library_name: transformers | |
| tags: | |
| - code | |
| - python | |
| - maincoder | |
| - code-generation | |
| - reinforcement-learning | |
| - mcpo | |
| pipeline_tag: text-generation | |
| # base_model: Maincode/Maincoder-1B | |
| <img src="https://huggingface.co/datasets/Maincode/assets/resolve/e51154e034201be1a5dad0e9c8de31d8b9f17643/maincoder_logo.png" alt="" width="1250"> | |
| [**Maincoder-1B**](https://maincode.com/maincoder/) is a code-focused language model optimized for code generation and completion tasks. The model achieves strong performance on coding benchmarks while maintaining a compact size suitable for local deployment. | |
| # Key Features | |
| - **Code Generation**: Optimized for Python code completion and generation tasks. | |
| - **Compact Size**: 1 billion parameters, lightweight enough to run on consumer hardware. | |
| - **Deep Architecture**: Modern transformer architecture with RoPE embeddings, grouped-query attention, QK normalization and high depth-to-width ratio. | |
| - **Advanced Data Mixing**: Pre-trained and mid-trained on custom data mixes developed for high-performance coding. | |
| - **MCPO Algorithm**: Fine-tuned with specialised reinforcement learning policy optimisation algorithm to improve training stability and accelerate convergence. | |
| - **SOTA Performance**: State-of-the-art performance on Python coding benchmarks HumanEval, HumanEval+ and MBPP+. | |
| # Benchmark Results | |
| <img src="https://huggingface.co/datasets/Maincode/assets/resolve/main/performance_h.png" alt="Benchmark Performance Across Baseline LLMs" width="1050"> | |
| | Model | HumanEval | HumanEval+ | MBPP+ | MMLU | GSM8K | | |
| |---|---:|---:|---:|---:|---:| | |
| | [Maincode/Maincoder-1B](https://huggingface.co/Maincode/Maincoder-1B) | **0.7622** | **0.7256** | **0.7090** | 0.3054 | 0.2976 | | |
| | [deepseek-ai/deepseek-coder-1.3b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-instruct) | 0.5610 | 0.5305 | 0.6217 | 0.2705 | 0.0413 | | |
| | [HuggingFaceTB/SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B) | 0.5366 | 0.5000 | 0.6799 | **0.5928** | 0.5505 | | |
| | [Qwen/Qwen2.5-Coder-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct) | 0.4634 | 0.4451 | 0.6561 | 0.4984 | 0.4944 | | |
| | [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) | 0.4024 | 0.3780 | 0.5582 | 0.5571 |**0.6865** | | |
| # Model Overview | |
| Maincoder uses a modern transformer decoder architecture with: | |
| - **Rotary Position Embeddings**: With theta of 1,000,000. | |
| - **RMSNorm**: Pre-normalization for stable training. | |
| - **Grouped Query Attention**: 4:1 ratio of query to key-value heads. | |
| - **QK Normalization**: RMSNorm applied to attention queries and keys. | |
| - **SwiGLU MLP**: Gated linear units with SiLU activation. | |
| | Attribute | Value | | |
| |-----------|-------| | |
| | Parameters | 1B | | |
| | Hidden Size | 1536 | | |
| | Layers | 32 | | |
| | Attention Heads | 16 (4 KV heads) | | |
| | Head Dimension | 96 | | |
| | Vocabulary Size | 151,936 | | |
| | Context Length | 2,048 | | |
| | Precision | bfloat16 | | |
| # Usage | |
| ### Installation | |
| ```bash | |
| pip install transformers torch | |
| ``` | |
| ### Quick Start | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model = AutoModelForCausalLM.from_pretrained( | |
| "Maincode/Maincoder-1B", | |
| torch_dtype="auto", | |
| device_map="auto", | |
| trust_remote_code=True, | |
| ) | |
| tokenizer = AutoTokenizer.from_pretrained( | |
| "Maincode/Maincoder-1B", | |
| trust_remote_code=True, | |
| ) | |
| # Code completion example | |
| prompt = '''def fibonacci(n: int) -> int: | |
| """Return the n-th Fibonacci number.""" | |
| ''' | |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) | |
| outputs = model.generate( | |
| **inputs, | |
| max_new_tokens=256, | |
| temperature=0.2, | |
| do_sample=True, | |
| ) | |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) | |
| ``` | |
| ### Code Completion | |
| ```python | |
| # Function completion | |
| prompt = '''def quicksort(arr: list) -> list: | |
| """Sort a list using the quicksort algorithm.""" | |
| ''' | |
| # Class completion | |
| prompt = '''class BinarySearchTree: | |
| """A binary search tree implementation.""" | |
| def __init__(self): | |
| ''' | |
| # Algorithm implementation | |
| prompt = '''def dijkstra(graph: dict, start: str, end: str) -> tuple: | |
| """Find the shortest path using Dijkstra's algorithm. | |
| Args: | |
| graph: Adjacency list representation of the graph | |
| start: Starting node | |
| end: Target node | |
| Returns: | |
| Tuple of (distance, path) | |
| """ | |
| ''' | |
| ``` | |
| # Additional Notes | |
| ## Reproducibility | |
| <details> | |
| <summary>Model evaluations were run on 8 AMD MI355X GPUs via the <a href="https://github.com/EleutherAI/lm-evaluation-harness">EleutherAI</a> framework.</summary> | |
| ```bash | |
| docker run --rm -it \ | |
| --device=/dev/kfd --device=/dev/dri --group-add=video \ | |
| --ipc=host --security-opt seccomp=unconfined \ | |
| -v $(pwd):/workspace -w /workspace \ | |
| -e HF_TOKEN \ | |
| -e PYTHONHASHSEED=0 \ | |
| -e TORCH_DETERMINISTIC=1 \ | |
| -e ROCBLAS_ATOMICS_MODE="0" \ | |
| -e MIOPEN_FIND_MODE="1" \ | |
| -e CUBLAS_WORKSPACE_CONFIG=":4096:8" \ | |
| -e HF_ALLOW_CODE_EVAL="1" \ | |
| rocm/pytorch:rocm7.1.1_ubuntu24.04_py3.12_pytorch_release_2.9.1 \ | |
| bash -c 'pip install "lm_eval[hf]" && \ | |
| accelerate launch -m lm_eval \ | |
| --model hf --model_args "pretrained=Maincode/Maincoder-1B,trust_remote_code=True,dtype=float32" \ | |
| --tasks humaneval,humaneval_plus,mbpp_plus,mmlu,gsm8k \ | |
| --device cuda:0 --batch_size 32 --seed 42 \ | |
| --confirm_run_unsafe_code' | |
| ``` | |
| </details> | |
| ## Limitations | |
| - Context length limited to 2,048 tokens | |
| - Primarily optimized for Python, performance may vary on other languages | |
| - May generate code with bugs or security issues - always review generated code | |
| <div style="margin-left:14px; border-left:4px solid #3b82f6; background:rgba(59,130,246,0.08); padding:8px 10px; border-radius:8px; font-size:0.92em; margin:10px 0;"> | |
| <strong>Disclaimer</strong>: This model has <strong>not</strong> undergone any alignment or safety tuning (e.g., RLHF/RLAIF, DPO, or safety fine-tuning). Outputs may be unsafe or biased. Please use appropriate safeguards and evaluate carefully for your use case. | |
| </div> | |
| ## License | |
| This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0). | |
| ## Citation | |
| ```bibtex | |
| @misc{maincoder2025, | |
| title = {Maincoder-1B: A High-Performance 1B Parameter Coding Model}, | |
| author = {Maincode Team}, | |
| year = {2025}, | |
| organization = {Maincode}, | |
| howpublished = {\url{https://huggingface.co/Maincode/Maincoder-1B}} | |
| } | |
| ``` | |
| ## Contact | |
| For questions, issues, or collaboration inquiries, please visit [Maincode](https://maincode.com). | |