Instructions to use sensiarion/CodeRankEmbed-f16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use sensiarion/CodeRankEmbed-f16 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("sensiarion/CodeRankEmbed-f16", trust_remote_code=True) sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
CodeRankEmbed-f16
An f16 (half-precision) cast of
nomic-ai/CodeRankEmbed
โ the 137M NomicBert bi-encoder for code retrieval โ in safetensors,
for GPU inference (e.g. candle on Apple-Silicon Metal) at roughly half
the memory of the f32 base.
This repo is weights only, identical architecture: every tensor is
the base model cast f32 โ f16, tensor names/shapes unchanged. Use it
exactly like the base model (same config.json, tokenizer.json, CLS
pooling, and the required query instruction prefix).
Why
The base repo ships f32 safetensors (~547 MB). On the Metal GPU the
f16 weights halve the working set and matmul bandwidth with no change
to retrieval quality, so it is the form used by
embedding-search on Apple Silicon.
Validation (f16 vs f32, CodeSearchNet Python, N=300)
Same code/corpus, dtype the only difference:
| dtype | peak RSS | MRR@10 | Recall@1 |
|---|---|---|---|
| f32 (base) | 1116 MB | 0.9573 | 0.9367 |
| f16 (this) | 570 MB | 0.9573 | 0.9367 |
cosine(f16, f32)per-document: mean 0.999998, min 0.999996- top-1 retrieval agreement f16 vs f32: 1.0000
- MRR@10 / Recall@1 deltas: 0.0000
f16 is numerically a no-op for retrieval at about half the RAM. (The absolute MRR is high because the eval uses a small 300-doc distractor pool โ it is an f16-vs-f32 parity check, not a full-CodeSearchNet reproduction of the base model's published score.)
Usage
The query must use the task instruction prefix (same as the base model); code/documents get no prefix:
Represent this query for searching relevant code: <your query>
CLS-pool the last hidden state and L2-normalize; cosine similarity for ranking.
Provenance & license
Produced by a pure dtype cast (CPU, candle) of
nomic-ai/CodeRankEmbed model.safetensors; config.json and
tokenizer.json copied unchanged. Inherits the base model's MIT
license. Credit and citation belong to the original authors โ see the
base model card and
the CoRNStack paper (arXiv:2412.01007).
- Downloads last month
- 34
Model tree for sensiarion/CodeRankEmbed-f16
Base model
Snowflake/snowflake-arctic-embed-m-long