Natasha/Navec news_v1 Model Card

This repository provides a Sentence-Transformers version of the Natasha/Navec news_v1 word embeddings model.

The underlying Navec embedding weights are unchanged. This revision adds an explicit Normalize module after StaticEmbedding, so model.encode(...) returns L2-normalized sentence embeddings by default.

Source

The original word embeddings come from the Navec project:

Repository: https://github.com/natasha/navec
Authors: Natasha NLP project
License: MIT License

Navec is a compact and efficient set of Russian word embeddings trained on Russian corpora.

Usage

pip install -U sentence-transformers

from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer("BorisTM/natasha_navec_news_v1_1B_250K_300d_100q")

sentences = [
    "Сегодня хорошая погода.",
    "На улице солнечно.",
    "Команда выиграла матч.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)  # (3, 300)
print(np.linalg.norm(embeddings, axis=1))  # close to 1.0

similarities = model.similarity(embeddings, embeddings)
print(similarities)

Results

Results on MTEB (rus, v1.1), evaluated with normalized embeddings. Scores are percentages.

Task	navec_hudlit_v1_12B_500K_300d_100q	`navec_news_v1_1B_250K_300d_100q`
Mean (Task, 23 tasks)	36.94	36.31
Mean (Task Type)	34.59	34.23
CEDRClassification	34.14	32.81
GeoreviewClassification	33.09	34.03
GeoreviewClusteringP2P	34.10	28.78
HeadlineClassification	55.33	63.47
InappropriatenessClassification	53.39	53.15
KinopoiskClassification	45.25	44.89
MassiveIntentClassification	48.54	43.86
MassiveScenarioClassification	55.20	49.88
MIRACLReranking	10.88	10.88
MIRACLRetrievalHardNegatives.v2	1.75	1.60
RiaNewsRetrievalHardNegatives.v2	15.43	23.47
RuBQReranking	38.00	37.71
RuBQRetrieval	5.80	5.09
RUParaPhraserSTS	41.38	41.12
RuReviewsClassification	49.35	48.80
RuSciBenchGRNTIClassification	43.63	40.54
RuSciBenchGRNTIClusteringP2P	40.94	38.43
RuSciBenchOECDClassification	35.62	32.70
RuSciBenchOECDClusteringP2P	36.89	33.98
SensitiveTopicsClassification	19.49	18.57
STS22	50.20	51.57
TERRa	52.57	53.99
RuSTSBenchmarkSTS	48.59	45.84

Evaluation artifacts for this update are stored locally in the article workspace under data/metrics/navec_baselines_mteb_rus_v1_1_summary.csv and data/metrics/navec_baselines_mteb_rus_v1_1_task_scores.csv.

License

MIT

Contact

quelquemath@gmail.com

tg: @btmalov

Downloads last month: -; Downloads are not tracked for this model. How to track