-
Notifications
You must be signed in to change notification settings - Fork 611
Open
Description
Feature request
I got the following error while trying to optimise these embeddings with ONNX
KeyError: 'gemma3_text model type is not supported yet in NormalizedConfig. Only albert, bart, bert, big_bird, bigbird_pegasus, blenderbot, blenderbot-small, bloom, falcon, camembert, codegen, cvt, deberta, deberta-v2, deit, dinov2, distilbert, donut-swin, electra, encoder-decoder, gemma, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gptj, imagegpt, internlm2, llama, longt5, marian, markuplm, mbart, mistral, mixtral, modernbert, mpnet, mpt, mt5, m2m_100, nystromformer, olmo, olmo2, opt, pegasus, pix2struct, phi, phi3, poolformer, regnet, resnet, roberta, segformer, speech_to_text, splinter, t5, trocr, vision-encoder-decoder, vit, whisper, xlm-roberta, yolos, qwen2, qwen3, qwen3_moe, smollm3, granite, clip are supported. If you want to support gemma3_text please propose a PR or open up an issue.'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/louis/lab/emb/tarka/.venv/bin/optimum-cli", line 10, in <module>
sys.exit(main())
~~~~^^
File "/home/louis/lab/emb/tarka/.venv/lib/python3.13/site-packages/optimum/commands/optimum_cli.py", line 219, in main
service.run()
~~~~~~~~~~~^^
File "/home/louis/lab/emb/tarka/.venv/lib/python3.13/site-packages/optimum/commands/export/onnx.py", line 264, in run
main_export(
~~~~~~~~~~~^
model_name_or_path=self.args.model,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<20 lines>...
**input_shapes,
^^^^^^^^^^^^^^^
)
^
File "/home/louis/lab/emb/tarka/.venv/lib/python3.13/site-packages/optimum/exporters/onnx/__main__.py", line 399, in main_export
onnx_export_from_model(
~~~~~~~~~~~~~~~~~~~~~~^
model=model,
^^^^^^^^^^^^
...<18 lines>...
**kwargs_shapes,
^^^^^^^^^^^^^^^^
)
^
File "/home/louis/lab/emb/tarka/.venv/lib/python3.13/site-packages/optimum/exporters/onnx/convert.py", line 1096, in onnx_export_from_model
optimizer = ORTOptimizer.from_pretrained(output, file_names=onnx_files_subpaths)
File "/home/louis/lab/emb/tarka/.venv/lib/python3.13/site-packages/optimum/onnxruntime/optimization.py", line 126, in from_pretrained
return cls(onnx_model_path, config=config, from_ortmodel=from_ortmodel)
File "/home/louis/lab/emb/tarka/.venv/lib/python3.13/site-packages/optimum/onnxruntime/optimization.py", line 75, in __init__
raise NotImplementedError(
...<2 lines>...
)
NotImplementedError: Tried to use ORTOptimizer for the model type gemma3_text, but it is not available yet. Please open an issue or submit a PR at https://github.com/huggingface/optimum.
Motivation
The resulting embeddings are very fast on GPU but slow (20x slower) on CPU! This must surely be from lack of optimisation (-O2 and -O3 do not work)
Your contribution
I can look into it but the error message said to report it here first!
To reproduce:
# Try exporting with specific CPU optimization
uv pip install sentence-transformers optimum[onnxruntime-gpu]
optimum-cli export onnx \
--model Tarka-AIR/Tarka-Embedding-150M-V1 \
--task feature-extraction \
--optimize O3 \
tarka-150m-v1-onnx-o3/
Metadata
Metadata
Assignees
Labels
No labels