Instructions to use infly/Infinity-Parser2-Pro with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use infly/Infinity-Parser2-Pro with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="infly/Infinity-Parser2-Pro") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("infly/Infinity-Parser2-Pro") model = AutoModelForImageTextToText.from_pretrained("infly/Infinity-Parser2-Pro") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use infly/Infinity-Parser2-Pro with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "infly/Infinity-Parser2-Pro" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "infly/Infinity-Parser2-Pro", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/infly/Infinity-Parser2-Pro
- SGLang
How to use infly/Infinity-Parser2-Pro with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "infly/Infinity-Parser2-Pro" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "infly/Infinity-Parser2-Pro", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "infly/Infinity-Parser2-Pro" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "infly/Infinity-Parser2-Pro", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use infly/Infinity-Parser2-Pro with Docker Model Runner:
docker model run hf.co/infly/Infinity-Parser2-Pro
BBoxes are super weird
Hi! Thank you for releasing Infinity-Parser2-Pro.
I am testing the model for document layout parsing and noticed an issue with bounding boxes on some real documents.
For example, I send a non-square document image to the model, e.g. 1280x923 px. The raw model output returns bboxes in normalized [0..1000] coordinates, which seems consistent with the official Python package postprocessing (restore_abs_bbox_coordinates, scaling bbox coordinates from 0..1000 back to image pixels).
However, after applying the same scaling formula:
x_px = x / 1000 * image_width
y_px = y / 1000 * image_height
some bounding boxes still look inaccurate. They are often shifted or too large, especially around tables, flowcharts, dense text areas, and structured documents. In some cases the boxes look more like rough layout regions than precise text/element boxes.
Could you please clarify:
- Are raw bbox coordinates always expected to be normalized to a 1000x1000 coordinate space?
- Should they be scaled independently by image width and image height, as done in
restore_abs_bbox_coordinates? - Are the bboxes intended to be precise text/element boxes, or only approximate layout regions?
- Is there a recommended prompt or postprocessing step to improve bbox accuracy?
- For non-square images, is there any internal resize/letterbox/crop behavior that should be accounted for when mapping bboxes back to pixels?
I can share examples where the image is 1280x923, the model returns bbox values like [53, 103, 109, 144], and scaling them to pixels mostly follows the document layout, but the resulting boxes are still visibly misaligned or overly broad.
Thanks!
Thanks for the detailed report and for testing Infinity-Parser2-Pro!
To address your questions:
- The coordinate pipeline is correct. The model predicts bounding boxes in a 1000×1000 normalized coordinate space and scales them back to pixel coordinates using x_px = x / 1000 * image_width and y_px = y / 1000 * image_height — the same strategy used by Qwen3-VL and Qwen3.5.
- The model was primarily trained on high-resolution document images. For lower-resolution inputs (such as your 648 × 781 example), localization accuracy may degrade, which can result in shifted or overly broad bounding boxes — especially around dense regions like tables, flowcharts, and structured layouts. We plan to address this in the next release.
- Recommended workaround. As a temporary fix, we recommend upscaling the image so that its longer side is at least 1000 pixels before passing it to the model. This should meaningfully improve bounding box accuracy for lower-resolution inputs. We tested your example by resizing the image to 923×1280 (long side ≥ 1000), and the bounding boxes were accurate.
We appreciate you surfacing this with concrete examples — feedback like this directly informs our next iteration.


