[Bug]: Max Tokens not being honoured in Chat Completions for GPTOSS model

### Your current environment

It seems that in the latest version of vllm 0.11+ Chat Completions has stopped honouring `max_tokens` with GPTOSS 120B model, the below request payload has stopped working with `max_tokens` earlier the same payload would provide an output to the limit of the `max_tokens` provided.. 

Interestingly if you look at the `usage` tokens, it's showing `completion_tokens` as 500 but the output is BLANK.

```json
{
 "messages": [
 {
 "role": "user",
 "content": "What is the role of AI in medicine?"
 }
 ],
 "model": "openai/gpt-oss-120b",
 "max_tokens": 500,
 "reasoning": {"effort": "low"},
 "stream": false
}
```

getting BLANK output, even though the `usage` is showing token counts created is matching max_tokens 

```json
{
 "id": "chatcmpl-c71e934ac0b74bd4b8f99fe9b5516ea3",
 "object": "chat.completion",
 "created": 1764300020,
 "model": "openai/gpt-oss-120b",
 "choices": [
 {
 "index": 0,
 "message": {
 "role": "assistant",
 "content": null,
 "refusal": null,
 "annotations": null,
 "audio": null,
 "function_call": null,
 "tool_calls": [],
 "reasoning": "Need to answer.",
 "reasoning_content": "Need to answer."
 },
 "logprobs": null,
 "finish_reason": "length",
 "stop_reason": null,
 "token_ids": null
 }
 ],
 "service_tier": null,
 "system_fingerprint": null,
 "usage": {
 "prompt_tokens": 78,
 "total_tokens": 578,
 "completion_tokens": 500,
 "prompt_tokens_details": null
 },
 "prompt_logprobs": null,
 "prompt_token_ids": null,
 "kv_transfer_params": null
}
```

When you remove the `max_tokens`, we get the output which shows `usage_token` to have `completion_tokens` to be around 1600 tokens..
It seems that starting from vllm 0.11+ version, the auto-truncation using the `max_tokens` has stopped working

```json
{
 "id": "chatcmpl-61b60144d43147e2b007158712ad4920",
 "object": "chat.completion",
 "created": 1764300423,
 "model": "openai/gpt-oss-120b",
 "choices": [
 {
 "index": 0,
 "message": {
 "role": "assistant",
 "content": "**The role of AI in medicine is expanding rapidly and touches virtually every aspect of healthcare—from the way doctors diagnose patients to how hospitals run their operations.** Below is a structured overview that covers the major domains, concrete examples, benefits, challenges, and future directions.\n\n---\n\n## 1. Clinical Care\n\n| Sub‑area | What AI Does | Real‑World Examples | Benefits |\n|----------|--------------|---------------------|----------|\n| **Diagnostics** | Image analysis, pattern recognition, risk stratification | • Radiology: Google DeepMind’s AI detects lung cancer on CT scans with >95% accuracy. • Dermatology: FDA‑cleared apps (e.g., SkinVision) classify skin lesions from photos. • Pathology: Paige.ai assists in detecting prostate cancer in biopsy slides. | Faster, more consistent readings; can catch subtle findings that human eyes miss. |\n| **Predictive Analytics** | Forecast disease onset, complications, readmission risk | • Sepsis prediction models (e.g., Epic Sepsis Model) trigger alerts hours before clinical signs. • Cardiovascular risk calculators incorporating genomics and wearables. | Enables proactive interventions, reduces morbidity and cost. |\n| **Treatment Planning** | Decision support, dose optimisation, drug selection | • IBM Watson for Oncology (clinical trial matching). • Radiation oncology: AI‑driven dose‑painting to spare healthy tissue. • Pharmacogenomics: AI predicts drug‑gene interactions. | Personalises therapy, improves outcomes, reduces adverse events. |\n| **Robotics & Minimally Invasive Surgery** | Real‑time image guidance, autonomous suturing, task automation | • Da Vinci Surgical System (augmented with AI for instrument tracking). • VERDICT AI for autonomous suturing in animal models. | Increases precision, reduces surgeon fatigue, shortens recovery. |\n\n---\n\n## 2. Patient‑Facing Applications\n\n| Application | Description | Example |\n|-------------|-------------|---------|\n| **Virtual Assistants & Chatbots** | Symptom triage, medication reminders, mental‑health chat | • Babylon Health (AI‑driven triage). • Woebot (CBT‑based mental‑health chatbot). |\n| **Telemedicine Enhancements** | Real‑time vitals extraction from video, automated note‑taking | • KardiaMobile ECG integration with AI‑based arrhythmia detection. |\n| **Wearables & Remote Monitoring** | Continuous data streams analysed for early alerts | • Apple Watch ECG + AI arrhythmia detection; Fitbit heart‑rate trend alerts. |\n\n---\n\n## 3. Operational & Administrative Efficiency\n\n| Domain | AI Functions | Example |\n|--------|--------------|---------|\n| **Scheduling & Resource Allocation** | Predictive staffing, OR utilization optimisation | • Qventus AI platform reduces ER wait times by 30% in pilot sites. |\n| **Revenue Cycle Management** | Claim coding validation, fraud detection | • Change Healthcare’s AI coding assistant. |\n| **Supply Chain** | Demand forecasting for meds, PPE | • GE Healthcare’s AI‐driven inventory management. |\n\n---\n\n## 4. Research & Drug Development\n\n| Stage | AI Contribution | Notable Projects |\n|-------|----------------|------------------|\n| **Target Identification** | Deep learning on genomics & proteomics | • Insilico Medicine discovered a novel DDR1 inhibitor in 46 days. |\n| **Compound Screening** | Virtual screening of billions of molecules | • Atomwise’s AI screened 10M compounds for COVID‑19 antivirals. |\n| **Clinical Trial Design** | Patient‑centering enrollment, adaptive trial simulations | • Deep 6 AI matches patients to trials with 4× higher enrollment speed. |\n\n---\n\n## 5. Public Health & Population Health\n\n* **Epidemiology** – AI models (e.g., BlueDot) flagged COVID‑19 spread days before WHO alerts. \n* **Health Equity** – Bias‑aware algorithms identify underserved populations for targeted interventions. \n* **Surveillance** – AI parses social‑media, EMS calls, and wastewater data for outbreak detection.\n\n---\n\n## 6. Benefits at a Glance\n\n| Dimension | Impact |\n|-----------|--------|\n| **Speed** | Real‑time image and data processing → quicker diagnosis. |\n| **Accuracy** | Reduced inter‑observer variability; higher sensitivity/specificity. |\n| **Scalability** | Extends specialist expertise to remote or low‑resource settings. |\n| **Cost Savings** | Preventive alerts lower expensive complications; automation cuts labor costs. |\n| **Personalisation** | Tailors treatment to genetic, lifestyle, and environmental factors. |\n\n---\n\n## 7. Key Challenges & Risks\n\n1. **Data Quality & Bias** \n - Training data often lacks diversity → risk of health disparities. \n - Need rigorous bias‑mitigation pipelines (e.g., fairness metrics, adversarial debiasing).\n\n2. **Interpretability & Trust** \n - Black‑box models hinder clinician acceptance. \n - Emerging solutions: Explainable AI (XAI) dashboards, attention maps, counterfactual explanations.\n\n3. **Regulatory & Legal Landscape** \n - FDA’s “Software as a Medical Device (SaMD)” pathways, EU’s AI Act, and emerging global standards. \n - Liability unclear when AI recommendations lead to harm.\n\n4. **Integration with Clinical Workflow** \n - Alert fatigue, EMR incompatibility, and need for seamless UI/UX. \n - Human‑in‑the‑loop design is critical.\n\n5. **Data Privacy & Security** \n - HIPAA, GDPR, and emerging “AI‑specific” regulations require robust de‑identification and federated learning techniques.\n\n---\n\n## 8. Future Outlook (Next 5‑10 Years)\n\n| Trend | What to Expect |\n|-------|----------------|\n| **Federated & Edge AI** | Models trained on device (e.g., wearables) without moving PHI, preserving privacy. |\n| **Multimodal Foundation Models** | Large language/vision models (e.g., MedPaLM, ClinicalBERT‑2) that ingest notes, imaging, labs simultaneously for holistic suggestions. |\n| **AI‑driven Clinical Trials** | Real‑time adaptive designs powered by continuous data streams, shrinking development timelines. |\n| **Digital Twins of Patients** | Simulated virtual patients for therapy testing, surgical planning, and disease progression forecasting. |\n| **AI Governance Frameworks** | Standardized audit trails, certification bodies (e.g., ISO 82304‑2), and “AI ethics boards” embedded in hospitals. |\n\n---\n\n## 9. Practical Take‑aways for Stakeholders\n\n| Role | Actionable Steps |\n|------|------------------|\n| **Clinicians** | • Start with FDA‑cleared decision‑support tools. • Participate in model validation studies. • Keep a “human‑in‑the‑loop” mindset. |\n| **Hospital Administrators** | • Conduct ROI analyses for AI pilots. • Build a cross‑functional AI governance committee. • Invest in data infrastructure (FHIR, interoperable APIs). |\n| **Patients** | • Ask providers how AI influences their care. • Review consent forms for data use. • Use FDA‑approved consumer health apps and verify privacy policies. |\n| **Developers / Researchers** | • Prioritize diverse datasets and bias testing. • Implement explainability from day one. • Align with regulatory pathways early (e.g., pre‑submissions to FDA). |\n\n---\n\n### TL;DR\n\nAI is reshaping medicine across **diagnosis, treatment planning, surgery, patient engagement, operations, research, and public health**. It brings speed, accuracy, scalability, and personalization, but it also raises challenges around bias, interpretability, regulation, workflow integration, and privacy. Successful adoption will hinge on thoughtful governance, transparent models, and a collaborative “human‑AI partnership.”",
 "refusal": null,
 "annotations": null,
 "audio": null,
 "function_call": null,
 "tool_calls": [],
 "reasoning": "Need to answer.",
 "reasoning_content": "Need to answer."
 },
 "logprobs": null,
 "finish_reason": "stop",
 "stop_reason": null,
 "token_ids": null
 }
 ],
 "service_tier": null,
 "system_fingerprint": null,
 "usage": {
 "prompt_tokens": 78,
 "total_tokens": 1743,
 "completion_tokens": 1665,
 "prompt_tokens_details": null
 },
 "prompt_logprobs": null,
 "prompt_token_ids": null,
 "kv_transfer_params": null
}
```
```

### 🐛 Describe the bug

from vllm import LLM, SamplingParams

prompts = [ "What is the role of AI in medicine?"]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(model="openai/gpt-oss-120b")

outputs = llm.generate(prompts, sampling_params)

# Print the outputs.
for output in outputs:
 prompt = output.prompt
 generated_text = output.outputs[0].text
 print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Max Tokens not being honoured in Chat Completions for GPTOSS model #29641

Your current environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Max Tokens not being honoured in Chat Completions for GPTOSS model #29641

Description

Your current environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions