Skip to content

[Bug]: Max Tokens not being honoured in Chat Completions for GPTOSS model #29641

@soodrohit

Description

@soodrohit

Your current environment

It seems that in the latest version of vllm 0.11+ Chat Completions has stopped honouring max_tokens with GPTOSS 120B model, the below request payload has stopped working with max_tokens earlier the same payload would provide an output to the limit of the max_tokens provided..

Interestingly if you look at the usage tokens, it's showing completion_tokens as 500 but the output is BLANK.

{
    "messages": [
        {
            "role": "user",
            "content": "What is the role of AI in medicine?"
        }
    ],
    "model": "openai/gpt-oss-120b",
    "max_tokens": 500,
    "reasoning": {"effort": "low"},
    "stream": false
}

getting BLANK output, even though the usage is showing token counts created is matching max_tokens

{
    "id": "chatcmpl-c71e934ac0b74bd4b8f99fe9b5516ea3",
    "object": "chat.completion",
    "created": 1764300020,
    "model": "openai/gpt-oss-120b",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": null,
                "refusal": null,
                "annotations": null,
                "audio": null,
                "function_call": null,
                "tool_calls": [],
                "reasoning": "Need to answer.",
                "reasoning_content": "Need to answer."
            },
            "logprobs": null,
            "finish_reason": "length",
            "stop_reason": null,
            "token_ids": null
        }
    ],
    "service_tier": null,
    "system_fingerprint": null,
    "usage": {
        "prompt_tokens": 78,
        "total_tokens": 578,
        "completion_tokens": 500,
        "prompt_tokens_details": null
    },
    "prompt_logprobs": null,
    "prompt_token_ids": null,
    "kv_transfer_params": null
}

When you remove the max_tokens, we get the output which shows usage_token to have completion_tokens to be around 1600 tokens..
It seems that starting from vllm 0.11+ version, the auto-truncation using the max_tokens has stopped working

{
    "id": "chatcmpl-61b60144d43147e2b007158712ad4920",
    "object": "chat.completion",
    "created": 1764300423,
    "model": "openai/gpt-oss-120b",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "**The role of AI in medicine is expanding rapidly and touches virtually every aspect of healthcare—from the way doctors diagnose patients to how hospitals run their operations.** Below is a structured overview that covers the major domains, concrete examples, benefits, challenges, and future directions.\n\n---\n\n## 1. Clinical Care\n\n| Sub‑area | What AI Does | Real‑World Examples | Benefits |\n|----------|--------------|---------------------|----------|\n| **Diagnostics** | Image analysis, pattern recognition, risk stratification | • Radiology: Google DeepMind’s AI detects lung cancer on CT scans with >95% accuracy.<br>• Dermatology: FDA‑cleared apps (e.g., SkinVision) classify skin lesions from photos.<br>• Pathology: Paige.ai assists in detecting prostate cancer in biopsy slides. | Faster, more consistent readings; can catch subtle findings that human eyes miss. |\n| **Predictive Analytics** | Forecast disease onset, complications, readmission risk | • Sepsis prediction models (e.g., Epic Sepsis Model) trigger alerts hours before clinical signs.<br>• Cardiovascular risk calculators incorporating genomics and wearables. | Enables proactive interventions, reduces morbidity and cost. |\n| **Treatment Planning** | Decision support, dose optimisation, drug selection | • IBM Watson for Oncology (clinical trial matching).<br>• Radiation oncology: AI‑driven dose‑painting to spare healthy tissue.<br>• Pharmacogenomics: AI predicts drug‑gene interactions. | Personalises therapy, improves outcomes, reduces adverse events. |\n| **Robotics & Minimally Invasive Surgery** | Real‑time image guidance, autonomous suturing, task automation | • Da Vinci Surgical System (augmented with AI for instrument tracking).<br>• VERDICT AI for autonomous suturing in animal models. | Increases precision, reduces surgeon fatigue, shortens recovery. |\n\n---\n\n## 2. Patient‑Facing Applications\n\n| Application | Description | Example |\n|-------------|-------------|---------|\n| **Virtual Assistants & Chatbots** | Symptom triage, medication reminders, mental‑health chat | • Babylon Health (AI‑driven triage).<br>• Woebot (CBT‑based mental‑health chatbot). |\n| **Telemedicine Enhancements** | Real‑time vitals extraction from video, automated note‑taking | • KardiaMobile ECG integration with AI‑based arrhythmia detection. |\n| **Wearables & Remote Monitoring** | Continuous data streams analysed for early alerts | • Apple Watch ECG + AI arrhythmia detection; Fitbit heart‑rate trend alerts. |\n\n---\n\n## 3. Operational & Administrative Efficiency\n\n| Domain | AI Functions | Example |\n|--------|--------------|---------|\n| **Scheduling & Resource Allocation** | Predictive staffing, OR utilization optimisation | • Qventus AI platform reduces ER wait times by 30% in pilot sites. |\n| **Revenue Cycle Management** | Claim coding validation, fraud detection | • Change Healthcare’s AI coding assistant. |\n| **Supply Chain** | Demand forecasting for meds, PPE | • GE Healthcare’s AI‐driven inventory management. |\n\n---\n\n## 4. Research & Drug Development\n\n| Stage | AI Contribution | Notable Projects |\n|-------|----------------|------------------|\n| **Target Identification** | Deep learning on genomics & proteomics | • Insilico Medicine discovered a novel DDR1 inhibitor in 46 days. |\n| **Compound Screening** | Virtual screening of billions of molecules | • Atomwise’s AI screened 10M compounds for COVID‑19 antivirals. |\n| **Clinical Trial Design** | Patient‑centering enrollment, adaptive trial simulations | • Deep 6 AI matches patients to trials with 4× higher enrollment speed. |\n\n---\n\n## 5. Public Health & Population Health\n\n* **Epidemiology** – AI models (e.g., BlueDot) flagged COVID‑19 spread days before WHO alerts.  \n* **Health Equity** – Bias‑aware algorithms identify underserved populations for targeted interventions.  \n* **Surveillance** – AI parses social‑media, EMS calls, and wastewater data for outbreak detection.\n\n---\n\n## 6. Benefits at a Glance\n\n| Dimension | Impact |\n|-----------|--------|\n| **Speed** | Real‑time image and data processing → quicker diagnosis. |\n| **Accuracy** | Reduced inter‑observer variability; higher sensitivity/specificity. |\n| **Scalability** | Extends specialist expertise to remote or low‑resource settings. |\n| **Cost Savings** | Preventive alerts lower expensive complications; automation cuts labor costs. |\n| **Personalisation** | Tailors treatment to genetic, lifestyle, and environmental factors. |\n\n---\n\n## 7. Key Challenges & Risks\n\n1. **Data Quality & Bias**  \n   - Training data often lacks diversity → risk of health disparities.  \n   - Need rigorous bias‑mitigation pipelines (e.g., fairness metrics, adversarial debiasing).\n\n2. **Interpretability & Trust**  \n   - Black‑box models hinder clinician acceptance.  \n   - Emerging solutions: Explainable AI (XAI) dashboards, attention maps, counterfactual explanations.\n\n3. **Regulatory & Legal Landscape**  \n   - FDA’s “Software as a Medical Device (SaMD)” pathways, EU’s AI Act, and emerging global standards.  \n   - Liability unclear when AI recommendations lead to harm.\n\n4. **Integration with Clinical Workflow**  \n   - Alert fatigue, EMR incompatibility, and need for seamless UI/UX.  \n   - Human‑in‑the‑loop design is critical.\n\n5. **Data Privacy & Security**  \n   - HIPAA, GDPR, and emerging “AI‑specific” regulations require robust de‑identification and federated learning techniques.\n\n---\n\n## 8. Future Outlook (Next 5‑10 Years)\n\n| Trend | What to Expect |\n|-------|----------------|\n| **Federated & Edge AI** | Models trained on device (e.g., wearables) without moving PHI, preserving privacy. |\n| **Multimodal Foundation Models** | Large language/vision models (e.g., MedPaLM, ClinicalBERT‑2) that ingest notes, imaging, labs simultaneously for holistic suggestions. |\n| **AI‑driven Clinical Trials** | Real‑time adaptive designs powered by continuous data streams, shrinking development timelines. |\n| **Digital Twins of Patients** | Simulated virtual patients for therapy testing, surgical planning, and disease progression forecasting. |\n| **AI Governance Frameworks** | Standardized audit trails, certification bodies (e.g., ISO 82304‑2), and “AI ethics boards” embedded in hospitals. |\n\n---\n\n## 9. Practical Take‑aways for Stakeholders\n\n| Role | Actionable Steps |\n|------|------------------|\n| **Clinicians** | • Start with FDA‑cleared decision‑support tools.<br>• Participate in model validation studies.<br>• Keep a “human‑in‑the‑loop” mindset. |\n| **Hospital Administrators** | • Conduct ROI analyses for AI pilots.<br>• Build a cross‑functional AI governance committee.<br>• Invest in data infrastructure (FHIR, interoperable APIs). |\n| **Patients** | • Ask providers how AI influences their care.<br>• Review consent forms for data use.<br>• Use FDA‑approved consumer health apps and verify privacy policies. |\n| **Developers / Researchers** | • Prioritize diverse datasets and bias testing.<br>• Implement explainability from day one.<br>• Align with regulatory pathways early (e.g., pre‑submissions to FDA). |\n\n---\n\n### TL;DR\n\nAI is reshaping medicine across **diagnosis, treatment planning, surgery, patient engagement, operations, research, and public health**. It brings speed, accuracy, scalability, and personalization, but it also raises challenges around bias, interpretability, regulation, workflow integration, and privacy. Successful adoption will hinge on thoughtful governance, transparent models, and a collaborative “human‑AI partnership.”",
                "refusal": null,
                "annotations": null,
                "audio": null,
                "function_call": null,
                "tool_calls": [],
                "reasoning": "Need to answer.",
                "reasoning_content": "Need to answer."
            },
            "logprobs": null,
            "finish_reason": "stop",
            "stop_reason": null,
            "token_ids": null
        }
    ],
    "service_tier": null,
    "system_fingerprint": null,
    "usage": {
        "prompt_tokens": 78,
        "total_tokens": 1743,
        "completion_tokens": 1665,
        "prompt_tokens_details": null
    },
    "prompt_logprobs": null,
    "prompt_token_ids": null,
    "kv_transfer_params": null
}

### 🐛 Describe the bug

from vllm import LLM, SamplingParams

prompts = [ "What is the role of AI in medicine?"]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(model="openai/gpt-oss-120b")

outputs = llm.generate(prompts, sampling_params)

# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions