Responses API, Structured Extraction, and Web Search

Why the Responses API matters

Microsoft Foundry now exposes a newer v1 data-plane endpoint for Azure OpenAI:

https://<resource>.openai.azure.com/openai/v1/responses

Unlike the older deployment-path chat API, the v1 Responses API sends the model deployment in the JSON body. It also adds stateful response chaining, built-in tools, structured output formats, and richer output metadata.

foundryR wraps this with foundry_response() while keeping the package’s tidy interface: generated text, citations, tool calls, token usage, and the raw response are returned as tibble columns.

The examples below omit model =, so foundryR reads the deployment from AZURE_FOUNDRY_MODEL. Set it once, or pass model = to override per call.

Basic response

library(foundryR)

foundry_response(
  "Answer in one sentence: what is retrieval-augmented generation?"
)
#> # A tibble: 1 × 17
#>   response_id     status model output_text structured structured_error citations
#>   <chr>           <chr>  <chr> <chr>       <list>     <chr>            <list>   
#> 1 resp_08bfb26cf… compl… gpt-… Retrieval-… <NULL>     NA               <tibble> 
#> # ℹ 10 more variables: tool_calls <list>, refusal <chr>,
#> #   incomplete_reason <chr>, created_at <dttm>, input_tokens <int>,
#> #   output_tokens <int>, reasoning_tokens <int>, cached_input_tokens <int>,
#> #   total_tokens <int>, raw_response <list>

The result includes:

response_id: the stored Responses API object ID
output_text: generated text aggregated from the response output items
citations: a list-column of URL citations, when present
tool_calls: a list-column of tool calls, such as web-search calls
token usage columns, including reasoning and cached input tokens when the API reports them
a raw_response list-column

Stateful turns

Responses are stored by the service by default. You can chain turns by passing the previous response_id:

first <- foundry_response(
  "Define catastrophic forgetting in one sentence."
)

second <- foundry_response(
  "Explain it for a college freshman in one sentence.",
  previous_response_id = first$response_id
)

second$output_text
#> [1] "Catastrophic forgetting is when a model forgets how to do earlier tasks after being trained on new ones because the updates to its parameters overwrite or disrupt the previously learned knowledge."

If you do not want the service to store a response, pass store = FALSE. Stateful chaining with previous_response_id requires the previous response to be stored.

Structured extraction with JSON Schema

foundry_extract() is designed for data scientists and researchers who need to turn free text into analyzable variables. You provide a JSON Schema as an R list; foundryR sends it through the Responses API structured output format and returns one row per input text.

foundry_extract() sends strict = TRUE in the JSON Schema format by default. For supported models, the service must return data that conforms to the schema.

schema <- list(
  type = "object",
  properties = list(
    sentiment = list(
      type = "string",
      enum = c("positive", "negative", "neutral")
    ),
    entities = list(
      type = "array",
      items = list(type = "string")
    ),
    summary = list(type = "string")
  ),
  required = c("sentiment", "entities", "summary"),
  additionalProperties = FALSE
)

texts <- c(
  "The new data pipeline reduced manual coding time by half.",
  "Participants reported confusion about the consent form."
)

foundry_extract(
  texts,
  schema = schema
)
#> # A tibble: 2 × 11
#>   .input_idx .input_text     .response_id .status .output_text .error .error_msg
#>        <int> <chr>           <chr>        <chr>   <chr>        <lgl>  <chr>     
#> 1          1 The new data p… resp_0daf26… comple… "{\"sentime… FALSE  NA        
#> 2          2 Participants r… resp_0df47e… comple… "{\"sentime… FALSE  NA        
#> # ℹ 4 more variables: raw_response <list>, sentiment <chr>, entities <list>,
#> #   summary <chr>

Top-level scalar fields become regular tibble columns. Arrays and nested objects become list-columns, which work naturally with tidyverse workflows.

User-defined R tools

The Responses API function-calling contract uses tool definitions with type = "function" and follow-up tool outputs with type = "function_call_output" plus a matching call_id. foundry_tool() builds the tool schema and keeps the R function reference for local execution. foundry_agent() runs the bounded call, execute, return-output loop.

get_weather <- function(location) {
  list(location = location, temperature = "70 F")
}

weather_tool <- foundry_tool(
  get_weather,
  description = "Get weather for a location",
  parameters = list(
    type = "object",
    properties = list(location = list(type = "string")),
    required = "location"
  )
)

turns <- foundry_agent(
  "What is the weather in San Francisco?",
  tools = list(weather_tool),
  max_iterations = 4
)

turns[, c("iteration", "final", "output_text")]
turns$tool_results[[1]]

The loop stops with an error if the model continues requesting tools after max_iterations. This protects batch jobs from unbounded tool use.

Remote MCP tools

Microsoft documents remote Model Context Protocol tools for the Responses API. foundryR does not add a separate MCP helper yet because foundry_response() already accepts raw Responses API tool objects:

mcp_tool <- list(
  type = "mcp",
  server_label = "my_mcp_server",
  server_url = Sys.getenv("MY_MCP_SERVER_URL"),
  require_approval = "never"
)

foundry_response(
  "Use the MCP server if it helps answer the question.",
  tools = list(mcp_tool)
)

Only attach MCP servers you trust and whose data-handling behavior your organization has approved.

Web-grounded answers with citations

foundry_web_search() uses the Responses API web_search tool and parses URL citations into a tidy list-column:

answer <- foundry_web_search(
  "What changed recently in Azure AI Foundry Responses API?",
  search_context_size = "high"
)

answer$output_text
answer$citations[[1]]
answer$tool_calls[[1]]

You can optionally provide approximate location fields:

foundry_web_search(
  "Find a recent AI research event near me.",
  country = "US",
  region = "Washington",
  city = "Seattle",
  timezone = "America/Los_Angeles"
)

Responsible use of web search

Microsoft documents that web search uses Grounding with Bing Search and/or Grounding with Bing Custom Search. The Data Protection Addendum does not apply to data sent to these services, data can leave compliance and geographic boundaries, and tool usage can incur additional costs. Avoid sending secrets or sensitive research data in web-search prompts.

Reasoning models and token accounting

foundry_response() accepts reasoning_effort and sends it as reasoning = list(effort = ...), the Responses API shape documented by Microsoft for reasoning models. foundry_chat() accepts the chat-completions shape, reasoning_effort = "medium".

foundry_response(
  "Compare the two arguments and identify the weaker premise.",
  model = "my-reasoning-deployment",
  reasoning_effort = "medium"
)

The returned tibble includes reasoning_tokens and cached_input_tokens when the API reports them. These fields matter for cost review because reasoning tokens may be billed even when they are not visible in output_text.

Streaming

The Azure OpenAI Responses API supports Server-Sent Events streaming, but foundryR does not implement streaming. The package focuses on reproducible, tibble-returning analytical workflows. Use ellmer when you need interactive streaming chat in R.

When to use `foundry_chat()` vs `foundry_response()`

Use foundry_chat() when you want the established chat-completions interface and simple assistant replies.

Use foundry_response() when you need newer v1 capabilities: stateful response IDs, built-in tools, structured output formats, richer output items, or a forward-looking API surface for new Microsoft Foundry model capabilities.