HomeDocumentationAPI Reference
Documentation

Create an Automated Pipeline

Introduction

This tutorial shows how to build an automated pipeline that pulls UserTesting Results API data and loads it into a data warehouse (such as Snowflake, BigQuery, or Redshift) on a scheduled basis.

The goal is to enable nightly or periodic syncs of session-level research data so analytics, BI, and AI teams can work with UserTesting insights alongside product and business metrics.

What you'll build

A repeatable pipeline that:

  1. Uses a known testId to list all completed sessions
  2. Pages through session results safely
  3. Retrieves structured session details from the v3 endpoint
  4. Retrieves session transcripts
  5. Loads the data into warehouse-friendly tables

Target audience

  • Data engineers
  • Analytics engineers
  • Research operations teams
  • ML / AI teams preparing data for downstream analysis

Prerequisites

  • A valid access token (ACCESS_TOKEN). Go to Authorization for details.
  • A known test ID, referred to as TEST_ID or testId.
  • A scheduler (cron, Airflow, dbt Cloud, GitHub Actions, etc.)
  • A destination warehouse (Snowflake, BigQuery, Redshift, etc.)

High-level architecture

  1. Scheduler triggers the job (nightly or on demand)
  2. Extractor calls the Results API
  3. Transformer normalizes JSON into tables
  4. Loader writes data to the warehouse

graph LR;
    Scheduler --> ResultsAPI;
    ResultsAPI --> JSON;
    JSON --> Normalize;
    Normalize --> WarehouseTables;

Steps

Step 1 - List sessions for a test (pagination aware)

Endpoint

GET /api/v2/sessionResults

Why this matters

  • Tests can have many sessions (use obtained session IDs on the endpoints of the following steps)
  • Pagination must be handled correctly for full exports

Example (curl)

curl --location 'https://api.use2.usertesting.com/api/v2/sessionResults?testId=TEST_ID&limit=25&offset=50' \
  --header 'Authorization: Bearer ACCESS_TOKEN' \
  --header 'Content-Type: application/json'
👍

Best Practice: Always use pagination query parameters (limit and offset) along with the appropriate logic to ensure all sessions are retrieved.

  • Use the response data from limit, offset, and totalCount in the meta.pagination property to calculate the number of page iterations needed to collect all sessions in the test.
  • By default, limit is set to 25 and offset to 0.
  • In the response, sessions are sorted in descending order, from newest to oldest.

Key fields to store

  • testId
  • sessionId
  • status
  • startTime
  • finishTime
  • audienceId (if present)

Pagination logic

  • Retrieve meta.pagination.totalCount to determine the total number of sessions.
  • Increment the offset by the limit value in each request until the offset reaches or exceeds the totalCount, ensuring all sessions are retrieved.
  • Save a checkpoint if running in incremental mode to track progress and avoid reprocessing.

Step 2 - Retrieve session details

Endpoint

GET /api/v3/sessionResults/SESSION_ID

Purpose

This endpoint provides the core structured research data, including:

  • Session, audience, and test plan identifiers
  • Participant demographics under sessionParticipant
  • Task-group metadata under taskGroups[]
  • Per-task responses under tasks[]

Example (curl)

curl --location 'https://api.use2.usertesting.com/api/v3/sessionResults/SESSION_ID' \
  --header 'Authorization: Bearer ACCESS_TOKEN' \
  --header 'Content-Type: application/json'

What the v3 response looks like

{
  "sessionId": "UUID|null",
  "audienceId": "UUID|null",
  "testPlanId": "UUID|null",
  "sessionParticipant": {
    "participantId": "UUID|null",
    "demographicsInfo": []
  },
  "taskGroups": [
    {
      "uuid": "UUID",
      "type": "QX_SCORE|BALANCED_COMPARISON",
      "response": {}
    }
  ],
  "tasks": [
    {
      "uuid": "UUID",
      "text": "string",
      "type": "RATING_SCALE|MULTIPLE_CHOICE|...",
      "response": {
        "startTimeMs": 0,
        "endTimeMs": 0,
        "skipped": false,
        "answer": {}
      }
    }
  ]
}

How to model it

  • sessionParticipant contains participant-level context
  • taskGroups[] contains metadata such as QX_SCORE and BALANCED_COMPARISON
  • tasks[] contains the individual task responses you will analyze most often
  • tasks[].response.answer is task-type-specific, so treat it as semi-structured JSON

Recommended warehouse tables

sessions
  • session_id
    • participant_id
    • audience_id
    • test_plan_id
    • extracted_at
demographics
  • session_id
    • participant_id
    • demographic_code
    • label
    • value
    • type
task_groups
  • session_id
    • participant_id
    • task_group_uuid
    • task_group_type
    • task_group_response (JSON / VARIANT)
tasks
  • session_id
    • participant_id
    • task_uuid
    • task_text
    • task_type
    • start_time_ms
    • end_time_ms
    • skipped
    • answer_json (JSON / VARIANT)
👍

Carry session_id and participant_id into flattened task_groups and tasks rows. Those keys make downstream joins to demographics and session metadata simple and reliable.


Step 3 - Retrieve transcripts (text-based insights)

For sessions that have video recordings, such as those based on think-out-loud tests.

Endpoint

GET /api/v2/sessionResults/SESSION_ID/transcript

Response format

  • text/vtt (WebVTT subtitle format)

Example (curl)

curl --location 'https://api.use2.usertesting.com/api/v2/sessionResults/SESSION_ID/transcript' \
--header 'Authorization: Bearer ACCESS_TOKEN'

Warehouse strategy

  • Store raw VTT text as-is
  • Optionally:
    • Strip timestamps
    • Store a derived plain_text column
    • Chunk text for LLM usage

Recommended table

transcripts
  • session_id
    • transcript_vtt (TEXT)
    • transcript_plain_text (TEXT, optional)

Step 4 - (Optional) Video handling strategy

For sessions with recorded videos, such as those based on interaction tests.

Endpoint

GET /api/v2/sessionResults/SESSION_ID/videoDownloadUrl


Request sample

curl --location 'https://api.use2.usertesting.com/api/v2/sessionResults/SESSION_ID/videoDownloadUrl' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer ACCESS_TOKEN'

Important behavior

  • URLs are pre-signed
  • URLs expire after one hour

Best practice for pipelines

  • Do not store the video URL itself
  • Either:
    • Download the video immediately into object storage (S3, GCS)
    • Or skip video ingestion for nightly syncs

Recommended table (metadata only)

session_videos
  • session_id
    • video_storage_path
    • downloaded_at

Rate limits and batching

Take into consideration:

  • Endpoints allow 10 requests per minute

Use:

  • Request batching
  • Exponential backoff on rate limit error messages (429 status code)
  • Parallelism with limits (for example, 3 to 5 concurrent requests)

Incremental vs full syncs

Full sync (simplest)

  • Re-pull all sessions for a test on a nightly basis
  • Overwrite or upsert warehouse tables

Incremental sync (recommended at scale)

  • Track:
    • session finishTime
    • last successful sync timestamp
  • Only process newly completed sessions

Example pseudo-code

checkpoint = read_checkpoint(test_id)  # e.g. last_finish_time or last_run_at

for page in paginate_sessions(test_id, page_size=100):
    for s in page.sessions:
        if checkpoint and s.finishTime <= checkpoint:
            continue

        details = with_backoff(lambda: get_session_details_v3(s.sessionId))
        transcript = with_backoff(
            lambda: get_transcript(s.sessionId)
        )  # optional / not always present

        participant_id = (
            details.get("sessionParticipant", {}) or {}
        ).get("participantId")

        upsert_session(
            session_id=details["sessionId"],
            participant_id=participant_id,
            audience_id=details.get("audienceId"),
            test_plan_id=details.get("testPlanId"),
        )

        upsert_demographics(
            session_id=details["sessionId"],
            participant_id=participant_id,
            demographics=(details.get("sessionParticipant", {}) or {}).get(
                "demographicsInfo", []
            ),
        )

        upsert_task_groups(
            session_id=details["sessionId"],
            participant_id=participant_id,
            task_groups=details.get("taskGroups", []),
        )

        upsert_tasks(
            session_id=details["sessionId"],
            participant_id=participant_id,
            tasks=details.get("tasks", []),
        )

        upsert_transcripts(s.sessionId, transcript)

        # optional: if ingesting video
        # video_url = with_backoff(lambda: get_video_download_url(s.sessionId))
        # download_immediately(video_url)  # URL expires quickly
        # upsert_session_videos_metadata(...)

update_checkpoint(test_id, max_finish_time_processed)
🖥️

Code overview: The pipeline reads a saved checkpoint, pages through completed sessions for a test, retrieves v3 session details and transcripts with retry and backoff handling, and then writes normalized session, demographic, task-group, and task rows into warehouse tables. Finally, it updates the checkpoint so the next run processes only newly completed sessions.


Common errors and troubleshooting

ErrorMeaningFix
401Invalid or missing access tokenVerify authorization header
404Test or session not foundValidate IDs
429Rate limit exceededAdd backoff + batching

What you can build next

Once this pipeline is running, you can:

  • Power BI dashboards with UX metrics
  • Feed transcripts into LLM workflows
  • Join UserTesting data with product analytics
  • Trigger alerts when certain task patterns appear

Summary

You now have a scalable pattern to:

  • Extract UserTesting research data programmatically
  • Normalize it for analytics and AI
  • Keep insights continuously in sync with your organization's data ecosystem

This pipeline forms the backbone for advanced dashboards, automation, and AI-powered insight generation.