Create an Automated Pipeline
Introduction
This tutorial shows how to build an automated pipeline that pulls UserTesting Results API data and loads it into a data warehouse (such as Snowflake, BigQuery, or Redshift) on a scheduled basis.
The goal is to enable nightly or periodic syncs of session-level research data so analytics, BI, and AI teams can work with UserTesting insights alongside product and business metrics.
What you'll build
A repeatable pipeline that:
- Uses a known
testIdto list all completed sessions - Pages through session results safely
- Retrieves structured session details from the v3 endpoint
- Retrieves session transcripts
- Loads the data into warehouse-friendly tables
Target audience
- Data engineers
- Analytics engineers
- Research operations teams
- ML / AI teams preparing data for downstream analysis
Prerequisites
- A valid access token (
ACCESS_TOKEN). Go to Authorization for details. - A known test ID, referred to as
TEST_IDortestId. - A scheduler (cron, Airflow, dbt Cloud, GitHub Actions, etc.)
- A destination warehouse (Snowflake, BigQuery, Redshift, etc.)
High-level architecture
- Scheduler triggers the job (nightly or on demand)
- Extractor calls the Results API
- Transformer normalizes JSON into tables
- Loader writes data to the warehouse
graph LR;
Scheduler --> ResultsAPI;
ResultsAPI --> JSON;
JSON --> Normalize;
Normalize --> WarehouseTables;
Steps
Step 1 - List sessions for a test (pagination aware)
Endpoint
Why this matters
- Tests can have many sessions (use obtained session IDs on the endpoints of the following steps)
- Pagination must be handled correctly for full exports
Example (curl)
curl --location 'https://api.use2.usertesting.com/api/v2/sessionResults?testId=TEST_ID&limit=25&offset=50' \
--header 'Authorization: Bearer ACCESS_TOKEN' \
--header 'Content-Type: application/json'Best Practice: Always use pagination query parameters (
limitandoffset) along with the appropriate logic to ensure all sessions are retrieved.
- Use the response data from
limit,offset, andtotalCountin themeta.paginationproperty to calculate the number of page iterations needed to collect all sessions in the test.- By default,
limitis set to25andoffsetto0.- In the response, sessions are sorted in descending order, from newest to oldest.
Key fields to store
testIdsessionIdstatusstartTimefinishTimeaudienceId(if present)
Pagination logic
- Retrieve
meta.pagination.totalCountto determine the total number of sessions. - Increment the
offsetby thelimitvalue in each request until theoffsetreaches or exceeds thetotalCount, ensuring all sessions are retrieved. - Save a checkpoint if running in incremental mode to track progress and avoid reprocessing.
Step 2 - Retrieve session details
Endpoint
GET /api/v3/sessionResults/SESSION_ID
Purpose
This endpoint provides the core structured research data, including:
- Session, audience, and test plan identifiers
- Participant demographics under
sessionParticipant - Task-group metadata under
taskGroups[] - Per-task responses under
tasks[]
Example (curl)
curl --location 'https://api.use2.usertesting.com/api/v3/sessionResults/SESSION_ID' \
--header 'Authorization: Bearer ACCESS_TOKEN' \
--header 'Content-Type: application/json'What the v3 response looks like
{
"sessionId": "UUID|null",
"audienceId": "UUID|null",
"testPlanId": "UUID|null",
"sessionParticipant": {
"participantId": "UUID|null",
"demographicsInfo": []
},
"taskGroups": [
{
"uuid": "UUID",
"type": "QX_SCORE|BALANCED_COMPARISON",
"response": {}
}
],
"tasks": [
{
"uuid": "UUID",
"text": "string",
"type": "RATING_SCALE|MULTIPLE_CHOICE|...",
"response": {
"startTimeMs": 0,
"endTimeMs": 0,
"skipped": false,
"answer": {}
}
}
]
}How to model it
sessionParticipantcontains participant-level contexttaskGroups[]contains metadata such asQX_SCOREandBALANCED_COMPARISONtasks[]contains the individual task responses you will analyze most oftentasks[].response.answeris task-type-specific, so treat it as semi-structured JSON
Recommended warehouse tables
- session_id
- participant_id
- audience_id
- test_plan_id
- extracted_at
- session_id
- participant_id
- demographic_code
- label
- value
- type
- session_id
- participant_id
- task_group_uuid
- task_group_type
- task_group_response (JSON / VARIANT)
- session_id
- participant_id
- task_uuid
- task_text
- task_type
- start_time_ms
- end_time_ms
- skipped
- answer_json (JSON / VARIANT)
Carry
session_idandparticipant_idinto flattenedtask_groupsandtasksrows. Those keys make downstream joins to demographics and session metadata simple and reliable.
Step 3 - Retrieve transcripts (text-based insights)
For sessions that have video recordings, such as those based on think-out-loud tests.
Endpoint
GET /api/v2/sessionResults/SESSION_ID/transcript
Response format
text/vtt(WebVTT subtitle format)
Example (curl)
curl --location 'https://api.use2.usertesting.com/api/v2/sessionResults/SESSION_ID/transcript' \
--header 'Authorization: Bearer ACCESS_TOKEN'Warehouse strategy
- Store raw VTT text as-is
- Optionally:
- Strip timestamps
- Store a derived
plain_textcolumn - Chunk text for LLM usage
Recommended table
- session_id
- transcript_vtt (TEXT)
- transcript_plain_text (TEXT, optional)
Step 4 - (Optional) Video handling strategy
For sessions with recorded videos, such as those based on interaction tests.
Endpoint
GET /api/v2/sessionResults/SESSION_ID/videoDownloadUrl
Request sample
curl --location 'https://api.use2.usertesting.com/api/v2/sessionResults/SESSION_ID/videoDownloadUrl' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer ACCESS_TOKEN'Important behavior
- URLs are pre-signed
- URLs expire after one hour
Best practice for pipelines
- Do not store the video URL itself
- Either:
- Download the video immediately into object storage (S3, GCS)
- Or skip video ingestion for nightly syncs
Recommended table (metadata only)
- session_id
- video_storage_path
- downloaded_at
Rate limits and batching
Take into consideration:
- Endpoints allow 10 requests per minute
Use:
- Request batching
- Exponential backoff on rate limit error messages (429 status code)
- Parallelism with limits (for example, 3 to 5 concurrent requests)
Incremental vs full syncs
Full sync (simplest)
- Re-pull all sessions for a test on a nightly basis
- Overwrite or upsert warehouse tables
Incremental sync (recommended at scale)
- Track:
- session
finishTime - last successful sync timestamp
- session
- Only process newly completed sessions
Example pseudo-code
checkpoint = read_checkpoint(test_id) # e.g. last_finish_time or last_run_at
for page in paginate_sessions(test_id, page_size=100):
for s in page.sessions:
if checkpoint and s.finishTime <= checkpoint:
continue
details = with_backoff(lambda: get_session_details_v3(s.sessionId))
transcript = with_backoff(
lambda: get_transcript(s.sessionId)
) # optional / not always present
participant_id = (
details.get("sessionParticipant", {}) or {}
).get("participantId")
upsert_session(
session_id=details["sessionId"],
participant_id=participant_id,
audience_id=details.get("audienceId"),
test_plan_id=details.get("testPlanId"),
)
upsert_demographics(
session_id=details["sessionId"],
participant_id=participant_id,
demographics=(details.get("sessionParticipant", {}) or {}).get(
"demographicsInfo", []
),
)
upsert_task_groups(
session_id=details["sessionId"],
participant_id=participant_id,
task_groups=details.get("taskGroups", []),
)
upsert_tasks(
session_id=details["sessionId"],
participant_id=participant_id,
tasks=details.get("tasks", []),
)
upsert_transcripts(s.sessionId, transcript)
# optional: if ingesting video
# video_url = with_backoff(lambda: get_video_download_url(s.sessionId))
# download_immediately(video_url) # URL expires quickly
# upsert_session_videos_metadata(...)
update_checkpoint(test_id, max_finish_time_processed)Code overview: The pipeline reads a saved checkpoint, pages through completed sessions for a test, retrieves v3 session details and transcripts with retry and backoff handling, and then writes normalized session, demographic, task-group, and task rows into warehouse tables. Finally, it updates the checkpoint so the next run processes only newly completed sessions.
Common errors and troubleshooting
| Error | Meaning | Fix |
|---|---|---|
| 401 | Invalid or missing access token | Verify authorization header |
| 404 | Test or session not found | Validate IDs |
| 429 | Rate limit exceeded | Add backoff + batching |
What you can build next
Once this pipeline is running, you can:
- Power BI dashboards with UX metrics
- Feed transcripts into LLM workflows
- Join UserTesting data with product analytics
- Trigger alerts when certain task patterns appear
Summary
You now have a scalable pattern to:
- Extract UserTesting research data programmatically
- Normalize it for analytics and AI
- Keep insights continuously in sync with your organization's data ecosystem
This pipeline forms the backbone for advanced dashboards, automation, and AI-powered insight generation.
