Use Cases

FirstHandAPI powers crowdsourced file collection for a wide range of industries. Below are three of the most common categories, with concrete examples and sample API calls.

Every file returned includes auto-generated annotations — object detection labels, OCR text extraction, scene classification, color palettes, speaker counts, transcripts, and more — so you get both raw content and structured metadata without running a separate labeling pipeline.

User-Generated Content (UGC)

Brands, marketplaces, and social platforms need authentic photos, videos, and audio from real people. FirstHandAPI lets you post a job describing exactly what you need, and workers across the country capture and upload content from their phones. AI scoring ensures every delivered file meets your quality bar.

Example scenarios:

Product review photos — Collect real-world images of customers using your product in their homes, kitchens, or offices
Local business imagery — Gather storefront photos, interior shots, and neighborhood context for a listings platform
Testimonial audio clips — Request short spoken testimonials from users about their experience with a service
Event coverage — Crowdsource photos and short video clips from attendees at concerts, festivals, or sporting events

Sample job creation:

curl -X POST https://api.firsthandapi.com/v1/jobs \
  -H "Authorization: Bearer fh_live_..." \
  -H "Idempotency-Key: ugc-storefront-batch-001" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "data_collection",
    "description": "Photos of local coffee shop storefronts. Must show the full exterior, signage clearly visible, taken during daylight. No people in frame.",
    "files_needed": 200,
    "accepted_formats": ["image/jpeg", "image/png"],
    "price_per_file_cents": 30
  }'

Typical pricing: $0.15 - $0.75 per file depending on specificity and location requirements.

Ground Truth & Validation Data

ML teams need labeled, real-world data to validate model predictions, measure accuracy, and catch edge cases. FirstHandAPI provides a fast way to collect ground truth datasets — photos of real-world conditions, audio samples across accents and environments, or video of physical processes — all scored for quality before delivery.

Example scenarios:

Retail shelf audits — Collect photos of grocery store shelves to validate planogram compliance models
Road condition photos — Gather images of potholes, cracks, and signage to benchmark autonomous driving perception systems
Accent-diverse speech samples — Request short spoken phrases from workers in different regions to test speech recognition accuracy
Handwriting samples — Collect photos of handwritten text for OCR model validation across different styles

Sample job creation:

curl -X POST https://api.firsthandapi.com/v1/jobs \
  -H "Authorization: Bearer fh_live_..." \
  -H "Idempotency-Key: ground-truth-shelf-audit-042" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "data_collection",
    "description": "Photos of grocery store cereal aisles. Capture the full shelf from eye level, 3-4 feet away. Boxes must be legible. No flash, natural or store lighting only.",
    "files_needed": 500,
    "accepted_formats": ["image/jpeg", "image/png", "image/webp"],
    "price_per_file_cents": 40
  }'

Typical pricing: $0.25 - $1.00 per file depending on specificity and domain expertise required.

LLM Training Data

Foundation model teams and fine-tuning practitioners need diverse, high-quality multimodal data. FirstHandAPI collects images, audio, and video that represent real-world variety — different lighting, environments, accents, handwriting styles, and perspectives — at scale. Every file is AI-scored to filter out blurry, off-topic, or low-effort submissions before it reaches your dataset.

Example scenarios:

Instruction-following image pairs — Collect photos taken according to specific written prompts (e.g., “a red object next to a blue object”) to train vision-language models
Conversational audio — Gather recordings of workers reading scripted dialogues or describing scenes aloud for speech model training
Document photography — Collect photos of receipts, menus, whiteboards, and handwritten notes for document understanding models
Video narration — Request short clips of workers narrating what they see in their environment for video-language model training

Sample job creation:

curl -X POST https://api.firsthandapi.com/v1/jobs \
  -H "Authorization: Bearer fh_live_..." \
  -H "Idempotency-Key: llm-training-receipts-007" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "data_collection",
    "description": "Photos of paper receipts from any store or restaurant. Lay receipt flat on a contrasting surface, capture the entire receipt including total and line items. Must be in focus and fully legible.",
    "files_needed": 1000,
    "accepted_formats": ["image/jpeg", "image/png"],
    "price_per_file_cents": 20
  }'

Typical pricing: $0.10 - $0.50 per file depending on volume and complexity of the capture instructions.

Webhook Handling AI Agent Integration