AI Scoring Rubric

Every file uploaded by a worker is scored by a multi-model AI ensemble (Claude Vision for images/video, Whisper + Claude for audio). The scorer evaluates each submission against your job description and returns a structured rating.

Star Rating Scale (1-5)

Rating	Meaning	Result
5 stars	Excellent — fully addresses the job description with high quality and attention to detail. No meaningful improvements needed.	Auto-approved
4 stars	Good — clearly addresses the job description with only minor issues. Well-executed.	Auto-approved
3 stars	Acceptable — meets basic requirements but with notable gaps. On-topic and usable but lacks polish.	Auto-approved (default threshold)
2 stars	Below threshold — partially relevant but with major deficiencies. Wrong format, missing core elements, or very low quality.	Rejected (retry allowed on first attempt)
1 star	Rejected — completely off-topic, unintelligible, or corrupted.	Rejected (no retry, strike issued)

Three Scoring Dimensions

Each submission is scored on three dimensions (each 1-5):

Relevance

How well does the submission match what was asked for in the job description? Does it address the correct topic, format, and subject matter?

Quality

How well-executed is the submission? This covers technical quality (resolution, clarity, fidelity for images; clarity and fidelity for audio; framing and stability for video) and craft (composition, coherence, professionalism).

Completeness

Does the submission cover all aspects of the job description? Are there missing elements, truncated content, or incomplete coverage?

The overall star rating reflects the weighted average of these dimensions, rounded to the nearest integer. If one dimension is critically low, the overall rating may be adjusted down by 1 star.

Approval Threshold

By default, submissions scoring 3 stars or higher are auto-approved and delivered to your folder. You can customize this per job using the min_star_rating field:

curl -X POST https://api.firsthandapi.com/v1/jobs \
  -H "Authorization: Bearer fh_live_..." \
  -d '{
    "type": "data_collection",
    "description": "High-quality product photos...",
    "files_needed": 50,
    "accepted_formats": ["image/jpeg"],
    "price_per_file_cents": 100,
    "min_star_rating": 4
  }'

Setting min_star_rating: 4 means only 4-star and 5-star submissions are approved. Workers with 2-3 star submissions can retry once.

AI Feedback

Every scored submission includes two text fields:

ai_reasoning — 2-3 sentences explaining why the file received its rating
ai_feedback — Actionable suggestions for improvement if the worker retries

These are shown to workers in the iOS app immediately after uploading, enabling them to improve and resubmit.

Tips for Better Scores

Write clear, specific job descriptions — “Photo of a residential mailbox from the front, in daylight, with the house number visible” scores better than “mailbox photo”
List explicit requirements — the AI checks each requirement as a completeness criterion
Specify format preferences — orientation, lighting, background, etc.

Pre-Check Gates (Before Scoring)

Some submissions are rejected before the AI ensemble runs — saving compute and protecting the scoring model. Pre-check rejects never charge your credits and do not generate annotations.

Gate	Triggered when	Effect
Resolution floor	`min_width` or `min_height` set on the job, and the image/video falls below either	Auto-reject at upload (image EXIF or ffprobe dimensions). No credit deduction.
Duration bounds	`min_duration_seconds` or `max_duration_seconds` set on the job, and audio/video falls outside the range	Auto-reject after metadata extraction. No credit deduction.
Reverse image search	Image matches a stock photo, known dataset image, or prior submission via perceptual hash	Auto-reject as likely fraud / duplicate. 1-star equivalent; counts toward worker strikes.
Authenticity signals	Image exhibits AI-generation tells (artifact patterns, metadata anomalies, color space tells)	Auto-reject. 1-star; counts toward strikes.
Whisper hallucination	Audio/video transcribed as speech but per-segment `no_speech_prob`, `avg_logprob`, and `compression_ratio` indicate ambient noise misinterpreted as words	Transcript suppressed; file scored as ambient audio (not 1-star).
Content hash dedup	File bytes exactly match a file already submitted by the same worker on the same job	Reject as duplicate. No credit deduction.

Pre-check rejections emit submission.rejected and submission.scored webhook events with a reason field (pre_check_resolution, pre_check_stock_photo, etc.) so you can monitor gate performance without polling.

Auto-Labeling

In addition to quality scoring, the AI ensemble also generates structured annotation metadata for every scored submission — object detection labels, OCR text extraction, scene classification, color palettes, speaker counts, transcripts, and more. Annotations are included in the GET /v1/jobs/:id/files response alongside each file.

See the Auto-Labeling & Annotations guide for full schema documentation and examples.

Trust & Safety Auto-Labeling & Annotations