GuidesAI Scoring Rubric

AI Scoring Rubric

Every file uploaded by a worker is scored by a multi-model AI ensemble (Claude Vision for images/video, Whisper + Claude for audio). The scorer evaluates each submission against your job description and returns a structured rating.

Star Rating Scale (1-5)

RatingMeaningResult
5 starsExcellent — fully addresses the job description with high quality and attention to detail. No meaningful improvements needed.Auto-approved
4 starsGood — clearly addresses the job description with only minor issues. Well-executed.Auto-approved
3 starsAcceptable — meets basic requirements but with notable gaps. On-topic and usable but lacks polish.Auto-approved (default threshold)
2 starsBelow threshold — partially relevant but with major deficiencies. Wrong format, missing core elements, or very low quality.Rejected (retry allowed on first attempt)
1 starRejected — completely off-topic, unintelligible, or corrupted.Rejected (no retry, strike issued)

Three Scoring Dimensions

Each submission is scored on three dimensions (each 1-5):

Relevance

How well does the submission match what was asked for in the job description? Does it address the correct topic, format, and subject matter?

Quality

How well-executed is the submission? This covers technical quality (resolution, clarity, fidelity for images; clarity and fidelity for audio; framing and stability for video) and craft (composition, coherence, professionalism).

Completeness

Does the submission cover all aspects of the job description? Are there missing elements, truncated content, or incomplete coverage?

The overall star rating reflects the weighted average of these dimensions, rounded to the nearest integer. If one dimension is critically low, the overall rating may be adjusted down by 1 star.

Approval Threshold

By default, submissions scoring 3 stars or higher are auto-approved and delivered to your folder. You can customize this per job using the min_star_rating field:

curl -X POST https://api.firsthandapi.com/v1/jobs \
  -H "Authorization: Bearer fh_live_..." \
  -d '{
    "type": "data_collection",
    "description": "High-quality product photos...",
    "files_needed": 50,
    "accepted_formats": ["image/jpeg"],
    "price_per_file_cents": 100,
    "min_star_rating": 4
  }'

Setting min_star_rating: 4 means only 4-star and 5-star submissions are approved. Workers with 2-3 star submissions can retry once.

AI Feedback

Every scored submission includes two text fields:

  • ai_reasoning — 2-3 sentences explaining why the file received its rating
  • ai_feedback — Actionable suggestions for improvement if the worker retries

These are shown to workers in the iOS app immediately after uploading, enabling them to improve and resubmit.

Tips for Better Scores

  • Write clear, specific job descriptions — “Photo of a residential mailbox from the front, in daylight, with the house number visible” scores better than “mailbox photo”
  • List explicit requirements — the AI checks each requirement as a completeness criterion
  • Specify format preferences — orientation, lighting, background, etc.

Pre-Check Gates (Before Scoring)

Some submissions are rejected before the AI ensemble runs — saving compute and protecting the scoring model. Pre-check rejects never charge your credits and do not generate annotations.

GateTriggered whenEffect
Resolution floormin_width or min_height set on the job, and the image/video falls below eitherAuto-reject at upload (image EXIF or ffprobe dimensions). No credit deduction.
Duration boundsmin_duration_seconds or max_duration_seconds set on the job, and audio/video falls outside the rangeAuto-reject after metadata extraction. No credit deduction.
Reverse image searchImage matches a stock photo, known dataset image, or prior submission via perceptual hashAuto-reject as likely fraud / duplicate. 1-star equivalent; counts toward worker strikes.
Authenticity signalsImage exhibits AI-generation tells (artifact patterns, metadata anomalies, color space tells)Auto-reject. 1-star; counts toward strikes.
Whisper hallucinationAudio/video transcribed as speech but per-segment no_speech_prob, avg_logprob, and compression_ratio indicate ambient noise misinterpreted as wordsTranscript suppressed; file scored as ambient audio (not 1-star).
Content hash dedupFile bytes exactly match a file already submitted by the same worker on the same jobReject as duplicate. No credit deduction.

Pre-check rejections emit submission.rejected and submission.scored webhook events with a reason field (pre_check_resolution, pre_check_stock_photo, etc.) so you can monitor gate performance without polling.

Auto-Labeling

In addition to quality scoring, the AI ensemble also generates structured annotation metadata for every scored submission — object detection labels, OCR text extraction, scene classification, color palettes, speaker counts, transcripts, and more. Annotations are included in the GET /v1/jobs/:id/files response alongside each file.

See the Auto-Labeling & Annotations guide for full schema documentation and examples.