CLIENT DATA REQUEST — INTAKE FORM
v3
The form adapts as you fill it. Downstream sections unlock and reshape based on your
data type
,
industry
, and
sensitivity tier
.
Stage 1 · Drivers (complete these to unlock the rest)
1
Request & Use Case
Date submitted
*
Client reference / PO #
Priority
*
— select —
Standard
Expedited
Critical
Brief description
*
Intended use
*
Pre-training
SFT / fine-tune
RLHF / preference
Eval / benchmark
Red-team / safety
Agent / tool-use traces
Commercial product
Research
Other…
Specify
2
Data Type
drives sections below
Pick every type that applies. Each unlocks its own detail panel in Stage 2.
Type(s)
*
Ops data
Legacy code bases
Modern code bases
Consumer data
Egocentric data
Multimodal data
Other…
Specify
3
Industry
drives regulation & tooling
Primary sector
*
— select —
Healthcare / Life Sciences
Fintech / Financial Services
Insurance
Legal
Logistics / Supply Chain
Manufacturing / Industrial
Retail / E-commerce
Government / Public Sector
Education
Telecom
Media / Entertainment
Energy / Utilities
Pharma
Call Center / BPO
Other…
Specify
4
Modality & Sensitivity
Modality
*
Structured records
Free-text docs
Chat / conversational
Audio
Video
Images
Code
Telemetry / sensor
3D / point cloud
Biometric
Other…
Specify
Sensitivity tier
*
T0
Public
T1
Business confidential
T2
PII-light
T3
Regulated PII — DPA required
T4
PHI — BAA required
T5
Privileged / classified (reject)
Stage 2 · Data-type detail (shown based on §2)
5
Ops Data
Function(s)
Sales
Marketing
Finance
HR
Legal
Procurement
IT / DevOps
Customer Support
Operations
Product
Source systems
Salesforce
HubSpot
Microsoft Dynamics
Zendesk
Intercom
Freshdesk
ServiceNow
Jira
Asana
Linear
Slack
Teams
SAP
Oracle ERP
NetSuite
Workday
QuickBooks
Marketo
Braze
Iterable
Other…
Record types
Time range
5
Legacy Code Bases
Language(s)
COBOL
Fortran
RPG
PL/I
Delphi
VB6 / VBScript
Classic ASP
Perl
ABAP
PowerBuilder
ColdFusion
MUMPS
Other…
Other
Platform
Mainframe (z/OS)
IBM i (AS/400)
Unix (Solaris/AIX/HP-UX)
Legacy Windows
Approx LOC
Migration target
5
Modern Code Bases
Language(s)
Python
JavaScript
TypeScript
Go
Rust
Java
Kotlin
Swift
C#
C/C++
Ruby
PHP
Scala
Elixir
Dart
Other…
Other
Repo hosting
GitHub
GitLab
Bitbucket
Azure Repos
Self-hosted
Deliverable scope
Source snapshot
Full git history
PR / review threads
Paired bugfixes / refactors
5
Consumer Data
Consumer data is always routed through the PII redaction pipeline.
Class
Behavioral / clickstream
Transactional / purchase
Demographic / profile
Reviews / UGC
Support interactions
Location / geo
Source platforms
GA4
Mixpanel
Amplitude
Segment
Snowplow
Adobe Analytics
Shopify
Amazon Seller
Other…
Other
Consent basis
Explicit opt-in
Contractual
Legitimate interest
Public only
Unsure
5
Egocentric Data
Capture device
Meta Ray-Ban / Aria
Meta Quest
Apple Vision Pro
GoPro (head/chest)
Smartphone rig
HoloLens 2
Pupil Labs / Tobii (gaze)
Other…
Other
Activity
Cooking / household
Assembly / repair
Medical procedures
Manufacturing task
Navigation / driving
Social interaction
Sports / fitness
Sensor streams
RGB
Depth / stereo
Audio
IMU
Eye gaze
Hand tracking
Duration (hrs)
Consent
Wearer-only
Wearer + bystander
Controlled studio
Public space (no bystander PII)
5
Multimodal Data
Modality pairs
Text + Image
Text + Audio
Text + Video
Image + 3D / depth
Video + Audio + Text
Tabular + Text
Code + Text + Diagram
Alignment
Strict time/frame-aligned
Loose (within segment)
Independent, shared labels
Stage 3 · Industry detail (shown based on §3)
6
Healthcare / Life Sciences
T4 (PHI) requires a signed BAA
before
kickoff.
Clinical specialty
Primary care
Cardiology
Oncology
Radiology
Pathology
Neurology
OB/GYN
Emergency
ICU
Source systems
Epic
Cerner / Oracle Health
Meditech
Allscripts / Veradigm
athenahealth
PACS
Optum / Change Healthcare
REDCap
Other…
Other
IRB status
N/A
Approved
Exempt
Pending
6
Fintech / Financial Services
Data category
Transaction / payments
Market / pricing
KYC / AML
Credit / underwriting
Fraud signals
Trading
On-chain / crypto
Vendors / systems
Plaid
Stripe
MX / Yodlee
Bloomberg
Refinitiv / LSEG
Experian / Equifax / TransUnion
FIS / Fiserv
Other…
Other
PCI-DSS scope?
Out of scope
Tokenized
Raw PAN (reject)
6
Insurance
Lines of business
P&C
Auto
Life
Health
Workers' comp
Reinsurance
Artifact
FNOL calls
Claim notes
Policy docs
Adjuster reports
SIU / fraud files
Core system
Guidewire
Duck Creek
Majesco
Insurity
Legacy AS/400
6
Legal
Document class
Executed contracts
Drafts / redlines
Litigation pleadings
E-discovery
Regulatory filings
Patents
Deposition transcripts
M&A diligence
Jurisdiction
Privilege review
Not required
Client reviews
Vendor privilege team (+premium)
6
Logistics / Supply Chain
Data category
Shipment tracking / visibility
WMS / warehouse
TMS / transportation
Fleet telematics
Customs / trade
Last-mile
Platforms
SAP TM / EWM
Oracle TMS / WMS
Manhattan Associates
Blue Yonder
FourKites
project44
Samsara
Other…
Other
6
Manufacturing / Industrial
Data type
IoT / sensor streams
MES logs
Quality / QC inspection
ERP transactions
CAD / PLM
Maintenance (CMMS)
Systems
SAP (ECC / S/4HANA)
Siemens (Teamcenter / Opcenter)
Rockwell FactoryTalk
GE Digital (Proficy)
PTC (Windchill / ThingWorx)
Aveva / OSIsoft PI
Other…
Other
6
Call Center / BPO
Channels
Voice inbound
Voice outbound
Chat
Email
SMS
Platform
Genesys Cloud
NICE CXone
Five9
Talkdesk
Amazon Connect
Twilio Flex
Two-party consent?
N/A — one-party OK
All parties consented
To be verified
6
Retail / E-commerce
Data category
Product catalog
Orders
Inventory
Pricing / promos
Reviews
Clickstream
Platform
Shopify
Amazon
BigCommerce
Magento / Adobe Commerce
Salesforce Commerce
6
Government / Public Sector
Level
Federal
State
Local
International
Classification
Public / FOIA
CUI
Confidential (reject)
Classified (reject)
Export control
None
ITAR
EAR-99
EAR-controlled
6
Education
FERPA applies to US student records; under-13 triggers COPPA.
Level
K-12
Higher ed
Corporate L&D
Platform
Canvas
Blackboard
Moodle
Google Classroom
PowerSchool
6
Telecom
Data category
CDRs / signaling
Network telemetry (OSS)
BSS / billing
Customer support
Field ops
6
Media / Entertainment
Content type
News articles
Video / broadcast
Podcast / audio
Social / UGC
Scripts / screenplays
Rights
Fully licensed
Public domain
Fair use review needed
6
Energy / Utilities
Segment
Upstream O&G
Midstream
Downstream
Power gen
T&D / Grid
Renewables
Data category
SCADA / OT telemetry
Smart meter (AMI)
Seismic / subsurface
Asset / maintenance
6
Pharma
Category
Clinical trial (CTMS/EDC)
Real-world evidence
Pharmacovigilance
Regulatory submissions
HCP / commercial
GMP manufacturing
GxP scope
Non-GxP
GLP
GCP
GMP
Stage 4 · Commercial & compliance
7
Volume & Quality
Target quantity
*
Minimum viable
Language(s)
Geographic origin
Time period
Annotation / labels
None
Schema-tagged
Rubric-scored
Human-reviewed
Expert / SME-labeled
Exclusions
8
Compliance & PII Redaction
mandatory
Every ingestion pipeline routes incoming data through PII redaction before downstream use.
Fields below configure that step.
Regulatory regimes
HIPAA
GLBA
PCI-DSS
FERPA
GDPR
CCPA / CPRA
Two-party consent
ITAR / EAR
COPPA
None / public only
PII redaction pipeline
*
Engine
Microsoft Presidio
AWS Comprehend
Google Cloud DLP
Azure AI Language
Custom NER + regex
Other…
Other
Entities to redact
Name
Email
Phone
Address
DOB
SSN / gov ID
MRN / patient ID
Financial account / card
IP / device ID
Face / voice biometric
Strategy
Mask ([PERSON], [EMAIL])
Remove span
Tokenize (reversible)
Synthetic replacement
De-identification level
Safe Harbor
Expert Determination (+premium)
Raw with PII (only if cleared)
Contracts needed
MSA
DPA
BAA
NDA
9
Format & Delivery
Format
JSONL
Parquet
CSV
PDF
WAV / FLAC
MP4
PNG / JPEG
DICOM
HuggingFace dataset
WebDataset
PLY / PCD / LAS
Other…
Other
Delivery
S3
GCS
Azure Blob
SFTP
HTTPS download
Cadence
One-shot
Weekly
Monthly
Streaming
10
Commercial & Sign-off
Budget (USD)
License scope
Internal use
Commercial redistribution
Model training only
Hard deadline
Attestations
No re-identification attempts.
Use only for stated case in §1.
PII redaction pipeline (§8) accepted.
BAA/DPA will be executed before kickoff.
Anything else?
Signed by