Data Ingest
Upload partner datasets, commit manifests, inspect quality checks, and resolve failed imports.
Sentra supports S3-based ingest through presigned uploads and manifest commit. Admin Console wraps this flow in /admin, while backend integrations can call the API directly.
API flow
sequenceDiagram
participant Admin as Admin Console or script
participant API as Sentra API
participant S3 as sentra-raw-data
participant Worker as Ingest worker
participant DB as Sentra DB
participant Processed as sentra-processed-data
Admin->>API: POST /v1/ingest/presign
API-->>Admin: Presigned upload URLs
Admin->>S3: PUT files
Admin->>API: POST /v1/ingest/commit
API->>DB: Create ingest_run
API->>Worker: Launch/queue ingest worker
Worker->>S3: Read raw files
Worker->>DB: Upsert transactions/entities/labels/docs
Worker->>Processed: Write processed snapshot
Worker->>DB: Mark completed or failedPresign files
curl "$SENTRA_API_BASE/v1/ingest/presign" \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"dataset_name": "partner_batch_20260315",
"files": [
{ "name": "transactions.csv", "type": "transactions", "content_type": "text/csv" },
{ "name": "labels.csv", "type": "labels", "content_type": "text/csv" },
{ "name": "kyc_docs.jsonl", "type": "kyc_docs", "content_type": "application/jsonl" }
]
}'Response includes upload URLs and S3 keys:
{
"dataset_name": "partner_batch_20260315",
"files": [
{
"name": "transactions.csv",
"file_type": "transactions",
"s3_key": "partner_batch_20260315/transactions.csv",
"upload_url": "https://..."
}
],
"manifest_s3_key": "partner_batch_20260315/manifest.json"
}Commit ingest
You can commit by manifest object:
curl "$SENTRA_API_BASE/v1/ingest/commit" \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"manifest": {
"schema_version": "sentra.ingest.v0",
"dataset_name": "partner_batch_20260315",
"files": [
{
"name": "transactions.csv",
"type": "transactions",
"s3_key": "partner_batch_20260315/transactions.csv"
}
]
}
}'Or by manifest S3 key:
{ "manifest_s3_key": "partner_batch_20260315/manifest.json" }Inspect ingest runs
curl "$SENTRA_API_BASE/v1/admin/ingest-runs" \
-H "Authorization: Bearer $ADMIN_TOKEN"Detailed run:
curl "$SENTRA_API_BASE/v1/admin/ingest-runs/$RUN_ID" \
-H "Authorization: Bearer $ADMIN_TOKEN"Detail response can include:
row_counts_jsonerror_summarysnapshot_keysnapshot_json.quality_checks- missing columns
- duplicate primary keys
- null required fields
- warnings
- preview rows
Local validation
Before uploading private data:
.venv/bin/python scripts/validate_partner_dataset.py /path/to/partner_batch_001Optional JSON report:
.venv/bin/python scripts/validate_partner_dataset.py /path/to/partner_batch_001 \
--output-json /tmp/sentra-dataset-report.jsonBlocking issues include:
- Missing
transactions.csv. - Missing required columns.
- Unsupported schema version.
- Unsupported file type.
- Invalid label taxonomy.
- Non-binary label values.
- Invalid KYC JSONL rows.
Warnings include:
- Unknown canonical values.
- Duplicate IDs.
- Null required fields.
- Labels stored but ignored by fraud target construction.
Demo helper
Local demo includes a repeatable ingest helper:
python scripts/demo_ingest_batch.py --api-base http://127.0.0.1:8000 --token local-adminIt generates a small dataset, uploads via /v1/ingest/presign, commits the manifest, and polls ingest status.