M2Square

Data Ingest

Upload partner datasets, commit manifests, inspect quality checks, and resolve failed imports.

Sentra supports S3-based ingest through presigned uploads and manifest commit. Admin Console wraps this flow in /admin, while backend integrations can call the API directly.

API flow

sequenceDiagram
  participant Admin as Admin Console or script
  participant API as Sentra API
  participant S3 as sentra-raw-data
  participant Worker as Ingest worker
  participant DB as Sentra DB
  participant Processed as sentra-processed-data

  Admin->>API: POST /v1/ingest/presign
  API-->>Admin: Presigned upload URLs
  Admin->>S3: PUT files
  Admin->>API: POST /v1/ingest/commit
  API->>DB: Create ingest_run
  API->>Worker: Launch/queue ingest worker
  Worker->>S3: Read raw files
  Worker->>DB: Upsert transactions/entities/labels/docs
  Worker->>Processed: Write processed snapshot
  Worker->>DB: Mark completed or failed

Presign files

curl "$SENTRA_API_BASE/v1/ingest/presign" \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "dataset_name": "partner_batch_20260315",
    "files": [
      { "name": "transactions.csv", "type": "transactions", "content_type": "text/csv" },
      { "name": "labels.csv", "type": "labels", "content_type": "text/csv" },
      { "name": "kyc_docs.jsonl", "type": "kyc_docs", "content_type": "application/jsonl" }
    ]
  }'

Response includes upload URLs and S3 keys:

{
  "dataset_name": "partner_batch_20260315",
  "files": [
    {
      "name": "transactions.csv",
      "file_type": "transactions",
      "s3_key": "partner_batch_20260315/transactions.csv",
      "upload_url": "https://..."
    }
  ],
  "manifest_s3_key": "partner_batch_20260315/manifest.json"
}

Commit ingest

You can commit by manifest object:

curl "$SENTRA_API_BASE/v1/ingest/commit" \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "manifest": {
      "schema_version": "sentra.ingest.v0",
      "dataset_name": "partner_batch_20260315",
      "files": [
        {
          "name": "transactions.csv",
          "type": "transactions",
          "s3_key": "partner_batch_20260315/transactions.csv"
        }
      ]
    }
  }'

Or by manifest S3 key:

{ "manifest_s3_key": "partner_batch_20260315/manifest.json" }

Inspect ingest runs

curl "$SENTRA_API_BASE/v1/admin/ingest-runs" \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Detailed run:

curl "$SENTRA_API_BASE/v1/admin/ingest-runs/$RUN_ID" \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Detail response can include:

  • row_counts_json
  • error_summary
  • snapshot_key
  • snapshot_json.quality_checks
  • missing columns
  • duplicate primary keys
  • null required fields
  • warnings
  • preview rows

Local validation

Before uploading private data:

.venv/bin/python scripts/validate_partner_dataset.py /path/to/partner_batch_001

Optional JSON report:

.venv/bin/python scripts/validate_partner_dataset.py /path/to/partner_batch_001 \
  --output-json /tmp/sentra-dataset-report.json

Blocking issues include:

  • Missing transactions.csv.
  • Missing required columns.
  • Unsupported schema version.
  • Unsupported file type.
  • Invalid label taxonomy.
  • Non-binary label values.
  • Invalid KYC JSONL rows.

Warnings include:

  • Unknown canonical values.
  • Duplicate IDs.
  • Null required fields.
  • Labels stored but ignored by fraud target construction.

Demo helper

Local demo includes a repeatable ingest helper:

python scripts/demo_ingest_batch.py --api-base http://127.0.0.1:8000 --token local-admin

It generates a small dataset, uploads via /v1/ingest/presign, commits the manifest, and polls ingest status.

On this page