Provider: SageMaker
El provider sagemaker ejecuta el pipeline como un SageMaker Pipeline multi-step en AWS. Escala desde miles hasta millones de filas sin cambiar el YAML β solo cambia el provider.
Requisitos previosβ
pip install godml[aws]
# Credenciales AWS
export AWS_ACCESS_KEY_ID=AKIA...
export AWS_SECRET_ACCESS_KEY=...
export AWS_DEFAULT_REGION=us-east-1
export SAGEMAKER_ROLE_ARN=arn:aws:iam::123456789012:role/SageMakerRole
ConfiguraciΓ³n mΓnimaβ
name: customer-churn
version: 1.0.0
provider: sagemaker
dataset:
uri: s3://mi-bucket/data/churn.csv # debe ser S3
target: churned
aws:
role_arn: ${SAGEMAKER_ROLE_ARN}
region: us-east-1
s3_bucket: mi-bucket
model:
type: xgboost
hyperparameters:
max_depth: 6
eta: 0.3
metrics:
- name: auc
threshold: 0.80
ConfiguraciΓ³n completaβ
provider: sagemaker
aws:
role_arn: ${SAGEMAKER_ROLE_ARN}
region: us-east-1
s3_bucket: mi-bucket
s3_prefix: godml # default: "godml"
kms_key_id: ${KMS_KEY_ID} # opcional β cifrado KMS
compute:
preprocessing: ml.m5.large # default
training: ml.m5.2xlarge # default
evaluation: ml.m5.large # default
registry:
model_package_group: godml-churn
approval: manual # manual | auto
Pipeline generadoβ
godml crea automΓ‘ticamente este pipeline en SageMaker:
s3://bucket/data/churn.csv
β
ββββββββββββββββββββββββββββββββββββββββββββββββ
β PreprocessingStep [ml.m5.large] β
β β’ compliance + PII masking β
β β’ train/test split 80/20 β
β β s3://bucket/godml/pipeline/preprocessed/ β
ββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββββββββββββββ
β TrainingStep [ml.m5.2xlarge] β
β β’ XGBoost built-in container β
β β s3://bucket/godml/pipeline/model/ β
ββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββββββββββββββ
β EvaluationStep [ml.m5.large] β
β β’ AUC, F1, Precision, Recall β
β β evaluation.json β
ββββββββββββββββββββββββββββββββββββββββββββββββ
β (solo si AUC β₯ threshold)
ββββββββββββββββββββββββββββββββββββββββββββββββ
β RegisterModel (condicional) β
β β’ SageMaker Model Package Group β
β β’ PendingManualApproval / Approved β
ββββββββββββββββββββββββββββββββββββββββββββββββ
Tipos de instancias recomendadasβ
| Dataset | Training recomendado | Costo aprox. |
|---|---|---|
| < 100K filas | ml.m5.large | $0.12/h |
| 100K β 1M filas | ml.m5.2xlarge | $0.46/h |
| > 1M filas | ml.m5.4xlarge | $0.92/h |
| Con GPU | ml.g4dn.xlarge | $0.74/h |
Los steps se levantan y apagan solos β solo pagas el tiempo que corren.
IAM Role necesarioβ
{
"Effect": "Allow",
"Action": [
"sagemaker:*",
"s3:GetObject",
"s3:PutObject",
"s3:ListBucket",
"iam:PassRole",
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage"
],
"Resource": "*"
}
En la consola de AWS: IAM β Roles β Create Role β SageMaker β AmazonSageMakerFullAccess.
Modelos soportadosβ
model.type | Container AWS |
|---|---|
xgboost | SageMaker XGBoost built-in |
random_forest | SageMaker SKLearn |
logistic_regression | SageMaker SKLearn |
lightgbm | SageMaker SKLearn + lightgbm |
Ejecutarβ
godml run -f godml.yml
godml:
- Construye la definiciΓ³n del Pipeline
- Hace
upserten SageMaker (crea o actualiza) - Inicia la ejecuciΓ³n
- Espera y muestra el status de cada step
- Reporta el ARN de ejecuciΓ³n para trazabilidad
β DataPrep Service