The Model custom resource defines a machine learning model deployment in Plastikube.

Specification

Properties

PropertyTypeDescription
imagestringContainer image to use for the model
imagePullSecretstringSecret to use for pulling the image
modelStorageobjectStorage configuration for the model
enginestringModel engine to use (currently only supports “llamacpp”)
features[]stringModel features
resourceProfilestringReference to a predefined resource profile
entrypoint[]stringOverride the image entrypoint
args[]stringOverride the image args
env[]objectExtra environment variables to set in the container
replicasintegerDefault number of replicas
autoscalingobjectAutoscaling configuration
nodeSelectorobjectNode labels for pod assignment
tolerations[]objectTaints that the pod can tolerate

modelStorage Object

PropertyTypeDescription
PersistentVolumeClaimobjectNew PVC to create for storing the model
existingVolumestringExisting PVC to use for storing the model
pathstringPath within the PVC to load the model from
downloadobjectHow to download the model

download Object

PropertyTypeDescription
typestringDownload type (huggingface-dl | http | s3)
sourcestringDownload source URL / huggingface repo/filename
jobobjectConfiguration for the download job

job Object

PropertyTypeDescription
imagestringContainer image to use for downloading the model
imagePullSecretstringSecret to use for pulling the downloader image
entrypoint[]stringOverride the image entrypoint for the downloader
args[]stringOverride the image args for the downloader
env[]objectExtra environment variables to set in the downloader container
securityContextobjectSecurity settings for the downloader pod

autoscaling Object

PropertyTypeDescription
minReplicasintegerMinimum replicas when autoscaling is enabled
maxReplicasintegerMaximum replicas when autoscaling is enabled
idleScaleDownintegerSeconds model can be idle before scaling to minReplicas
busyScaleUpobjectScale-up configuration

busyScaleUp Object

PropertyTypeDescription
bucketintegerSeconds to size the bucket
activePercentintegerPercentage of the bucket we have to be busy to trigger scaling up

Examples

Basic Model with Download

  apiVersion: plastikube.dev/v1
kind: Model
metadata:
  name: example-model
spec:
  image: ghcr.io/plastikube/llamacpp:latest-rocm63 # other tags available for other libraries
  engine: llamacpp
  resourceProfile: amd-gpu-hostdev-plugin
  env:
  # example: force gfx906 on rocm
  - name: HSA_OVERRIDE_GFX_VERSION
    value: "9.0.6"
  modelStorage:
    download:
      type: huggingface-dl
      source: my-repo/my-model
      job:
        env:
        - name: HUGGINGFACE_TOKEN
          valueFrom:
            secretKeyRef:
              name: huggingface-secret
              key: token
  autoscaling:
    minReplicas: 0
    maxReplicas: 1
    idleScaleDown: 300
    busyScaleUp:
      bucket: 60
      activePercent: 70
  

Model with Custom Download Job

  apiVersion: plastikube.dev/v1
kind: Model
metadata:
  name: custom-download-model
spec:
  image: my-model:latest
  engine: llamacpp
  resourceProfile: my-resource-profile
  modelStorage:
    download:
      type: http
      source: https://example.com/model.bin
      job:
        image: curlimages/curl:latest
        args:
        - "-L"
        - "https://example.com/model.bin"
        - "-o"
        - "/model/model.bin"
        env:
        - name: CUSTOM_VAR
          value: "value"
  

Model with Existing PVC

  apiVersion: plastikube.dev/v1
kind: Model
metadata:
  name: existing-pvc-model
spec:
  image: my-model:latest
  engine: llamacpp
  resourceProfile: my-resource-profile
  modelStorage:
    existingVolume: model-pvc
    path: /models/my-model
  

Model with New PVC

  apiVersion: plastikube.dev/v1
kind: Model
metadata:
  name: new-pvc-model
spec:
  image: my-model:latest
  engine: llamacpp
  resourceProfile: my-resource-profile
  modelStorage:
    PersistentVolumeClaim:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi
    path: /models