Production-Ready AI Applications with SAP AI Core

In our previous posts, we built a Support Ticket System with AI orchestration and RAG capabilities. Now it's time to deploy it to production with enterprise-grade reliability, security, and observability.

This post is part of a series:

  1. Getting Started with SAP AI Core and the SAP AI SDK in CAP
  2. Leveraging LLM Models and Deployments in SAP AI Core
  3. Orchestrating AI Workflows with SAP AI Core
  4. Document Grounding with RAG in SAP AI Core
  5. Production-Ready AI Applications with SAP AI Core (this post)

What Production-Ready Means

A production AI application needs:

Requirement Why It Matters
Security Protect sensitive data and API keys
Monitoring Track performance, costs, and failures
Resilience Handle errors gracefully and retry transient failures
Scalability Support growing user load
Cost Control Manage token consumption and API costs
Observability Debug issues quickly in production

Step 1: Secure Configuration Management

Never hardcode credentials. Use SAP BTP services for secrets management.

Create mta.yaml for Deployment

Copy
_schema-version: '3.3'
ID: support-ticket-ai
version: 1.0.0
description: Support Ticket System with AI

parameters:
  enable-parallel-deployments: true

build-parameters:
  before-all:
    - builder: custom
      commands:
        - npm ci
        - npx cds build --production

modules:
  # CAP Application
  - name: support-ticket-srv
    type: nodejs
    path: gen/srv
    parameters:
      buildpack: nodejs_buildpack
      memory: 512M
      disk-quota: 1024M
    build-parameters:
      builder: npm
    provides:
      - name: srv-api
        properties:
          srv-url: ${default-url}
    requires:
      - name: support-ticket-db
      - name: support-ticket-auth
      - name: support-ticket-destination
      - name: support-ticket-aicore

  # Database Deployer
  - name: support-ticket-db-deployer
    type: hdb
    path: gen/db
    parameters:
      buildpack: nodejs_buildpack
    requires:
      - name: support-ticket-db

resources:
  # HANA Cloud Database
  - name: support-ticket-db
    type: com.sap.xs.hdi-container
    parameters:
      service: hana
      service-plan: hdi-shared
    properties:
      hdi-service-name: ${service-name}

  # XSUAA Authentication
  - name: support-ticket-auth
    type: org.cloudfoundry.managed-service
    parameters:
      service: xsuaa
      service-plan: application
      path: ./xs-security.json
      config:
        xsappname: support-ticket-${org}-${space}
        tenant-mode: dedicated
        scopes:
          - name: '$XSAPPNAME.Admin'
            description: Admin access
          - name: '$XSAPPNAME.User'
            description: User access
        role-templates:
          - name: Admin
            description: Administrator
            scope-references:
              - '$XSAPPNAME.Admin'
          - name: User
            description: Regular user
            scope-references:
              - '$XSAPPNAME.User'

  # Destination Service
  - name: support-ticket-destination
    type: org.cloudfoundry.managed-service
    parameters:
      service: destination
      service-plan: lite

  # AI Core Service
  - name: support-ticket-aicore
    type: org.cloudfoundry.managed-service
    parameters:
      service: aicore
      service-plan: extended

Create xs-security.json

Copy
{
  "xsappname": "support-ticket",
  "tenant-mode": "dedicated",
  "description": "Security configuration for Support Ticket AI",
  "scopes": [
    {
      "name": "$XSAPPNAME.Admin",
      "description": "Admin access"
    },
    {
      "name": "$XSAPPNAME.User",
      "description": "User access"
    }
  ],
  "role-templates": [
    {
      "name": "Admin",
      "description": "Administrator",
      "scope-references": [
        "$XSAPPNAME.Admin"
      ]
    },
    {
      "name": "User",
      "description": "Regular User",
      "scope-references": [
        "$XSAPPNAME.User"
      ]
    }
  ]
}

Step 2: Implement Authentication & Authorization

Update srv/server.js to require authentication:

Copy
const cds = require('@sap/cds');
const xsenv = require('@sap/xsenv');

// Load environment variables
xsenv.loadEnv();

module.exports = cds.server;

// Add authentication middleware
cds.on('bootstrap', (app) => {
  const passport = require('passport');
  const { JWTStrategy } = require('@sap/xssec');
  
  // Configure passport with JWT strategy
  passport.use(new JWTStrategy(xsenv.getServices({ uaa: { tag: 'xsuaa' } }).uaa));
  
  app.use(passport.initialize());
  app.use(passport.authenticate('JWT', { session: false }));
});

// Add authorization checks
cds.on('served', () => {
  const { Tickets } = cds.entities;
  
  // Restrict access based on roles
  cds.before('CREATE', 'Tickets', async (req) => {
    if (!req.user.is('User')) {
      req.reject(403, 'Insufficient privileges');
    }
  });
  
  cds.before('UPDATE', 'Tickets', async (req) => {
    if (!req.user.is('Admin')) {
      req.reject(403, 'Only admins can update tickets');
    }
  });
});

Update package.json to add authentication:

Copy
{
  "cds": {
    "requires": {
      "auth": {
        "kind": "xsuaa"
      },
      "db": {
        "kind": "hana"
      }
    }
  }
}

Step 3: Implement Comprehensive Error Handling

Create /srv/lib/error-handler.js:

Copy
const cds = require('@sap/cds');

class AIErrorHandler {
  
  /**
   * Handle AI Core errors with proper retry logic
   */
  static async handleWithRetry(operation, options = {}) {
    const maxRetries = options.maxRetries || 3;
    const initialDelay = options.initialDelay || 1000;
    const backoffMultiplier = options.backoffMultiplier || 2;
    
    let lastError;
    
    for (let attempt = 0; attempt < maxRetries; attempt++) {
      try {
        return await operation();
      } catch (error) {
        lastError = error;
        
        // Don't retry certain errors
        if (this.isNonRetryableError(error)) {
          throw this.enhanceError(error);
        }
        
        // Log the error
        console.error(`Attempt ${attempt + 1} failed:`, {
          message: error.message,
          status: error.response?.status,
          code: error.code
        });
        
        // Wait before retrying
        if (attempt < maxRetries - 1) {
          const delay = initialDelay * Math.pow(backoffMultiplier, attempt);
          await this.sleep(delay);
        }
      }
    }
    
    throw this.enhanceError(lastError);
  }
  
  /**
   * Check if error should not be retried
   */
  static isNonRetryableError(error) {
    const status = error.response?.status;
    
    // Don't retry client errors (except 429 rate limit)
    if (status && status >= 400 && status < 500 && status !== 429) {
      return true;
    }
    
    // Don't retry authentication errors
    if (error.code === 'EAUTH' || error.message?.includes('authentication')) {
      return true;
    }
    
    return false;
  }
  
  /**
   * Enhance error with additional context
   */
  static enhanceError(error) {
    const enhanced = new Error(error.message);
    enhanced.name = 'AIOperationError';
    enhanced.originalError = error;
    enhanced.timestamp = new Date().toISOString();
    
    // Add status code if available
    if (error.response?.status) {
      enhanced.statusCode = error.response.status;
    }
    
    // Add rate limit info if available
    if (error.response?.headers) {
      const headers = error.response.headers;
      if (headers['x-ratelimit-remaining']) {
        enhanced.rateLimitRemaining = headers['x-ratelimit-remaining'];
        enhanced.rateLimitReset = headers['x-ratelimit-reset'];
      }
    }
    
    return enhanced;
  }
  
  /**
   * Sleep helper
   */
  static sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
  
  /**
   * Handle content filtering errors
   */
  static handleContentFilterError(error) {
    if (error.response?.status === 400) {
      const data = error.response.data;
      if (data?.error?.message?.includes('content filter')) {
        return {
          filtered: true,
          message: 'Your request was blocked by content safety filters.',
          categories: this.extractFilterCategories(data)
        };
      }
    }
    return null;
  }
  
  /**
   * Extract filter categories from error
   */
  static extractFilterCategories(errorData) {
    // Parse error data to extract which filters triggered
    const categories = [];
    const message = errorData?.error?.message || '';
    
    if (message.includes('hate')) categories.push('hate');
    if (message.includes('violence')) categories.push('violence');
    if (message.includes('self-harm')) categories.push('self-harm');
    if (message.includes('sexual')) categories.push('sexual');
    
    return categories;
  }
}

module.exports = AIErrorHandler;

Use it in your services:

Copy
const AIErrorHandler = require('./lib/error-handler');
const { OrchestrationClient } = require('@sap-ai-sdk/orchestration');

class TicketService {
  async processTicket(ticket) {
    return await AIErrorHandler.handleWithRetry(async () => {
      const client = new OrchestrationClient(/* ... */);
      return await client.chatCompletion(/* ... */);
    }, {
      maxRetries: 3,
      initialDelay: 1000
    });
  }
}

Step 4: Implement Monitoring & Observability

Create /srv/lib/ai-telemetry.js:

Copy
class AITelemetry {
  constructor() {
    this.metrics = {
      requests: 0,
      errors: 0,
      totalTokens: 0,
      totalCost: 0,
      latencies: []
    };
  }
  
  /**
   * Track AI request metrics
   */
  trackRequest(operation, result, duration) {
    this.metrics.requests++;
    
    if (result.usage) {
      this.metrics.totalTokens += result.usage.total_tokens || 0;
      
      // Estimate cost (adjust rates based on your model)
      const inputCost = (result.usage.prompt_tokens || 0) * 0.00000112;
      const outputCost = (result.usage.completion_tokens || 0) * 0.00000320;
      this.metrics.totalCost += inputCost + outputCost;
    }
    
    this.metrics.latencies.push(duration);
    
    // Log detailed metrics
    console.log('AI Request Completed', {
      operation,
      duration,
      tokens: result.usage,
      cost: this.estimateCost(result.usage)
    });
  }
  
  /**
   * Track errors
   */
  trackError(operation, error) {
    this.metrics.errors++;
    
    console.error('AI Request Failed', {
      operation,
      error: error.message,
      statusCode: error.statusCode,
      timestamp: new Date().toISOString()
    });
  }
  
  /**
   * Estimate cost for a request
   */
  estimateCost(usage) {
    if (!usage) return 0;
    
    const inputCost = (usage.prompt_tokens || 0) * 0.00000112;
    const outputCost = (usage.completion_tokens || 0) * 0.00000320;
    
    return {
      input: inputCost,
      output: outputCost,
      total: inputCost + outputCost
    };
  }
  
  /**
   * Get metrics summary
   */
  getMetrics() {
    const avgLatency = this.metrics.latencies.length > 0
      ? this.metrics.latencies.reduce((a, b) => a + b, 0) / this.metrics.latencies.length
      : 0;
    
    return {
      ...this.metrics,
      avgLatency,
      errorRate: this.metrics.requests > 0 
        ? this.metrics.errors / this.metrics.requests 
        : 0
    };
  }
  
  /**
   * Reset metrics (for testing or periodic reports)
   */
  reset() {
    this.metrics = {
      requests: 0,
      errors: 0,
      totalTokens: 0,
      totalCost: 0,
      latencies: []
    };
  }
}

// Singleton instance
const telemetry = new AITelemetry();

module.exports = telemetry;

Integrate telemetry into your service:

Copy
const telemetry = require('./lib/ai-telemetry');
const AIErrorHandler = require('./lib/error-handler');

class TicketService {
  async processTicketWithTelemetry(ticket) {
    const startTime = Date.now();
    
    try {
      const result = await AIErrorHandler.handleWithRetry(async () => {
        return await this.orchestrationClient.chatCompletion(/* ... */);
      });
      
      const duration = Date.now() - startTime;
      telemetry.trackRequest('processTicket', result, duration);
      
      return result;
    } catch (error) {
      telemetry.trackError('processTicket', error);
      throw error;
    }
  }
}

Add a metrics endpoint:

Copy
// In srv/server.js
cds.on('bootstrap', (app) => {
  const telemetry = require('./lib/ai-telemetry');
  
  app.get('/metrics', (req, res) => {
    res.json(telemetry.getMetrics());
  });
});

Step 5: Cost Optimization Strategies

Implement Response Caching

Create /srv/lib/response-cache.js:

Copy
const NodeCache = require('node-cache');

class AIResponseCache {
  constructor(ttlSeconds = 3600) {
    this.cache = new NodeCache({ 
      stdTTL: ttlSeconds,
      checkperiod: 600
    });
  }
  
  /**
   * Generate cache key from request
   */
  generateKey(prompt, model, temperature = 0) {
    const crypto = require('crypto');
    const data = JSON.stringify({ prompt, model, temperature });
    return crypto.createHash('sha256').update(data).digest('hex');
  }
  
  /**
   * Get cached response
   */
  get(prompt, model, temperature) {
    const key = this.generateKey(prompt, model, temperature);
    return this.cache.get(key);
  }
  
  /**
   * Store response in cache
   */
  set(prompt, model, temperature, response) {
    const key = this.generateKey(prompt, model, temperature);
    this.cache.set(key, response);
  }
  
  /**
   * Clear cache
   */
  clear() {
    this.cache.flushAll();
  }
  
  /**
   * Get cache statistics
   */
  getStats() {
    return this.cache.getStats();
  }
}

module.exports = AIResponseCache;

Use caching in your service:

Copy
const AIResponseCache = require('./lib/response-cache');
const cache = new AIResponseCache(3600); // 1 hour TTL

class TicketService {
  async processTicketWithCache(ticket) {
    const prompt = `${ticket.subject} ${ticket.description}`;
    const model = 'gpt-4o';
    const temperature = 0.3;
    
    // Check cache first
    const cached = cache.get(prompt, model, temperature);
    if (cached) {
      console.log('Cache hit - saved API call');
      return cached;
    }
    
    // Generate new response
    const response = await this.processTicket(ticket);
    
    // Cache the response
    cache.set(prompt, model, temperature, response);
    
    return response;
  }
}

Token Usage Optimization

Copy
class TokenOptimizer {
  
  /**
   * Truncate prompt to fit within token limit
   */
  static truncatePrompt(text, maxTokens = 4000) {
    // Rough estimate: 1 token ≈ 4 characters for English
    const maxChars = maxTokens * 4;
    
    if (text.length <= maxChars) {
      return text;
    }
    
    // Truncate and add ellipsis
    return text.substring(0, maxChars - 3) + '...';
  }
  
  /**
   * Optimize prompt by removing unnecessary whitespace
   */
  static optimizePrompt(text) {
    return text
      .replace(/\s+/g, ' ')  // Replace multiple spaces with single space
      .replace(/\n\s*\n/g, '\n')  // Remove empty lines
      .trim();
  }
  
  /**
   * Estimate token count (rough approximation)
   */
  static estimateTokens(text) {
    return Math.ceil(text.length / 4);
  }
  
  /**
   * Split long documents into chunks
   */
  static chunkDocument(text, chunkSize = 1000) {
    const words = text.split(/\s+/);
    const chunks = [];
    
    for (let i = 0; i < words.length; i += chunkSize) {
      chunks.push(words.slice(i, i + chunkSize).join(' '));
    }
    
    return chunks;
  }
}

module.exports = TokenOptimizer;

Model Selection Strategy

Copy
class ModelSelector {
  
  /**
   * Select appropriate model based on task complexity
   */
  static selectModel(task) {
    const complexity = this.assessComplexity(task);
    
    if (complexity === 'simple') {
      return 'gpt-4o-mini';  // Cheaper for simple tasks
    } else if (complexity === 'medium') {
      return 'gpt-4o';
    } else {
      return 'gpt-4o';  // Most capable for complex tasks
    }
  }
  
  /**
   * Assess task complexity
   */
  static assessComplexity(task) {
    const text = task.subject + ' ' + task.description;
    
    // Simple heuristics
    if (text.length < 100) {
      return 'simple';
    } else if (text.length < 500) {
      return 'medium';
    } else {
      return 'complex';
    }
  }
  
  /**
   * Get model configuration
   */
  static getModelConfig(modelName) {
    const configs = {
      'gpt-4o-mini': {
        max_tokens: 500,
        temperature: 0.5
      },
      'gpt-4o': {
        max_tokens: 1000,
        temperature: 0.3
      }
    };
    
    return configs[modelName] || configs['gpt-4o'];
  }
}

module.exports = ModelSelector;

Step 6: CI/CD Pipeline

Create .github/workflows/deploy.yml:

Copy
name: Deploy to Cloud Foundry

on:
  push:
    branches:
      - main
  workflow_dispatch:

jobs:
  build-and-deploy:
    runs-on: ubuntu-latest
    
    steps:
      - name: Checkout code
        uses: actions/checkout@v3
      
      - name: Setup Node.js
        uses: actions/setup-node@v3
        with:
          node-version: '18'
          cache: 'npm'
      
      - name: Install dependencies
        run: npm ci
      
      - name: Run tests
        run: npm test
      
      - name: Build MTA
        run: |
          npm install -g mbt
          mbt build
      
      - name: Deploy to Cloud Foundry
        uses: cloud-foundry/cf-cli-action@v1
        with:
          api_endpoint: ${{ secrets.CF_API }}
          username: ${{ secrets.CF_USERNAME }}
          password: ${{ secrets.CF_PASSWORD }}
          org: ${{ secrets.CF_ORG }}
          space: ${{ secrets.CF_SPACE }}
      
      - name: Deploy application
        run: cf deploy mta_archives/*.mtar

Step 7: Health Checks & Readiness Probes

Add health check endpoints in srv/server.js:

Copy
cds.on('bootstrap', (app) => {
  const telemetry = require('./lib/ai-telemetry');
  
  // Liveness probe - is the app running?
  app.get('/health/live', (req, res) => {
    res.status(200).json({ status: 'alive' });
  });
  
  // Readiness probe - is the app ready to serve?
  app.get('/health/ready', async (req, res) => {
    try {
      // Check database connection
      await cds.tx(async () => {
        await SELECT.one.from('sap.capire.tickets.Tickets');
      });
      
      // Check AI Core connection
      const { OrchestrationClient } = require('@sap-ai-sdk/orchestration');
      const client = new OrchestrationClient({
        promptTemplating: { model: { name: 'gpt-4o' } }
      });
      
      // Simple ping (don't use tokens)
      // Just check if we can initialize the client
      
      res.status(200).json({ 
        status: 'ready',
        metrics: telemetry.getMetrics()
      });
    } catch (error) {
      res.status(503).json({ 
        status: 'not ready',
        error: error.message 
      });
    }
  });
});

Step 8: Environment-Specific Configuration

Create environment-specific configurations:

Copy
// config/production.js
module.exports = {
  ai: {
    orchestration: {
      maxRetries: 3,
      timeout: 30000,
      cacheEnabled: true,
      cacheTTL: 3600
    },
    models: {
      default: 'gpt-4o',
      simple: 'gpt-4o-mini'
    }
  },
  monitoring: {
    enabled: true,
    logLevel: 'info'
  }
};

// config/development.js
module.exports = {
  ai: {
    orchestration: {
      maxRetries: 1,
      timeout: 10000,
      cacheEnabled: false
    },
    models: {
      default: 'gpt-4o-mini',  // Use cheaper model in dev
      simple: 'gpt-4o-mini'
    }
  },
  monitoring: {
    enabled: true,
    logLevel: 'debug'
  }
};

Load configuration:

Copy
// srv/lib/config.js
const path = require('path');
const env = process.env.NODE_ENV || 'development';

let config;
try {
  config = require(path.join(__dirname, '../../config', env));
} catch (error) {
  console.warn(`No config found for ${env}, using defaults`);
  config = {};
}

module.exports = config;

Step 9: Logging Best Practices

Create structured logging utility:

Copy
// srv/lib/logger.js
const config = require('./config');

class Logger {
  constructor(component) {
    this.component = component;
    this.level = config.monitoring?.logLevel || 'info';
  }
  
  log(level, message, data = {}) {
    if (!this.shouldLog(level)) return;
    
    const logEntry = {
      timestamp: new Date().toISOString(),
      level,
      component: this.component,
      message,
      ...data
    };
    
    // In production, send to application logging service
    if (process.env.NODE_ENV === 'production') {
      console.log(JSON.stringify(logEntry));
    } else {
      console.log(`[${level.toUpperCase()}] ${this.component}:`, message, data);
    }
  }
  
  shouldLog(level) {
    const levels = ['debug', 'info', 'warn', 'error'];
    const currentLevelIndex = levels.indexOf(this.level);
    const messageLevelIndex = levels.indexOf(level);
    return messageLevelIndex >= currentLevelIndex;
  }
  
  debug(message, data) { this.log('debug', message, data); }
  info(message, data) { this.log('info', message, data); }
  warn(message, data) { this.log('warn', message, data); }
  error(message, data) { this.log('error', message, data); }
}

module.exports = Logger;

Use in services:

Copy
const Logger = require('./lib/logger');
const logger = new Logger('TicketService');

class TicketService {
  async processTicket(ticket) {
    logger.info('Processing ticket', { ticketId: ticket.ID });
    
    try {
      const result = await this.generateResponse(ticket);
      logger.info('Ticket processed successfully', { 
        ticketId: ticket.ID,
        tokens: result.usage?.total_tokens
      });
      return result;
    } catch (error) {
      logger.error('Failed to process ticket', { 
        ticketId: ticket.ID,
        error: error.message 
      });
      throw error;
    }
  }
}

Step 10: Performance Optimization

Connection Pooling

Copy
// srv/lib/ai-client-pool.js
const { OrchestrationClient } = require('@sap-ai-sdk/orchestration');

class AIClientPool {
  constructor(size = 5) {
    this.size = size;
    this.clients = [];
    this.available = [];
    this.initialize();
  }
  
  initialize() {
    for (let i = 0; i < this.size; i++) {
      const client = new OrchestrationClient({
        promptTemplating: {
          model: { name: 'gpt-4o' }
        }
      });
      this.clients.push(client);
      this.available.push(client);
    }
  }
  
  async acquire() {
    if (this.available.length > 0) {
      return this.available.pop();
    }
    
    // Wait for a client to become available
    return new Promise((resolve) => {
      const interval = setInterval(() => {
        if (this.available.length > 0) {
          clearInterval(interval);
          resolve(this.available.pop());
        }
      }, 100);
    });
  }
  
  release(client) {
    this.available.push(client);
  }
}

module.exports = new AIClientPool();

Batch Processing

Copy
class BatchProcessor {
  constructor(batchSize = 10) {
    this.batchSize = batchSize;
    this.queue = [];
  }
  
  async addToQueue(ticket) {
    this.queue.push(ticket);
    
    if (this.queue.length >= this.batchSize) {
      await this.processBatch();
    }
  }
  
  async processBatch() {
    const batch = this.queue.splice(0, this.batchSize);
    
    // Process tickets in parallel
    const results = await Promise.all(
      batch.map(ticket => this.processTicket(ticket))
    );
    
    return results;
  }
  
  async processTicket(ticket) {
    // Your AI processing logic
  }
}

Deployment Checklist

Before deploying to production:

  • All secrets stored in BTP services (no hardcoded credentials)
  • Authentication and authorization configured
  • Error handling with retry logic implemented
  • Monitoring and telemetry in place
  • Response caching configured
  • Cost tracking enabled
  • Health check endpoints working
  • Logging structured and searchable
  • CI/CD pipeline tested
  • Performance optimizations applied
  • Documentation updated
  • Disaster recovery plan documented

Deployment Commands

Copy
# Build the MTA
npm install -g mbt
mbt build

# Login to Cloud Foundry
cf login -a <api-endpoint>

# Deploy the application
cf deploy mta_archives/support-ticket-ai_1.0.0.mtar

# Check deployment status
cf apps

# View logs
cf logs support-ticket-srv --recent

# Check service bindings
cf services

# Scale the application
cf scale support-ticket-srv -i 2 -m 1G

Monitoring in Production

Key Metrics to Track

  1. Request Metrics

    • Total requests per minute
    • Average response time
    • Error rate
  2. Token Usage

    • Tokens per request
    • Daily/monthly token consumption
    • Cost per request
  3. Cache Performance

    • Hit rate
    • Miss rate
    • Cache size
  4. Model Performance

    • Latency by model
    • Success rate by model
    • Cost by model

Setting Up Alerts

Copy
// srv/lib/alerting.js
class AlertManager {
  
  checkThresholds(metrics) {
    const alerts = [];
    
    // High error rate
    if (metrics.errorRate > 0.05) {
      alerts.push({
        severity: 'high',
        message: `Error rate is ${(metrics.errorRate * 100).toFixed(2)}%`,
        metric: 'errorRate',
        value: metrics.errorRate
      });
    }
    
    // High cost
    if (metrics.totalCost > 100) {
      alerts.push({
        severity: 'medium',
        message: `Daily cost is $${metrics.totalCost.toFixed(2)}`,
        metric: 'cost',
        value: metrics.totalCost
      });
    }
    
    // High latency
    if (metrics.avgLatency > 5000) {
      alerts.push({
        severity: 'medium',
        message: `Average latency is ${metrics.avgLatency}ms`,
        metric: 'latency',
        value: metrics.avgLatency
      });
    }
    
    return alerts;
  }
  
  sendAlert(alert) {
    // Send to alerting service (email, Slack, PagerDuty, etc.)
    console.error('ALERT:', alert);
  }
}

module.exports = new AlertManager();

Recap

We've covered essential production requirements:

  1. Security: XSUAA authentication, service bindings, no hardcoded secrets
  2. Error Handling: Retry logic, graceful degradation, enhanced error messages
  3. Monitoring: Telemetry, metrics tracking, structured logging
  4. Cost Optimization: Response caching, token optimization, smart model selection
  5. CI/CD: Automated testing and deployment pipeline
  6. Performance: Connection pooling, batch processing, health checks
  7. Observability: Comprehensive logging, alerting, health monitoring

Your AI application is now production-ready with enterprise-grade reliability and observability!

Additional Resources

Next Steps

With your production deployment complete, consider:

  • Advanced Features: Fine-tuning models, custom embeddings, multi-tenant architecture
  • Scaling: Load balancing, auto-scaling policies, database optimization
  • Compliance: Data residency, audit logging, compliance certifications
  • Innovation: Explore new SAP AI Core capabilities, experiment with new models

Congratulations! You've built and deployed a production-ready AI application on SAP BTP! 🎉