Production-Ready AI Applications with SAP AI Core
In our previous posts, we built a Support Ticket System with AI orchestration and RAG capabilities. Now it's time to deploy it to production with enterprise-grade reliability, security, and observability.
This post is part of a series:
- Getting Started with SAP AI Core and the SAP AI SDK in CAP
- Leveraging LLM Models and Deployments in SAP AI Core
- Orchestrating AI Workflows with SAP AI Core
- Document Grounding with RAG in SAP AI Core
- Production-Ready AI Applications with SAP AI Core (this post)
What Production-Ready Means
A production AI application needs:
| Requirement | Why It Matters |
|---|---|
| Security | Protect sensitive data and API keys |
| Monitoring | Track performance, costs, and failures |
| Resilience | Handle errors gracefully and retry transient failures |
| Scalability | Support growing user load |
| Cost Control | Manage token consumption and API costs |
| Observability | Debug issues quickly in production |
Step 1: Secure Configuration Management
Never hardcode credentials. Use SAP BTP services for secrets management.
Create mta.yaml for Deployment
_schema-version: '3.3'
ID: support-ticket-ai
version: 1.0.0
description: Support Ticket System with AI
parameters:
enable-parallel-deployments: true
build-parameters:
before-all:
- builder: custom
commands:
- npm ci
- npx cds build --production
modules:
# CAP Application
- name: support-ticket-srv
type: nodejs
path: gen/srv
parameters:
buildpack: nodejs_buildpack
memory: 512M
disk-quota: 1024M
build-parameters:
builder: npm
provides:
- name: srv-api
properties:
srv-url: ${default-url}
requires:
- name: support-ticket-db
- name: support-ticket-auth
- name: support-ticket-destination
- name: support-ticket-aicore
# Database Deployer
- name: support-ticket-db-deployer
type: hdb
path: gen/db
parameters:
buildpack: nodejs_buildpack
requires:
- name: support-ticket-db
resources:
# HANA Cloud Database
- name: support-ticket-db
type: com.sap.xs.hdi-container
parameters:
service: hana
service-plan: hdi-shared
properties:
hdi-service-name: ${service-name}
# XSUAA Authentication
- name: support-ticket-auth
type: org.cloudfoundry.managed-service
parameters:
service: xsuaa
service-plan: application
path: ./xs-security.json
config:
xsappname: support-ticket-${org}-${space}
tenant-mode: dedicated
scopes:
- name: '$XSAPPNAME.Admin'
description: Admin access
- name: '$XSAPPNAME.User'
description: User access
role-templates:
- name: Admin
description: Administrator
scope-references:
- '$XSAPPNAME.Admin'
- name: User
description: Regular user
scope-references:
- '$XSAPPNAME.User'
# Destination Service
- name: support-ticket-destination
type: org.cloudfoundry.managed-service
parameters:
service: destination
service-plan: lite
# AI Core Service
- name: support-ticket-aicore
type: org.cloudfoundry.managed-service
parameters:
service: aicore
service-plan: extendedCreate xs-security.json
{
"xsappname": "support-ticket",
"tenant-mode": "dedicated",
"description": "Security configuration for Support Ticket AI",
"scopes": [
{
"name": "$XSAPPNAME.Admin",
"description": "Admin access"
},
{
"name": "$XSAPPNAME.User",
"description": "User access"
}
],
"role-templates": [
{
"name": "Admin",
"description": "Administrator",
"scope-references": [
"$XSAPPNAME.Admin"
]
},
{
"name": "User",
"description": "Regular User",
"scope-references": [
"$XSAPPNAME.User"
]
}
]
}Step 2: Implement Authentication & Authorization
Update srv/server.js to require authentication:
const cds = require('@sap/cds');
const xsenv = require('@sap/xsenv');
// Load environment variables
xsenv.loadEnv();
module.exports = cds.server;
// Add authentication middleware
cds.on('bootstrap', (app) => {
const passport = require('passport');
const { JWTStrategy } = require('@sap/xssec');
// Configure passport with JWT strategy
passport.use(new JWTStrategy(xsenv.getServices({ uaa: { tag: 'xsuaa' } }).uaa));
app.use(passport.initialize());
app.use(passport.authenticate('JWT', { session: false }));
});
// Add authorization checks
cds.on('served', () => {
const { Tickets } = cds.entities;
// Restrict access based on roles
cds.before('CREATE', 'Tickets', async (req) => {
if (!req.user.is('User')) {
req.reject(403, 'Insufficient privileges');
}
});
cds.before('UPDATE', 'Tickets', async (req) => {
if (!req.user.is('Admin')) {
req.reject(403, 'Only admins can update tickets');
}
});
});Update package.json to add authentication:
{
"cds": {
"requires": {
"auth": {
"kind": "xsuaa"
},
"db": {
"kind": "hana"
}
}
}
}Step 3: Implement Comprehensive Error Handling
Create /srv/lib/error-handler.js:
const cds = require('@sap/cds');
class AIErrorHandler {
/**
* Handle AI Core errors with proper retry logic
*/
static async handleWithRetry(operation, options = {}) {
const maxRetries = options.maxRetries || 3;
const initialDelay = options.initialDelay || 1000;
const backoffMultiplier = options.backoffMultiplier || 2;
let lastError;
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await operation();
} catch (error) {
lastError = error;
// Don't retry certain errors
if (this.isNonRetryableError(error)) {
throw this.enhanceError(error);
}
// Log the error
console.error(`Attempt ${attempt + 1} failed:`, {
message: error.message,
status: error.response?.status,
code: error.code
});
// Wait before retrying
if (attempt < maxRetries - 1) {
const delay = initialDelay * Math.pow(backoffMultiplier, attempt);
await this.sleep(delay);
}
}
}
throw this.enhanceError(lastError);
}
/**
* Check if error should not be retried
*/
static isNonRetryableError(error) {
const status = error.response?.status;
// Don't retry client errors (except 429 rate limit)
if (status && status >= 400 && status < 500 && status !== 429) {
return true;
}
// Don't retry authentication errors
if (error.code === 'EAUTH' || error.message?.includes('authentication')) {
return true;
}
return false;
}
/**
* Enhance error with additional context
*/
static enhanceError(error) {
const enhanced = new Error(error.message);
enhanced.name = 'AIOperationError';
enhanced.originalError = error;
enhanced.timestamp = new Date().toISOString();
// Add status code if available
if (error.response?.status) {
enhanced.statusCode = error.response.status;
}
// Add rate limit info if available
if (error.response?.headers) {
const headers = error.response.headers;
if (headers['x-ratelimit-remaining']) {
enhanced.rateLimitRemaining = headers['x-ratelimit-remaining'];
enhanced.rateLimitReset = headers['x-ratelimit-reset'];
}
}
return enhanced;
}
/**
* Sleep helper
*/
static sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
/**
* Handle content filtering errors
*/
static handleContentFilterError(error) {
if (error.response?.status === 400) {
const data = error.response.data;
if (data?.error?.message?.includes('content filter')) {
return {
filtered: true,
message: 'Your request was blocked by content safety filters.',
categories: this.extractFilterCategories(data)
};
}
}
return null;
}
/**
* Extract filter categories from error
*/
static extractFilterCategories(errorData) {
// Parse error data to extract which filters triggered
const categories = [];
const message = errorData?.error?.message || '';
if (message.includes('hate')) categories.push('hate');
if (message.includes('violence')) categories.push('violence');
if (message.includes('self-harm')) categories.push('self-harm');
if (message.includes('sexual')) categories.push('sexual');
return categories;
}
}
module.exports = AIErrorHandler;Use it in your services:
const AIErrorHandler = require('./lib/error-handler');
const { OrchestrationClient } = require('@sap-ai-sdk/orchestration');
class TicketService {
async processTicket(ticket) {
return await AIErrorHandler.handleWithRetry(async () => {
const client = new OrchestrationClient(/* ... */);
return await client.chatCompletion(/* ... */);
}, {
maxRetries: 3,
initialDelay: 1000
});
}
}Step 4: Implement Monitoring & Observability
Create /srv/lib/ai-telemetry.js:
class AITelemetry {
constructor() {
this.metrics = {
requests: 0,
errors: 0,
totalTokens: 0,
totalCost: 0,
latencies: []
};
}
/**
* Track AI request metrics
*/
trackRequest(operation, result, duration) {
this.metrics.requests++;
if (result.usage) {
this.metrics.totalTokens += result.usage.total_tokens || 0;
// Estimate cost (adjust rates based on your model)
const inputCost = (result.usage.prompt_tokens || 0) * 0.00000112;
const outputCost = (result.usage.completion_tokens || 0) * 0.00000320;
this.metrics.totalCost += inputCost + outputCost;
}
this.metrics.latencies.push(duration);
// Log detailed metrics
console.log('AI Request Completed', {
operation,
duration,
tokens: result.usage,
cost: this.estimateCost(result.usage)
});
}
/**
* Track errors
*/
trackError(operation, error) {
this.metrics.errors++;
console.error('AI Request Failed', {
operation,
error: error.message,
statusCode: error.statusCode,
timestamp: new Date().toISOString()
});
}
/**
* Estimate cost for a request
*/
estimateCost(usage) {
if (!usage) return 0;
const inputCost = (usage.prompt_tokens || 0) * 0.00000112;
const outputCost = (usage.completion_tokens || 0) * 0.00000320;
return {
input: inputCost,
output: outputCost,
total: inputCost + outputCost
};
}
/**
* Get metrics summary
*/
getMetrics() {
const avgLatency = this.metrics.latencies.length > 0
? this.metrics.latencies.reduce((a, b) => a + b, 0) / this.metrics.latencies.length
: 0;
return {
...this.metrics,
avgLatency,
errorRate: this.metrics.requests > 0
? this.metrics.errors / this.metrics.requests
: 0
};
}
/**
* Reset metrics (for testing or periodic reports)
*/
reset() {
this.metrics = {
requests: 0,
errors: 0,
totalTokens: 0,
totalCost: 0,
latencies: []
};
}
}
// Singleton instance
const telemetry = new AITelemetry();
module.exports = telemetry;Integrate telemetry into your service:
const telemetry = require('./lib/ai-telemetry');
const AIErrorHandler = require('./lib/error-handler');
class TicketService {
async processTicketWithTelemetry(ticket) {
const startTime = Date.now();
try {
const result = await AIErrorHandler.handleWithRetry(async () => {
return await this.orchestrationClient.chatCompletion(/* ... */);
});
const duration = Date.now() - startTime;
telemetry.trackRequest('processTicket', result, duration);
return result;
} catch (error) {
telemetry.trackError('processTicket', error);
throw error;
}
}
}Add a metrics endpoint:
// In srv/server.js
cds.on('bootstrap', (app) => {
const telemetry = require('./lib/ai-telemetry');
app.get('/metrics', (req, res) => {
res.json(telemetry.getMetrics());
});
});Step 5: Cost Optimization Strategies
Implement Response Caching
Create /srv/lib/response-cache.js:
const NodeCache = require('node-cache');
class AIResponseCache {
constructor(ttlSeconds = 3600) {
this.cache = new NodeCache({
stdTTL: ttlSeconds,
checkperiod: 600
});
}
/**
* Generate cache key from request
*/
generateKey(prompt, model, temperature = 0) {
const crypto = require('crypto');
const data = JSON.stringify({ prompt, model, temperature });
return crypto.createHash('sha256').update(data).digest('hex');
}
/**
* Get cached response
*/
get(prompt, model, temperature) {
const key = this.generateKey(prompt, model, temperature);
return this.cache.get(key);
}
/**
* Store response in cache
*/
set(prompt, model, temperature, response) {
const key = this.generateKey(prompt, model, temperature);
this.cache.set(key, response);
}
/**
* Clear cache
*/
clear() {
this.cache.flushAll();
}
/**
* Get cache statistics
*/
getStats() {
return this.cache.getStats();
}
}
module.exports = AIResponseCache;Use caching in your service:
const AIResponseCache = require('./lib/response-cache');
const cache = new AIResponseCache(3600); // 1 hour TTL
class TicketService {
async processTicketWithCache(ticket) {
const prompt = `${ticket.subject} ${ticket.description}`;
const model = 'gpt-4o';
const temperature = 0.3;
// Check cache first
const cached = cache.get(prompt, model, temperature);
if (cached) {
console.log('Cache hit - saved API call');
return cached;
}
// Generate new response
const response = await this.processTicket(ticket);
// Cache the response
cache.set(prompt, model, temperature, response);
return response;
}
}Token Usage Optimization
class TokenOptimizer {
/**
* Truncate prompt to fit within token limit
*/
static truncatePrompt(text, maxTokens = 4000) {
// Rough estimate: 1 token ≈ 4 characters for English
const maxChars = maxTokens * 4;
if (text.length <= maxChars) {
return text;
}
// Truncate and add ellipsis
return text.substring(0, maxChars - 3) + '...';
}
/**
* Optimize prompt by removing unnecessary whitespace
*/
static optimizePrompt(text) {
return text
.replace(/\s+/g, ' ') // Replace multiple spaces with single space
.replace(/\n\s*\n/g, '\n') // Remove empty lines
.trim();
}
/**
* Estimate token count (rough approximation)
*/
static estimateTokens(text) {
return Math.ceil(text.length / 4);
}
/**
* Split long documents into chunks
*/
static chunkDocument(text, chunkSize = 1000) {
const words = text.split(/\s+/);
const chunks = [];
for (let i = 0; i < words.length; i += chunkSize) {
chunks.push(words.slice(i, i + chunkSize).join(' '));
}
return chunks;
}
}
module.exports = TokenOptimizer;Model Selection Strategy
class ModelSelector {
/**
* Select appropriate model based on task complexity
*/
static selectModel(task) {
const complexity = this.assessComplexity(task);
if (complexity === 'simple') {
return 'gpt-4o-mini'; // Cheaper for simple tasks
} else if (complexity === 'medium') {
return 'gpt-4o';
} else {
return 'gpt-4o'; // Most capable for complex tasks
}
}
/**
* Assess task complexity
*/
static assessComplexity(task) {
const text = task.subject + ' ' + task.description;
// Simple heuristics
if (text.length < 100) {
return 'simple';
} else if (text.length < 500) {
return 'medium';
} else {
return 'complex';
}
}
/**
* Get model configuration
*/
static getModelConfig(modelName) {
const configs = {
'gpt-4o-mini': {
max_tokens: 500,
temperature: 0.5
},
'gpt-4o': {
max_tokens: 1000,
temperature: 0.3
}
};
return configs[modelName] || configs['gpt-4o'];
}
}
module.exports = ModelSelector;Step 6: CI/CD Pipeline
Create .github/workflows/deploy.yml:
name: Deploy to Cloud Foundry
on:
push:
branches:
- main
workflow_dispatch:
jobs:
build-and-deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Setup Node.js
uses: actions/setup-node@v3
with:
node-version: '18'
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Run tests
run: npm test
- name: Build MTA
run: |
npm install -g mbt
mbt build
- name: Deploy to Cloud Foundry
uses: cloud-foundry/cf-cli-action@v1
with:
api_endpoint: ${{ secrets.CF_API }}
username: ${{ secrets.CF_USERNAME }}
password: ${{ secrets.CF_PASSWORD }}
org: ${{ secrets.CF_ORG }}
space: ${{ secrets.CF_SPACE }}
- name: Deploy application
run: cf deploy mta_archives/*.mtarStep 7: Health Checks & Readiness Probes
Add health check endpoints in srv/server.js:
cds.on('bootstrap', (app) => {
const telemetry = require('./lib/ai-telemetry');
// Liveness probe - is the app running?
app.get('/health/live', (req, res) => {
res.status(200).json({ status: 'alive' });
});
// Readiness probe - is the app ready to serve?
app.get('/health/ready', async (req, res) => {
try {
// Check database connection
await cds.tx(async () => {
await SELECT.one.from('sap.capire.tickets.Tickets');
});
// Check AI Core connection
const { OrchestrationClient } = require('@sap-ai-sdk/orchestration');
const client = new OrchestrationClient({
promptTemplating: { model: { name: 'gpt-4o' } }
});
// Simple ping (don't use tokens)
// Just check if we can initialize the client
res.status(200).json({
status: 'ready',
metrics: telemetry.getMetrics()
});
} catch (error) {
res.status(503).json({
status: 'not ready',
error: error.message
});
}
});
});Step 8: Environment-Specific Configuration
Create environment-specific configurations:
// config/production.js
module.exports = {
ai: {
orchestration: {
maxRetries: 3,
timeout: 30000,
cacheEnabled: true,
cacheTTL: 3600
},
models: {
default: 'gpt-4o',
simple: 'gpt-4o-mini'
}
},
monitoring: {
enabled: true,
logLevel: 'info'
}
};
// config/development.js
module.exports = {
ai: {
orchestration: {
maxRetries: 1,
timeout: 10000,
cacheEnabled: false
},
models: {
default: 'gpt-4o-mini', // Use cheaper model in dev
simple: 'gpt-4o-mini'
}
},
monitoring: {
enabled: true,
logLevel: 'debug'
}
};Load configuration:
// srv/lib/config.js
const path = require('path');
const env = process.env.NODE_ENV || 'development';
let config;
try {
config = require(path.join(__dirname, '../../config', env));
} catch (error) {
console.warn(`No config found for ${env}, using defaults`);
config = {};
}
module.exports = config;Step 9: Logging Best Practices
Create structured logging utility:
// srv/lib/logger.js
const config = require('./config');
class Logger {
constructor(component) {
this.component = component;
this.level = config.monitoring?.logLevel || 'info';
}
log(level, message, data = {}) {
if (!this.shouldLog(level)) return;
const logEntry = {
timestamp: new Date().toISOString(),
level,
component: this.component,
message,
...data
};
// In production, send to application logging service
if (process.env.NODE_ENV === 'production') {
console.log(JSON.stringify(logEntry));
} else {
console.log(`[${level.toUpperCase()}] ${this.component}:`, message, data);
}
}
shouldLog(level) {
const levels = ['debug', 'info', 'warn', 'error'];
const currentLevelIndex = levels.indexOf(this.level);
const messageLevelIndex = levels.indexOf(level);
return messageLevelIndex >= currentLevelIndex;
}
debug(message, data) { this.log('debug', message, data); }
info(message, data) { this.log('info', message, data); }
warn(message, data) { this.log('warn', message, data); }
error(message, data) { this.log('error', message, data); }
}
module.exports = Logger;Use in services:
const Logger = require('./lib/logger');
const logger = new Logger('TicketService');
class TicketService {
async processTicket(ticket) {
logger.info('Processing ticket', { ticketId: ticket.ID });
try {
const result = await this.generateResponse(ticket);
logger.info('Ticket processed successfully', {
ticketId: ticket.ID,
tokens: result.usage?.total_tokens
});
return result;
} catch (error) {
logger.error('Failed to process ticket', {
ticketId: ticket.ID,
error: error.message
});
throw error;
}
}
}Step 10: Performance Optimization
Connection Pooling
// srv/lib/ai-client-pool.js
const { OrchestrationClient } = require('@sap-ai-sdk/orchestration');
class AIClientPool {
constructor(size = 5) {
this.size = size;
this.clients = [];
this.available = [];
this.initialize();
}
initialize() {
for (let i = 0; i < this.size; i++) {
const client = new OrchestrationClient({
promptTemplating: {
model: { name: 'gpt-4o' }
}
});
this.clients.push(client);
this.available.push(client);
}
}
async acquire() {
if (this.available.length > 0) {
return this.available.pop();
}
// Wait for a client to become available
return new Promise((resolve) => {
const interval = setInterval(() => {
if (this.available.length > 0) {
clearInterval(interval);
resolve(this.available.pop());
}
}, 100);
});
}
release(client) {
this.available.push(client);
}
}
module.exports = new AIClientPool();Batch Processing
class BatchProcessor {
constructor(batchSize = 10) {
this.batchSize = batchSize;
this.queue = [];
}
async addToQueue(ticket) {
this.queue.push(ticket);
if (this.queue.length >= this.batchSize) {
await this.processBatch();
}
}
async processBatch() {
const batch = this.queue.splice(0, this.batchSize);
// Process tickets in parallel
const results = await Promise.all(
batch.map(ticket => this.processTicket(ticket))
);
return results;
}
async processTicket(ticket) {
// Your AI processing logic
}
}Deployment Checklist
Before deploying to production:
- All secrets stored in BTP services (no hardcoded credentials)
- Authentication and authorization configured
- Error handling with retry logic implemented
- Monitoring and telemetry in place
- Response caching configured
- Cost tracking enabled
- Health check endpoints working
- Logging structured and searchable
- CI/CD pipeline tested
- Performance optimizations applied
- Documentation updated
- Disaster recovery plan documented
Deployment Commands
# Build the MTA
npm install -g mbt
mbt build
# Login to Cloud Foundry
cf login -a <api-endpoint>
# Deploy the application
cf deploy mta_archives/support-ticket-ai_1.0.0.mtar
# Check deployment status
cf apps
# View logs
cf logs support-ticket-srv --recent
# Check service bindings
cf services
# Scale the application
cf scale support-ticket-srv -i 2 -m 1GMonitoring in Production
Key Metrics to Track
-
Request Metrics
- Total requests per minute
- Average response time
- Error rate
-
Token Usage
- Tokens per request
- Daily/monthly token consumption
- Cost per request
-
Cache Performance
- Hit rate
- Miss rate
- Cache size
-
Model Performance
- Latency by model
- Success rate by model
- Cost by model
Setting Up Alerts
// srv/lib/alerting.js
class AlertManager {
checkThresholds(metrics) {
const alerts = [];
// High error rate
if (metrics.errorRate > 0.05) {
alerts.push({
severity: 'high',
message: `Error rate is ${(metrics.errorRate * 100).toFixed(2)}%`,
metric: 'errorRate',
value: metrics.errorRate
});
}
// High cost
if (metrics.totalCost > 100) {
alerts.push({
severity: 'medium',
message: `Daily cost is $${metrics.totalCost.toFixed(2)}`,
metric: 'cost',
value: metrics.totalCost
});
}
// High latency
if (metrics.avgLatency > 5000) {
alerts.push({
severity: 'medium',
message: `Average latency is ${metrics.avgLatency}ms`,
metric: 'latency',
value: metrics.avgLatency
});
}
return alerts;
}
sendAlert(alert) {
// Send to alerting service (email, Slack, PagerDuty, etc.)
console.error('ALERT:', alert);
}
}
module.exports = new AlertManager();Recap
We've covered essential production requirements:
- Security: XSUAA authentication, service bindings, no hardcoded secrets
- Error Handling: Retry logic, graceful degradation, enhanced error messages
- Monitoring: Telemetry, metrics tracking, structured logging
- Cost Optimization: Response caching, token optimization, smart model selection
- CI/CD: Automated testing and deployment pipeline
- Performance: Connection pooling, batch processing, health checks
- Observability: Comprehensive logging, alerting, health monitoring
Your AI application is now production-ready with enterprise-grade reliability and observability!
Additional Resources
- SAP BTP Cloud Foundry Documentation
- CAP Production Best Practices
- SAP AI Core Monitoring
- XSUAA Security
- MTA Development Guide
- Cloud Foundry Logging
Next Steps
With your production deployment complete, consider:
- Advanced Features: Fine-tuning models, custom embeddings, multi-tenant architecture
- Scaling: Load balancing, auto-scaling policies, database optimization
- Compliance: Data residency, audit logging, compliance certifications
- Innovation: Explore new SAP AI Core capabilities, experiment with new models
Congratulations! You've built and deployed a production-ready AI application on SAP BTP! 🎉
