June 20, 2024
Decision Nodes

Consuming AI APIs: Off-The-Shelf vs Custom

Gwendolyn Faraday
Tech Lead

Gwen is a tech lead with over 8 years of experience architecting and developing applications. Outside of her professional career, she shares knowledge about software development and productivity through her newsletter and YouTube channel under the brand Faraday Academy.

Open in Eraser


The purpose of this article is to guide you through the process, tradeoffs, and practicalities of integrating your applications with AI APIs, from the initial decision-making phase to actual implementation. By examining both off-the-shelf and custom AI solutions, I will provide a comprehensive overview to help you make informed decisions that align with your technical capabilities and business goals.

When it comes to AI APIs, the variety of options and potential hybrid architectures can be overwhelming. In this article, I will focus on the primary categories and main tradeoffs that most developers will encounter when implementing LLM-specific AI features in an application. There are primarily two categories to consider: off-the-shelf APIs and custom AI solutions. For the purpose of this article, I define off-the-shelf APIs, such as OpenAI’s GPT-4, Anthropic's Claude, AWS Bedrock, Google Cloud AI, and Microsoft Azure Cognitive Services, as those services that offer ready-to-use solutions that can be quickly integrated into applications. I use the term "custom AI solution" to define self-hosted solutions with models like Llama and Mixtral, as well as managed services like Ollama and Groq that are set up to host and run these models.

Definitions for Terms Used in this Article

  • Off-the-Shelf APIs: Pre-built AI services provided by third-party vendors, ready for immediate integration into applications with minimal setup.
  • Custom AI Solutions: Tailored AI implementations that can be self-hosted, managed by a third-party service, or fine-tuned to meet specific needs.
  • Model: In AI and machine learning, a model is a mathematical representation of a real-world process or system. It is created by training on a dataset and is used to make predictions or decisions based on new data inputs. Models can vary in complexity from simple linear regressions to complex deep neural networks.
  • LLMs (Large Language Models): Advanced AI models trained on vast amounts of text data to perform a variety of language tasks.
  • Fine-Tuning: The process of taking a pre-trained model (like an LLM) and further training it on a specific dataset to adapt it to a particular task or domain. Fine-tuning involves adjusting the model’s parameters so that it performs better on the specialized data provided, resulting in improved accuracy and relevance for specific applications.

Understanding Custom AI Solutions

Before diving into the main comparison in this article, it's important to understand what a custom AI solution is and why it is becoming increasingly popular. It's fairly straightforward to grasp the offerings of AI API services like OpenAI's APIs, but many people may not fully appreciate why an organization might opt for an open-source model instead. There's a common misconception that open-source models are only for learning and experimental projects, while "real" companies rely exclusively on off-the-shelf solutions.

In reality, custom AI solutions offer significant benefits that can be crucial for specific use cases. These solutions provide greater flexibility, control over data, and the ability to tailor the AI to meet precise business needs. To illustrate this, let’s explore some examples.

Self-Hosted LLMs: Hosting Models Like Llama on AWS

Imagine a software consultancy, "DataGuard Advisors," that specializes in providing strategic advice to large enterprises on data management and security. They want to leverage AI to enhance their internal tooling for generating reports, analyzing client data, and providing insights. However, they are extremely concerned about the privacy and confidentiality of their clients' data. Using off-the-shelf AI services from third-party vendors could potentially expose sensitive client information, which is unacceptable for their business model.

To address this concern, DataGuard Advisors decides to create an internal service using a self-hosted LLM, specifically the Llama model, on AWS. This approach allows them to have full control over their AI infrastructure, ensuring that all client data remains within their secured environment. The firm sets up the necessary infrastructure on AWS, taking advantage of AWS’s robust security features to protect their data.

DataGuard Advisors’ team then gathers and preprocesses anonymized client data to fine-tune the Llama model. This custom training allows the model to understand the specific terminology and nuances relevant to their industry, resulting in highly accurate and insightful analyses. By hosting the model themselves, they ensure that no external parties have access to the data or the AI processes.

The result is a powerful internal tool that can generate detailed, accurate reports and provide deep insights, all while maintaining the highest standards of data privacy and security. This setup not only enhances their operational efficiency but also reinforces their commitment to client confidentiality, giving them a competitive edge in the consultancy market.

Fine-Tuning and Custom Training: Extending Base Models with Specific Data

Consider a financial services company, "FinSecure," that wants to implement an AI-powered fraud detection system. Most off-the-shelf solutions provide generic fraud detection capabilities, but FinSecure needs a system tailored to their specific transaction patterns and fraud risk factors.

They initially planned to try using OpenAI's GPT models but were worried about the recent turnover in company leadership. Since they were going to invest a lot of time and money into fine-tuning a model, they wanted more assurance and control over it in the long-term.

FinSecure decided to start with a pre-trained LLM and take it through a custom training process. They went on to gather extensive historical transaction data, including known fraudulent transactions, and fine-tune the model to recognize patterns specific to their operations. This involved using advanced techniques like transfer learning, where the base LLM is trained further on FinSecure's unique dataset to improve its accuracy in detecting anomalies and fraudulent activities.

The custom-trained model now excels at identifying potential fraud in real-time, reducing false positives and catching more fraudulent transactions than generic solutions. This highly specialized model enhances FinSecure's security measures, providing their customers with a safer and more reliable service. However, this approach requires significant investment in data collection, model training, and continuous monitoring to ensure the model adapts to new fraud patterns.

Comparative Analysis of AI APIs

In the previous section, we explored the scope and customization options available with custom AI solutions. Now, let's dive into a detailed comparison between off-the-shelf AI APIs and custom AI solutions, providing specific examples and highlighting their respective advantages and disadvantages.

Off-the-Shelf AI APIs


  • OpenAI’s GPT-4: A powerful language model capable of various tasks, including text generation, translation, and summarization.
  • AWS Bedrock: Amazon's suite of foundation models designed for integration into AWS services.
  • Google Cloud AI: A collection of AI services, including machine learning models for vision, speech, language, and structured data.
  • Microsoft Azure Cognitive Services: AI services that provide vision, speech, language, and decision capabilities.


  • Ease of Use: These APIs are user-friendly and designed for quick and easy integration. For instance, integrating GPT-4 into a chatbot can be done with a few lines of code, significantly reducing development time.
  • Cost Efficiency: Initial costs are generally lower since there's no need for extensive development or infrastructure setup. For example, Google Cloud AI offers pay-as-you-go pricing, making it affordable for smaller projects.
  • Reliability: Leveraging cloud providers' infrastructure ensures high reliability and availability. Microsoft Azure Cognitive Services, for instance, benefits from Azure's global network and robust infrastructure.
  • Scalability: Built-in scalability allows these services to handle varying workloads without significant effort from your side. AWS Bedrock use Claude, for example, can scale seamlessly to meet high demand.


  • Limited Customization: Off-the-shelf APIs offer limited customization options. For example, while GPT-4 can handle a wide range of tasks, fine-tuning it for very specific use cases can be challenging.
  • Higher Usage Costs: While initial costs are low, usage fees can accumulate quickly, especially with high-volume applications. For instance, using OpenAI’s GPT-4 extensively can become costly over time.
  • Vendor Dependency: Relying on third-party services can create dependency on the vendor for updates, support, and pricing. Changes in API pricing or service availability can impact your application.
  • Data Privacy Concerns: Sharing data with third-party vendors can raise privacy and compliance issues. For example, sensitive data processed through cloud services may be subject to additional regulatory scrutiny.

Custom AI Solutions


  • Self-hosted models like Llama on AWS: Allows full control over the AI model and data, ensuring privacy and customization.
  • Managed services like Ollama and Groq: Provide robust infrastructure and management while offering customization options.
  • Fine-tuned models on base LLMs: Tailored to specific datasets and requirements, enhancing performance for specialized tasks.


  • Full Customization: Custom solutions can be tailored to meet specific needs, offering (almost) complete control over the functionality and performance. For example, a consultancy like DataGuard Advisors can fine-tune an LLM on proprietary client data to provide specialized insights.
  • Ownership and Control: Greater control over data, infrastructure, and AI models, reducing dependency on external vendors. Self-hosting Llama on AWS allows complete data privacy and security.
  • Competitive Advantage: Unique customizations can provide a competitive edge by addressing specific business challenges more effectively. For instance, FinSecure’s custom-trained fraud detection model significantly reduces false positives compared to generic solutions.


  • Higher Development Costs: Significant upfront investment is required for development, infrastructure, and ongoing maintenance. Setting up and maintaining a self-hosted LLM like Llama on AWS involves costs for computing resources, data storage, and skilled personnel.
  • Knowledge and Complexity: Developing and maintaining custom solutions requires advanced technical expertise and can be complex. Fine-tuning a model to recognize specific fraud patterns requires in-depth knowledge of both AI and the domain.
  • Scalability Challenges: Ensuring scalability comparable to commercial APIs requires careful planning and substantial resources. Custom solutions must be designed to handle increasing data volumes and user demands without compromising performance.


My Proposed Decision-Making Framework: Off-The-Shelf vs Custom

In this section, I will present a decision-making framework to help you choose between off-the-shelf and custom AI solutions, building on the insights from the examples and tradeoffs discussed previously.

Step-by-Step Guide

  • Identify Specific Needs: Determine the specific business problems you aim to solve with AI.
    • Example: Do you need AI for customer service, data analysis, or product recommendations?
  • Customization vs. Out-of-the-Box Functionality: Evaluate whether your requirements can be met with off-the-shelf solutions or if they need custom AI capabilities.
    • Example: If you need highly specialized functionality, custom solutions might be necessary.
  • Initial vs. Ongoing Costs: Compare the upfront costs of development with the ongoing operational costs.
    • Example: Off-the-shelf solutions might have low initial costs but high usage fees.
  • Total Cost of Ownership: Conduct a comprehensive analysis considering all costs over the solution's lifecycle.
    • Example: Include costs for development, maintenance, scaling, and support.
  • Speed of Deployment: Assess how quickly you need the solution to be operational.
    • Example: Off-the-shelf solutions can be deployed quickly, often within days or weeks.
  • Development Timelines: Compare the rapid deployment of off-the-shelf solutions with the longer development time of custom solutions.
    • Example: Custom solutions might take months to develop and deploy.
  • In-House Expertise: Determine if your team has the necessary skills.
    • Example: Does your team have experience with AI model development and maintenance?
  • External Resources: Consider if you need to hire or train additional personnel for custom AI solutions.
    • Example: Hiring AI specialists or providing training can increase costs and time.
  • Data Sensitivity: Evaluate the sensitivity of your data and privacy requirements.
    • Example: Handling medical or financial data might require stricter privacy controls.
  • Compliance: Ensure that your chosen solution complies with relevant data privacy regulations.
    • Example: Custom solutions can be designed to meet specific regulatory requirements like GDPR or HIPAA.
  • Long-Term Maintenance: Consider the resources needed for maintaining the solution.
    • Example: Off-the-shelf solutions often include maintenance, while custom solutions require dedicated resources.
  • Scalability: Plan for future scalability requirements and how each solution handles scaling.
    • Example: Ensure the solution can handle increased data and user loads over time.
  • Weighing Tradeoffs: Consider the tradeoffs discussed (customization, cost, reliability, etc.).
    • Example: Balance the need for customization against the ease of use and lower costs of off-the-shelf solutions.
  • Use Cases and Real-World Examples: Refer to examples to understand how similar decisions have been implemented successfully.
    • Example: Review case studies of companies that have implemented both types of solutions.
Open in Eraser

Practical Implementation Guide

Implementing AI solutions involves understanding the infrastructure and integration processes required for both off-the-shelf and custom AI solutions. In this section, I will provide specific examples, including infrastructure diagrams, to illustrate how these implementations can be handled.

Off-the-Shelf AI API Example: Customer Service Chatbot


"RetailInsights," a retail analytics firm, wants to enhance their ability to generate detailed customer insights by leveraging AI. They decide to use the Claude API with a Retrieval-Augmented Generation (RAG) approach to ensure their analyses are both comprehensive and contextually relevant. This approach allows them to combine the power of a language model with specific examples retrieved from their extensive customer data stored in a vector database.

Steps for Implementation:

  1. Choose the API Provider: RetailInsights selects the Claude API for its advanced language capabilities.
  2. Sign Up and Obtain API Key: They register for the service and obtain the necessary API key.
  3. Set Up Development Environment: Install the Sonnet library and set up the development environment to securely use the API key.
  4. Generate Embeddings: Develop an API integration to send customer data to the Claude API for embedding generation.
  5. Store Embeddings: Store the generated embeddings in a vector database for efficient retrieval.
  6. Create Prompts with Examples: Retrieve relevant embeddings from the vector database and include them as examples in prompts sent to the Claude API.
  7. Integrate with Existing Systems: Embed the API calls and embedding retrieval logic into their analytics platform to provide seamless functionality.
  8. Deploy and Monitor: Deploy the enhanced insights generation system and continuously monitor its performance, making adjustments as necessary.

Infrastructure Diagram:

Open in Eraser

Custom AI Solution Example: Financial Fraud Detection


"FinSecure," a financial services company, needs a highly customized AI solution for fraud detection that aligns with their specific transaction patterns and security requirements. Due to the sensitivity of their data, they opt for a self-hosted LLM using Llama on AWS.

Steps for Implementation:

  1. Select Base Model: FinSecure chooses the Llama model as their base LLM. This model is selected for its adaptability and robust performance in handling complex data sets.
  2. Set Up Infrastructure: Deploy the necessary infrastructure on AWS, including EC2 instances for hosting the model and S3 for storing transaction data. This setup ensures scalability and robust performance.
  3. Preprocess Data: Collect and preprocess transaction data, including known fraudulent and legitimate transactions. This involves data cleaning, normalization, and feature extraction to ensure the model receives high-quality inputs.
  4. Fine-Tune Model: Fine-tune the Llama model using the processed data. This step involves training the model with transaction data to enhance its ability to detect patterns indicative of fraud.
  5. Develop API Endpoints: Create custom API endpoints to expose the model’s functionalities to FinSecure’s transaction processing systems. These endpoints allow seamless integration and real-time interaction with the model.
  6. Integrate with Existing Systems: Embed the API endpoints within FinSecure’s transaction processing systems. This integration ensures that all transactions are analyzed in real-time for potential fraud.
  7. Deploy and Monitor: Deploy the model and set up a monitoring system to continuously track its performance. Regular monitoring and adjustments are essential to maintain high detection accuracy and adapt to new fraud patterns. This includes setting up alerts and logs to track the model’s performance and retraining the model as new data becomes available.

Infrastructure Diagram:

Open in Eraser

These examples illustrate how to implement both off-the-shelf and custom AI solutions, highlighting the infrastructure and integration processes involved. The off-the-shelf example demonstrates quick deployment and ease of use, while the custom solution showcases the ability to tailor AI functionalities to meet specific business needs, despite higher complexity and resource requirements. By following these steps and understanding the infrastructure needs, organizations can effectively leverage AI to enhance their operations.

Best Practices and Recommendations

Hybrid Approaches

Example: Viable

Viable, a company focused on analyzing large-scale qualitative data, uses a hybrid approach by leveraging both off-the-shelf and custom AI solutions. Initially, they fine-tune OpenAI’s GPT-4 to analyze customer feedback more effectively, allowing businesses to improve their Net Promoter Score (NPS) and reduce support ticket volumes. As they expand, they incorporate a vector database to manage embeddings for quick retrieval of relevant information, further enhancing their ability to provide fast and accurate insights while maintaining cost efficiency.

Best Practice:

  • Combining Solutions: Start with off-the-shelf APIs for quick deployment and progressively integrate custom solutions to handle specialized tasks. This can include fine-tuning models for specific needs and using vector databases for efficient data retrieval.
  • Incremental Transition: Use off-the-shelf solutions initially and gradually transition to custom solutions as specific needs are identified.

When to Use Fine-Tuning vs. Vector Databases:

  • Fine-Tuning: Ideal for scenarios where specific domain knowledge is crucial and the model needs to be highly specialized. For example, fine-tuning is beneficial for customizing a language model to understand industry-specific jargon or detailed customer feedback analysis.
  • Vector Databases: Useful for managing large sets of embeddings and enabling quick retrieval of relevant data. This approach is beneficial when the speed of accessing information is critical, such as in real-time recommendation systems or customer support scenarios.

Continuous Evaluation

Example: OpenAI Content Moderation

OpenAI uses GPT-4 for content moderation, requiring regular updates and assessments to keep up with evolving content policies. By iteratively refining policies based on AI and human feedback, OpenAI ensures consistent and accurate content moderation.

Best Practice:

  • Regular Assessments: Implement frequent evaluations and updates of AI models to maintain relevance and accuracy.
  • Monitoring and Analytics: Use robust monitoring tools to track performance and make data-driven adjustments.

Staying Informed

Example: FinTech Innovators

FinTech Innovators, a financial services company, uses AI-driven fraud detection systems and stays updated with the latest AI advancements. They participate in industry conferences and professional networks to continuously improve their knowledge and practices.

Best Practice:

  • Ongoing Learning: Keep up with AI trends through industry conferences, publications, and professional networks.
  • Innovation Culture: Foster a culture of continuous learning and experimentation within your team.

Security and Compliance

Example: MedData Inc.

MedData Inc., a company handling sensitive patient data, regularly conducts security audits and updates protocols to ensure compliance with regulations like GDPR and HIPAA.

Best Practice:

  • Regular Audits: Conduct frequent security audits and update protocols to comply with data privacy regulations.
  • Data Encryption: Implement strong data encryption and access control measures to protect sensitive information.

Scalability Planning

Example: EduLearn

EduLearn, an online education platform, experienced rapid user growth and transitioned to a scalable cloud-based infrastructure to handle increased demand efficiently.

Best Practice:

  • Scalable Architecture: Plan for scalability from the outset using cloud infrastructure and optimized models.
  • Future-Proofing: Ensure your AI solution can handle increasing loads and data volumes.

Collaboration and Knowledge Sharing

Example: RetailMax

RetailMax developed AI tools for analyzing customer behavior by fostering collaboration between data scientists, engineers, and business analysts. This approach led to innovative and effective solutions.

Best Practice:

  • Cross-Functional Teams: Establish cross-functional teams to improve the quality and relevance of AI solutions.
  • Knowledge Sharing: Encourage regular knowledge-sharing sessions to disseminate best practices and foster continuous improvement.

Testing and Monitoring

Example: Salesforce Einstein

Salesforce Einstein emphasizes rigorous testing and continuous monitoring to ensure the accuracy and reliability of its AI models. This proactive approach helps them identify and address issues promptly, maintaining consistent performance.

Best Practice:

  • Rigorous Testing:
    • Unit Tests: Validate individual components to ensure they function correctly.
    • Integration Tests: Ensure different components work together as intended.
    • Real-World Scenario Tests: Simulate real-world conditions to verify practical performance.

Example: Salesforce Einstein uses unit tests to check each algorithm's accuracy and integration tests to ensure seamless data flow between their CRM and AI components.

  • Continuous Monitoring:
    • Performance Metrics: Monitor accuracy, response time, and resource usage in real-time.
    • Anomaly Detection: Use automated tools to spot unusual patterns.
    • User Feedback: Collect and analyze user feedback to identify improvement areas.

Example: Salesforce Einstein employs continuous monitoring to track AI performance and uses anomaly detection tools to catch and rectify deviations quickly.

  • Feedback Loops:
    • Data Collection: Gather data from user interactions and monitoring tools.
    • Model Retraining: Regularly retrain models with new data.
    • Regular Updates: Schedule periodic updates based on performance data.

Example: Salesforce Einstein continuously collects user interaction data and retrains its models weekly to adapt to new patterns and improve accuracy.

Challenges and Solutions:

  • Testing Complexity:
    • Example: Salesforce Einstein uses robust test cases and synthetic data to cover a wide range of scenarios, ensuring comprehensive validation.
  • Monitoring and Validation:
    • Example: Automated monitoring systems at Salesforce Einstein alert the team to performance issues in real-time, allowing for quick fixes and maintaining model reliability.


In my experience, choosing between off-the-shelf and custom AI solutions requires a deep understanding of your specific business needs and technical capabilities. As mentioned above, it's important to weigh the tradeoffs in customization, cost, scalability, data privacy, and time to market carefully.

There is no one-size-fits-all solution; you might find multiple options that seem viable. Align your choice with your business goals and the expertise of your team. A balanced approach, considering both immediate needs and long-term vision, will help you effectively integrate AI technologies into your applications. This strategy ensures that your solution not only meets your current requirements but also remains adaptable and future-proof.