methodology
SaaS
cloud

You want to create a SaaS? here is important key factors - part2

Summary. Explore vital factors for building robust SaaS apps. Learn about stateless processes, port binding, concurrency, disposability, dev/prod parity, logs, and admin tasks. These practices ensure scalability, fast startup/shutdown, consistent environments, ... more

Alireza Tanoomandian

2024-06-14

saas-main-cover

As discussed in a previous post, we talk about "What is SaaS?" and its pros & cons. After all, we continued by talking about factors and practices that a SaaS application should have and introduced The Twelve Factors app and 15 factors app along together.

Here we continue to explain remained factors:

6. Processes

Execute the app as one or more stateless processes

In the big picture, every application executes in one or more processes in the target execution environment, where it is running on your local system (e.g. your computer or laptop) or the production environment with more resources. This factor emphasizes that each process should be stateless; it means that any data that the process needs should be stored in the backing services, such as a database or cache service.

By designing the processes in this way, we can keep the whole platform state more robust, we also make these required shared data more available and prevent their loss in case of code deployment, config change, or system scaling for system high-load peak during specific times.

Although we can use an in-memory cache beside the running process to decrease the response time, for example loading an asset (without a backing service), it's not recommended.

7. Port binding

Export services via port binding

Some used technologies for creating a web app or services that do not expose themselves, but they use a container such as Apache HTTPD or Nginx or for java apps run inside Tomcat. This factor said that despite using this kind of technology or not, to export a service we have to use port binding, whether you use HTTP protocol or others.

It's important to note that by using port binding approach we are capable of using our service as a backing service for other services by providing a URL in the consumer app configuration.

8. Concurrency

Scale out via the process model

Before talking about concurrency, we have to know precisely about the process model. By the process model, or better to say unix process model, we tend to say that each process should have the following properties in the case of web or SaaS app:

  • Each process should have a single entry point, meaning that there's only one command to invoke the process
  • Processes should be scalable as the load increased
  • Each process fault shouldn't affect other processes
  • Long-running or resource-intensive tasks should be handled in a separate process to improve the performance and responsiveness of the main application
  • Some dedicated tasks like data processing, queuing, or communication with external services preferred to operate on different processes

This model's importance appears when we want to scale out the system; This means that adding more concurrency is a simple and reliable operation.

Keep in mind that these kinds of apps should not daemonize or write PID files to manage their lifetime. Instead, the system's process manager (such as systemd, the available cloud process manager) manages the output stream (like logs), responds to crashed processes, and handles controlled restarts and shutdowns.

9. Disposability

Maximize robustness with fast startup and graceful shutdown

If you can start or stop the app at any time with near to no pain, you did this factor very well. Plus robustness, it helps you to make code or config changes so fast in different environments.

Ideally, a process should run so fast, and the time from the running moment of the command or starting the process, until everything is ready to use, should be too short, only a few seconds. It brings us to a more agile situation that helps to release or scale up easily.

Shutting down a process should be considered as it might interrupt an ongoing task or request and cause data corruption or change the data to an invalid state. It's common to listen to the termination signals like the SIGTERM signal (or SIGINT in development). After receiving this signal in sync processes should stop accepting new requests and wait until the ongoing tasks reach a termination state (most often the task ends), then exiting.

For tasks that may not be completed promptly, such as long polling, the consumer should attempt to reconnect to ensure continued operation.

In async and worker processes, graceful shutdown is achieved by returning the current task to the work queue. For example, in message brokers like RabbitMQ the worker should send NACK which returns the message to the broker to handle by another process.

Most problems that might arise by incorrect shutdown are listed as follows:

  • Database deadlocks
  • Data corruption
  • Resource leaks
  • Application crashes
  • System instability
  • Security vulnerabilities
  • Delayed startup
  • Increased maintenance overhead

10. Dev/prod parity

Keep development, staging, and production as similar as possible

before-devops-after-devops Source: turnoff.us

Software running typically occurs in various environments, including local developer machines for code writing and modification, staging or UAT environments for testing and validation, and finally production servers for deployment and live operation. Despite the common properties and requirements of these environments, some differences exist. For example, in local development, you are running the same codebase as one that will run in other environments, but for the required fast pace of developing and testing, you might need different setup on backing services, config, etc. or run the code on other OS (e.g. you are developing on macOS but the production running on Ubuntu server).

On the other hand, sadly, the developer might only change the code or config but others, like operation engineers, are responsible for deploying the changes.

This factor said that the difference between the running environments should be as much as possible; For example, using the same stack on environment. It means that don't use SQLite for the local development and PostgreSQL on production or don't use local memory caching in local and use Memcached on other environments. Nowadays, by using containerization technologies, such as Docker or Vagrant, it is easily achievable to keep the used backing services technology the same. The developer should be involved in the deployment to see the app's behavior and if some actions are required, do them right in time.

Another important note is to keep the technologies' versions in different environments the same. Most of the used languages and technologies are always in progress to bring new features, fix issues, and enhance their capabilities. Although it's a practice to make their changes backward compatible, it's reasonable not to accept the risk of outer dependencies changes on the fly and instead specify what versions we are using.

11. Logs

Treat logs as event streams

Software without logs is like a black box system. It's impossible to trace, monitor, or debug activities without logs that are typically separated into three categories:

  1. Developer logs
  2. Business event logs
  3. Audit logs

Each type has its properties (e.g. storing strategy, privacy & sensitivity, and the way of using), but in the first step, we should gather the logs from the app. This factor said, "Logs are the stream of aggregated, time-ordered events collected from the output streams of all running processes and backing services". In some software, logs are stored in files that rotate at a specific time, and in others, logs are unbuffered and written to stdout.

It is crucial to emphasize that the application is not responsible for managing or directing its output stream. This responsibility lies with the execution environment. In local development, developers will view this stream in the foreground, at staging and production environments, these process logs will be captured by the execution environment, grouped with other processes' logs, and routed to one or more final destinations for viewing and long-term archival. For example, if you are using docker to manage your environment you can use open-source log routers like fluentd, or if you plan to store your data in ElasticSearch, you can use FileBeat to gather all logs and ship them to ElasticSearch for further processes.

Keep in mind that well-structured and informative logs are essential for effectively responding to potential issues and identifying opportunities for improvement.

12. Admin processes

Run admin/management tasks as one-off processes

Every app has processes for regular business-related tasks, such as handling web requests. There might exist some that will help developers to do one-off administrative or maintenance tasks for the app, like:

  • running database migration
  • running specific jobs
  • running a console (a.k.a. REPL) to run arbitrary code, inspect the app's models against the live database, view current configuration, etc.

These processes should be run in a similar environment as the regular long-running process that ships with the same codebase and config on the same release version. In local development, developers run the one-off admin processes inside the shell command at the codebase directory, and on production, they will use SSH or other remote command execution mechanisms to access the execution environment for running the process.

That's all twelve factors

We discussed the 12 Factors app in the current and previous posts and explained them precisely to make you familiar with the facts and practices that make software development core characteristics of the software as a service. But there are three more factors that we will explore together in the future. I hope you find this content helpful and use the practices in your future projects!

Alireza Tanoomandian

2024-06-14