TensorFlow Extended (TFX)
Developing and maintaining an integrated platform for reliably producing and deploying any machine learning models, requires the subsequent orchestration of many components—a learner unit for generating models based on train dataset, modules for validating both data as well as models, and finally an infrastructure for serving models in production phase. This becomes challenging when the data has velocity and veracity over time and fresh models need to be developed continuously. Unfortunately, orchestration of this kind is often done in legacy mode using glue code and snippets developed by individual teams for business specific use cases, leading to duplicated effort, less scalable and fragile systems with the high technical cost of maintenance.
TensorFlow Extended (TFX) is a Google-production-scale ML platform based on TensorFlow backend. It provides a shared configuration framework of ML pipelines consisting of TFX components. TFX pipelines can be orchestrated using supporting technologies like Apache Airflow and Kubeflow Pipelines. Both components as well as the orchestrator integrations can be extended. TFX components interact with a ML Metadata (Data Lakes) backing that keeps an index of a number of components runs, input and output artifacts, and runtime configurations.
Thus, a TFX pipeline is a sequence of components that integrate to implement an ML pipeline, which is specifically designed for scalable, high-performance ML tasks. The major tasks include ML modelling, training using train dataset, serving inference using validations, and managing the deployments to online (generally on Cloud deployments) and native mobile. By integrating these components into a single platform, it is possible to standardize and simplify the contents and platform configuration and reduce the TCO and time taken to produce an ML pipeline from the order of months to weeks, while providing platform stability and quality that minimizes disruptions. Below is an overview of those underlying components:
The ML pipeline components are built using TFX libraries which can also be used individually and interchangeably. A TFX pipeline includes the following components:
- ExampleGen is the initial input component of a pipeline that ingests data and optionally splits the input dataset into training and validation sets.
- StatisticsGen calculates statistics for the dataset like normal distribution and variance.
- SchemaGen examines the statistics (generally takes into consideration the distribution curve) and creates a data schema.
- ExampleValidator looks for anomalies (data type mismatch, etc.) and missing values in the dataset (may or may not reproduce hog data).
- Transform performs feature extraction engineering on the dataset.
- Trainer trains the model using training dataset.
- Evaluator performs deep analysis of the training results taking in validation data.
- ModelValidator validates exported models, ensuring that they are “good enough” to be pushed to production phase.
- Pusher deploys the model on a serving infrastructure (which may be a mainframe or Cloud Service).
TFX includes both libraries and ML pipeline components. TFX provides a plethora of Python packages that are the libraries which are used to create ML pipeline components. TFX libraries are illustrated below in the diagram:
- TensorFlow Data Validation (TFDV) helps to understand, validate, and monitor ML data at large scale. TFDV is used to summarize and validate petabytes of data every day, and has a track record in helping TFX users maintain the health of their ML pipelines.
- TensorFlow Transform (TFTM) helps in Data pre-processing that requires a lot of effort when applying ML to real world datasets. This includes converting formats, tokenizing and stemming textual interfaces and forming word vocabularies, and performing a variety of numerical operations such as normalization.
- TensorFlow Model Analysis (TFMA) helps to compute and visualize evaluation metrics for the ML models. Before deploying an ML model, evaluating model performance is a must to ensure that it meets quality thresholds and behaves as expected for all relevant folds of data.
- Machine Learning (ML) Serving systems support model versioning (for models continually updating and have a rollback option) and multiple models (for experimentation via A/B testing), while ensuring that models achieve high throughput on hardware accelerators (GPUs and TPUs) with low jitters and latency.
The diagram below illustrates the relationships between TFX libraries and pipeline components:
Case Study for Google Play Songs:
Google Play Songs, a commercial mobile app for song play, search, buffer and save, is one of the deployments of the recommender system using TFX. The Google Play Songs recommender system recommends relevant Songs to the Play Songs app users when they visit the app homepage relevant to the user choice. The input to the ML system is a “query” that includes the raw data about user song context. The recommender system returns a similar list of songs or a customized playlist of songs, which the user can either save for later or listen. Since the corpus contains over a million songs, it is impossible to score every song for every such query. Hence, after retrieval (return a short list of songs based on a various query made in the past), the ranking system uses the ML model to calculate a score per song and present a ranked song list to the user. The ML model that ranks the songs is trained continuously as fresh training dataset arrives (usually in dumps). The typical training dataset size is hundreds of billions of examples where each example has query features (e.g., the user’s context) as well as hyperparameters (e.g., user ratings and singer or movie of song being ranked). After validation (e.g., comparing pitch and quality metrics with models serving live streaming traffic), the trained models are deployed through TensorFlow Serving in data centres and collectively serve thousands of queries per second with a strict latency requirement of tens of milliseconds. Due to fresh models being trained daily, the servers reload multiple models (both the production models, as well as other experimental models) per day. The figure below illustrates how the Training and Serving features predict a song from a Query (raw data) to a shortlist (processed data) to songs with ranks (prediction).
TFX is designed to be portable into multiple environments, hardware accelerators and orchestration frameworks. It is also portable to different computing platforms, including bare-metal, Amazon Web Services (AWS) and the Google Cloud Platform (GCP). It has a simple, consistent set of usage patterns on Local or cloud, single or distributed execution, in-memory data flow or big data shards across HDFS and follows WORA. It comes with a toolbox with Useful Abstractions and the right entry point for the task at hand, starting with commercial-off-the-shelf (COTS) algorithms that lets focus on feature extraction engineering and hyperparameters fine tuning. Using YAML, JSON, and simplified Python interfaces it minimizes the amount of boilerplate code.
ML in general has High Technical Debt. The conceptual workflow of applying an ML model is simple but actual workflow becomes more complex. TFX provides certain benefits over the traditional ML stereotypes as: 1. Building a single ML platform for many different learning tasks 2. Continuous training and serving 3. Human-in-the-loop 4. Production-level reliability and scalability.
One of the core philosophies of TFX is to streamline and automate the process of training and moving quality models to production which can support all training use cases. TFX frameworks help in managing Simplicity and Flexibility in High-Level ML Frameworks.