This post is a report on the Google Summer of Code 2019 TensorFlow.js project "Reasonable Effectiveness of Mobile Inference: Adaptive Growth of the TensorFlow.js Model Garden", in collaboration with Manraj Singh.

Since 2012, the amount of compute in the largest machine learning training runs has been doubling every three and a half months: the deployment of the state-of-the-art developments is expensive. With the slow, hard to scale inference, compression techniques like knowledge distillation become key in realizing the full potential of these models on low-power platforms.

Knowledge distillation helps reduce the lag and size of applications from lane detection and steering to semantic segmentation and pose estimation by teaching the lighter model mimic the output and internal data representations of the parent model.

TensorFlow.js makes using machine learning in the browser and Node.js easy, providing the tools to convert and optimize existing models in Python or to develop new ones in JavaScript. For fast prototyping and fine-tuning, TensorFlow.js maintains the model garden with pre-trained models. This project adds the implementations and demos of three architectures for semantic segmentation, image classification and text detection.

Plans

The original goal was to add five new low-power, high-accuracy applications controlled via an interactive dashboard, providing a proof of concept for the mobile-first paradigm of machine learning.

After the sync with the core TensorFlow.js team, the project scope then was narrowed down to align with the promise of tfjs-models:

The tfjs-examples module is similar in form to tfjs-models: both show the capabilities of TensorFlow.js with demos and implement overlapping models. Nevertheless, they are different in spirit: while the tfjs-models repo serves as a convenient toolbox providing off-the-shelf building blocks, the tfjs-examples repo is a starter kit for personal projects.

The model garden cannot capture all of the use cases, striving instead to make fine-tuning as easy as possible. Since some models, like MobileNet, are more suitable for re-training than others, like Pix2Pix, the ones with the larger minimum viable audience are given priority.

Models

DeepLab: making sense of the infinite content

Pull Request | Demo | Usage | Conversion Script

DeepLab assigns a semantic label (a human, road, Harley-Davidson, and so on) to each pixel of the input image. Three types of pre-trained models are available:

Despite the fact that CityScapes recognizes the least number of objects, it is the slowest and most compute-intensive model of all three variants. This is a known bug which is yet to be resolved.

Applications

- medical: identification of healthy and cancerous cells from CT scans
- geographical: early detection of forest fires
- artistic: low-cost CGI overlays

EfficientNet: porting a heavyweight sibling of MobileNetV2

Pull Request | Demo | Usage | Conversion Script

EfficientNet classifies images into 1000 ImageNet classes, building on the success of MobileNet.

Despite the greater accuracy than other alternatives focusing on mobile platforms and a lot of excitement around its release, the model did not pass the quality assurance tests of tfjs-models: even B0, the lightest variant of EfficientNet, is slower in-browser than MobileNet.

When the full WebGPU API support comes to TensorFlow.js, a factor of magnitude improvements in performance might allow offering heavier, more accurate models as a viable alternative to the existing solutions.

Converting EfficientNet from the pre-trained checkpoint revealed a bug in tfjs-converter, brought by breaking changes associated with the upcoming TF 2.0 release. This might have been resolved by the recent updates, but further testing is required to identify the source of the issue.

Applications

- agriculture: automated sorting of the produce into grades
- retail: visual search of similar products
- marketing: adaptive ads reacting to customers wearing specific brands

PSENet: detecting text with arbitrary shapes

Pull Request | Demo | Usage | Conversion Script (tf.keras) | Conversion Script (tf.slim)

PSENet detects text by first feeding the image through feature pyramid network extracting features from the image classifier lacking the top dense and activation layers, applying the progressive scale expansion algorithm to extract pixels that most likely correspond to text regions, separating them into distinct components, and then reducing the components to bounding boxes.

Two non-trivial post-processing methods are available out of the box:

The demo also supports drawing contours via the OpenCV.js library, reduced in size from 9 to 2 MB using a custom build.

The progressive scale expansion algorithm is written in pure TypeScript to take advantage of the JS engine optimizations and avoid the overhead associated with the TensorFlow.js implementation details.

The model is available in two variants:

Inputs

Sample input Sample label

The culprit was pinned down to the tf.estimator-based setup and resolved for the final, 186th AI Platform job by re-writing the pipeline using only the machinery of tf.data and tf.keras.

Outputs

tf.keras.Model converted to tf.estimator tf.keras.Model in the tf.estimator model function tf.keras implementation
poor quality segmentation result poor quality segmentation result satisfactory quality segmentation result

Despite the weight improvements and good validation performance (0.98 accuracy, 0.99 precision, 0.99 F1-score on 2000 images), several pipeline design decisions prevented the new model from beating the results of the tf.slim version:

Some examples behave well...

And some miss important features...

While others show the imperfections of training beyond any doubt.

Despite these hurdles, PSENet offers a promising approach for in-browser text detection, and the TensorFlow.js port will be finalized when the weights are updated.

Applications
- privacy: masking of sensitive text information from photos
- education: extraction of text written on whiteboards
- knowledge management: detecting key parts of architectural drawings for automated annotation

Avoiding the pitfalls

Adopting the following heuristics would have helped to avoid a lot of the timesinks in the development process:

Andrej Karpathy gives this and other advice in his recipe on training neural networks.

Future Work

Other ideas for growing the model garden are collected in the scratchpad.

Links

Porting the models called for contributions to the TensorFlow.js ecosystem and beyond.

The list below highlights all of them.

Acknowledgements

I am grateful to the following amazing people for the gift of a valuable learning experience that GSoC has become, helping me grow into a pro and making the world a better place: