Core ml 3

Core ml 3 DEFAULT

Core ML

Core ML is optimized for on-device performance of a broad variety of model types by leveraging Apple hardware and minimizing memory footprint and power consumption.

Experience more with Core ML

Run models fully on-device

Core ML models run strictly on the user’s device and remove any need for a network connection, keeping your app responsive and your users’ data private.

Run advanced neural networks

Core ML supports the latest models, such as cutting-edge neural networks designed to understand images, video, sound, and other rich media.

Deploy models

With Core ML Model Deployment, you can easily distribute models to your app using CloudKit.

Convert models to Core ML

Models from libraries like TensorFlow or PyTorch can be converted to Core ML using Core ML Converters more easily than ever before.

Personalize models on-device

Models bundled in apps can be updated with user data on-device, helping models stay relevant to user behavior without compromising privacy.

Encrypt models

Xcode supports model encryption enabling additional security for your machine learning models.

Powerful Apple Silicon

Core ML is designed to seamlessly take advantage of powerful hardware technology including CPU, GPU, and Neural Engine, in the most efficient way in order to maximize performance while minimizing memory and power consumption.

Create ML

Build and train Core ML models right on your Mac with no code.

Learn more

Core ML Converters

Convert models from third-party training libraries into Core ML using the coremltools Python package.

Learn more


Get started with models from the research community that have been converted to Core ML.

Browse models


Core ML and Vision Tutorial: On-device training on iOS

Update note: Christine Abernathy updated this tutorial for Xcode 11, Swift 5 and iOS 13. Audrey Tam wrote the original.

Apple released Core ML and Vision in iOS 11. Core ML gives developers a way to bring machine learning models into their apps. This makes it possible to build intelligent features on-device like object detection.

iOS 13 added on-device training in Core ML 3 and unlocked new ways to personalize the user experience.

In this tutorial, you’ll learn how to fine-tune a model on the device using Core ML and Vision Framework. To learn this, you’ll start with Vibes, an app that generates quotes based on the given image. It also allows you to add your favorite emojis using shortcuts after training the model.

Getting Started

To get started, click the Download Materials button at the top or bottom of this tutorial. Inside the zip file, you’ll find two folders: Starter and final. Now double-click Vibes.xcodeproj in the starter project to open it in Xcode.

Build and run the project. You’ll see this:

Vibes starter screen

Tap the camera icon and select a photo from the library to view a quote. Next, tap the sticker icon and select a sticker to add to the image. Move the sticker around to any desired location:

Vibes starter app flow: Blank screen, waterfall image with quote, selection of emojis, waterfall with quote and emoji

There are two things you can improve:

Little bug guy saying Do tell

  1. The quote is randomly selected. How about displaying a quote that’s related to the selected image?
  2. Adding stickers takes too many steps. What if you could create shortcuts for stickers you use the most?

Your goal in this tutorial is to use machine learning to tackle these two challenges.

What is Machine Learning?

If you’re new to machine learning, it’s time to demystify some common terms.

Artificial Intelligence, or AI, is the power added to a machine programmatically to mimic human actions and thoughts.

Machine Learning, or ML, is a subset of AI that trains machines to perform certain tasks. For example, you can use ML to train a machine to recognize a cat in an image or translate text from one language to another.

Deep Learning is one method of training a machine. This technique mimics the human brain, which consists of neurons organized in a network. Deep Learning trains an artificial neural network from the data provided.

Phrases going into a microphone, an arrow pointing to a web of dots and lines

Say you want the machine to recognize a cat in an image. You can feed the machine lots of images that are manually labeled cat and not cat. You then build a model that can make accurate guesses or predictions.

Training With Models

Apple defines a model as “the result of applying a machine-learning algorithm to a set of training data”. Think of a model as a function that takes an input, performs a particular operation to its best on the given input, such as learning and then predicting and classifying, and produces the suitable output.

A cat, arrow to web of dots and lines, arrow to the word cat

Training with labeled data is called supervised learning. You want lots of good data to build a good model. What does good mean? It means the data represents the use cases for which you’re building.

If you want your model to recognize all cats but only feed it a specific breed, it may miss others. Training with biased data can lead to undesired outcomes.

Yoda head, arrow to confused little guy, arrow to cat??

Training is compute-intensive and often done on servers. With their parallel computing capabilities, GPUs typically speed things up.

Once training is complete, you can deploy your model to production to run predictions or inferences on real-world data.

Machine learning workflow: cylinder stack, arrow to web of dots and lines, arrow to rectangle labeled ML

Inference isn’t as computationally demanding as training. However, in the past mobile apps had to make remote calls to a server for model inference.

Advances to mobile chip performance have opened the door to on-device inference. The benefits include reduced latency, less network dependency and improved privacy. But you get increases in app size and battery drain due to computational load.

This tutorial showcases Core ML for on-device inference and on-device training.

Apple’s Frameworks and Tools for Machine Learning

Core ML works with domain-specific frameworks such as Vision for image analysis. Vision provides high-level APIs to run computer vision algorithms on images and videos. Vision can classify images using a built-in model that Apple provides or custom Core ML models that you provide.

Core ML is built on top of lower-level primitives: Accelerate with BNNS and Metal Performance Shaders:

iOS machine learning architecture

Other domain-specific frameworks that Core ML works with include Natural Language for processing text and Sound Analysis for identifying sounds in audio.

Integrating a Core ML Model Into Your App

To integrate with Core ML, you need a model in the Core ML Model format. Apple provides pre-trained models you can use for tasks like image classification. If those don’t work for you, you can look for models created by the community or create your own.

For your first enhancement to Vibes, you need a model that does image classification. Models are available with varying degrees of accuracy and model size. You’ll use SqueezeNet, a small model trained to recognize common objects.

Drag SqueezeNet.mlmodel from the starter Models directory into your Xcode project’s Models folder:

User adding a model in Xcode

Select SqueezeNet.mlmodel and review the model details in Project navigator:

Model and it's contents in Xcode

The Prediction section lists the expected inputs and outputs:

  • The input expects an image of size 227×227.
  • There are two output types: returns a dictionary with the probabilities for the categories. returns the category with the highest probability.

Click the arrow next to the model:

Under Model Class, SqueezeNet with arrow in red box

Xcode auto-generates a file for the model that includes classes for the input, output and main class. The main class includes various methods for making predictions.

The standard workflow of Vision framework is:

  1. First, create Core ML model.
  2. Then, create one or more requests.
  3. Finally, create and run a request handler.

You’ve already created your model, SqueezeNet.mlmodel. Next, you’ll create a request.

Creating a Request

Go to CreateQuoteViewController.swift and add the following after the import:

import CoreML import Vision

Vision helps work with images, such as converting them to the desired format.

Add the following property:

// 1 private lazy var classificationRequest: VNCoreMLRequest = { do { // 2 let model = try VNCoreMLModel(for: SqueezeNet().model) // 3 let request = VNCoreMLRequest(model: model) { request, _ in if let classifications = request.results as? [VNClassificationObservation] { print("Classification results: \(classifications)") } } // 4 request.imageCropAndScaleOption = .centerCrop return request } catch { // 5 fatalError("Failed to load Vision ML model: \(error)") } }()

Here’s a breakdown of what’s going on:

  1. Define an image analysis request that’s created when first accessed.
  2. Create an instance of the model.
  3. Instantiate an image analysis request object based on the model. The completion handler receives the classification results and prints them.
  4. Use Vision to crop the input image to match what the model expects.
  5. Handle model load errors by killing the app. The model is part of the app bundle so this should never happen.

Integrating the Request

Add the following at the end of the private extension:

func classifyImage(_ image: UIImage) { // 1 guard let orientation = CGImagePropertyOrientation( rawValue: UInt32(image.imageOrientation.rawValue)) else { return } guard let ciImage = CIImage(image: image) else { fatalError("Unable to create \(CIImage.self) from \(image).") } // 2 .userInitiated).async { let handler = VNImageRequestHandler(ciImage: ciImage, orientation: orientation) do { try handler.perform([self.classificationRequest]) } catch { print("Failed to perform classification.\n\(error.localizedDescription)") } } }

Here’s what this classification request method does:

  1. Gets the orientation of the image and the representation.
  2. Kicks off an asynchronous classification request in a background queue. You create a handler to perform the Vision request, and then schedule the request.

Finally, add the following at the end of :


This triggers the classification request when the user selects an image.

Build and run the app. Tap the camera icon and select a photo. Nothing changes visually:

Waterfall with random quote at the bottom

However, the console should list the raw classification results:

Raw classification results

In this example, the classifier has a 27.9% confidence that this image is a cliff, drop, drop-off. Find and replace the statement with the code below to log the results:

let topClassifications = classifications.prefix(2).map { (confidence: $0.confidence, identifier: $0.identifier) } print("Top classifications: \(topClassifications)")

Build and run the app and go through the steps to select a photo. The console should log the top results:

Pretty classification results

You can now use the extracted prediction details to show a quote related to the image.

Adding a Related Quote

In , remove the following:

if let quote = getQuote() { quoteTextView.text = quote.text }

This displays a random quote and is no longer needed.

Next, you’ll add logic to get a quote using the results of . Add the following to the extension:

func processClassifications(for request: VNRequest, error: Error?) { DispatchQueue.main.async { // 1 if let classifications = request.results as? [VNClassificationObservation] { // 2 let topClassifications = classifications.prefix(2).map { (confidence: $0.confidence, identifier: $0.identifier) } print("Top classifications: \(topClassifications)") let topIdentifiers = {$0.identifier.lowercased() } // 3 if let quote = self.getQuote(for: topIdentifiers) { self.quoteTextView.text = quote.text } } } }

Here what’s going on in the code above:

  1. This method processes the results from an image classification request.
  2. The method extracts the top two predictions using code you’ve seen before.
  3. The predictions feed into to get a matching quote.

The method runs on the main queue to ensure that the quote display update happens on the UI thread.

Finally, call this method from and change to the following:

let request = VNCoreMLRequest(model: model) { [weak self] request, error in guard let self = self else { return } self.processClassifications(for: request, error: error) }

Here, your completion handler calls your new method to process the results.

Build and run the app. Select a photo with a lemon or lemon tree in it. If necessary, download one from the browser. You should see the lemon quote selected instead of a random quote:

Close up of tree with related quote

Verify that the console logs a matching classification:

Classification results that match the image above

Test the flow a few times to verify the consistency of the results.

Great stuff! You’ve learned how to use Core ML for on-device model inference. :]

Happy, cheering iPhone with medals

Personalizing a Model on the Device

With Core ML 3, you can fine-tune an updatable model on the device during runtime. This means you can personalize the experience for each user.

On-device personalization is the idea behind Face ID. Apple can ship a model down to the device that recognizes generic faces. During Face ID set up, each user can fine-tune the model to recognize their face.

It doesn’t make sense to ship this updated model back up to Apple for deployment to other users. This underscores the advantage of the privacy that on-device personalization brings.

An updatable model is a Core ML model that’s marked as updatable. You also define the training inputs that you’ll use to update the model.

k-Nearest Neighbors

You’ll enhance Vibes using an updatable drawing classifier model. The classifier recognizes new drawings based on k-Nearest Neighbors, or k-NN. K-what?

The k-NN algorithm assumes that similar things are close to each other.

Bunch of sunglass smiley faces and one awkward rectangle

It does this by comparing feature vectors. A feature vector contains important information that describes an object’s characteristics. An example feature vector is RGB color represented by R, G, B.

Comparing the distance between feature vectors is a simple way to see if two objects are similar. k-NN categorizes an input by using its k nearest neighbors.

The example below shows a spread of drawings classified as squares and circles. Let’s say you want to find out what group the new mystery drawing in red belongs to:

k-NN plot with groups of blue rectangles and green circles

Choosing k = 3 predicts that this new drawing is a square:

k-NN classification graph with three blue rectangles circled

k-NN models are simple and fast. You don’t need many examples to train them. Performance can slow down though, if there’s lots of example data.

k-NN is one of the model types that Core ML supports for training. Vibes uses an updatable drawing classifier with:

  1. A neural network that acts as a feature extractor. The neural network knows how to recognize drawings. You need to extract the features for the k-NN model.
  2. A k-NN model for on-device drawing personalization.

Half circle, arrow to Static web of dots, arrow to unadaptable classifier, arrow to laughing emoji

In Vibes, the user can add a shortcut by selecting an emoji then drawing three examples. You’ll train the model with the emoji as the label and the drawings as the training examples.

Setting Up Training Drawing Flow

First, prepare the screen to accept user input to train your model by:

  1. Adding a screen to be shown on selecting an emoji.
  2. Add action on tapping save.
  3. Removing from .

Open AddStickerViewController.swift and in replace the call with the following:

performSegue(withIdentifier: "AddShortcutSegue", sender: self)

This transitions to the examples drawing view when the user selects an emoji.

Next, open AddShortcutViewController.swift and add the following code to implement :

print("Training data ready for label: \(selectedEmoji ?? "")") performSegue( withIdentifier: "AddShortcutUnwindSegue", sender: self)

This unwinds the segue to go back to the main screen when the user taps Save.

Finally, open CreateQuoteViewController.swift and in remove the following code:

stickerLabel.isUserInteractionEnabled = true let panGestureRecognizer = UIPanGestureRecognizer( target: self, action: #selector(handlePanGesture(_:))) stickerLabel.addGestureRecognizer(panGestureRecognizer)

This removes the code that allows the user to move stickers around. This was only useful when the user couldn’t control the sticker location.

Build and run the app then select a photo. Tap the sticker icon and select an emoji. You’ll see your selected emoji as well as three drawing canvases:

Screen with Add a Shortcut header, laughing emoji and three blank rectangles

Now, draw three similar images. Verify that Save is enabled when you complete the third drawing:

Same screen as before but with hand-drawn half circles each rectangle

Then, tap Save and verify that the selected emoji is logged in the console:

Console logs selected emoji

You can now turn your attention to the flow that triggers the shortcut.

Adding the Shortcut Drawing View

It’s time to prepare the drawing view on image by following these steps:

  1. First, declare a DrawingView.
  2. Next, add the drawing view in the main view.
  3. Then, call the from .
  4. Finally, clear the canvas on selecting an image.

Open CreateQuoteViewController.swift and add the following property after the declarations:

var drawingView: DrawingView!

This contains the view where the user draws the shortcut.

Next, add the following code to implement :

drawingView = DrawingView(frame: stickerView.bounds) view.addSubview(drawingView) drawingView.translatesAutoresizingMaskIntoConstraints = false NSLayoutConstraint.activate([ drawingView.topAnchor.constraint(equalTo: stickerView.topAnchor), drawingView.leftAnchor.constraint(equalTo: stickerView.leftAnchor), drawingView.rightAnchor.constraint(equalTo: stickerView.rightAnchor), drawingView.bottomAnchor.constraint(equalTo: stickerView.bottomAnchor) ])

Here you create an instance of the drawing view and add it to the main view. You set Auto Layout constraints so that it overlaps only the sticker view.

Then, add the following to the end of :

addCanvasForDrawing() drawingView.isHidden = true

Here you add the drawing view and make sure it’s initially hidden.

Now, in add the following right after is enabled:

drawingView.clearCanvas() drawingView.isHidden = false

Here you clear any previous drawings and unhide the drawing view so the user can add stickers.

Build and run the app and select a photo. Use your mouse, or finger, to verify that you can draw on the selected image:

Waterfall with quote and drawings added

Progress has been made. Onwards!

Making Model Predictions

Drag UpdatableDrawingClassifier.mlmodel from the starter’s Models directory into your Xcode project’s Models folder:

Add updatable model in Xcode

Now, select UpdatableDrawingClassifier.mlmodel in Project navigator. The Update section lists the two inputs the model expects during training. One represents the drawing and the other the emoji label:

Updatable model details in Xcode

The Prediction section lists the input and outputs. The input format matches that used during training. The output represents the predicted emoji label.

Select the Model folder in Xcode’s Project navigator. Then, go to File ▸ New ▸ File…, choose the iOS ▸ Source ▸ Swift File template, and click Next. Name the file UpdatableModel.swift and click Create.

Now, replace the import with the following:

import CoreML

This brings in the machine learning framework.

Now add the following extension to the end of the file:

extension UpdatableDrawingClassifier { var imageConstraint: MLImageConstraint { return model.modelDescription .inputDescriptionsByName["drawing"]! .imageConstraint! } func predictLabelFor(_ value: MLFeatureValue) -> String? { guard let pixelBuffer = value.imageBufferValue, let prediction = try? prediction(drawing: pixelBuffer).label else { return nil } if prediction == "unknown" { print("No prediction found") return nil } return prediction } }

This extends which is the generated model class. Your code adds the following:

  1. to make sure the image matches what the model expects.
  2. to call the model’s prediction method with the representation of the drawing. It returns the predicted label or if there’s no prediction.

Updating the Model

Add the following after the import statement:

struct UpdatableModel { private static var updatedDrawingClassifier: UpdatableDrawingClassifier? private static let appDirectory = FileManager.default.urls( for: .applicationSupportDirectory, in: .userDomainMask).first! private static let defaultModelURL = UpdatableDrawingClassifier.urlOfModelInThisBundle private static var updatedModelURL = appDirectory.appendingPathComponent("personalized.mlmodelc") private static var tempUpdatedModelURL = appDirectory.appendingPathComponent("personalized_tmp.mlmodelc") private init() { } static var imageConstraint: MLImageConstraint { let model = updatedDrawingClassifier ?? UpdatableDrawingClassifier() return model.imageConstraint } }

The struct represents your updatable model. The definition here sets up properties for the model. These include locations to the original compiled model and the saved model.

Note: Core ML uses a compiled model file with an .mlmodelc extension which is actually a folder.

Loading the Model Into Memory

Now, add the following private extension after the struct definition:

private extension UpdatableModel { static func loadModel() { let fileManager = FileManager.default if !fileManager.fileExists(atPath: updatedModelURL.path) { do { let updatedModelParentURL = updatedModelURL.deletingLastPathComponent() try fileManager.createDirectory( at: updatedModelParentURL, withIntermediateDirectories: true, attributes: nil) let toTemp = updatedModelParentURL .appendingPathComponent(defaultModelURL.lastPathComponent) try fileManager.copyItem( at: defaultModelURL, to: toTemp) try fileManager.moveItem( at: toTemp, to: updatedModelURL) } catch { print("Error: \(error)") return } } guard let model = try? UpdatableDrawingClassifier( contentsOf: updatedModelURL) else { return } updatedDrawingClassifier = model } }

This code loads the updated, compiled model into memory. Next, add the following public extension right after the struct definition:

extension UpdatableModel { static func predictLabelFor(_ value: MLFeatureValue) -> String? { loadModel() return updatedDrawingClassifier?.predictLabelFor(value) } }

The predict method loads the model into memory then calls the predict method that you added to the extension.

Now, open Drawing.swift and add the following after the import:

import CoreML

You need this to prepare the prediction input.

Preparing the Prediction

Core ML expects you to wrap the input data for a prediction in an object. This object includes both the data value and its type.

In Drawing.swift, add the following property to the struct:

var featureValue: MLFeatureValue { let imageConstraint = UpdatableModel.imageConstraint let preparedImage = whiteTintedImage let imageFeatureValue = try? MLFeatureValue(cgImage: preparedImage, constraint: imageConstraint) return imageFeatureValue! }

This defines a computed property that sets up the drawing’s feature value. The feature value is based on the white-tinted representation of the image and the model’s image constraint.

Now that you’ve prepared the input, you can focus on triggering the prediction.

First, open CreateQuoteViewController.swift and add the extension to the end of the file:

extension CreateQuoteViewController: DrawingViewDelegate { func drawingDidChange(_ drawingView: DrawingView) { // 1 let drawingRect = drawingView.boundingSquare() let drawing = Drawing( drawing: drawingView.canvasView.drawing, rect: drawingRect) // 2 let imageFeatureValue = drawing.featureValue // 3 let drawingLabel = UpdatableModel.predictLabelFor(imageFeatureValue) // 4 DispatchQueue.main.async { drawingView.clearCanvas() guard let emoji = drawingLabel else { return } self.addStickerToCanvas(emoji, at: drawingRect) } } }

Recall that you added a to draw sticker shortcuts. In this code, you conform to the protocol to get notified whenever the drawing has changed. Your implementation does the following:

  1. Creates a instance with the drawing info and its bounding square.
  2. Creates the feature value for the drawing prediction input.
  3. Makes a prediction to get the emoji that corresponds to the drawing.
  4. Updates the view on the main queue to clear the canvas and add the predicted emoji to the view.

Then, in remove the following:


You don’t need to clear the drawing here. You’ll do this after you make a prediction.

Testing the Prediction

Next, in add the following right after assigning :

drawingView.delegate = self

This makes the view controller the drawing view delegate.

Build and run the app and select a photo. Draw on the canvas and verify that the drawing is cleared and the following is logged in the console:

Log results shows no prediction

That’s to be expected. You haven’t added a sticker shortcut yet.

Now walk through the flow of adding a sticker shortcut. After you come back to the view of the selected photo, draw the same shortcut:

Close-up of dandelion with quote

Oops, the sticker still isn’t added! You can check the console log for clues:

Sticker still not added

After a bit of head-scratching, it may notice that your model has no clue about the sticker you’ve added. Time to fix that.

Updating the Model

You update a model by creating an . The update task initializer requires the compiled model file, training data and a completion handler. Generally, you want to save your updated model to disk and reload it, so new predictions make use of the latest data.

You’ll start by preparing the training data based on the shortcut drawings.

Recall that you made model predictions by passing in an input. Likewise, you can train a model by passing in a input. You can make batch predictions or train with many inputs by passing in an containing multiple feature providers.

First, open DrawingDataStore.swift and replace the import with the following:

import CoreML

You need this to set up the Core ML training inputs.

Next, add the following method to the extension:

func prepareTrainingData() throws -> MLBatchProvider { // 1 var featureProviders: [MLFeatureProvider] = [] // 2 let inputName = "drawing" let outputName = "label" // 3 for drawing in drawings { if let drawing = drawing { // 4 let inputValue = drawing.featureValue // 5 let outputValue = MLFeatureValue(string: emoji) // 6 let dataPointFeatures: [String: MLFeatureValue] = [inputName: inputValue, outputName: outputValue] // 7 if let provider = try? MLDictionaryFeatureProvider( dictionary: dataPointFeatures) { featureProviders.append(provider) } } } // 8 return MLArrayBatchProvider(array: featureProviders) }

Here’s a step-by-step breakdown of this code:

  1. Initialize an empty array of feature providers.
  2. Define the names for the model training inputs.
  3. Loop through the drawings in the data store.
  4. Wrap the drawing training input in a feature value.
  5. Wrap the emoji training input in a feature value.
  6. Create a data point for the training input. This is a dictionary of the training input names and feature values.
  7. Create a feature provider for the data point and append it to the feature providers array.
  8. Finally, create a batch provider from the array of feature providers.

Now, open UpdatableModel.swift and add the following method to the end of the extension:

static func updateModel( at url: URL, with trainingData: MLBatchProvider, completionHandler: @escaping (MLUpdateContext) -> Void ) { do { let updateTask = try MLUpdateTask( forModelAt: url, trainingData: trainingData, configuration: nil, completionHandler: completionHandler) updateTask.resume() } catch { print("Couldn't create an MLUpdateTask.") } }

The code creates the update task with the compiled model URL. You also pass in a batch provider with the training data. The call to starts the training and the completion handler is called when training finishes.

Saving the Model

Now, add the following method to the private extension for :

static func saveUpdatedModel(_ updateContext: MLUpdateContext) { // 1 let updatedModel = updateContext.model let fileManager = FileManager.default do { // 2 try fileManager.createDirectory( at: tempUpdatedModelURL, withIntermediateDirectories: true, attributes: nil) // 3 try updatedModel.write(to: tempUpdatedModelURL) // 4 _ = try fileManager.replaceItemAt( updatedModelURL, withItemAt: tempUpdatedModelURL) print("Updated model saved to:\n\t\(updatedModelURL)") } catch let error { print("Could not save updated model to the file system: \(error)") return } }

This helper class does the work of saving the updated model. It takes in an which has useful info about the training. The method does the following:

  1. First it gets the updated model from memory. This is not the same as the original model.
  2. Then it creates an intermediary folder to save the updated model.
  3. It writes the updated model to a temporary folder.
  4. Finally, it replaces the model folder’s content. Overwriting the existing mlmodelc folder gives errors. The solution is to save to an intermediate folder then copy the contents over.

Performing the Update

Add the following method to the public extension:

static func updateWith( trainingData: MLBatchProvider, completionHandler: @escaping () -> Void ) { loadModel() UpdatableDrawingClassifier.updateModel( at: updatedModelURL, with: trainingData) { context in saveUpdatedModel(context) DispatchQueue.main.async { completionHandler() } } }

The code loads the model into memory then calls the update method you defined in its extension. The completion handler saves the updated model then runs this method’s completion handler.

Now, open AddShortcutViewController.swift and replace the implementation with the following:

do { let trainingData = try drawingDataStore.prepareTrainingData() .userInitiated).async { UpdatableModel.updateWith(trainingData: trainingData) { DispatchQueue.main.async { self.performSegue( withIdentifier: "AddShortcutUnwindSegue", sender: self) } } } } catch { print("Error updating model", error) }

Here you’ve put everything together for training. After setting up the training data, you start a background queue to update the model. The update method calls the unwind segue to transition to the main screen.

Build and run the app and go through the steps to create a shortcut.

Add a shortcut screen with heart eyes emoji, there gray rectangles with hearts drawn in them

Verify that when you tap Save the console logs the model update:

Model update logged

Draw the same shortcut on the selected photo and verify that the right emoji shows:

Four screens: flower close up with quote, add a shortcut with hearts, flower close up with heart, flower closeup with heart eye emoji

Congratulations, you machine learning ninja!

Machine Learning ninja warrior

Where to Go From Here?

Download the completed version of the project using the Download Materials button at the top or bottom of this tutorial.

Check out the Machine Learning in iOS video course to learn more about how to train your own models using Create ML and Turi Create. Beginning Machine Learning with Keras & Core ML walks you through how to train a neural network and convert it to Core ML.

Create ML app lets you build, train and deploy machine learning models with no machine learning expertise required. You can also check out official WWDC 2019 sessions on What’s New in Machine Learning and Training Object Detection Models in Create ML

I hope you enjoyed this tutorial! If you have any questions or comments, please join the discussion below. Weekly

The newsletter is the easiest way to stay up-to-date on everything you need to know as a mobile developer.

Get a weekly digest of our tutorials and courses, and receive a free in-depth email course as a bonus!

  1. Overload sizing chart
  2. Stream overlay
  3. Mohawk t shirt
  4. Warp 9 wheels drz400
  5. Tangkula towel warmer

Apple debuts Core ML 3 with on-device machine learning

Join gaming leaders online at GamesBeat Summit Next this upcoming November 9-10. Learn more about what comes next.

Apple today introduced Core ML 3, the latest iteration of its machine learning model framework for iOS developers bringing machine intelligence to smartphone apps. Core ML 3 will for the first time be able to provide training for on-device machine learning to deliver personalized experiences with iOS apps. The ability to train multiple models with different data sets will also be part of a new Create ML app on macOS for applications like object detection and identifying sounds.

Apple’s machine learning framework will be able to support more than 100 model layer types.

On-device machine learning is growing in popularity as a way to deploy quickly on the edge and respect privacy. Solutions for popular frameworks like Google’s TensorFlow and Facebook’s PyTorch to supply on-device machine learning through approaches like federated learning arrived in recent months.

Today’s news was announced at Apple’s Worldwide Developers Conference (WWDC) being held this week in San Jose, California.

VentureBeat has reached out to an Apple spokesperson for more details about Core ML 3, and we will update this story as they become available.

Also announced today: watchOS 6 with Voice Memos and menstrual cycle tracking, iOS 13 with more expressive Siri and personalized results with Apple’s Home Pod, a modular Mac on wheels, and the ability to control Apple’s tvOS with PlayStation and Xbox controllers.

The Core ML framework is used internally at Apple for things like training Siri, the QuickType keyboard, language learning for the app Memrise, and predictions for the Polarr photo editing app.

At last year’s WWDC, Apple introduced Core ML 2, a framework the company called 30% faster than its predecessor, and Create ML, a GPU-accelerated framework for training custom AI models with Xcode and the Swift programming language. The initial Core ML framework for iOS was introduced at WWDC in 2017 and incorporated into iOS 11. Premade models that work right out of the box include Apple’s Vision API and Natural Language Framework.

Unlike Google’s ML Kit that works for both Android and iOS developers, Core ML is made exclusively for developers creating apps for Apple’s iOS operating system. Google integrated Core ML with its TensorFlow Lite back in late 2017.

Apple WWDC 2019: Click Here For Full Coverage
Understand Core ML on iOS in 5 Minutes

Core ML 3 Framework

WWDC19FrameworksMachine Learning and VisionXcodeiOSmacOStvOSwatchOSWatch VideoSlides

Description: Core ML 3 now enables support for advanced model types that were never before available in on-device machine learning. Learn how model personalization brings amazing personalization opportunities to your app. Gain a deeper understanding of strategies for linking models and improvements to Core ML tools used for conversion of existing models.

On device model personalization

  • Previously only bundles in the app
  • Apple has intentionally avoided trying to load models from the cloud as this compromises privacy and scalability
  • Provide some new training examples and receive a new model
  • Demo uses pencil kit to capture input drawing and find a emoji. With the new API the user can train the model to understand the user’s way of drawing.
    • Import an editable CoreML model this model has some parameters for updating declared
    • CoreML generates some classes to work with these inputs
    • To update:
      • Load the model from the bundle
      • Prepare the training data
      • Create an MLUpdateTask which async produces a CoreML model
  • We can update the model in the background. iOS will give you up to several minutes (see notes on the background tasks framework)

In short, Core ML 3 lets us update the model entirely locally (ad-hoc for the given user).

What’s inside a MLmodel

Core ML 2Core ML 3

In CoreML 2 models consist mostly of:

  • Parameters: things like the weights of the layers, if it's a neural network for example
  • Metadata: licensing and authors
  • Interface: describes how our app can interact with this model.

What’s new on CoreML 3:

  • Update parameters: describes what parts of the model are updatable and how it's updatable.
  • Update Interface: model interface our app can leverage to make these updates happen.

Neural Networks

Neural networks are great at solving challenging task, such as understanding the content of an image or a document or an audio clip.

  • Core ML 3 has more advanced features available for Control flow in Neural Networks (e.g. Branching, Loops)
  • Internal improvements: Made existing layers more generic, control flow, add new mathematical operations

BERT in depth

BERT (Bidirectional Encoder Representation from Transformers) is state of the art machine learning model.

The BERT model is actually a neural network that can perform multiple tasks for natural language understanding.

Inside the BERT model there are a bunch of modules.

And inside these modules there are a bunch of layers.

(Models on the left, layers on the right)

In short this is what’s happening behind the scenes of the demo:

Other improvements

  • Linked Models: Shared models between pipelines.
  • Image Features: now images (from url, or ) are supported for models rather than just pixel buffers.
  • can set preferred MetalDevice (use full for multi GPU env) and an option to speed up the performance of the GPU calculations (at accuracy cost).

Ml 3 core

An in-depth look at Core ML 3

As you may have seen in the WWDC 2019 videos, Core ML 3 adds a lot of new stuff to machine learning on iOS. The new killer feature is on-device training of models, but it can now also run many advanced model architectures — and thanks to the addition of many new layer types, it should even be able to run new architectures that haven’t been invented yet!

Core ML logo

This is by far the biggest update to Core ML since its original release in 2017.

Core ML is not perfect, but with these new additions — as well as the A12 chip’s Neural Engine — Apple is definitely way ahead of everyone else when it comes to machine learning on mobile.

In this blog post I’ll describe in excruciating detail what Core ML 3’s new features are, all the way down to the individual layer types. We’re mostly going to look at the mlmodel format here, not the API from CoreML.framework (which, except for adding training functionality, didn’t really change much).

Do you need to know this stuff if you’re just looking to convert an existing model and use it in your app? Probably not. But it’s definitely good to have as a reference for when you’re designing your own machine learning models that you intend to use with Core ML 3, or when you’re not sure an existing model can be converted.

It’s all in the proto files

So, what is new? If you look at the API documentation for CoreML.framework, you won’t get much wiser. The nitty-gritty isn’t in the API docs but in the Core ML model format specification.

This specification consists of a number of .proto files containing protobuf message definitions. Protobuf, or “protocol buffers”, is the serialization format used by Core ML’s mlmodel files. It’s a common serialization technology, also used by TensorFlow and ONNX. The proto files describe the different objects that can be found in an mlmodel file.

You can find the Core ML model specification here but this website isn’t always up-to-date. It’s better to look directly at the proto files. You can find those inside the coremltools repo. They’re regular text files so you can open them with any editor.

Note: If you’re serious about using Core ML, I suggest getting familiar with the proto files and the protobuf format in general. This is the only place where the capabilities and limitations of Core ML are documented, and you’ll learn a ton by reading through them. You can read more about the mlmodel internals in my book, which I’m updating to include Core ML 3 as we speak.

The main file in the format specification is Model.proto. This defines what a model is, what kind of inputs and outputs a model can have, and what different types of models exist.

An important property of the definition is the specification version. This version number determines which functionality is available in the mlmodel file, and which operating system can run the model.

The new specification version is 4, not 3 as you might expect. There have been three major releases of Core ML, but there was also a small update for iOS 11.2 that bumped the version number.

Core ML models with specification version 4 can only run on iOS 13 and macOS 10.15 (Catalina) or better. If you’re targeting iOS 12 or even 11, forget about using any of the new features shown in this blog post.

Note: When you convert a model, coremltools will choose the lowest possible specification version that your model is compatible with. v3 models can run on iOS 12, v2 models on iOS 11.2, and v1 models on iOS 11.0. Of course, if your model uses any of the newly introduced features, it’s iOS 13 or later only.

New model types

Core ML has always supported the following model types (spec v1):

  • Identity — this does nothing, it simply passes the input data through to the output, useful only for testing
  • GLM regressor and classifier — for linear and logistic regression
  • Support vector machines for regression and classification
  • Tree ensemble regressor and classifier — for XGBoost models
  • Neural networks — for regression, classification, or general-purpose neural networks
  • Pipeline models — these let you combine multiple models into one big model
  • Model types for feature engineering — one-hot encoding, imputation of missing values, input vectorization, and so on. These are mostly useful for converting scikit-learn models to Core ML. The model is turned into a Pipeline that has several of these feature engineering models in a row.

Specification version 2 was only a small update that added support for 16-bit floating point weights. Enabling this makes your mlmodel files about 2× smaller but unlike what some people expect, it does not make the model run any faster.

In Core ML 2 (spec v3), the following model types were added:

  • Bayesian probit regressor — a more fancy version of logistic regression
  • Non-maximum suppression — useful for post-processing object detection results, typically used as the last model in a Pipeline. See my blog post on SSDLite in Core ML for more info.
  • VisionFeaturePrint — this is a convolutional neural network for extracting features from images. The output is a 2048-element feature vector. Create ML uses this for transfer learning when training image classifiers, but you can also use it in your own models (such as for image similarity).
  • Other models from Create ML: text classifier, word tagger.
  • Custom models: sometimes you may have a model type that Core ML doesn’t understand, but you’d still like to put it in a pipeline alongside other models. A custom model lets you put the learned parameters (and any other data) inside the mlmodel file, while you put the actual implementation of the custom logic inside your app.

Other features added by v3 were weight quantization for even smaller mlmodel files (but still no change in inference speed) and flexible input sizes. The API added batch predictions and better support for dealing with sequential data.

OK, that was the story until now. As of Core ML 3 (spec v4), the following model types can also be described by your mlmodel files:

  • k-Nearest Neighbors classifier (or k-NN)
  • ItemSimilarityRecommender — you can use this to build recommender models, like the one that now comes with Create ML
  • SoundAnalysisPreprocessing — this is for Create ML’s new sound classification model. It takes audio samples and converts them to mel spectrograms. This can be used in a Pipeline as the input to an audio feature extraction model (typically a neural network).
  • Gazetteer – this is Create ML’s new model, used with from the Natural Language framework. A gazetteer is a fancy look-up table for words and phrases.
  • WordEmbedding – for Create ML’s new model that is a dictionary of words and their embedding vectors; also used with the Natural Language framework.
  • Linked models — a linked model is simply a reference to another mlmodel file (actually, the compiled version, mlmodelc) in your app bundle. This lets you reuse expensive feature extractors across multiple classifiers — if two different Pipelines use the same linked model, it only gets loaded the once.

The object now has an property. If this is , the model can be trained on-device with new data. This currently only works for neural networks and k-Nearest Neighbors (either as a standalone model or inside a Pipeline).

k-Nearest Neighbors is only a simple algorithm but this makes it quite suitable for on-device training. A common method is to have a fixed neural network, such as VisionFeaturePrint, extract the features from the input data, and then use k-NN to classify those feature vectors. Such a model is really fast to “train” because k-NN simply memorizes any examples you give it — it doesn’t do any actual learning.

One downside of k-NN is that making predictions becomes slow when you have a lot of examples memorized, but Core ML supports a K-D tree variant that should be quite efficient.

What’s new in neural networks

This is where it gets interesting… the vast majority of changes in Core ML 3 are related to neural networks.

Where Core ML 2 supported “only” about 40 different layer types, Core ML 3 adds over 100 new ones. But let’s not get carried away, some of these new layers are merely refinements of older layer types to make them suitable for handling flexible tensor shapes.

For Core ML 2 and earlier, the data that flowed through the neural network always was a tensor of rank 5. That means each tensor was made up of the following five dimensions, in this order:

This choice makes a lot of sense when the input to your neural network is mostly images, but it’s not very accommodating to other types of data.

For example, in a neural network that processes 1-dimensional vectors, you were supposed to use the “channels” dimension to describe the size of the vector and set the other dimensions to size 1. In that case, the shape of the input tensor would be . That’s just awkward.

Many of the new layers that were added in Core ML 3 support tensors of arbitrary rank and shape, making Core ML much more suitable to data other than images.

Tip: This is why it’s important to know how to read the proto files, as they explain for each layer how tensors of different ranks are handled. This kind of thing isn’t documented anywhere else!

All the neural network stuff is described in NeuralNetwork.proto. It’s a big file at almost 5000 lines…

The main object is , although there are two other variations: and . The difference is that a plain neural network outputs MultiArray objects or images, while the classifier variant outputs a dictionary with the classes and their predicted probabilities, and the regressor simply outputs a numeric value. Other than that small difference in how the output is interpreted, these three model types all work the same.

The object has a list of layers, as well as a list of preprocessing options for any image inputs. Core ML 3 adds a few new properties that describe:

  • how inputs of type MultiArray are converted into tensors. You can choose between the old way, which creates that rank-5 tensor shown previously, or the new way, which simply passes the input tensor through unchanged. For most types of data that is not images, it makes sense to use this new method.
  • how image inputs are converted into tensors. Instead of the old rank-5 tensor, you can choose to use a rank-4 tensor, which is just . This drops the “sequence length” dimension, which you usually don’t need for images (unless, of course, you have a sequence of them).
  • the hyperparameters for training this model, if you chose to enable that. This is a object (described below).

I plan to write a detailed blog post about on-device training soon, but just to give you an idea of what is involved:

  • The property of must be set to true.
  • The property of any layers that you wish to train must be set to true. This allows you to limit training to specific layers only. Currently, training is only supported for convolution and fully-connected layers.
  • The objects that hold the learnable parameters for the layers you wish to train must also have their property set.
  • You need to define additional “training inputs” on the model that are used to provide the ground-truth labels to the loss function.

In addition, the object describes:

  • which loss function(s) to use — supported loss functions are categorical cross entropy and MSE. Inside the mlmodel file, a loss function is just another layer. It only has two properties: the name of one of the model’s output layers, and the name of the model’s training input that provides the target labels. For cross entropy loss, the input must be connected to the output of a softmax layer.
  • what optimizer to use — currently only SGD and Adam are supported
  • the number of epochs to train for

Update: New in beta 3 are and parameters that tell Core ML to randomly shuffle the training data before each epoch.

Note that hyperparameters such as the number of epochs, the learning rate, and so on, can be overridden inside the app. The values inside the mlmodel should be set to reasonable defaults but you’re not stuck with them if you don’t like them.

Neural network layers in Core ML 2

Now let’s get to the good stuff: the neural network layers. The first versions of Core ML supported only the following layer types:

  • Convolution: 2D only, although you can fake 1D convolution by setting the kernel width or height to 1. Also supports dilated or atrous convolutions, grouped (depthwise) convolution, and deconvolution.
  • Pooling: max, average, L2, and global pooling.
  • Fully-connected, also known as “inner product” or “dense” layer.
  • Activation functions: linear, ReLU, leaky ReLU, thresholded ReLU, PReLU, tanh, scaled tanh, sigmoid, hard sigmoid, ELU, softsign, softplus, parametric soft plus. All the different activation functions are handled by a single layer type, . Note that, unlike in Keras, where the activation can be a property of the convolution layer, in Core ML they are always layers of their own. For extra speed, the Core ML runtime will “fuse” the activation function with the preceding layer, if possible.
  • Batch normalization.
  • Other types of normalization, such as using mean & variance, L2 norm, and local response normalization (LRN).
  • Softmax: usually the last layer of a object.
  • Padding: for adding extra zero-padding around the edges of the image tensor. Convolution and pooling layers can already take care of padding themselves, but with this layer you can do things such as reflection or replication padding.
  • Cropping: for removing pixels around the edges of the tensor.
  • Upsampling: nearest neighbor or bilinear upsampling by an integer scaling factor.
  • Unary operations: sqrt, 1/sqrt, 1/x, x^power, exp, log, abs, thresholding.
  • Element-wise operations between two or more tensors: add, multiply, average, maximum, minimum. These support broadcasting to some extent.
  • Element-wise operations on a single tensor: multiply by a scale factor, add bias. These support broadcasting.
  • Reduction operations on a single tensor: sum, sum of natural logarithm, sum of squares, average, product, L1 norm, L2 norm, maximum, minimum, argmax.
  • Dot product between two vectors, can also compute cosine similarity.
  • Layers that reorganize the contents of a tensor: reshape, flatten, permute, space-to-depth, depth-to-space.
  • Concat, split, and slice: these combine or pull apart tensors.
  • Recurrent neural network layers: basic RNN, uni- and bi-directional LSTM, GRU (unidirectional only).
  • Sequence repeat: duplicates the given input sequence a number of times.
  • Embeddings.
  • Load constant: can be used to provide data to some of the other layers, for example anchor boxes in an object detection model.

Specification version 2 added support for custom layers in neural networks. This was a very welcome addition, as now it became possible to convert many more models.

Inside the mlmodel file, a custom layer is simply a placeholder, possibly with trained weights and configuration parameters. In your app, you’re supposed to provide a Swift or Objective-C implementation of the layer’s functionality, and possibly a Metal version as well for running it on the GPU. (Unfortunately, the Neural Engine isn’t currently an option for custom layers.)

For example, if a model requires an activation function that is not in the list above, it can be implemented as a custom layer. However, you can also do this by cleverly combining some of the other layer types. For example, a ReLU6 can be made by first doing a regular ReLU, then multiplying the data by -1, thresholding to -6, and finally multiplying by -1 again. That requires 4 different layers but — in theory — the Core ML framework could optimize this away at runtime.

In Core ML 2 (spec v3), the following layer types were added:

  • Resize bilinear: unlike the upsampling layer, which only accepts integer scaling factors, this lets you perform a bilinear resize to an arbitrary image size.
  • Crop-resize: for extracting regions of interest from a tensor. This can be used to implement an RoI Align layer as used in Mask R-CNN.

As you can tell, even though there are some layer types for dealing with vectors or sequence data, most of the layers are very focused on convolutional neural networks for dealing with images. In Core ML 2, all the layers expect tensors of shape even if your data is only one-dimensional.

Core ML 3 (spec v4) relaxes that requirement a little bit for these existing layer types. For example, an inner product layer can now work on input tensors from rank 1 to rank 5. So in addition to adding a whole bunch of new layers, Core ML 3 also made the existing layer types more flexible.

Note: The doc comments inside NeuralNetwork.proto explain for each of these layers how tensors of different ranks are handled. If you’re ever wondering what the right tensor shape is for a layer, that’s the place to look.

Now let’s finally look at the new stuff!

The new neural network layers

Over 100 new layers… phew! It’s a lot but I’m going to list them all because it is useful to have this blog post as a reference. (Of course, the proto files are the authoritative source.)

Keep in mind that in the previous version of the spec, certain operations were combined into a single layer. For example, all the unary tensor operations were part of the layer type . But with Core ML 3, a whole bunch of new unary operations were added and they all have their own layer type, which obviously inflates the total count.

Note: In the proto files, the name of every layer type ends with , so the unary function layer type is really named . For the sake of readability, I’m dropping the “params” from the layer names here.

Core ML 3 adds the following layers for element-wise unary operations:

  • : clamps the input between a minimum and maximum value
  • and : the usual ceil and floor functions applied to an entire tensor at once
  • : tells you whether a number is positive, zero, or negative
  • : rounds off the values of a tensor to whole numbers
  • : calculates for every element in the tensor
  • , , , , , , , , , , , : the well-known (hyperbolic) trig functions
  • : computes the Gauss error function

This seriously expands the number of math primitives supported by Core ML. Unlike the math functions it already had, these can deal with tensors of any rank.

There is only one new activation function:

  • : the Gaussian error linear unit activation, either exact or using a tanh or sigmoid approximation

Of course, you can use any unary function as an activation function, or create one by combining different math layers.

There are also new layer types for comparing tensors:

  • , , , , ,
  • , , ,

These output a new tensor that is where the condition is true, and where the condition is false (also known as a tensor “mask”). These layer types support broadcasting, so you can compare tensors of different ranks. You can also compare a tensor with a (hardcoded) scalar value.

One place where these layer types come in useful is with the new control flow operations (see below), so that you can branch based on the outcome of a comparison, or create a loop that keeps repeating until a certain condition becomes false.

Previously, there were a handful of layers for element-wise operations between two or more tensors. Core ML 3 adds a few new types, and — as you can tell from the name — these are now much more flexible because they fully support NumPy-style broadcasting:

  • : addition
  • : subtraction
  • : multiplication
  • : division
  • : division followed by rounding down, to get a whole number result
  • : remainder of division
  • : raise the first tensor to the power of the second
  • , : minimum and maximum

Reduction operations have now moved into their own layers. Core ML already supported most or all of these, but these new versions can work with arbitrary tensors, not just images. You can now do the reduction over one or more axes.

  • : compute the sum over the specified axes
  • : compute the sum of the squares of the tensor’s elements
  • : compute the sum of the natural logarithm of the elements
  • : the log-sum-exp trick! Exponentiate the elements, sum them up, and take the natural logarithm.
  • : compute the average of the elements
  • : multiply all the elements together
  • , : compute the L1 or L2 norm
  • , : find the maximum or minimum value
  • , : find the index of the maximum or minimum value
  • : find the k top (or bottom) values and their indices; this is a more general version of argmax and argmin. The value of k can be provided by an input, so it does not have to be hardcoded into the model.

Speaking of math stuff, Core ML 3 also adds the following:

  • : a general-purpose matrix multiplication on two input tensors, or between a single input tensor and a fixed set of weights (plus an optional bias). Supports broadcasting and can transpose the inputs before doing the multiplication. In other words, gemm.
  • : a simple normalization layer that subtracts beta (such as the mean) and divides by gamma (e.g. the standard deviation), both of which are provided as fixed weights. This is different from the existing , which performs the same formula but actually computes the mean and variance from the tensor at inference time.

A number of other existing operations have been extended to use arbitrary size tensors, also known as rank-N tensors or N-dimensional tensors. You can recognize such layer types by the “” in their name:

  • : the old softmax could only be applied to the channel axis, this one can use any axis
  • : concatenate two or more outputs across any axis
  • : the opposite of concat. Previously you could only split on the channel axis, and only into parts with the same size. Now it lets you split on any axis and the sizes of the splits can be different.
  • : OK, this doesn’t have ND in its name but it’s the same as except it supports N-dimensional tensors
  • : like the existing but with more flexible tensor shapes
  • : like the existing layer but with more flexible tensor shapes

Slicing lets you keep only a part of the original tensor and throw away the rest. The old slicing layer could slice the input tensor across the width, height, or channel axis. Core ML 3 gives us two new slicing layers that support slicing across any axis:

    I’m not 100% sure yet how these layers work as the documentation isn’t very helpful, but it looks like they can slice by indices or by a mask. In any case, these layers slice and dice!

    Why two different versions? You’ll actually see this distinction between static and dynamic in some of the upcoming layer types too.

    Static basically means, “everything is known about this operation beforehand” while dynamic means “the arguments of this operation can change between runs”. For example, the static version of a layer may have a hardcoded property while the dynamic version can use a different output shape every time.

    Note: In the first version of Core ML, the size of the input was hardcoded — for example, only 224×224 images. Since version 2, Core ML has supported flexible input shapes, where you can tell the mlmodel that a given input can accept tensors between a minimum and maximum size, or from a list of predefined sizes. That sounds pretty dynamic! However, by dynamic operations in Core ML 3 we mean something slightly different…

    Here “dynamic” means that inside the graph itself, from one run to the next, the shapes of the intermediate tensors may be different, even if the input tensor is always of the same size.

    For example, if the first part of your model predicts a bounding box that then gets cropped and fed to the second part of your model, it’s likely that this bounding box and the resulting cropped tensor will have a different size every time. Therefore, the layers in the second part of the model cannot make any assumptions about the shape of that cropped tensor.

    Because Core ML is no longer limited to static image-based models but now also contains methods for control flow and other dynamic operations, it has to be able to manipulate tensors in all kinds of fancy ways. Let’s look at those functions.

    • : this returns a vector containing the shape of the input tensor, which lets you inspect at runtime how big a given tensor is
    • , , : these change the shape of the tensor according to the common NumPy broadcasting rules
    • , : fills the tensor with evenly spaced values in a given interval, much like NumPy’s function.
    • , , : these functions fill the tensor with a constant scalar value — usually all zeros or ones, but any floating point value will do.

    Notice how some of these layer types come in three different variants: , , and . What do these mean?

    • Static is the simplest one: all the properties for this layer are hardcoded in the mlmodel file. If you know that, regardless of what happens, you’re always going to need a tensor with shape , you would use . Note that and do not take an input tensor, but does.

    • Like takes an additional input tensor and outputs new a tensor that has the same shape as that input. The layer ignores the actual values from that extra input tensor — it only looks at its shape. and take only one input and use this to determine the shape of the output tensor, while takes two input tensors: the one to broadcast, and the second one whose shape it will broadcast to.

    • Dynamic is similar to like: it also takes an additional input tensor, but this time it’s not the shape of that tensor that’s important but its contents. For example, to fill a tensor of shape you would pass in a tensor of shape that has three values: , , and . Interestingly, doesn’t let you pass in the scalar value dynamically.

    Note: Static / dynamic isn’t always about the output shape, it depends on the layer. For example, in the random distribution layers (see next), you can set the random seed dynamically too. Some of the dynamic layers have several different inputs that let you override their default properties.

    Core ML 3 also lets you create new tensors by sampling from random distributions:

    • , ,
    • , ,
    • , ,

    There were already layers for reshaping and flattening layers, but more variants have been added:

    • : remove any dimensions that have size 1
    • : the opposite of squeeze, adds new dimensions with size 1
    • : flatten the input tensor into a two-dimensional matrix
    • , ,
    • : this is like using in NumPy. The layer automatically infers the rest of the new shape. Handy!

    Besides concat and split operations for arbitrary tensors, Core ML 3 also adds the following tensor manipulation operations:

    • : repeat the tensor a given number of times
    • : join tensors along a new axis (as opposed to concat, which joins the tensors along an existing axis)
    • : reverses one or more dimensions of the input tensor
    • : reverses the sequence, for tensors that store a sequence of data
    • : slides a window over the input data and returns a new tensor with the contents of the window at every step

    Also new is support for gather and scatter:

    • , , : given a set of indices, keeps only the parts of the input tensor at those indices
    • , , : copies the values of one tensor into another tensor, but only at the given indices. Besides copying there are also other accumulation modes: add, subtract, multiply, divide, maximum, and minimum.

    Speaking of selecting elements based on some condition, here are a few more layer types for dealing with masks:

    • : creates a new tensor with only the elements that were not zero. You could use this with a mask tensor from a tensor comparison, such as , for example.
    • : takes three input tensors, two data tensors and a mask that contains ones (true) or zeros (false). Returns a new tensor containing the elements of the first data tensor or the second data tensor, depending on whether the value from the mask is true or false.
    • , : zeroes out the elements below or above the diagonal
    • : zeroes out the elements outside a central band

    Beta 3 of coremltools 3.0 snuck in a few new layer types:

    • : adds a certain amount of padding around a tensor. Unlike the existing padding layer, this one works for any axis, not just the width and height dimensions.
    • : there already was a separate model type for doing NMS on bounding boxes, which you’d put into a pipeline following an object detect detection model, but now it’s also possible to do NMS directly inside the neural network.

    Finally — and perhaps most excitingly — Core ML 3 adds layers for control flow such as decision making and loops.

    Previously, Core ML would run the neural network graph from top to bottom just once for each prediction. But now it can repeat certain parts of the graph and skip others. Exactly which parts of the neural network get executed by Core ML can vary between one run of the model and the next — this depends purely on the contents of your input data.

    The control flow layers are:

    • : this is like an if-else statement. It contains two objects, one that runs when the input to this layer is true, the other when the input is false. Yep, you read that correctly: a branch contains a smaller neural network inside the main neural network. Because Core ML doesn’t have a boolean tensor type, you’ll actually pass in 1 or 0 instead of true or false. (Core ML considers the condition to the true if the value is greater than 1e-6.)
    • : this is like a while loop. If no input is given, the loop repeats for the maximum number of iterations specified in the layer. You can override this by passing in the number of iterations you want to loop for. The contains a “body” that represents the inside of the while loop. The layers from this neural net are run on every iteration. It’s also possible to include a that acts as the condition of the while loop. This “condition” neural network is run once before the loop starts and again before every new iteration. As long as it outputs a value greater than 0, the loop keeps repeating.
    • : you can put this into the loop’s body to terminate the loop, just like a regular break statement.
    • : you’d put this into the body if you want to stop the current loop iteration and skip ahead to the next one, just like a regular continue statement.
    • : this is used to overwrite a previous tensor, for example to replace an old result with a new one — without this , tensors in the graph could never change once they have been computed.

    Note that the and do not have outputs. They always pass control to one of their child objects, which will have an output of some kind. (I haven’t tried it, but it seems reasonable to assume you can nest these loops and branches too.)

    For an example of how to use these new control flow layers, check out this Jupyter notebook from the coremltools repo. It shows how to implement a simple iterative process inside the Core ML model and uses a many of the new layer types.

    The example works like this:

    1. use a layer to load the value into the output named
    2. add a that will loop for a certain maximum number of iterations
    3. inside the loop, add a new neural network that performs some kind of computation
    4. at the end of the computation, use a arithmetic layer to increment the current value from , and then a to overwrite the value inside the output
    5. they also use another to copy the result of the computation back into the original tensor, so that the next iteration of the loop can use this new value
    6. add a to compare the output of the computation to some convergence threshold, and feed this yes/no result into a
    7. add a new neural network to the that just has inside it. In other words, if the branch happens — because the output of the computation was small enough to go under the convergence threshold — then we’ll break out of the loop.

    It’s a little weird perhaps, but very flexible! Key point is to remember to use the to overwrite existing tensors with new values, much like an assignment statement in Swift. After you run the model, the output will now have counted how many times the loop was repeated. Of course, this count may be different every time, depending on the values of the inputs to the model, as some will converge quicker than others. Pretty cool!

    Thanks to these control flow layers, Core ML 3 graphs can go way beyond the traditional acyclic graphs. However, you only get branches and loops — there is currently no such thing as a “goto”. Core ML is not Turing-complete quite yet. 😁

    At the very bottom of NeuralNetwork.proto are the layer definitions for on-device training. We already briefly looked at those, but here they are again:

    • , , : the loss functions
    • , , : the available optimizers

    Note: I find it a little odd that the loss function is defined in the mlmodel file. This makes it impossible to train with other loss functions. Likewise for the optimizers. Perhaps a future version of Core ML will allow us to provide custom implementations of these.

    And that’s it, those are all the new layer types in Core ML 3!

    Most of the new layer types are for creating, shaping, and manipulating tensors. There are also many new mathematics primitives. Not a whole lot of “real” neural network stuff has been added. But having these low-level operations will make it a lot easier to support all kinds of new, still unimagined, layer types.

    Then again, if implementing a new layer type requires adding 20 different math layers to your Core ML mlmodel, it might be faster to write a custom layer… 😉

    If you liked this post, say hi on Twitter @mhollemans or by email [email protected]
    Find the source code on my GitHub.

    Training CoreML Object Detection model from scratch using CreateML


    Now discussing:


    74 75 76 77 78