Engineering

How to apply a filter to a video stream in iOS

Greg Niemann

Platform Software Engineer

Last updated:

Published:

July 18, 2018

Have you ever wanted to apply a blur (or any other visual effect really) to an HTTP Live Stream? If so, you might have discovered that it isn’t as easy as you might think. We faced this problem and, after research and testing, came up with a solution.

One does not simply blur the HTTP Live Stream

Of course, blurring a stream may be as easy as putting a blur view on top of the AVPlayerViewController's view. The drawback to a blur view is that you have no control over how much blur is applied. This was unacceptable for our use case, so we created our own blurred video view to be able to get the exact amount of blur required. Feel free to follow along with our test project.

Core Image

Fortunately, Apple provides an entire library of image manipulation via the Core Image framework. Core Image offers well over 100 different manipulations and effects (known as filters and implemented as CIFilter), along with objects and functions to string those filters into chains, and for rendering the final result as a CGImage. By way of analogy, Core Image resembles cooking; you have ingredients, a recipe, and a kitchen to turn it all into a meal.

Following this analogy, there are three main objects in Core Image:

CIFilter is the ingredient (along with the base image). Use one of the ~160 pre-defined filters spiced with varying properties for creating visual effects or manipulation, or create your own.
CIImage is the recipe. It is a lightweight object that contains a reference to the base image and a chain of filters that is applied to it. It is lightweight in the sense that applying a CIFilter to a CIImage doesn’t do anything more than add the filter to the recipe for the final image - no rendering is done yet. For that we need…
CIContext is the kitchen, where the actual cooking happens. A CIContext takes the CIImage with filters applied and creates the output image. It can do this either on the CPU, creating a CGImage, or use either an OpenGL or Metal context to draw the image directly to the GPU.

Of course there is a CIFilter for Gaussian blur. There is even a convenience method to apply it automatically. It takes a single argument, the blur radius (higher values have more blur). With this in mind, we knew that our manipulation would look like this:

let base = // get the base image from somewhere
let blurred = base.clampedToExtent().applyingGaussianBlur(sigma: 6.0).clampedToRect(base.extent)

The clampedToExtent method extends the edges of the image to infinity. This is helpful as otherwise, the blur creates a thick black border at the edges of the image. Because we don’t actually want an infinite image, we clamp it back to the size (CIImage refers to it as extent) of the source image.

AVFoundation

The only problem was, how do we apply our filter to each frame of a video? For that, we dived into the AVFoundation framework and came up with what looked like the silver bullet: AVVideoComposition and the the AVPlayerItem.videoComposition property. This seemed to do exactly what we want - apply a filter to every frame of the video. It seemed easy to use, too - just create an AVVideoComposition object with init(asset:applyingCIFiltersWithHandler:) initalizer. The first argument is the AVAsset that backs the AVPlayerItem you will associate with. The second is a closure, which takes a AVAsynchronousCIImageFilteringRequest. This is the function that defines your image manipulation. For our purpose, it would look something like this:

let blurRadius = 6.0
let asset = AVAsset(url: streamURL)
let item = AVPlayerItem(asset: asset)
item.videoComposition= AVVideoComposition(asset: asset) { request in
    let blurred = request.sourceImage.clampedToExtent().applyingGaussianBlur(sigma: blurRadius)
    let output = blurred.clampedToRect(request.sourceImage.extent)
    request.finish(with: output, context: nil)
}

Well, that was easy, right? Problem solved, high fives all around! A little glue code, compile, run and…nothing.

Nothing?

Well, not nothing. The original stream, sans blur, was happily playing for us.

So we opened up the documentation for AVPlayerItem.videoComposition, and to our dismay found this line in the discussion:

A video composition can only be used with file-based media and is not supported for use with media served using HTTP Live Streaming.

Well, shucks. Now what?

It seemed clear that we would need to dig deeper into the bowels of AVFoundation to solve this problem. With the clear-cut solution exposed as non-viable, we wouldn’t be able to use an AVPlayerViewController or an AVPlayerLayer to output the video. But fear not! AVFoundation is a very powerful framework. Even though we couldn’t hook up our AVPlayer to a pre-fab output source, didn’t mean we couldn’t make our own output!

Enter AVPlayerItemVideoOutput, whose descriptive name tells us is an object for getting the video from an AVPlayerItem. Specifically, the AVPlayerItemVideoOutput has a method, copyPixelBuffer(forItemTime:itemTimeForDisplay:) which can extract a frame from the video, which can subsequently be modified and displayed. The last piece was timing - how do we know when a new frame is ready?

CADisplayLink

The answer to this question is, it doesn’t matter. What matters more is that we know what our target display frame rate is. The naive approach at this point would be to either use a Timer, or string up an observer on the AVPlayer using the addPeriodicTimeObserver(forInterval:queue:using:) method. However, there’s a better alternative: CADisplayLink. Per the documentation, a CADisplayLink is “A timer object that allows your application to synchronize its drawing to the refresh rate of the display.”

Sweet, that’s just what we want. Basically you set a desired frame rate, and get a call at a rate close to that which is supported by the device. It’s more durable than a Timer, and less overhead than a periodicTimeObserver.

Putting it all together

Our general strategy now became clear:

Create an AVPlayerItem with our live stream, and add an AVPlayerItemVideoOutput to it.
Create a CADisplayLink to allow us to update the video at a regular interval
Grab the image from the AVPlayerItemVideoOutput, blur it, and display it.

We wrote a UIView subclass that handles these three steps. You can get the full class here, and in-depth below. You can also clone the sample project from GitHib. Be advised that this runs much better on an actual device than the simulator (especially an iOS 11 simulator) due to limitations in the how the simulator implements graphics.

The end result lets us do stuff like this

First, we set up our class and instance variables:

class BlurredVideoView: UIView {
    var blurRadius: Double = 6.0
    var player: AVPlayer!

    private var output: AVPlayerItemVideoOutput!
    private var displayLink: CADisplayLink!
    private var context: CIContext = CIContext(options: [kCIContextWorkingColorSpace : NSNull()]) // 1
    private var playerItemObserver: NSKeyValueObservation? // 2

We create a CIContext for repeat use (1). Unlike the other Core Image classes, CIContext is expensive to create, so you should generally create it early (before you need to draw with it) and reuse it. We pass it an option to not use a working color space. This is a trade off of a small degradation in color quality, for a big improvement in performance. the optional NSKeyValueObservation may be unfamiliar as it is a new addition in Swift 4.0, for use in Swift’s new type-safe Key-Value Observations (KVO). Its use will be explained below.

Next is the real setup of the BlurredVideoView:

  func play(stream: URL, withBlur blur: Double? = nil, completion:  (()->Void)? = nil) {
      layer.isOpaque = true
      blurRadius = blur ?? blurRadius

      // 1
      let item = AVPlayerItem(url: stream)
      output = AVPlayerItemVideoOutput(outputSettings: nil)
      item.add(output)

      // 2
      playerItemObserver = item.observe(\.status) { [weak self] item, _ in
          guard item.status == .readyToPlay else { return }
          self?.playerItemObserver = nil
          self?.setupDisplayLink()

          self?.player.play()
          completion?()
      }

      player = AVPlayer(playerItem: item)
  }

  // 3
  private func setupDisplayLink() {
      displayLink = CADisplayLink(target: self, selector: #selector(displayLinkUpdated(link:)))
      displayLink.preferredFramesPerSecond = 20
      displayLink.add(to: .main, forMode: .commonModes)
  }

Here’s where the work happens. In play(stream:withBlur:completion:) we set up our AVFoundation stack (1) by creating our AVPlayerItem and adding to its outputs our AVPlayerItemVideoOutput. We also make our backing layer opaque. As we don’t plan to display any transparent content, this is an easy performance improvement as it will allow the system to ignore any views behind our view during the draw cycle. We then use Swift 4.0’s new KVO to observe the item’s status. The first argument to observe is the keypath of the property to observe, again using Swift 4’s new keypath notation. This is specifically so we can be told when it is ready to begin playback, allowing us to setup our display link, begin playback and call the completion handler.

This new method of KVO has several advantages over the legacy method. Most importantly, the call to observe vends an observer object, which automatically removes the KVO observer when it is destroyed. Because of this, we no longer have to remember to remove the observer ourselves (a major cause of bad behavior!). All we have to do instead is ensure that the observer objects lives as long as we need to continue the observation. Since playerItemObserver is a member variable, it will live until the BlurredVideoView is destroyed. However, once we are ready to play, we don’t need to continue the observation, and so we assign nil to the observer to destroy it.

All this has been setup, but now we get into the actual manipulation code in our display link callback:

    @objc func displayLinkUpdated(link: CADisplayLink) {
        // 1
        let time = output.itemTime(forHostTime: CACurrentMediaTime())
        guard output.hasNewPixelBuffer(forItemTime: time),
              let pixbuf = output.copyPixelBuffer(forItemTime: time, itemTimeForDisplay: nil) else { return }
        // 2
        let baseImg = CIImage(cvImageBuffer: pixbuf)
        let blurImg = baseImg.clampedToExtent().applyingGaussianBlur(sigma: blurRadius).cropped(to: baseImg.extent)
        // 3
        guard let cgImg = context.createCGImage(blurImg, from: baseImg.extent) else { return }

        layer.contents = cgImg
    }

    func stop() {
        player.rate = 0
        displayLink.invalidate()
    }
}

First, we get the current time of the video, then guard that there is a new frame and retrieve it if so. This yields us a CVPixelBuffer, a thin wrapper around a C-level struct that holds the raw image data. Next, (2) we create a CIImage using the CVPixelBuffer as the backing data, then apply our filter chain. Finally, (3) we ‘cook’ our CIImage to create a CGImage, which we then use as our layer’s contents, which draws it to the view using the standard Core Animation process.

The final method, stop(), stops video playback and also invalidates the display link, which removes it from its runtime.

The End. Any questions?

Yes, you in the back waving your arms and jumping up and down? You have a question? Whats what - why didn’t we use OpenGL and do all the image processing directly on the GPU? Yeah, I knew you’d ask that…

If you could just use the GPU, that'd be great

OpenGL and Metal

This seems like an easy choice - who doesn’t want to offload expensive image processing to the processor that literally exists for that purpose? We sure do! There are two methods for gaining access to the GPU on iOS devices. The first (and oldest) is to use OpenGL. Apple makes it easy by providing the GLKit framework and the EAGLContext, which hooks up to your CIContext to perform rendering directly onto the screen.

But really, who wants to use OpenGL when Apple has given us an even more powerful new graphics library, the Metal framework? Certainly not us! After getting some advice here and here, we went about changing our implementation to use Metal. We created a BlurredVideoMetalView class, which is a MTKView subclass. Most of our existing code could be reused, and we only had to change how we rendered and drew the images. The relevant changes are shown below:

class BlurredVideoMetalView: MTKView {
  /// AVFoundation properties
  private let colorSpace = CGColorSpaceCreateDeviceRGB()

  private lazy var commandQueue: MTLCommandQueue? = {
      return self.device!.makeCommandQueue()
  }()

  private lazy var content: CIContext = {
      return CIContext(mtlDevice: self.device!, options: [kCIContextWorkingColorSpace : NSNull()])
  }()

  private var image: CIImage? {
    didSet {
        draw()
    }
  }

  override init(frame frameRect: CGRect, device: MTLDevice?) {
    super.init(frame: frameRect, device: device ?? MTLCreateSystemDefaultDevice())
    setup()
  }

  required init(coder aDecoder: NSCoder) {
      super.init(coder: aDecoder)
      device = MTLCreateSystemDefaultDevice()
      setup()
  }

  private func setup() {
      framebufferOnly = false
      isPaused = false
      enableSetNeedsDisplay = false
  }

  /// play, setupDisplayLink and stop remain unchanged

The big thing here is that we now create our CIContext using a MTLDevice, specifically the MTKView's device. We set up this device in the constructor, either by passing in a MTLDevice, or getting the system default with MTLCreateSystemDefaultDevice. In the setup() method, we set isPaused and enableSetNeedsDisplay both to false, so that the view is only updated when we deliberately call draw(). We also expose a property, image, which calls draw() when it is set.

Then, instead of rendering the image in displayLinkUpdated(link:), we set this image, which triggers a draw event.

  @objc private func displayLinkUpdated(link: CADisplayLink) {
    let time = output.itemTime(forHostTime: CACurrentMediaTime())
    guard output.hasNewPixelBuffer(forItemTime: time),
          let pixbuf = output.copyPixelBuffer(forItemTime: time, itemTimeForDisplay: nil) else { return }
    let baseImg = CIImage(cvImageBuffer: pixbuf)
    image = baseImg.clampedToExtent().applyingGaussianBlur(sigma: blurRadius).cropped(to: baseImg.extent)
  }

  override func draw(_ rect: CGRect) {
      guard let image = image,
            let currentDrawable = currentDrawable,
            let commandBuffer = commandQueue?.makeCommandBuffer()
              else {
          return
      }
      let currentTexture = currentDrawable.texture
      let drawingBounds = CGRect(origin: .zero, size: drawableSize)

      let scaleX = drawableSize.width / image.extent.width
      let scaleY = drawableSize.height / image.extent.height
      let scaledImage = image.transformed(by: CGAffineTransform(scaleX: scaleX, y: scaleY))

      content.render(scaledImage, to: currentTexture, commandBuffer: commandBuffer, bounds: drawingBounds, colorSpace: colorSpace)

      commandBuffer.present(currentDrawable)
      commandBuffer.commit()
  }
}

We override the draw(_:) function to render our image. To do this, we need to get a MTLCommandBuffer, render the image onto it with CIContext.render(_:to:commandBuffer:bounds:colorSpace), present the buffer to the current drawable and commit the change to trigger the draw.

To use this class in the sample project, simply re-type the videoView outlet in the ViewController as a BlurredVideoMetalView, and also change the type in the Storyboard as well. Then run it on a device (not a simulator, as Metal is not implemented in the simulator).

When we profiled the Metal solution, however, we saw that it still was using a decent amount of CPU. Sure, it was half as much as the CPU-bound solution required, but it wasn’t entirely on the GPU. What really stuck out, though, was how badly it performed on older hardware. We tested it with an iPhone 5S, and the framerate dropped to a mere 6 frames per second as the GPU moaned in agony. With an iPhone 6S, it was hitting our target framerate without a problem.

We can do better, as long as we have newer hardware. With newer devices, Metal has more capabilities, both with improved hardware but also new software specifically designed for the new hardware. Particular, the Metal Performance Shader (MPS) framework provides a set of texture processing filters similar to CIFilter, but optimized for Metal. A good overview of MPS can be found here. So we went about replacing our CIFilters with MPSImageGaussianBlur. This is actually an approximation of a true Gaussian blur which is highly optimized to run on Metal. Per the documentation,

The Gaussian blur utilizes a very fast algorithm that typically runs at approximately half the speed of copy speeds. Notably, it is faster than either the tent or box blur except perhaps for very large filter windows. Mathematically, it is an approximate Gaussian. Some non-Gaussian behavior may be detectable with advanced analytical methods such as FFT.

Sounds good, as long as it blurs! Let's hook it up! We created a new class, BlurredVideoMPSView, which you can see in full here. This class is designed to use Metal Shaders when available, and to fall back to using Core Image when they’re not.

First, we import the framework and add some properties:

import UIKit
import MetalKit
import MetalPerformanceShaders
import AVKit

class BlurredVideoMPSView: MTKView {
  // unchanged properties omitted
  private var gaussianBlur: MPSImageGaussianBlur?

  var blurRadius: Double = 6.0 {
      didSet {
          createGaussianBlur()
      }
  }

  // omitted

  private func createGaussianBlur() {
    if let device = device, MPSSupportsMTLDevice(device) {
        gaussianBlur = MPSImageGaussianBlur(device: device, sigma: Float(blurRadius))
    }
}

We also call createGaussianBlur from the setup() method. The MPSImageGaussianBlur kernel is immutable, so we have to create a new one each time we change the blur (not to worry as it is a lightweight object). Before creating it, we ensure that our Metal device supports performance shaders. This way, if it does not, we default to using regular Metal. This, however, is not optimal - if the device is old enough that it doesn’t support MPS, it is probably also not going to perform well just using metal. A better alternative would be to default to the CPU-bound solution which, while more CPU heavy, results in much better framerates on old devices.

The big change comes in the displayLinkUpdated(link) and draw(_:) methods:

  @objc private func displayLinkUpdated(link: CADisplayLink) {
      let time = output.itemTime(forHostTime: CACurrentMediaTime())
      guard output.hasNewPixelBuffer(forItemTime: time),
            let pixbuf = output.copyPixelBuffer(forItemTime: time, itemTimeForDisplay: nil) else { return }
      let baseImg = CIImage(cvImageBuffer: pixbuf)

      if gaussianBlur != nil {
          image = baseImg
      } else {
          image = baseImg.clampedToExtent()
                  .applyingGaussianBlur(sigma: blurRadius)
                  .cropped(to: baseImg.extent)
      }
  }

  override func draw(_ rect: CGRect) {
      guard let image = image,
            let currentDrawable = currentDrawable,
            let commandBuffer = commandQueue?.makeCommandBuffer()
              else {
          return
      }
      let currentTexture = currentDrawable.texture
      let drawingBounds = CGRect(origin: .zero, size: drawableSize)

      let scaleX = drawableSize.width / image.extent.width
      let scaleY = drawableSize.height / image.extent.height
      let scaledImage = image.transformed(by: CGAffineTransform(scaleX: scaleX, y: scaleY))

      content.render(scaledImage, to: currentTexture, commandBuffer: commandBuffer, bounds: drawingBounds, colorSpace: colorSpace)
      commandBuffer.present(currentDrawable)

      if let gaussianBlur = gaussianBlur {
          // apply the gaussian blur with MPS
          let inplaceTexture = UnsafeMutablePointer<MTLTexture>.allocate(capacity: 1)
          inplaceTexture.initialize(to: currentTexture)
          gaussianBlur.encode(commandBuffer: commandBuffer, inPlaceTexture: inplaceTexture)
      }

      commandBuffer.commit()
  }

In displayLinkUpdated(link:), if we are using performance shaders, we don’t do anything with the image - just assign it as is. If we’re not using shaders, we have to continue to apply the CIFilters as before. Then, in draw(_:), we render the image to the current drawable. It will either be the base image (which we will transform in place), or the blurred image.

The magic happens inside the if block. After obtaining a pointer to the currentTexture (which we have drawn the base image to), we use our MPSImageGaussianBlur to do an in-place encode. And this is fast! CPU usage is down, framerates are up, and we’re all smiles.

Conclusion

We discovered that there are actually many ways to blur a video, each of which comes with its own caveats. If we had a local video, it could have been as easy as using AVPlayerItem.videoComposition, but as that doesn’t work with HTTP live streams, we had to craft our own solution. Our final solution using MetalKit and Metal Performance Shaders is screaming fast, as long as it’s run on capable hardware. But for older hardware, a CPU bound solution is also feasible and provides decent results (as long as you’re not trying to do other CPU intensive actions concurrently!). The project requirements also come into play - if the video to be blurred is short, or small, or isn’t expected to live very long, the CPU bound solution is probably good enough. If, however, the video is a major piece of your app, or will be played along with other computationally intensive operations, it is probably worth looking at the faster (albeit more complex) Metal solution.

Finally, this method is not limited solely to Gaussian blurs! Indeed, this can be generalized to any form of image manipulation with CoreImage or MetalKit. We’ve also used the same technique to apply gradients directly to the video. A useful extension of this class would be to extract most of the functionality into a base class, and move the image manipulation into a function (probably which takes one CIImage and returns a CIImage) that subclasses could override.

Resources

Besides the Apple documentation, we drew heavily from this objc.io article on Core Image and Video (especially the sample project). We also found this Realm talk informative. And there was this helpful StackOverflow thread. For inspiration into Metal Performance Shaders, this article was crucial. And for a great, short example of Swift 4.0’s KVO, check out this blog post!.