Camera app lags after switching from AVCaptureVideoPreviewLayer to MTKView

I have an app that takes photos and videos, then does some transforms and adds some text, then saves. It worked very smoothly until I decided to switch to Metal, refactoring my app to use MTKView rather than AVCaptureVideoPreviewLayer (based on Apple’s AVCamFilter demo app). Now, the transform step causes a significant lag (1-2 seconds) in the camera view.

Using Instruments, I can confirm this hang is due to the steps where I do the transforms and add text. What’s interesting is that if I profile the old non-Metal version, it too registers a hang at the same time, however in the old version that hang doesn’t translate to a lag in the camera view.

Here’s the code for the transforms, which takes place in didFinishProcessingPhoto. This is the Metal version, but starting with var latitude = 0.0 it’s the same code.

func photoOutput(_ output: AVCapturePhotoOutput, didFinishProcessingPhoto photo: AVCapturePhoto, error: Error?) {
    guard let photoPixelBuffer = photo.pixelBuffer else {
        print("Error occurred while capturing photo: Missing pixel buffer (\(String(describing: error)))")
        return
    }
    
    
    startSpinnerInGallery?()
    
    var photoFormatDescription: CMFormatDescription?
    CMVideoFormatDescriptionCreateForImageBuffer(allocator: kCFAllocatorDefault, imageBuffer: photoPixelBuffer, formatDescriptionOut: &photoFormatDescription)
    
    processingQueue.async {
        var finalPixelBuffer = photoPixelBuffer
        if let filter = self.photoFilter {
            if !filter.isPrepared {
                if let unwrappedPhotoFormatDescription = photoFormatDescription {
                    filter.prepare(with: unwrappedPhotoFormatDescription, outputRetainedBufferCountHint: 2)
                }
            }
            
            guard let filteredPixelBuffer = filter.render(pixelBuffer: finalPixelBuffer) else {
                print("Unable to filter photo buffer")
                return
            }
            finalPixelBuffer = filteredPixelBuffer
        }
        
        let metadataAttachments: CFDictionary = photo.metadata as CFDictionary
        guard let data = CameraViewMetal.jpegData(withPixelBuffer: finalPixelBuffer, attachments: metadataAttachments) else {
            print("Unable to create JPEG photo")
            return
        }
        
        var latitude = 0.0
        var longitude = 0.0
        let locationManager = CLLocationManager()
        if let location = locationManager.location {
            latitude = location.coordinate.latitude
            longitude = location.coordinate.longitude
        }

        self.backgroundTask = UIApplication.shared.beginBackgroundTask { [weak self] in
          print("iOS has signaled time has expired")
        }


        Task.init {
            let croppedImage = self.cropImageAndAddBackground(data: data)
            let croppedData = croppedImage.jpegData(compressionQuality: 0.6)!
            let fkStoredAsset = FKStoredAsset(assetType: .photo, videoData: emptyData, imageData: croppedData, latitude: latitude, longitude: longitude, geocodeString: "", date: Date(), customTitle: "", notes: "", capturedSettings: saveState)

            // SAVE TO FILESYSTEM
            var pathComponent = "\(fkStoredAsset.dateStringIncludingSeconds)"
            let fileURL = pathWithFKAssetsFolder(component: pathComponent)
            if saveToFilesystem == true {
                var retrievedCurrentScene = currentScene
                retrievedCurrentScene.assetPathComponents.append(pathComponent)
                allProjects[currentProjectIndex].scenes[currentSceneIndex] = retrievedCurrentScene

                let encodedAsset = try? JSONEncoder().encode(fkStoredAsset)
                do {
                    try encodedAsset!.write(to: fileURL)
                    print("^^File saved: \(fileURL)")
                } catch {
                    print("^^error in saving to filesystem: \(error)")
                }
            }
            
            // GEOCODE, TRANSFORMS, AND RE-SAVE TO FILESYSTEM
            // we do an initial save without the transforms in case someone wants to see it fast. Then we take the time to geocode in the background, do the transforms + text, and re-save.

            // set up asset
            var asset = emptyAsset
            asset = fkStoredAsset

            // add street address geocode string
            if streetAddressGeocoding == true && (fkStoredAsset.longitude != 0 && fkStoredAsset.latitude != 0) {
                let locationString = try await self.fetchLocation(latitude: fkStoredAsset.latitude, longitude: fkStoredAsset.longitude)
                asset.geocodeString = locationString
            }

            let metadataView = createMetadataView(fkStoredAsset: asset)
                let imageWithMetadata = addMetadataViewToImage(fkStoredAsset: asset, metadataView: metadataView)
                asset.imageDataWithMetadataStrip = imageWithMetadata

                if saveToFilesystem == true {
                    overwriteSavedAssetWithGeocodedAsset(asset: asset, url: fileURL)
                }

            func overwriteSavedAssetWithGeocodedAsset(asset: FKStoredAsset, url: URL) {
                let encodedAsset = try? JSONEncoder().encode(asset)
                do {
                    try encodedAsset!.write(to: url)
                    print("File saved: \(url)")
                    saveUserData()
                } catch {
                    print(error.localizedDescription)
                }
            }

        

            UIGraphicsEndImageContext()
            UIApplication.shared.endBackgroundTask(self.backgroundTask)
            self.backgroundTask = .invalid
        }

    }
}

Instruments says the biggest culprit is cropImageAndAddBackground:

func cropImageAndAddBackground(data: Data) -> UIImage {
    
    let image = UIImage(data: data)!
    
    var height = image.size.height
    var width = image.size.width
    var sensor = SensorSize(width: 0, height: 0, name: "")
    sensor = currentSensorSizeWithAccessories
    let aspect = sensor.aspect
    if sensorBackgroundLeading.isActive == true {
        height = width / aspect
    } else {
        width = height * aspect
    }
    let frame = CGRect(x: 0, y: 0, width: width, height: height)
    let croppedImageView = UIView(frame: frame)
    let imageView = UIImageView(frame: frame)
    
    imageView.image = image
    croppedImageView.insertSubview(imageView, at: 0)
    imageView.center = croppedImageView.center
    imageView.clipsToBounds = true
    imageView.contentMode = .scaleAspectFill
    croppedImageView.clipsToBounds = true
    
    let gridView = UIImageView(frame: frame)
    let pincushion = UIImage(named: "Pincushion Grid", in: Bundle(identifier:"com.grayhour.GrayHourFrameworks"), compatibleWith: nil)
    gridView.image = pincushion
    croppedImageView.insertSubview(gridView, at: 0)
    gridView.center = croppedImageView.center
    gridView.contentMode = .scaleAspectFill
    tintImage(icon: gridView, color: Colors.specLabelInset)
    croppedImageView.backgroundColor = Colors.specLabel
    
    let overlaysWereVisible = !overlayView.isHidden
    if burnInOverlays == false { overlayView.isHidden = true }
    UIGraphicsBeginImageContextWithOptions(framelineView.frame.size, false, 1.0)
    framelineView.layer.render(in: UIGraphicsGetCurrentContext()!)
    let overlays = UIGraphicsGetImageFromCurrentImageContext()
    UIGraphicsEndImageContext()
    let view = UIImageView(frame: frame)
    view.image = overlays
    croppedImageView.insertSubview(view, aboveSubview: imageView)
    view.center = croppedImageView.center
    view.contentMode = .scaleAspectFill
    if overlaysWereVisible { overlayView.isHidden = false }
    
    
    let factor = currentActiveImageWidth / sensorViewForBackground.bounds.width
    imageView.transform = imageView.transform.scaledBy(x: factor, y: factor)
    
    
    UIGraphicsBeginImageContextWithOptions(croppedImageView.frame.size, true, 1.0)
    croppedImageView.layer.render(in: UIGraphicsGetCurrentContext()!)
    let croppedImageWithBackground = UIGraphicsGetImageFromCurrentImageContext()!
    return croppedImageWithBackground
}

Followed by addMetadataViewToImage at almost the same amount of hang time:

func addMetadataViewToImage(fkStoredAsset: FKStoredAsset, metadataView: ImageMetadataView) -> Data {

var image = UIImage(data: fkStoredAsset.imageData)!
var height = image.size.height
var width = image.size.width
let frame = CGRect(x: 0, y: 0, width: width, height: height)
let view = UIView(frame: frame)
var imageView = UIImageView(frame: frame)

imageView.image = image
view.insertSubview(imageView, at: 0)
imageView.center = view.center
imageView.clipsToBounds = true
imageView.contentMode = .scaleAspectFill
view.clipsToBounds = true

let metadataImage = metadataView.makeImage()
let metadataImageAspect = metadataImage.size.height / metadataImage.size.width
let requiredMetadataHeight = imageView.bounds.width * metadataImageAspect
let metadataImageView = UIImageView(frame: CGRect(x: 0, y: 0, width: imageView.bounds.width, height: requiredMetadataHeight))
metadataImageView.image = metadataImage

let frameWithMetadata = CGRect(x: 0, y: 0, width: imageView.bounds.width, height: imageView.bounds.height + requiredMetadataHeight)
let croppedImageWithMetadata = UIView(frame: frameWithMetadata)
croppedImageWithMetadata.insertSubview(imageView, at: 0)
croppedImageWithMetadata.insertSubview(metadataImageView, at: 1)
metadataImageView.center = CGPoint(x: croppedImageWithMetadata.center.x, y: croppedImageWithMetadata.bounds.height - requiredMetadataHeight/2)
croppedImageWithMetadata.backgroundColor = Colors.specLabel
UIGraphicsBeginImageContextWithOptions(croppedImageWithMetadata.frame.size, true, 1.0)
croppedImageWithMetadata.layer.render(in: UIGraphicsGetCurrentContext()!)
let croppedImage = UIGraphicsGetImageFromCurrentImageContext()
UIGraphicsEndImageContext()

let imageData = addGPSMetadatatoImage(image: croppedImage!, latitude: fkStoredAsset.latitude, longitude: fkStoredAsset.longitude)

return imageData

But just to re-iterate: the same functions don’t cause lag in the non-metal version. So I have to wonder if it’s the content of these functions or the fact that they need to be called in a different way.

And just to be clear, I switched to Metal so I could support CIFilters in the camera preview, so there’s no easy alternative for me using AVCaptureVideoPreviewLayer like I was before.

I can put together a minimum reproducible project if this code isn’t enough to diagnose.

Thank you!

Leave a Comment