Curiosity’s Profile | Apple Developer Forums

How can I improve the speed of running a `VNDetectHumanBodyPoseRequest` on a `VNImageRequestHandler` for every `CMSampleBuffer` of an imported video?

Below, the sampleBufferProcessor closure is where the Vision body pose detection occurs. /// Transfers the sample data from the AVAssetReaderOutput to the AVAssetWriterInput, /// processing via a CMSampleBufferProcessor. /// /// - Parameters: /// - readerOutput: The source sample data. /// - writerInput: The destination for the sample data. /// - queue: The DispatchQueue. /// - completionHandler: The completion handler to run when the transfer finishes. /// - Tag: transferSamplesAsynchronously private func transferSamplesAsynchronously(from readerOutput: AVAssetReaderOutput, to writerInput: AVAssetWriterInput, onQueue queue: DispatchQueue, sampleBufferProcessor: SampleBufferProcessor, completionHandler: @escaping () -> Void) { /* The writerInput continously invokes this closure until finished or cancelled. It throws an NSInternalInconsistencyException if called more than once for the same writer. */ writerInput.requestMediaDataWhenReady(on: queue) { var isDone = false /* While the writerInput accepts more data, process the sampleBuffer and then transfer the processed sample to the writerInput. */ while writerInput.isReadyForMoreMediaData { if self.isCancelled { isDone = true break } // Get the next sample from the asset reader output. guard let sampleBuffer = readerOutput.copyNextSampleBuffer() else { // The asset reader output has no more samples to vend. isDone = true break } // Process the sample, if requested. do { try sampleBufferProcessor?(sampleBuffer) } catch { /* The `readingAndWritingDidFinish()` function picks up this error. */ self.sampleTransferError = error isDone = true } // Append the sample to the asset writer input. guard writerInput.append(sampleBuffer) else { /* The writer could not append the sample buffer. The `readingAndWritingDidFinish()` function handles any error information from the asset writer. */ isDone = true break } } if isDone { /* Calling `markAsFinished()` on the asset writer input does the following: 1. Unblocks any other inputs needing more samples. 2. Cancels further invocations of this "request media data" callback block. */ writerInput.markAsFinished() /* Tell the caller the reader output and writer input finished transferring samples. */ completionHandler() } } } The processor closure runs body pose detection on every sample buffer so that later in the VNDetectHumanBodyPoseRequest completion handler, VNHumanBodyPoseObservation results are fed into a custom Core ML action classifier. private func videoProcessorForActivityClassification() -> SampleBufferProcessor { let videoProcessor: SampleBufferProcessor = { sampleBuffer in do { let requestHandler = VNImageRequestHandler(cmSampleBuffer: sampleBuffer) try requestHandler.perform([self.detectHumanBodyPoseRequest]) } catch { print("Unable to perform the request: \(error.localizedDescription).") } } return videoProcessor } How could I improve the performance of this pipeline? After testing with an hour long 4K video at 60 FPS, it took several hours to process running as a Mac Catalyst app on M1 Max.

Developer Tools & Services General Performance Core ML Vision AVFoundation

1

0

1.1k

Jan ’22

Apple sample code "Detecting Human Actions in a Live Video Feed" - accessing the observations associated with an action prediction

Say I have an alert @State var showingAlert = false var body: some View { Text("Hello, world!") .alert("Here's an alert with multiple possible buttons.", isPresented: $showingAlert) { Button("OK") { } Button("Another button that may or may not show") { } } } How could I display the second button based only on some condition? I tried factoring out one button into fileprivate func extractedFunc() -> Button<Text> { return Button("OK") { } } and this would work for conditionally displaying the button content given a fixed number of buttons, but how could optionality of buttons be taken into account?

UI Frameworks SwiftUI SwiftUI

1

0

528

Jan ’22

Apple sample code "Detecting Human Actions in a Live Video Feed" - accessing the observations associated with an action prediction

I'm having trouble reasoning about and modifying the Detecting Human Actions in a Live Video Feed sample code since I'm new to Combine. // ---- [MLMultiArray?] -- [MLMultiArray?] ---- // Make an activity prediction from the window. .map(predictActionWithWindow) // ---- ActionPrediction -- ActionPrediction ---- // Send the action prediction to the delegate. .sink(receiveValue: sendPrediction) These are the final two operators of the video processing pipeline, where the action prediction occurs. In either the implementation for private func predictActionWithWindow(_ currentWindow: [MLMultiArray?]) -> ActionPrediction or for private func sendPrediction(_ actionPrediction: ActionPrediction), how might I access the results of a VNHumanBodyPoseRequest that's retrieved and scoped in a function called earlier in the daisy chain? When I did this imperatively, I accessed results in the VNDetectHumanBodyPoseRequest completion handler, but I'm not sure how data flow would work with Combine's programming model. I want to associate predictions with the observation results they're based on so that I can store the time range of a given prediction label.

Programming Languages Swift Swift Combine Core ML Vision

0

825

Jan ’22

General guidelines for improving body pose action classifier performance

I just got an app feature working where the user imports a video file, each frame is fed to a custom action classifier, and then only frames with a certain action classified are exported. However, I'm finding that testing a one hour 4K video at 60 FPS is taking an unreasonably long time - it's been processing for 7 hours now on a MacBook Pro with M1 Max running the Mac Catalyst app. Are there any techniques or general guidance that would help with improving performance? As much as possible I'd like to preserve the input video quality, especially frame rate. One hour length for the video is expected, as it's of a tennis session (could be anywhere from 10 minutes to a couple hours). I made the body pose action classifier with Create ML.

Developer Tools & Services General Performance Core ML Vision Create ML

2

0

1.3k

Jan ’22

How do you track when a VNVideoProcessor analysis is finished?

I'm adopting and transitioning to VNVideoProcessor away from performing Vision requests on individual frames, since it more concisely does the same. However, I'm not sure how to detect when analysis of a video is finished. Previously when reading frames with AVFoundation I could check with // Get the next sample from the asset reader output. guard let sampleBuffer = readerOutput.copyNextSampleBuffer() else { // The asset reader output has no more samples to vend. isDone = true break } What would be an equivalent when using VNVideoProcessor?

Machine Learning & AI General Vision

1

0

752

Jan ’22

What is a good way to track the time ranges associated with a particular action classifier label?

After creating a custom action classifier in Create ML, previewing it (see the bottom of the page) with an input video shows the label associated with a segment of the video. What would be a good way to store the duration for a given label, say, each CMTimeRange of segment of video frames that are classified as containing "Jumping Jacks?" I previously found that storing time ranges of trajectory results was convenient, since each VNTrajectoryObservation vended by Apple had an associated CMTimeRange. However, using my custom action classifier instead, each VNObservation result's CMTimeRange has a duration value that's always 0. func completionHandler(request: VNRequest, error: Error?) { guard let results = request.results as? [VNHumanBodyPoseObservation] else { return } if let result = results.first { storeObservation(result) } do { for result in results where try self.getLastTennisActionType(from: [result]) == .playing { var fileRelativeTimeRange = result.timeRange fileRelativeTimeRange.start = fileRelativeTimeRange.start - self.assetWriterStartTime self.timeRangesOfInterest[Int(fileRelativeTimeRange.start.seconds)] = fileRelativeTimeRange } } catch { print("Unable to perform the request: \(error.localizedDescription).") } } In this case I'm interested in frames with the label "Playing" and successfully classify them, but I'm not sure where to go from here to track the duration of video segments with consecutive frames that have that label.

Machine Learning & AI Core ML Core ML Core Media Create ML wwdc21-10039

0

744

Dec ’21

Trying to export time ranges from a composition using AVAssetExportSession - "Decode timestamp is earlier than previous sample's decode timestamp"

Modifying guidance given in an answer on AVFoundation + Vision trajectory detection, I'm instead saving time ranges of frames that have a specific ML label from my custom action classifier: private lazy var detectHumanBodyPoseRequest: VNDetectHumanBodyPoseRequest = { let detectHumanBodyPoseRequest = VNDetectHumanBodyPoseRequest(completionHandler: completionHandler) return detectHumanBodyPoseRequest }() var timeRangesOfInterest: [Int : CMTimeRange] = [:] private func readingAndWritingDidFinish(assetReaderWriter: AVAssetReaderWriter, asset completionHandler: @escaping FinishHandler) { if isCancelled { completionHandler(.success(.cancelled)) return } // Handle any error during processing of the video. guard sampleTransferError == nil else { assetReaderWriter.cancel() completionHandler(.failure(sampleTransferError!)) return } // Evaluate the result reading the samples. let result = assetReaderWriter.readingCompleted() if case .failure = result { completionHandler(result) return } /* Finish writing, and asynchronously evaluate the results from writing the samples. */ assetReaderWriter.writingCompleted { result in self.exportVideoTimeRanges(timeRanges: self.timeRangesOfInterest.map { $0.value }) { result in completionHandler(result) } } } func exportVideoTimeRanges(timeRanges: [CMTimeRange], completion: @escaping (Result<OperationStatus, Error>) -> Void) { let inputVideoTrack = self.asset.tracks(withMediaType: .video).first! let composition = AVMutableComposition() let compositionTrack = composition.addMutableTrack(withMediaType: .video, preferredTrackID: kCMPersistentTrackID_Invalid)! var insertionPoint: CMTime = .zero for timeRange in timeRanges { try! compositionTrack.insertTimeRange(timeRange, of: inputVideoTrack, at: insertionPoint) insertionPoint = insertionPoint + timeRange.duration } let exportSession = AVAssetExportSession(asset: composition, presetName: AVAssetExportPresetHighestQuality)! try? FileManager.default.removeItem(at: self.outputURL) exportSession.outputURL = self.outputURL exportSession.outputFileType = .mov exportSession.exportAsynchronously { var result: Result<OperationStatus, Error> switch exportSession.status { case .completed: result = .success(.completed) case .cancelled: result = .success(.cancelled) case .failed: // The `error` property is non-nil in the `.failed` status. result = .failure(exportSession.error!) default: fatalError("Unexpected terminal export session status: \(exportSession.status).") } print("export finished: \(exportSession.status.rawValue) - \(exportSession.error)") completion(result) } } This worked fine with results vended from Apple's trajectory detection, but using my custom action classifier TennisActionClassifier (Core ML model exported from Create ML), I get the console error getSubtractiveDecodeDuration signalled err=-16364 (kMediaSampleTimingGeneratorError_InvalidTimeStamp) (Decode timestamp is earlier than previous sample's decode timestamp.) at MediaSampleTimingGenerator.c:180. Why might this be?

Machine Learning & AI Core ML Core ML Vision AVFoundation Core Media

0

835

Dec ’21

After exporting an action classifier from Create ML and importing it into Xcode, how do you use it do make predictions?

I followed Apple's guidance in their articles Creating an Action Classifier Model, Gathering Training Videos for an Action Classifier, and Building an Action Classifier Data Source. With this Core ML model file now imported in Xcode, how do use it to classify video frames? For each video frame I call do { let requestHandler = VNImageRequestHandler(cmSampleBuffer: sampleBuffer) try requestHandler.perform([self.detectHumanBodyPoseRequest]) } catch { print("Unable to perform the request: \(error.localizedDescription).") } But it's unclear to me how to use the results of the VNDetectHumanBodyPoseRequest which come back as the type [VNHumanBodyPoseObservation]?. How would I feed to the results into my custom classifier, which has an automatically generated model class TennisActionClassifier.swift? The classifier is for making predictions on the frame's body poses, labeling the actions as either playing a rally/point or not playing.

Machine Learning & AI Core ML Core ML Create ML wwdc21-10037 wwdc21-10040

0

642

Dec ’21

What would be the most accurate way to classify both playing a tennis rally and the downtime in between rallies?

My goal is to mark any tennis video's timestamps of both the start of each rally/point and the end of each rally/point. I tried trajectory detection, but the "end time" is when the ball bounces rather than when the rally/point ends. I'm not quite sure what direction to go from here to improve on this. Would action classification of body poses in each frame (two classes, "playing" and "not playing") be the best way to split the video into segments? A different technique?

Machine Learning & AI Core ML Core ML Vision Create ML

0

544

Dec ’21

Accurately getting timestamps for start and end of tennis rallies

I'm building a feature to automatically edit out all the downtime of a tennis video. I have a partial implementation that stores the start and end times of Vision trajectory detections and writes only those segments to an AVFoundation export session. I've encountered a major issue, which is that the trajectories returned end whenever the ball bounce, so each segment is just one tennis shot and nowhere close to an entire rally with multiple bounces. I'm ensure if I should continue done the trajectory route, maybe stitching together the trajectories and somehow only splitting at the start and end of a rally. Any general guidance would be appreciated. Is there a different Vision or ML approach that would more accurately model the start and end time of a rally? I considered creating a custom action classifier to classify frames to be either "playing tennis" or "inactivity," but I started with Apple's trajectory detection since it was already built and trained. Maybe a custom classifier would be needed, but not sure.

Machine Learning & AI Core ML Core ML Vision wwdc21-10039 wwdc21-10040

0

617

Dec ’21

Selectively reading sample buffers from specific time ranges and then writing them to an asset writer - why is the AVPlayer stuck loading?

Given an AVAsset, I'm performing a Vision trajectory request on it and would like to write out a video asset that only contains frames with trajectories (filter out downtime in sports footage where there's no ball moving). I'm unsure what would be a good approach, but as a starting point I tried the following pipeline: Copy sample buffer from the source AVAssetReaderOutput. Perform trajectory request on a vision handler parameterized by the sample buffer. For each resulting VNTrajectoryObservation (trajectory detected), use its associated CMTimeRange to configure a new AVAssetReader set to that time range. Append the time range constrained sample buffer to one AVAssetWriterInput until the forEach is complete. In code: private func transferSamplesAsynchronously(from readerOutput: AVAssetReaderOutput, to writerInput: AVAssetWriterInput, onQueue queue: DispatchQueue, sampleBufferProcessor: SampleBufferProcessor, completionHandler: @escaping () -> Void) { /* The writerInput continously invokes this closure until finished or cancelled. It throws an NSInternalInconsistencyException if called more than once for the same writer. */ writerInput.requestMediaDataWhenReady(on: queue) { var isDone = false /* While the writerInput accepts more data, process the sampleBuffer and then transfer the processed sample to the writerInput. */ while writerInput.isReadyForMoreMediaData { if self.isCancelled { isDone = true break } // Get the next sample from the asset reader output. guard let sampleBuffer = readerOutput.copyNextSampleBuffer() else { // The asset reader output has no more samples to vend. isDone = true break } let visionHandler = VNImageRequestHandler(cmSampleBuffer: sampleBuffer, orientation: self.orientation, options: [:]) do { try visionHandler.perform([self.detectTrajectoryRequest]) if let results = self.detectTrajectoryRequest.results { try results.forEach { result in let assetReader = try AVAssetReader(asset: self.asset) assetReader.timeRange = result.timeRange let trackOutput = AVTrackOutputs.firstTrackOutput(ofType: .video, fromTracks: self.asset.tracks, withOutputSettings: nil) assetReader.add(trackOutput) assetReader.startReading() guard let sampleBuffer = trackOutput.copyNextSampleBuffer() else { // The asset reader output has no more samples to vend. isDone = true return } // Append the sample to the asset writer input. guard writerInput.append(sampleBuffer) else { /* The writer could not append the sample buffer. The `readingAndWritingDidFinish()` function handles any error information from the asset writer. */ isDone = true return } } } } catch { print(error) } } if isDone { /* Calling `markAsFinished()` on the asset writer input does the following: 1. Unblocks any other inputs needing more samples. 2. Cancels further invocations of this "request media data" callback block. */ writerInput.markAsFinished() /* Tell the caller the reader output and writer input finished transferring samples. */ completionHandler() } } } private func readingAndWritingDidFinish(assetReaderWriter: AVAssetReaderWriter, completionHandler: @escaping FinishHandler) { if isCancelled { completionHandler(.success(.cancelled)) return } // Handle any error during processing of the video. guard sampleTransferError == nil else { assetReaderWriter.cancel() completionHandler(.failure(sampleTransferError!)) return } // Evaluate the result reading the samples. let result = assetReaderWriter.readingCompleted() if case .failure = result { completionHandler(result) return } /* Finish writing, and asynchronously evaluate the results from writing the samples. */ assetReaderWriter.writingCompleted { result in completionHandler(result) return } } When run I get the following: No error is caught in the first catch clause, and none are caught in private func readingAndWritingDidFinish(assetReaderWriter: AVAssetReaderWriter, completionHandler: @escaping FinishHandler), the completion handler is called. Help with any of the following questions would be appreciated: What is causing what appears to be indefinite loading? How might I isolate the problem further? Am I misusing or misunderstanding how to selectively read from time ranges of AVAssetReader objects? Should I forego the AVAssetReader / AVAsssetWriter route entirely, and use the time ranges with AVAssetExportSession instead? I don't know how the two approaches compare, or what to consider when choosing between the two.

Programming Languages Swift Swift Vision AVFoundation Core Media

1

0

978

Dec ’21

Strategy for avoiding of removing duplicate Vision trajectory request results

I am saving time ranges from an input video asset where trajectories are found, then exporting only those segments to an output video file. Currently I track these time ranges in a stored property var timeRangesOfInterest: [Double : CMTimeRange], which is set in the trajectory request's completion handler func completionHandler(request: VNRequest, error: Error?) { guard let request = request as? VNDetectTrajectoriesRequest else { return } if let results = request.results, results.count > 0 { for result in results { var timeRange = result.timeRange timeRange.start = timeRange.start - self.assetWriterStartTime self.timeRangesOfInterest[timeRange.start.seconds] = timeRange } } } Then these time ranges of interest are used in an export session to only export those segments /* Finish writing, and asynchronously evaluate the results from writing the samples. */ assetReaderWriter.writingCompleted { result in self.exportVideoTimeRanges(timeRanges: self.timeRangesOfInterest.map { $0.1 }) { result in completionHandler(result) } } Unfortunately however, I'm getting repeated trajectory video segments in the outputted video. Is this maybe because trajectory requests return "in progress" repeated trajectory results with slightly different time range start times? What might be a good strategy for avoiding or removing them? I noticed trajectory segments will appear out of order in the output as well.

Media Technologies Audio Vision AVFoundation

0

852

Dec ’21

What is a good way to reset a UIPanGestureRecognizer's view to its original frame once the pan gesture is done?

Say you have a pinch gesture recognizer and pan gesture recognizer on an image view: @IBAction func pinchPiece(_ pinchGestureRecognizer: UIPinchGestureRecognizer) { guard pinchGestureRecognizer.state == .began || pinchGestureRecognizer.state == .changed, let piece = pinchGestureRecognizer.view else { // After pinch releases, zoom back out. if pinchGestureRecognizer.state == .ended { UIView.animate(withDuration: 0.3, animations: { pinchGestureRecognizer.view?.transform = CGAffineTransform.identity }) } return } adjustAnchor(for: pinchGestureRecognizer) let scale = pinchGestureRecognizer.scale piece.transform = piece.transform.scaledBy(x: scale, y: scale) pinchGestureRecognizer.scale = 1 // Clear scale so that it is the right delta next time. } @IBAction func panPiece(_ panGestureRecognizer: UIPanGestureRecognizer) { guard panGestureRecognizer.state == .began || panGestureRecognizer.state == .changed, let piece = panGestureRecognizer.view else { return } let translation = panGestureRecognizer.translation(in: piece.superview) piece.center = CGPoint(x: piece.center.x + translation.x, y: piece.center.y + translation.y) panGestureRecognizer.setTranslation(.zero, in: piece.superview) } public func gestureRecognizer(_ gestureRecognizer: UIGestureRecognizer, shouldRecognizeSimultaneouslyWith otherGestureRecognizer: UIGestureRecognizer) -> Bool { true } The pinch gesture's view resets to its original state after the gesture is done, which occurs in its else clause. What would be a good way to do the same for the pan gesture recognizer? Ideally I'd like the gesture recognizers to be in an extension of UIImageView, which would also mean that I can't add a store property to the extension for tracking the initial state of the image view.

UI Frameworks UIKit Swift UIKit

5

0

1.2k

Dec ’21

How do restrict pan gesture recognizers to when a pinch gesture is occurring?

How do you only accept pan gestures when the user is in the process of a pinch gesture? In other words, I'd like to avoid delivering one finger pan gestures. @IBAction func pinchPiece(_ pinchGestureRecognizer: UIPinchGestureRecognizer) { guard pinchGestureRecognizer.state == .began || pinchGestureRecognizer.state == .changed, let piece = pinchGestureRecognizer.view else { // After pinch releases, zoom back out. if pinchGestureRecognizer.state == .ended { UIView.animate(withDuration: 0.3, animations: { pinchGestureRecognizer.view?.transform = CGAffineTransform.identity }) } return } adjustAnchor(for: pinchGestureRecognizer) let scale = pinchGestureRecognizer.scale piece.transform = piece.transform.scaledBy(x: scale, y: scale) pinchGestureRecognizer.scale = 1 // Clear scale so that it is the right delta next time. } @IBAction func panPiece(_ panGestureRecognizer: UIPanGestureRecognizer) { guard panGestureRecognizer.state == .began || panGestureRecognizer.state == .changed, let piece = panGestureRecognizer.view else { return } let translation = panGestureRecognizer.translation(in: piece.superview) piece.center = CGPoint(x: piece.center.x + translation.x, y: piece.center.y + translation.y) panGestureRecognizer.setTranslation(.zero, in: piece.superview) } public func gestureRecognizer(_ gestureRecognizer: UIGestureRecognizer, shouldRecognizeSimultaneouslyWith otherGestureRecognizer: UIGestureRecognizer) -> Bool { true }

UI Frameworks UIKit Swift UIKit

1

0

876

Nov ’21

Trying to modernize and get working Apple's sample code AVReaderWriter. Why is this `DispatchGroup` work block not being called?

Apple's sample code "AVReaderWriter: Offline Audio / Video Processing" has the following listing let writingGroup = dispatch_group_create() // Transfer data from input file to output file. self.transferVideoTracks(videoReaderOutputsAndWriterInputs, group: writingGroup) self.transferPassthroughTracks(passthroughReaderOutputsAndWriterInputs, group: writingGroup) // Handle completion. let queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0) dispatch_group_notify(writingGroup, queue) { // `readingAndWritingDidFinish()` is guaranteed to call `finish()` exactly once. self.readingAndWritingDidFinish(assetReader, assetWriter: assetWriter) } in CynanifyOperation.swift (an NSOperation subclass that stylizes imported video and exports it). How would I get about writing this part in modern Swift so that it compiles and works? I've tried writing this as let writingGroup = DispatchGroup() // Transfer data from input file to output file. self.transferVideoTracks(videoReaderOutputsAndWriterInputs: videoReaderOutputsAndWriterInputs, group: writingGroup) self.transferPassthroughTracks(passthroughReaderOutputsAndWriterInputs: passthroughReaderOutputsAndWriterInputs, group: writingGroup) // Handle completion. writingGroup.notify(queue: .global()) { // `readingAndWritingDidFinish()` is guaranteed to call `finish()` exactly once. self.readingAndWritingDidFinish(assetReader: assetReader, assetWriter: assetWriter) } However, it's taking an extremely long time for self.readingAndWritingDidFinish(assetReader: assetReader, assetWriter: assetWriter) to be called, and my UI is stuck in the ProgressViewController with a loading spinner. Is there something I wrote incorrectly or missed conceptually in the Swift 5 version?

Programming Languages Swift Swift AVFoundation

1

0

488

Nov ’21

Curiosity

Post

Replies

Boosts

Views

Activity