Curiosity’s Profile | Apple Developer Forums

When making an AVFoundation video copy, how do you only add particular ranges of the original video for which there exist trajectories

I've been looking through Apple's sample code Building a Feature-Rich App for Sports Analysis - https://developer.apple.com/documentation/vision/building_a_feature-rich_app_for_sports_analysis and its associated WWDC video to learn to reason about AVFoundation and VNDetectTrajectoriesRequest - https://developer.apple.com/documentation/vision/vndetecttrajectoriesrequest. My goal is to allow the user to import videos (this part I have working, the user sees a UIDocumentBrowserViewController - https://developer.apple.com/documentation/uikit/uidocumentbrowserviewcontroller, picks a video file, and then a copy is made), but I only want segments of the original video copied where trajectories are detected from a ball moving. I've tried as best I can to grasp the two parts, at the very least finding where the video copy is made and where the trajectory request is made. The full video copy happens in CameraViewController.swift (I'm starting with just imported video for now and not reading live from the device's video camera), line 160:func startReadingAsset(_ asset: AVAsset) { videoRenderView = VideoRenderView(frame: view.bounds) setupVideoOutputView(videoRenderView) let displayLink = CADisplayLink(target: self, selector: #selector(handleDisplayLink(:))) displayLink.preferredFramesPerSecond = 0 displayLink.isPaused = true displayLink.add(to: RunLoop.current, forMode: .default) guard let track = asset.tracks(withMediaType: .video).first else { AppError.display(AppError.videoReadingError(reason: "No video tracks found in AVAsset."), inViewController: self) return } let playerItem = AVPlayerItem(asset: asset) let player = AVPlayer(playerItem: playerItem) let settings = [ String(kCVPixelBufferPixelFormatTypeKey): kCVPixelFormatType420YpCbCr8BiPlanarFullRange ] let output = AVPlayerItemVideoOutput(pixelBufferAttributes: settings) playerItem.add(output) player.actionAtItemEnd = .pause player.play() self.displayLink = displayLink self.playerItemOutput = output self.videoRenderView.player = player let affineTransform = track.preferredTransform.inverted() let angleInDegrees = atan2(affineTransform.b, affineTransform.a) * CGFloat(180) / CGFloat.pi var orientation: UInt32 = 1 switch angleInDegrees { case 0: orientation = 1 // Recording button is on the right case 180, -180: orientation = 3 // abs(180) degree rotation recording button is on the right case 90: orientation = 8 // 90 degree CW rotation recording button is on the top case -90: orientation = 6 // 90 degree CCW rotation recording button is on the bottom default: orientation = 1 } videoFileBufferOrientation = CGImagePropertyOrientation(rawValue: orientation)! videoFileFrameDuration = track.minFrameDuration displayLink.isPaused = false } @objc private func handleDisplayLink(_ displayLink: CADisplayLink) { guard let output = playerItemOutput else { return } videoFileReadingQueue.async { let nextTimeStamp = displayLink.timestamp + displayLink.duration let itemTime = output.itemTime(forHostTime: nextTimeStamp) guard output.hasNewPixelBuffer(forItemTime: itemTime) else { return } guard let pixelBuffer = output.copyPixelBuffer(forItemTime: itemTime, itemTimeForDisplay: nil) else { return } // Create sample buffer from pixel buffer var sampleBuffer: CMSampleBuffer? var formatDescription: CMVideoFormatDescription? CMVideoFormatDescriptionCreateForImageBuffer(allocator: nil, imageBuffer: pixelBuffer, formatDescriptionOut: &formatDescription) let duration = self.videoFileFrameDuration var timingInfo = CMSampleTimingInfo(duration: duration, presentationTimeStamp: itemTime, decodeTimeStamp: itemTime) CMSampleBufferCreateForImageBuffer(allocator: nil, imageBuffer: pixelBuffer, dataReady: true, makeDataReadyCallback: nil, refcon: nil, formatDescription: formatDescription!, sampleTiming: &timingInfo, sampleBufferOut: &sampleBuffer) if let sampleBuffer = sampleBuffer { self.outputDelegate?.cameraViewController(self, didReceiveBuffer: sampleBuffer, orientation: self.videoFileBufferOrientation) DispatchQueue.main.async { let stateMachine = self.gameManager.stateMachine if stateMachine.currentState is GameManager.SetupCameraState { // Once we received first buffer we are ready to proceed to the next state stateMachine.enter(GameManager.DetectingBoardState.self) } } } } } Line 139 self.outputDelegate?.cameraViewController(self, didReceiveBuffer: sampleBuffer, orientation: self.videoFileBufferOrientation) is where the video sample buffer is passed to the Vision framework subsystem for analyzing trajectories, the second part. This delegate callback is implemented in GameViewController.swift on line 335: // Perform the trajectory request in a separate dispatch queue. trajectoryQueue.async { do { try visionHandler.perform([self.detectTrajectoryRequest]) if let results = self.detectTrajectoryRequest.results { DispatchQueue.main.async { self.processTrajectoryObservations(controller, results) } } } catch { AppError.display(error, inViewController: self) } } Trajectories found are drawn over the video in self.processTrajectoryObservations(controller, results). Where I'm stuck now is modifying this so that instead of drawing the trajectories, the new video only copies parts of the original video to it where trajectories were detected in the frame.

Media Technologies Audio Vision AVFoundation

0

1.1k

Apr ’21

Constrain view's top anchor to just at the edge of sensor housing

What might be a good way to constrain a view's top anchor to be just at the edge of a device's Face ID sensor housing if it has one? This view is a product photo that would be clipped too much if it ignored the top safe area inset, but if it was positioned relative to the top safe area margin this wouldn't be ideal either because of the slight gap between the sensor housing and the view (the view is a photo of pants cropped at the waist). What might be a good approach here?

UI Frameworks UIKit UIKit Auto Layout

0

1.1k

Jul ’23

Migrating from value semantics model stored on iCloud to Core Data model

I have an app that currently depends on fetching the model through CloudKit, and is composed of value types. I'm considering adding Core Data support so that record modifications are robust regardless of network conditions. Core Data resources seem to always assume a model layer with reference semantics, so I'm not sure where to begin. Should I keep my top-level model type a struct? Can I? If I move my model to reference semantics, how might I bridge from past model instances that are fetched through CloudKit and then decoded? Thank you in advance.

Programming Languages Swift Swift CloudKit Core Data

0

566

Sep ’21

Strategy for avoiding of removing duplicate Vision trajectory request results

I am saving time ranges from an input video asset where trajectories are found, then exporting only those segments to an output video file. Currently I track these time ranges in a stored property var timeRangesOfInterest: [Double : CMTimeRange], which is set in the trajectory request's completion handler func completionHandler(request: VNRequest, error: Error?) { guard let request = request as? VNDetectTrajectoriesRequest else { return } if let results = request.results, results.count > 0 { for result in results { var timeRange = result.timeRange timeRange.start = timeRange.start - self.assetWriterStartTime self.timeRangesOfInterest[timeRange.start.seconds] = timeRange } } } Then these time ranges of interest are used in an export session to only export those segments /* Finish writing, and asynchronously evaluate the results from writing the samples. */ assetReaderWriter.writingCompleted { result in self.exportVideoTimeRanges(timeRanges: self.timeRangesOfInterest.map { $0.1 }) { result in completionHandler(result) } } Unfortunately however, I'm getting repeated trajectory video segments in the outputted video. Is this maybe because trajectory requests return "in progress" repeated trajectory results with slightly different time range start times? What might be a good strategy for avoiding or removing them? I noticed trajectory segments will appear out of order in the output as well.

Media Technologies Audio Vision AVFoundation

0

844

Dec ’21

Accurately getting timestamps for start and end of tennis rallies

I'm building a feature to automatically edit out all the downtime of a tennis video. I have a partial implementation that stores the start and end times of Vision trajectory detections and writes only those segments to an AVFoundation export session. I've encountered a major issue, which is that the trajectories returned end whenever the ball bounce, so each segment is just one tennis shot and nowhere close to an entire rally with multiple bounces. I'm ensure if I should continue done the trajectory route, maybe stitching together the trajectories and somehow only splitting at the start and end of a rally. Any general guidance would be appreciated. Is there a different Vision or ML approach that would more accurately model the start and end time of a rally? I considered creating a custom action classifier to classify frames to be either "playing tennis" or "inactivity," but I started with Apple's trajectory detection since it was already built and trained. Maybe a custom classifier would be needed, but not sure.

Machine Learning & AI Core ML Core ML Vision wwdc21-10039 wwdc21-10040

0

613

Dec ’21

What would be the most accurate way to classify both playing a tennis rally and the downtime in between rallies?

My goal is to mark any tennis video's timestamps of both the start of each rally/point and the end of each rally/point. I tried trajectory detection, but the "end time" is when the ball bounces rather than when the rally/point ends. I'm not quite sure what direction to go from here to improve on this. Would action classification of body poses in each frame (two classes, "playing" and "not playing") be the best way to split the video into segments? A different technique?

Machine Learning & AI Core ML Core ML Vision Create ML

0

537

Dec ’21

After exporting an action classifier from Create ML and importing it into Xcode, how do you use it do make predictions?

I followed Apple's guidance in their articles Creating an Action Classifier Model, Gathering Training Videos for an Action Classifier, and Building an Action Classifier Data Source. With this Core ML model file now imported in Xcode, how do use it to classify video frames? For each video frame I call do { let requestHandler = VNImageRequestHandler(cmSampleBuffer: sampleBuffer) try requestHandler.perform([self.detectHumanBodyPoseRequest]) } catch { print("Unable to perform the request: \(error.localizedDescription).") } But it's unclear to me how to use the results of the VNDetectHumanBodyPoseRequest which come back as the type [VNHumanBodyPoseObservation]?. How would I feed to the results into my custom classifier, which has an automatically generated model class TennisActionClassifier.swift? The classifier is for making predictions on the frame's body poses, labeling the actions as either playing a rally/point or not playing.

Machine Learning & AI Core ML Core ML Create ML wwdc21-10037 wwdc21-10040

0

637

Dec ’21

Trying to export time ranges from a composition using AVAssetExportSession - "Decode timestamp is earlier than previous sample's decode timestamp"

Modifying guidance given in an answer on AVFoundation + Vision trajectory detection, I'm instead saving time ranges of frames that have a specific ML label from my custom action classifier: private lazy var detectHumanBodyPoseRequest: VNDetectHumanBodyPoseRequest = { let detectHumanBodyPoseRequest = VNDetectHumanBodyPoseRequest(completionHandler: completionHandler) return detectHumanBodyPoseRequest }() var timeRangesOfInterest: [Int : CMTimeRange] = [:] private func readingAndWritingDidFinish(assetReaderWriter: AVAssetReaderWriter, asset completionHandler: @escaping FinishHandler) { if isCancelled { completionHandler(.success(.cancelled)) return } // Handle any error during processing of the video. guard sampleTransferError == nil else { assetReaderWriter.cancel() completionHandler(.failure(sampleTransferError!)) return } // Evaluate the result reading the samples. let result = assetReaderWriter.readingCompleted() if case .failure = result { completionHandler(result) return } /* Finish writing, and asynchronously evaluate the results from writing the samples. */ assetReaderWriter.writingCompleted { result in self.exportVideoTimeRanges(timeRanges: self.timeRangesOfInterest.map { $0.value }) { result in completionHandler(result) } } } func exportVideoTimeRanges(timeRanges: [CMTimeRange], completion: @escaping (Result<OperationStatus, Error>) -> Void) { let inputVideoTrack = self.asset.tracks(withMediaType: .video).first! let composition = AVMutableComposition() let compositionTrack = composition.addMutableTrack(withMediaType: .video, preferredTrackID: kCMPersistentTrackID_Invalid)! var insertionPoint: CMTime = .zero for timeRange in timeRanges { try! compositionTrack.insertTimeRange(timeRange, of: inputVideoTrack, at: insertionPoint) insertionPoint = insertionPoint + timeRange.duration } let exportSession = AVAssetExportSession(asset: composition, presetName: AVAssetExportPresetHighestQuality)! try? FileManager.default.removeItem(at: self.outputURL) exportSession.outputURL = self.outputURL exportSession.outputFileType = .mov exportSession.exportAsynchronously { var result: Result<OperationStatus, Error> switch exportSession.status { case .completed: result = .success(.completed) case .cancelled: result = .success(.cancelled) case .failed: // The `error` property is non-nil in the `.failed` status. result = .failure(exportSession.error!) default: fatalError("Unexpected terminal export session status: \(exportSession.status).") } print("export finished: \(exportSession.status.rawValue) - \(exportSession.error)") completion(result) } } This worked fine with results vended from Apple's trajectory detection, but using my custom action classifier TennisActionClassifier (Core ML model exported from Create ML), I get the console error getSubtractiveDecodeDuration signalled err=-16364 (kMediaSampleTimingGeneratorError_InvalidTimeStamp) (Decode timestamp is earlier than previous sample's decode timestamp.) at MediaSampleTimingGenerator.c:180. Why might this be?

Machine Learning & AI Core ML Core ML Vision AVFoundation Core Media

0

828

Dec ’21

What is a good way to track the time ranges associated with a particular action classifier label?

After creating a custom action classifier in Create ML, previewing it (see the bottom of the page) with an input video shows the label associated with a segment of the video. What would be a good way to store the duration for a given label, say, each CMTimeRange of segment of video frames that are classified as containing "Jumping Jacks?" I previously found that storing time ranges of trajectory results was convenient, since each VNTrajectoryObservation vended by Apple had an associated CMTimeRange. However, using my custom action classifier instead, each VNObservation result's CMTimeRange has a duration value that's always 0. func completionHandler(request: VNRequest, error: Error?) { guard let results = request.results as? [VNHumanBodyPoseObservation] else { return } if let result = results.first { storeObservation(result) } do { for result in results where try self.getLastTennisActionType(from: [result]) == .playing { var fileRelativeTimeRange = result.timeRange fileRelativeTimeRange.start = fileRelativeTimeRange.start - self.assetWriterStartTime self.timeRangesOfInterest[Int(fileRelativeTimeRange.start.seconds)] = fileRelativeTimeRange } } catch { print("Unable to perform the request: \(error.localizedDescription).") } } In this case I'm interested in frames with the label "Playing" and successfully classify them, but I'm not sure where to go from here to track the duration of video segments with consecutive frames that have that label.

Machine Learning & AI Core ML Core ML Core Media Create ML wwdc21-10039

0

726

Dec ’21

Apple sample code "Detecting Human Actions in a Live Video Feed" - accessing the observations associated with an action prediction

I'm having trouble reasoning about and modifying the Detecting Human Actions in a Live Video Feed sample code since I'm new to Combine. // ---- [MLMultiArray?] -- [MLMultiArray?] ---- // Make an activity prediction from the window. .map(predictActionWithWindow) // ---- ActionPrediction -- ActionPrediction ---- // Send the action prediction to the delegate. .sink(receiveValue: sendPrediction) These are the final two operators of the video processing pipeline, where the action prediction occurs. In either the implementation for private func predictActionWithWindow(_ currentWindow: [MLMultiArray?]) -> ActionPrediction or for private func sendPrediction(_ actionPrediction: ActionPrediction), how might I access the results of a VNHumanBodyPoseRequest that's retrieved and scoped in a function called earlier in the daisy chain? When I did this imperatively, I accessed results in the VNDetectHumanBodyPoseRequest completion handler, but I'm not sure how data flow would work with Combine's programming model. I want to associate predictions with the observation results they're based on so that I can store the time range of a given prediction label.

Programming Languages Swift Swift Combine Core ML Vision

0

811

Jan ’22

Gathering Training Videos for an Action Classifier - video quality?

How should I think about video quality (if it's important) when gathering training videos? Does higher video quality of training data make for better predictions, or should it more closely match the common use case (1080p I suppose, thinking about iPhones broadly)?

Machine Learning & AI Core ML Core ML Create ML

0

686

Feb ’22

Progress estimate for a `VNVideoProcessor` operation

Would might be a good approach to estimating a VNVideoProcessor operation? I'd like to show a progress bar that's useful enough like one based the progress Apple vends for the photo picker or exports. This would make a world of difference compared to a UIActivityIndicatorView, but I'm not sure how to approach handrolling this (or if that would even be a good idea). I filed an API enhancement request for this, FB9888210.

UI Frameworks UIKit Design UIKit Vision

0

594

Feb ’22

How does the action duration parameter affect performance?

For a Create ML activity classifier, I’m classifying “playing” tennis (the points or rallies) and a second class “not playing” to be the negative class. I’m not sure what to specify for the action duration parameter given how variable a tennis point or rally can be, but I went with 10 seconds since it seems like the average duration for both the “playing” and “not playing” labels. When choosing this parameter however, I’m wondering if it affects performance, both speed of video processing and accuracy. Would the Vision framework return more results with smaller action durations?

Developer Tools & Services General Performance Core ML Vision Create ML

0

707

Feb ’22

Is it best to crop other people out of training videos for a Create ML activity classifier?

My activity classifier is used in tennis sessions, where there are necessarily multiple people on the court. There is also a decent chance other courts' players will be in the shot, depending on the angle and lens. For my training data, would it be best to crop out adjacent courts?

Machine Learning & AI Core ML Core ML Vision Create ML

0

719

Feb ’22

How do you determine which threads run loop to receive events, in the context of Combine publishers?

This Mac Catalyst tutorial (https://developer.apple.com/tutorials/mac-catalyst/adding-items-to-the-sidebar) shows the following code snippet: recipeCollectionsSubscriber = dataStore.$collections .receive(on: RunLoop.main) .sink { [weak self] _ in guard let self = self else { return } let snapshot = self.collectionsSnapshot() self.dataSource.apply(snapshot, to: .collections, animatingDifferences: true) }

App & System Services General Combine

0

632

May ’22

Curiosity

Post

Replies

Boosts

Views

Activity