Post

Replies

Boosts

Views

Activity

What would be the most accurate way to classify both playing a tennis rally and the downtime in between rallies?
My goal is to mark any tennis video's timestamps of both the start of each rally/point and the end of each rally/point. I tried trajectory detection, but the "end time" is when the ball bounces rather than when the rally/point ends. I'm not quite sure what direction to go from here to improve on this. Would action classification of body poses in each frame (two classes, "playing" and "not playing") be the best way to split the video into segments? A different technique?
0
0
537
Dec ’21
After exporting an action classifier from Create ML and importing it into Xcode, how do you use it do make predictions?
I followed Apple's guidance in their articles Creating an Action Classifier Model, Gathering Training Videos for an Action Classifier, and Building an Action Classifier Data Source. With this Core ML model file now imported in Xcode, how do use it to classify video frames? For each video frame I call do { let requestHandler = VNImageRequestHandler(cmSampleBuffer: sampleBuffer) try requestHandler.perform([self.detectHumanBodyPoseRequest]) } catch { print("Unable to perform the request: \(error.localizedDescription).") } But it's unclear to me how to use the results of the VNDetectHumanBodyPoseRequest which come back as the type [VNHumanBodyPoseObservation]?. How would I feed to the results into my custom classifier, which has an automatically generated model class TennisActionClassifier.swift? The classifier is for making predictions on the frame's body poses, labeling the actions as either playing a rally/point or not playing.
0
0
637
Dec ’21
Trying to export time ranges from a composition using AVAssetExportSession - "Decode timestamp is earlier than previous sample's decode timestamp"
Modifying guidance given in an answer on AVFoundation + Vision trajectory detection, I'm instead saving time ranges of frames that have a specific ML label from my custom action classifier: private lazy var detectHumanBodyPoseRequest: VNDetectHumanBodyPoseRequest = { let detectHumanBodyPoseRequest = VNDetectHumanBodyPoseRequest(completionHandler: completionHandler) return detectHumanBodyPoseRequest }() var timeRangesOfInterest: [Int : CMTimeRange] = [:] private func readingAndWritingDidFinish(assetReaderWriter: AVAssetReaderWriter, asset completionHandler: @escaping FinishHandler) { if isCancelled { completionHandler(.success(.cancelled)) return } // Handle any error during processing of the video. guard sampleTransferError == nil else { assetReaderWriter.cancel() completionHandler(.failure(sampleTransferError!)) return } // Evaluate the result reading the samples. let result = assetReaderWriter.readingCompleted() if case .failure = result { completionHandler(result) return } /* Finish writing, and asynchronously evaluate the results from writing the samples. */ assetReaderWriter.writingCompleted { result in self.exportVideoTimeRanges(timeRanges: self.timeRangesOfInterest.map { $0.value }) { result in completionHandler(result) } } } func exportVideoTimeRanges(timeRanges: [CMTimeRange], completion: @escaping (Result<OperationStatus, Error>) -> Void) { let inputVideoTrack = self.asset.tracks(withMediaType: .video).first! let composition = AVMutableComposition() let compositionTrack = composition.addMutableTrack(withMediaType: .video, preferredTrackID: kCMPersistentTrackID_Invalid)! var insertionPoint: CMTime = .zero for timeRange in timeRanges { try! compositionTrack.insertTimeRange(timeRange, of: inputVideoTrack, at: insertionPoint) insertionPoint = insertionPoint + timeRange.duration } let exportSession = AVAssetExportSession(asset: composition, presetName: AVAssetExportPresetHighestQuality)! try? FileManager.default.removeItem(at: self.outputURL) exportSession.outputURL = self.outputURL exportSession.outputFileType = .mov exportSession.exportAsynchronously { var result: Result<OperationStatus, Error> switch exportSession.status { case .completed: result = .success(.completed) case .cancelled: result = .success(.cancelled) case .failed: // The `error` property is non-nil in the `.failed` status. result = .failure(exportSession.error!) default: fatalError("Unexpected terminal export session status: \(exportSession.status).") } print("export finished: \(exportSession.status.rawValue) - \(exportSession.error)") completion(result) } } This worked fine with results vended from Apple's trajectory detection, but using my custom action classifier TennisActionClassifier (Core ML model exported from Create ML), I get the console error getSubtractiveDecodeDuration signalled err=-16364 (kMediaSampleTimingGeneratorError_InvalidTimeStamp) (Decode timestamp is earlier than previous sample's decode timestamp.) at MediaSampleTimingGenerator.c:180. Why might this be?
0
0
831
Dec ’21
What is a good way to track the time ranges associated with a particular action classifier label?
After creating a custom action classifier in Create ML, previewing it (see the bottom of the page) with an input video shows the label associated with a segment of the video. What would be a good way to store the duration for a given label, say, each CMTimeRange of segment of video frames that are classified as containing "Jumping Jacks?" I previously found that storing time ranges of trajectory results was convenient, since each VNTrajectoryObservation vended by Apple had an associated CMTimeRange. However, using my custom action classifier instead, each VNObservation result's CMTimeRange has a duration value that's always 0. func completionHandler(request: VNRequest, error: Error?) { guard let results = request.results as? [VNHumanBodyPoseObservation] else { return } if let result = results.first { storeObservation(result) } do { for result in results where try self.getLastTennisActionType(from: [result]) == .playing { var fileRelativeTimeRange = result.timeRange fileRelativeTimeRange.start = fileRelativeTimeRange.start - self.assetWriterStartTime self.timeRangesOfInterest[Int(fileRelativeTimeRange.start.seconds)] = fileRelativeTimeRange } } catch { print("Unable to perform the request: \(error.localizedDescription).") } } In this case I'm interested in frames with the label "Playing" and successfully classify them, but I'm not sure where to go from here to track the duration of video segments with consecutive frames that have that label.
0
0
727
Dec ’21
How do you track when a VNVideoProcessor analysis is finished?
I'm adopting and transitioning to VNVideoProcessor away from performing Vision requests on individual frames, since it more concisely does the same. However, I'm not sure how to detect when analysis of a video is finished. Previously when reading frames with AVFoundation I could check with // Get the next sample from the asset reader output. guard let sampleBuffer = readerOutput.copyNextSampleBuffer() else { // The asset reader output has no more samples to vend. isDone = true break } What would be an equivalent when using VNVideoProcessor?
1
0
740
Jan ’22
Apple sample code "Detecting Human Actions in a Live Video Feed" - accessing the observations associated with an action prediction
I'm having trouble reasoning about and modifying the Detecting Human Actions in a Live Video Feed sample code since I'm new to Combine. // ---- [MLMultiArray?] -- [MLMultiArray?] ---- // Make an activity prediction from the window. .map(predictActionWithWindow) // ---- ActionPrediction -- ActionPrediction ---- // Send the action prediction to the delegate. .sink(receiveValue: sendPrediction) These are the final two operators of the video processing pipeline, where the action prediction occurs. In either the implementation for private func predictActionWithWindow(_ currentWindow: [MLMultiArray?]) -> ActionPrediction or for private func sendPrediction(_ actionPrediction: ActionPrediction), how might I access the results of a VNHumanBodyPoseRequest that's retrieved and scoped in a function called earlier in the daisy chain? When I did this imperatively, I accessed results in the VNDetectHumanBodyPoseRequest completion handler, but I'm not sure how data flow would work with Combine's programming model. I want to associate predictions with the observation results they're based on so that I can store the time range of a given prediction label.
0
0
815
Jan ’22
Apple sample code "Detecting Human Actions in a Live Video Feed" - accessing the observations associated with an action prediction
Say I have an alert @State var showingAlert = false var body: some View { Text("Hello, world!") .alert("Here's an alert with multiple possible buttons.", isPresented: $showingAlert) { Button("OK") { } Button("Another button that may or may not show") { } } } How could I display the second button based only on some condition? I tried factoring out one button into fileprivate func extractedFunc() -> Button<Text> { return Button("OK") { } } and this would work for conditionally displaying the button content given a fixed number of buttons, but how could optionality of buttons be taken into account?
1
0
519
Jan ’22
Progress estimate for a `VNVideoProcessor` operation
Would might be a good approach to estimating a VNVideoProcessor operation? I'd like to show a progress bar that's useful enough like one based the progress Apple vends for the photo picker or exports. This would make a world of difference compared to a UIActivityIndicatorView, but I'm not sure how to approach handrolling this (or if that would even be a good idea). I filed an API enhancement request for this, FB9888210.
0
0
594
Feb ’22
How does the action duration parameter affect performance?
For a Create ML activity classifier, I’m classifying “playing” tennis (the points or rallies) and a second class “not playing” to be the negative class. I’m not sure what to specify for the action duration parameter given how variable a tennis point or rally can be, but I went with 10 seconds since it seems like the average duration for both the “playing” and “not playing” labels. When choosing this parameter however, I’m wondering if it affects performance, both speed of video processing and accuracy. Would the Vision framework return more results with smaller action durations?
0
0
708
Feb ’22
How do you determine which threads run loop to receive events, in the context of Combine publishers?
This Mac Catalyst tutorial (https://developer.apple.com/tutorials/mac-catalyst/adding-items-to-the-sidebar) shows the following code snippet: recipeCollectionsSubscriber = dataStore.$collections .receive(on: RunLoop.main) .sink { [weak self] _ in guard let self = self else { return } let snapshot = self.collectionsSnapshot() self.dataSource.apply(snapshot, to: .collections, animatingDifferences: true) }
0
0
632
May ’22
What would be a good way to partition instances by a particular matching property, such that each partition is a collection view section?
Say that in this example here, the struct struct Reminder: Identifiable { var id: String = UUID().uuidString var title: String var dueDate: Date var notes: String? = nil var isComplete: Bool = false var city: String } is modified slightly to include a city string. In the collection view that displays the reminders, I'd like each section to be each unique city, so if two reminder cells have the same city string then they would be in the same section of the collection view. The progress I've made to this end is sorting the reminders array so that reminders cells are grouped together by city func updateSnapshot(reloading ids: [Reminder.ID] = []) { var snapshot = Snapshot() snapshot.appendSections([0]) let reminders = reminders.sorted { $0.city } snapshot.appendItems(reminders.map { $0.id }) if !ids.isEmpty { snapshot.reloadItems(ids) } dataSource.apply(snapshot) } Where I'm stuck is in coming up with a way to make the snapshot represent sections by unique cities, and not just one flat section of all reminders.
1
0
628
May ’22