I've been looking through Apple's sample code Building a Feature-Rich App for Sports Analysis - https://developer.apple.com/documentation/vision/building_a_feature-rich_app_for_sports_analysis and its associated WWDC video to learn to reason about AVFoundation and VNDetectTrajectoriesRequest - https://developer.apple.com/documentation/vision/vndetecttrajectoriesrequest. My goal is to allow the user to import videos (this part I have working, the user sees a UIDocumentBrowserViewController - https://developer.apple.com/documentation/uikit/uidocumentbrowserviewcontroller, picks a video file, and then a copy is made), but I only want segments of the original video copied where trajectories are detected from a ball moving.
I've tried as best I can to grasp the two parts, at the very least finding where the video copy is made and where the trajectory request is made.
The full video copy happens in CameraViewController.swift (I'm starting with just imported video for now and not reading live from the device's video camera), line 160:func startReadingAsset(_ asset: AVAsset) {
videoRenderView = VideoRenderView(frame: view.bounds)
setupVideoOutputView(videoRenderView)
let displayLink = CADisplayLink(target: self, selector: #selector(handleDisplayLink(:)))
displayLink.preferredFramesPerSecond = 0
displayLink.isPaused = true
displayLink.add(to: RunLoop.current, forMode: .default)
guard let track = asset.tracks(withMediaType: .video).first else {
AppError.display(AppError.videoReadingError(reason: "No video tracks found in AVAsset."), inViewController: self)
return
}
let playerItem = AVPlayerItem(asset: asset)
let player = AVPlayer(playerItem: playerItem)
let settings = [
String(kCVPixelBufferPixelFormatTypeKey): kCVPixelFormatType420YpCbCr8BiPlanarFullRange
]
let output = AVPlayerItemVideoOutput(pixelBufferAttributes: settings)
playerItem.add(output)
player.actionAtItemEnd = .pause
player.play()
self.displayLink = displayLink
self.playerItemOutput = output
self.videoRenderView.player = player
let affineTransform = track.preferredTransform.inverted()
let angleInDegrees = atan2(affineTransform.b, affineTransform.a) * CGFloat(180) / CGFloat.pi
var orientation: UInt32 = 1
switch angleInDegrees {
case 0:
orientation = 1 // Recording button is on the right
case 180, -180:
orientation = 3 // abs(180) degree rotation recording button is on the right
case 90:
orientation = 8 // 90 degree CW rotation recording button is on the top
case -90:
orientation = 6 // 90 degree CCW rotation recording button is on the bottom
default:
orientation = 1
}
videoFileBufferOrientation = CGImagePropertyOrientation(rawValue: orientation)!
videoFileFrameDuration = track.minFrameDuration
displayLink.isPaused = false
}
@objc
private func handleDisplayLink(_ displayLink: CADisplayLink) {
guard let output = playerItemOutput else {
return
}
videoFileReadingQueue.async {
let nextTimeStamp = displayLink.timestamp + displayLink.duration
let itemTime = output.itemTime(forHostTime: nextTimeStamp)
guard output.hasNewPixelBuffer(forItemTime: itemTime) else {
return
}
guard let pixelBuffer = output.copyPixelBuffer(forItemTime: itemTime, itemTimeForDisplay: nil) else {
return
}
// Create sample buffer from pixel buffer
var sampleBuffer: CMSampleBuffer?
var formatDescription: CMVideoFormatDescription?
CMVideoFormatDescriptionCreateForImageBuffer(allocator: nil, imageBuffer: pixelBuffer, formatDescriptionOut: &formatDescription)
let duration = self.videoFileFrameDuration
var timingInfo = CMSampleTimingInfo(duration: duration, presentationTimeStamp: itemTime, decodeTimeStamp: itemTime)
CMSampleBufferCreateForImageBuffer(allocator: nil,
imageBuffer: pixelBuffer,
dataReady: true,
makeDataReadyCallback: nil,
refcon: nil,
formatDescription: formatDescription!,
sampleTiming: &timingInfo,
sampleBufferOut: &sampleBuffer)
if let sampleBuffer = sampleBuffer {
self.outputDelegate?.cameraViewController(self, didReceiveBuffer: sampleBuffer, orientation: self.videoFileBufferOrientation)
DispatchQueue.main.async {
let stateMachine = self.gameManager.stateMachine
if stateMachine.currentState is GameManager.SetupCameraState {
// Once we received first buffer we are ready to proceed to the next state
stateMachine.enter(GameManager.DetectingBoardState.self)
}
}
}
}
}
Line 139 self.outputDelegate?.cameraViewController(self, didReceiveBuffer: sampleBuffer, orientation: self.videoFileBufferOrientation) is where the video sample buffer is passed to the Vision framework subsystem for analyzing trajectories, the second part. This delegate callback is implemented in GameViewController.swift on line 335:
// Perform the trajectory request in a separate dispatch queue.
trajectoryQueue.async {
do {
try visionHandler.perform([self.detectTrajectoryRequest])
if let results = self.detectTrajectoryRequest.results {
DispatchQueue.main.async {
self.processTrajectoryObservations(controller, results)
}
}
} catch {
AppError.display(error, inViewController: self)
}
}
Trajectories found are drawn over the video in self.processTrajectoryObservations(controller, results).
Where I'm stuck now is modifying this so that instead of drawing the trajectories, the new video only copies parts of the original video to it where trajectories were detected in the frame.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
I'd like to perform VNDetectHumanBodyPoseRequests on a video that the user imports through the system photo picker or document view controller. I started looking at the Building a Feature-Rich App for Sports Analysis - https://developer.apple.com/documentation/vision/building_a_feature-rich_app_for_sports_analysis sample code since it has an example where video is imported from disk and then analyzed. However, my end goal is to filter for frames that contain certain poses, so that all frames without them are edited out / deleted (instead of in the sample code drawing on frames with detected trajectories). For pose detection I'm looking at the Detecting Human Actions in a Live Video Feed - https://developer.apple.com/documentation/createml/detecting_human_actions_in_a_live_video_feed, but the live video capture isn't quite relevant.
I'm trying to break this down into smaller problems and have a few questions:
Should a full video file copy be made before analysis?
The Detecting Human Actions in a Live Video Feed - https://developer.apple.com/documentation/createml/detecting_human_actions_in_a_live_video_feed sample code uses a Combine pipeline for analyzing live video frames. Since I'm analyzing imported video, would Combine be overkill or a good fit here?
After I've detected which frames have a particular pose, how (in AVFoundation terms) do I filter for those frames or edit out / delete the frames without that pose?
For example,
Operation A both fetches model data over the network and updates a UICollectionViewbacked by it.
Operation B filters model data.
What is a good approach to executing B only after A is finished?
When synchronizing model objects, local CKRecords, and CKRecords in CloudKit during swipe-to-delete, how can I make this as robust as possible? Error handling omitted for the sake of the example.
override func tableView(_ tableView: UITableView, commit editingStyle: UITableViewCell.EditingStyle, forRowAt indexPath: IndexPath) {
if editingStyle == .delete {
let record = self.records[indexPath.row]
privateDatabase.delete(withRecordID: record.recordID) { recordID, error in
self.records.remove(at: indexPath.row)
}
}
}
Since indexPath could change due to other changes in the table view / collection view during the time it takes to delete the record from CloudKit, how could this be improved upon?
What might be a good way to constrain a view's top anchor to be just at the edge of a device's Face ID sensor housing if it has one?
This view is a product photo that would be clipped too much if it ignored the top safe area inset, but if it was positioned relative to the top safe area margin this wouldn't be ideal either because of the slight gap between the sensor housing and the view (the view is a photo of pants cropped at the waist). What might be a good approach here?
In a SwiftUI scroll view with the page style, is it possible to change the page indicator color?
I have an app that currently depends on fetching the model through CloudKit, and is composed of value types. I'm considering adding Core Data support so that record modifications are robust regardless of network conditions.
Core Data resources seem to always assume a model layer with reference semantics, so I'm not sure where to begin.
Should I keep my top-level model type a struct? Can I? If I move my model to reference semantics, how might I bridge from past model instances that are fetched through CloudKit and then decoded?
Thank you in advance.
I observe when an AVPlayer finishes play in order to present a UIAlert at the end time.
NotificationCenter.default.addObserver(
self,
selector: #selector(presentAlert),
name: .AVPlayerItemDidPlayToEndTime,
object: nil
)
I've had multiple user reports of the alert happening where they're not intended, such as the middle of the video after replaying, and on other views. I'm unable to reproduce this myself, but my guess is that it's a threading issue since AVPlayerItemDidPlayToEndTime says "the system may post this notification on a thread other than the one used to registered the observer."
How then do I make sure the alert is present on the main thread? Should I dispatch to the main queue from within my presentAlert function, or add the above observer with addObserver(forName:object:queue:using:) instead, passing in the main operation queue?
I'd like a user's upload operation that's started in the foreground to continue when they leave the app. Apple's article Extending Your App's Background Execution Time has the following code listing
func sendDataToServer( data : NSData ) {
// Perform the task on a background queue.
DispatchQueue.global().async {
// Request the task assertion and save the ID.
self.backgroundTaskID = UIApplication.shared.
beginBackgroundTask (withName: "Finish Network Tasks") {
// End the task if time expires.
UIApplication.shared.endBackgroundTask(self.backgroundTaskID!)
self.backgroundTaskID = UIBackgroundTaskInvalid
}
// Send the data synchronously.
self.sendAppDataToServer( data: data)
// End the task assertion.
UIApplication.shared.endBackgroundTask(self.backgroundTaskID!)
self.backgroundTaskID = UIBackgroundTaskInvalid
}
}
The call to self.sendAppDataToServer( data: data) is unclear. Is this where the upload operation would go, wrapped in Dispatch.global().sync { }?
Apple's sample code Identifying Trajectories in Video contains the following delegate callback:
func cameraViewController(_ controller: CameraViewController, didReceiveBuffer buffer: CMSampleBuffer, orientation: CGImagePropertyOrientation) {
let visionHandler = VNImageRequestHandler(cmSampleBuffer: buffer, orientation: orientation, options: [:])
if gameManager.stateMachine.currentState is GameManager.TrackThrowsState {
DispatchQueue.main.async {
// Get the frame of rendered view
let normalizedFrame = CGRect(x: 0, y: 0, width: 1, height: 1)
self.jointSegmentView.frame = controller.viewRectForVisionRect(normalizedFrame)
self.trajectoryView.frame = controller.viewRectForVisionRect(normalizedFrame)
}
// Perform the trajectory request in a separate dispatch queue.
trajectoryQueue.async {
do {
try visionHandler.perform([self.detectTrajectoryRequest])
if let results = self.detectTrajectoryRequest.results {
DispatchQueue.main.async {
self.processTrajectoryObservations(controller, results)
}
}
} catch {
AppError.display(error, inViewController: self)
}
}
}
}
However, instead of drawing UI whenever detectTrajectoryRequest.results exist (https://developer.apple.com/documentation/vision/vndetecttrajectoriesrequest/3675672-results), I'm interested in using the CMTimeRange provided by each result to construct a new video. In effect, this would filter down the original video to only frames with trajectories. How might I accomplish this, perhaps through writing only specific time ranges' frames from one AVFoundation video to a new AVFoundation video?
Apple's sample code "AVReaderWriter: Offline Audio / Video Processing" has the following listing
let writingGroup = dispatch_group_create()
// Transfer data from input file to output file.
self.transferVideoTracks(videoReaderOutputsAndWriterInputs, group: writingGroup)
self.transferPassthroughTracks(passthroughReaderOutputsAndWriterInputs, group: writingGroup)
// Handle completion.
let queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0)
dispatch_group_notify(writingGroup, queue) {
// `readingAndWritingDidFinish()` is guaranteed to call `finish()` exactly once.
self.readingAndWritingDidFinish(assetReader, assetWriter: assetWriter)
}
in CynanifyOperation.swift (an NSOperation subclass that stylizes imported video and exports it). How would I get about writing this part in modern Swift so that it compiles and works?
I've tried writing this as
let writingGroup = DispatchGroup()
// Transfer data from input file to output file.
self.transferVideoTracks(videoReaderOutputsAndWriterInputs: videoReaderOutputsAndWriterInputs, group: writingGroup)
self.transferPassthroughTracks(passthroughReaderOutputsAndWriterInputs: passthroughReaderOutputsAndWriterInputs, group: writingGroup)
// Handle completion.
writingGroup.notify(queue: .global()) {
// `readingAndWritingDidFinish()` is guaranteed to call `finish()` exactly once.
self.readingAndWritingDidFinish(assetReader: assetReader, assetWriter: assetWriter)
}
However, it's taking an extremely long time for self.readingAndWritingDidFinish(assetReader: assetReader, assetWriter: assetWriter) to be called, and my UI is stuck in the ProgressViewController with a loading spinner. Is there something I wrote incorrectly or missed conceptually in the Swift 5 version?
Say you have a pinch gesture recognizer and pan gesture recognizer on an image view:
@IBAction func pinchPiece(_ pinchGestureRecognizer: UIPinchGestureRecognizer) {
guard pinchGestureRecognizer.state == .began || pinchGestureRecognizer.state == .changed,
let piece = pinchGestureRecognizer.view else {
// After pinch releases, zoom back out.
if pinchGestureRecognizer.state == .ended {
UIView.animate(withDuration: 0.3, animations: {
pinchGestureRecognizer.view?.transform = CGAffineTransform.identity
})
}
return
}
adjustAnchor(for: pinchGestureRecognizer)
let scale = pinchGestureRecognizer.scale
piece.transform = piece.transform.scaledBy(x: scale, y: scale)
pinchGestureRecognizer.scale = 1 // Clear scale so that it is the right delta next time.
}
@IBAction func panPiece(_ panGestureRecognizer: UIPanGestureRecognizer) {
guard panGestureRecognizer.state == .began || panGestureRecognizer.state == .changed,
let piece = panGestureRecognizer.view else {
return
}
let translation = panGestureRecognizer.translation(in: piece.superview)
piece.center = CGPoint(x: piece.center.x + translation.x, y: piece.center.y + translation.y)
panGestureRecognizer.setTranslation(.zero, in: piece.superview)
}
public func gestureRecognizer(_ gestureRecognizer: UIGestureRecognizer,
shouldRecognizeSimultaneouslyWith otherGestureRecognizer: UIGestureRecognizer) -> Bool {
true
}
The pinch gesture's view resets to its original state after the gesture is done, which occurs in its else clause. What would be a good way to do the same for the pan gesture recognizer? Ideally I'd like the gesture recognizers to be in an extension of UIImageView, which would also mean that I can't add a store property to the extension for tracking the initial state of the image view.
Given an AVAsset, I'm performing a Vision trajectory request on it and would like to write out a video asset that only contains frames with trajectories (filter out downtime in sports footage where there's no ball moving).
I'm unsure what would be a good approach, but as a starting point I tried the following pipeline:
Copy sample buffer from the source AVAssetReaderOutput.
Perform trajectory request on a vision handler parameterized by the sample buffer.
For each resulting VNTrajectoryObservation (trajectory detected), use its associated CMTimeRange to configure a new AVAssetReader set to that time range.
Append the time range constrained sample buffer to one AVAssetWriterInput until the forEach is complete.
In code:
private func transferSamplesAsynchronously(from readerOutput: AVAssetReaderOutput,
to writerInput: AVAssetWriterInput,
onQueue queue: DispatchQueue,
sampleBufferProcessor: SampleBufferProcessor,
completionHandler: @escaping () -> Void) {
/*
The writerInput continously invokes this closure until finished or
cancelled. It throws an NSInternalInconsistencyException if called more
than once for the same writer.
*/
writerInput.requestMediaDataWhenReady(on: queue) {
var isDone = false
/*
While the writerInput accepts more data, process the sampleBuffer
and then transfer the processed sample to the writerInput.
*/
while writerInput.isReadyForMoreMediaData {
if self.isCancelled {
isDone = true
break
}
// Get the next sample from the asset reader output.
guard let sampleBuffer = readerOutput.copyNextSampleBuffer() else {
// The asset reader output has no more samples to vend.
isDone = true
break
}
let visionHandler = VNImageRequestHandler(cmSampleBuffer: sampleBuffer, orientation: self.orientation, options: [:])
do {
try visionHandler.perform([self.detectTrajectoryRequest])
if let results = self.detectTrajectoryRequest.results {
try results.forEach { result in
let assetReader = try AVAssetReader(asset: self.asset)
assetReader.timeRange = result.timeRange
let trackOutput = AVTrackOutputs.firstTrackOutput(ofType: .video, fromTracks: self.asset.tracks,
withOutputSettings: nil)
assetReader.add(trackOutput)
assetReader.startReading()
guard let sampleBuffer = trackOutput.copyNextSampleBuffer() else {
// The asset reader output has no more samples to vend.
isDone = true
return
}
// Append the sample to the asset writer input.
guard writerInput.append(sampleBuffer) else {
/*
The writer could not append the sample buffer.
The `readingAndWritingDidFinish()` function handles any
error information from the asset writer.
*/
isDone = true
return
}
}
}
} catch {
print(error)
}
}
if isDone {
/*
Calling `markAsFinished()` on the asset writer input does the
following:
1. Unblocks any other inputs needing more samples.
2. Cancels further invocations of this "request media data"
callback block.
*/
writerInput.markAsFinished()
/*
Tell the caller the reader output and writer input finished
transferring samples.
*/
completionHandler()
}
}
}
private func readingAndWritingDidFinish(assetReaderWriter: AVAssetReaderWriter,
completionHandler: @escaping FinishHandler) {
if isCancelled {
completionHandler(.success(.cancelled))
return
}
// Handle any error during processing of the video.
guard sampleTransferError == nil else {
assetReaderWriter.cancel()
completionHandler(.failure(sampleTransferError!))
return
}
// Evaluate the result reading the samples.
let result = assetReaderWriter.readingCompleted()
if case .failure = result {
completionHandler(result)
return
}
/*
Finish writing, and asynchronously evaluate the results from writing
the samples.
*/
assetReaderWriter.writingCompleted { result in
completionHandler(result)
return
}
}
When run I get the following:
No error is caught in the first catch clause, and none are caught in private func readingAndWritingDidFinish(assetReaderWriter: AVAssetReaderWriter, completionHandler: @escaping FinishHandler), the completion handler is called.
Help with any of the following questions would be appreciated:
What is causing what appears to be indefinite loading?
How might I isolate the problem further?
Am I misusing or misunderstanding how to selectively read from time ranges of AVAssetReader objects?
Should I forego the AVAssetReader / AVAsssetWriter route entirely, and use the time ranges with AVAssetExportSession instead? I don't know how the two approaches compare, or what to consider when choosing between the two.
I am saving time ranges from an input video asset where trajectories are found, then exporting only those segments to an output video file.
Currently I track these time ranges in a stored property var timeRangesOfInterest: [Double : CMTimeRange], which is set in the trajectory request's completion handler
func completionHandler(request: VNRequest, error: Error?) {
guard let request = request as? VNDetectTrajectoriesRequest else { return }
if let results = request.results,
results.count > 0 {
for result in results {
var timeRange = result.timeRange
timeRange.start = timeRange.start - self.assetWriterStartTime
self.timeRangesOfInterest[timeRange.start.seconds] = timeRange
}
}
}
Then these time ranges of interest are used in an export session to only export those segments
/*
Finish writing, and asynchronously evaluate the results from writing
the samples.
*/
assetReaderWriter.writingCompleted { result in
self.exportVideoTimeRanges(timeRanges: self.timeRangesOfInterest.map { $0.1 }) { result in
completionHandler(result)
}
}
Unfortunately however, I'm getting repeated trajectory video segments in the outputted video. Is this maybe because trajectory requests return "in progress" repeated trajectory results with slightly different time range start times? What might be a good strategy for avoiding or removing them? I noticed trajectory segments will appear out of order in the output as well.
I'm building a feature to automatically edit out all the downtime of a tennis video. I have a partial implementation that stores the start and end times of Vision trajectory detections and writes only those segments to an AVFoundation export session.
I've encountered a major issue, which is that the trajectories returned end whenever the ball bounce, so each segment is just one tennis shot and nowhere close to an entire rally with multiple bounces. I'm ensure if I should continue done the trajectory route, maybe stitching together the trajectories and somehow only splitting at the start and end of a rally.
Any general guidance would be appreciated.
Is there a different Vision or ML approach that would more accurately model the start and end time of a rally? I considered creating a custom action classifier to classify frames to be either "playing tennis" or "inactivity," but I started with Apple's trajectory detection since it was already built and trained. Maybe a custom classifier would be needed, but not sure.