Post

Replies

Boosts

Views

Activity

Reply to Latency critical DMA read via PCIe
Thanks for the reply Kevin. My apologies for the too qualitative info. The device prototype has been just set up and I don't have enough good statistics yet. I currently would like to ensure that all the proper driver technologies have been put in place and I will then start a long run session. Audio Buffers Let me provide you more detail about the system and the tests carried out so far. I will present the methods concerning the D2H path only (the one affected by the latency spike). The write one is anyway completely equivalent. Buffer allocation (in audio device init): `` OSSharedPtr<IOBufferMemoryDescriptor> m_input_io_ring_buffer; //into ivars IOBufferMemoryDescriptor::Create(kIOMemoryDirectionIn, buffer_size_bytes, 0x4000, ivars->m_input_io_ring_buffer.attach()); `` Buffer memory mapping (in audio device StartIO): __block OSSharedPtr<IOMemoryDescriptor> input_iomd; input_iomd->CreateMapping(0, 0, 0, 0, 0, ivars->m_input_memory_map.attach()); In all tests, a 16384 audio sample buffer has been used. The total size depends on how many channels were interleaved. Particularly I tested a system with 16, 64 and 256 I/O audio channels, 48kHz, 32 bit integer format. DMA Buffer Preparation D2HSegmentsN = 1 // Single segment forced (so far) IODMACommand::Create(ivars->pciDevice, kIODMACommandCreateNoOptions, &dmaSpecification, &dmaCommandD2H); dmaCommandD2H->PrepareForDMA(kIODMACommandPrepareForDMANoOptions, D2H_memory_buffer_descriptor, 0, virtualD2HSegment.length, &mem_direction_flags, &D2HSegmentsN, physicalD2HSegment); PCIe Device Followed the same procedure presented in official Apple video for DMA bus mastering ("Modernize PCI and SCSI drivers with DriverKit"). // Enable memory space access and bus mastering for DMA ivars->pciDevice->ConfigurationRead16(kIOPCIConfigurationOffsetCommand, &commandReg); commandReg |= (kIOPCICommandBusMaster | kIOPCICommandMemorySpace); ivars->pciDevice->ConfigurationWrite16(kIOPCIConfigurationOffsetCommand, commandReg); Performed Tests Very First. No actions for CPU/DART/PCIe power management (all default), 16 Channels, single DMA burst at every audio sample (20.8us of deadline), that is 64 bytes (very inefficient). Frequent deadline misses (1 per minute) in the read operation. This is predictable since the baseline takes normally about ~20/25us -> abandoned approach. Burst increased to 8 audio samples (that is 167us of deadline) and 16 interleaved channels (512 bytes). Better stability in operation (read baseline is still about 10 to 40us). However, 1 per 30 minutes c.ca I noticed a spike in the read exceeding the deadline -> host underrun (bad). Same burst morphology but I applied power management + bus characteristic constraints. Particularly: pciDevice->EnablePCIPowerManagement(kPCIPMCSPowerStateD0); pciDevice->SetASPMState(kIOPCILinkControlASPMBitsDisabled); //This looks very critical <<<<------- RequireMaxBusStall(kIOMaxBusStall25usec); plus, into Info.plist: IOPCITunnelL1Enable NO IOPMPCISleepLinkDisable NO IOPMPCIConfigSpaceVolatile NO IOPCIRetrainLinkWake YES Now things are much better and read deadline misses occurred only probably 3 times in 12 hours test. Carried away by my enthusiasm, I tried an extreme test with 256 channels. The burst was of 8 or 4 samples, which indeed corresponds to 8KB or 4KB. The outcome seems very similar to case 3. But I’d like to eliminate the possibility of deadline misses entirely. So I went further on investigating about power features etc. I ended up adding this requirements before the audio IO op. start: ChangePowerState(kIOServicePowerCapabilityOn); SetPowerOverride(true); CreatePMAssertion(kIOServicePMAssertionCPUBit | kIOServicePMAssertionForceFullWakeupBit, &ivars->PMAssertionID, false); After this, in several days, I did not notice any relevant event and my question is if the problem has been really solved completely (?). I should probably try to comment the called method one by one and check what is the game changer. Am I doing some stupidities? Are some of these method redundant (probably yes). Are there other relevant methods I'm missing or some profile tools from the host system which I can use to track the system in long term? All the cited measurements have been carried out by the FPGA itself, so they are reliable in term of precision. Concerning your point of the 16KB, I know this is the page size, I can try to ask my DMA to produce such a burst. However, if I remember correctly, PCIe allows burst of 4KB maximum, so I don't know if this will help. I can try. Worth to study better if such a large request can be asked in a MRr, or a division In sub-chunks is unavoidable. Thank you very much
Topic: App & System Services SubTopic: Drivers Tags:
5d
Reply to Latency critical DMA read via PCIe
Thanks for the reply Kevin. My apologies for the too qualitative info. The device prototype has been just set up and I don't have enough good statistics yet. I currently would like to ensure that all the proper driver technologies have been put in place and I will then start a long run session. Audio Buffers Let me provide you more detail about the system and the tests carried out so far. I will present the methods concerning the D2H path only (the one affected by the latency spike). The write one is anyway completely equivalent. Buffer allocation (in audio device init): `` OSSharedPtr<IOBufferMemoryDescriptor> m_input_io_ring_buffer; //into ivars IOBufferMemoryDescriptor::Create(kIOMemoryDirectionIn, buffer_size_bytes, 0x4000, ivars->m_input_io_ring_buffer.attach()); `` Buffer memory mapping (in audio device StartIO): __block OSSharedPtr<IOMemoryDescriptor> input_iomd; input_iomd->CreateMapping(0, 0, 0, 0, 0, ivars->m_input_memory_map.attach()); In all tests, a 16384 audio sample buffer has been used. The total size depends on how many channels were interleaved. Particularly I tested a system with 16, 64 and 256 I/O audio channels, 48kHz, 32 bit integer format. DMA Buffer Preparation D2HSegmentsN = 1 // Single segment forced (so far) IODMACommand::Create(ivars->pciDevice, kIODMACommandCreateNoOptions, &dmaSpecification, &dmaCommandD2H); dmaCommandD2H->PrepareForDMA(kIODMACommandPrepareForDMANoOptions, D2H_memory_buffer_descriptor, 0, virtualD2HSegment.length, &mem_direction_flags, &D2HSegmentsN, physicalD2HSegment); PCIe Device Followed the same procedure presented in official Apple video for DMA bus mastering ("Modernize PCI and SCSI drivers with DriverKit"). // Enable memory space access and bus mastering for DMA ivars->pciDevice->ConfigurationRead16(kIOPCIConfigurationOffsetCommand, &commandReg); commandReg |= (kIOPCICommandBusMaster | kIOPCICommandMemorySpace); ivars->pciDevice->ConfigurationWrite16(kIOPCIConfigurationOffsetCommand, commandReg); Performed Tests Very First. No actions for CPU/DART/PCIe power management (all default), 16 Channels, single DMA burst at every audio sample (20.8us of deadline), that is 64 bytes (very inefficient). Frequent deadline misses (1 per minute) in the read operation. This is predictable since the baseline takes normally about ~20/25us -> abandoned approach. Burst increased to 8 audio samples (that is 167us of deadline) and 16 interleaved channels (512 bytes). Better stability in operation (read baseline is still about 10 to 40us). However, 1 per 30 minutes c.ca I noticed a spike in the read exceeding the deadline -> host underrun (bad). Same burst morphology but I applied power management + bus characteristic constraints. Particularly: pciDevice->EnablePCIPowerManagement(kPCIPMCSPowerStateD0); pciDevice->SetASPMState(kIOPCILinkControlASPMBitsDisabled); //This looks very critical <<<<------- RequireMaxBusStall(kIOMaxBusStall25usec); plus, into Info.plist: IOPCITunnelL1Enable NO IOPMPCISleepLinkDisable NO IOPMPCIConfigSpaceVolatile NO IOPCIRetrainLinkWake YES Now things are much better and read deadline misses occurred only probably 3 times in 12 hours test. Carried away by my enthusiasm, I tried an extreme test with 256 channels. The burst was of 8 or 4 samples, which indeed corresponds to 8KB or 4KB. The outcome seems very similar to case 3. But I’d like to eliminate the possibility of deadline misses entirely. So I went further on investigating about power features etc. I ended up adding this requirements before the audio IO op. start: ChangePowerState(kIOServicePowerCapabilityOn); SetPowerOverride(true); CreatePMAssertion(kIOServicePMAssertionCPUBit | kIOServicePMAssertionForceFullWakeupBit, &ivars->PMAssertionID, false); After this, in several days, I did not notice any relevant event and my question is if the problem has been really solved completely (?). I should probably try to comment the called method one by one and check what is the game changer. Am I doing some stupidities? Are some of these method redundant (probably yes). Are there other relevant methods I'm missing or some profile tools from the host system which I can use to track the system in long term? All the cited measurements have been carried out by the FPGA itself, so they are reliable in term of precision. Concerning your point of the 16KB, I know this is the page size, I can try to ask my DMA to produce such a burst. However, if I remember correctly, PCIe allows burst of 4KB maximum, so I don't know if this will help. I can try. Worth to study better if such a large request can be asked in a MRr, or a division In sub-chunks is unavoidable. Thank you very much
Topic: App & System Services SubTopic: Drivers Tags:
Replies
Boosts
Views
Activity
5d