Every time you render a CIImage with a CIContext, CI does a filter graph analysis to determine the best path for rendering the image (determining intermediates, region of interest, kernel concatenation, etc.). This can be quite CPU-intensive.
If you only have a few simple operations to perform on your image, and you can easily implement them in Metal directly, you are probably better off using that.
However, I would also suggest you file Feedback with the Core Image team and report your findings. We also observe a very heavy CPU load in our apps, caused by Core Image. Maybe they find a way to further optimize the graph analysis – especially for consecutive render calls with the same instructions.