![]() ![]() I executed perf top -p 16461 (16461 was the process id of my darktable instance), started a PNG export, and waited for it to finish. Head over to Brendan Greg’s page to get a feeling for what’s possible.) ( perf can do much more than I’m showing here, and can also do some instrumentation, BTW. So these are highly accurate, and I did have the source code at hand, but I didn’t see a reason to rebuild the whole binary when Sampling should already be able to give me the answer. Instrumentation tools like gprof (which requires some help from the compiler) on the other hand change the program code and have it emit profiling information every time something happens. Since the sampling interval is much lower than the clock rate of the CPU, the output is only a statistic and inaccurate, but accurate enough to see if the application is spending a lot of time in the same block of code. Some component (in this case the kernel) looks at the value of the instruction pointer of the application in regular intervals, and some other component (in this case the perf command line tool) can take all the debugging information available on the system and try to find out which instruction or code block the instruction pointer was pointing to at a given moment. Sampling is an external method, so the program code and binary don’t have to be changed. There are two different approaches to software profiling: Sampling and Instrumentation. It is a performance-counter-based profiling tool which has been available on Linux for a couple of years now, but doesn’t seem to be very well-known outside of some niches. The next step was to use perf to check what darktable was spending its time on. Sadly the export step is not being accounted for, so there might be something wrong here, but what? Converting a picture of this size to PNG format shouldn’t take an additional five seconds, not even on this slow CPU. exported to `/home/mr2515/Bilder/2017/ Bunker Burgeis/bearbeitet/_DSC9303_20.png'Īccording to the second last line, image processing took 5.619 seconds, but in reality it took about ten seconds until the output file was complete. pixel pipeline processing took 5.619 secs (14.702 CPU) took 0.048 secs (0.121 CPU) processed `gamma' on CPU, blended on CPU took 0.160 secs (0.423 CPU) processed `output color profile' on CPU, blended on CPU took 0.168 secs (0.413 CPU) processed `sharpen' on CPU, blended on CPU took 0.075 secs (0.186 CPU) processed `contrast brightness saturation' on CPU, blended on CPU took 0.730 secs (1.915 CPU) processed `shadows and highlights' on CPU, blended on CPU took 0.065 secs (0.155 CPU) processed `input color profile' on CPU, blended on CPU took 0.073 secs (0.139 CPU) processed `base curve' on CPU, blended on CPU took 0.876 secs (2.300 CPU) processed `perspective correction' on CPU, blended on CPU took 2.283 secs (6.161 CPU) processed `lens correction' on CPU, blended on CPU took 0.038 secs (0.089 CPU) processed `exposure' on CPU, blended on CPU took 1.025 secs (2.687 CPU) processed `demosaic' on CPU, blended on CPU took 0.016 secs (0.023 CPU) processed `highlight reconstruction' on CPU, blended on CPU took 0.019 secs (0.027 CPU) processed `white balance' on CPU, blended on CPU ![]() took 0.018 secs (0.019 CPU) processed `raw black/white point' on CPU, blended on CPU took 0.023 secs (0.041 CPU) initing base buffer ![]() Here’s a sample output from the ultrabook: creating pixelpipe took 0.112 secs (0.240 CPU) darkable has a built-in performance debugging mode which can be enabled by running darktable -d perf in a terminal. Since I had never worked with the darktable codebase before, I decided to use other existing means for profiling the application first. Exporting to JPEG was much faster, taking less than two seconds on the PC and about eight on the ultrabook. I wouldn’t expect my old Intel Core i5-5300U ultrabook to do well in this benchmark, but it processed the same image in about 18 seconds. I had checked that both strategies were actually working, but exporting one of my Nikon D750 RAW images at full 24 megapixel resolution with the usual processing modules applied took up to twelve seconds. I’ve always had the impression that the PNG export might be slower than it has to be, but it had become a real issue since I’ve upgraded my desktop PC to a six-core Ryzen 7 1600X CPU and an NVIDIA GTX 950 GPU.ĭarktable uses OpenMP for multithreading, and OpenCL to offload most image processing routines to the GPU. My workflow is centered around darktable, but instead of exporting to JPEG format I export 16-bit PNGs so I can pass the images through some additional scripts and move the lossy compression step to the very last moment. I do a lot of photo editing nowadays, mostly for my travel blog over at One Man, One Map. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |