linux-kernelorg-stable/kernel/time
Thomas Gleixner ec2d0c0462 posix-timers: Provide a mechanism to allocate a given timer ID
Checkpoint/Restore in Userspace (CRIU) requires to reconstruct posix timers
with the same timer ID on restore. It uses sys_timer_create() and relies on
the monotonic increasing timer ID provided by this syscall. It creates and
deletes timers until the desired ID is reached. This is can loop for a long
time, when the checkpointed process had a very sparse timer ID range.

It has been debated to implement a new syscall to allow the creation of
timers with a given timer ID, but that's tideous due to the 32/64bit compat
issues of sigevent_t and of dubious value.

The restore mechanism of CRIU creates the timers in a state where all
threads of the restored process are held on a barrier and cannot issue
syscalls. That means the restorer task has exclusive control.

This allows to address this issue with a prctl() so that the restorer
thread can do:

   if (prctl(PR_TIMER_CREATE_RESTORE_IDS, PR_TIMER_CREATE_RESTORE_IDS_ON))
      goto linear_mode;
   create_timers_with_explicit_ids();
   prctl(PR_TIMER_CREATE_RESTORE_IDS, PR_TIMER_CREATE_RESTORE_IDS_OFF);
   
This is backwards compatible because the prctl() fails on older kernels and
CRIU can fall back to the linear timer ID mechanism. CRIU versions which do
not know about the prctl() just work as before.

Implement the prctl() and modify timer_create() so that it copies the
requested timer ID from userspace by utilizing the existing timer_t
pointer, which is used to copy out the allocated timer ID on success.

If the prctl() is disabled, which it is by default, timer_create() works as
before and does not try to read from the userspace pointer.

There is no problem when a broken or rogue user space application enables
the prctl(). If the user space pointer does not contain a valid ID, then
timer_create() fails. If the data is not initialized, but constains a
random valid ID, timer_create() will create that random timer ID or fail if
the ID is already given out. 
 
As CRIU must use the raw syscall to avoid manipulating the internal state
of the restored process, this has no library dependencies and can be
adopted by CRIU right away.

Recreating two timers with IDs 1000000 and 2000000 takes 1.5 seconds with
the create/delete method. With the prctl() it takes 3 microseconds.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Tested-by: Cyrill Gorcunov <gorcunov@gmail.com>
Link: https://lore.kernel.org/all/87jz8vz0en.ffs@tglx
2025-03-13 12:07:18 +01:00
..
Kconfig timekeeping: Always check for negative motion 2024-11-02 10:14:31 +01:00
Makefile timers: Move *sleep*() and timeout functions into a separate file 2024-10-16 00:36:46 +02:00
alarmtimer.c alarmtimer: Switch to use hrtimer_setup() and hrtimer_setup_on_stack() 2024-11-07 02:47:07 +01:00
clockevents.c clockevents: Shutdown and unregister current clockevents at CPUHP_AP_TICK_DYING 2024-10-31 10:41:42 +01:00
clocksource-wdtest.c clocksource/wdtest: Print time values for short udelay(1) 2025-01-15 19:49:13 +01:00
clocksource.c clocksource: Remove unnecessary strscpy() size argument 2025-03-13 11:37:44 +01:00
hrtimer.c hrtimers: Replace hrtimer_clock_to_base_table with switch-case 2025-02-18 10:12:49 +01:00
itimer.c signal: Confine POSIX_TIMERS properly 2024-10-29 11:43:18 +01:00
jiffies.c
namespace.c
ntp.c ntp: Remove invalid cast in time offset math 2024-11-28 12:02:38 +01:00
ntp_internal.h ntp: Make sure RTC is synchronized when time goes backwards 2024-09-10 13:50:40 +02:00
posix-clock.c posix-clock: Remove duplicate compat ioctl() handler 2025-02-26 16:53:58 +01:00
posix-cpu-timers.c posix-timers: Cleanup SIG_IGN workaround leftovers 2024-11-07 02:14:45 +01:00
posix-stubs.c
posix-timers.c posix-timers: Provide a mechanism to allocate a given timer ID 2025-03-13 12:07:18 +01:00
posix-timers.h posix-timers: Cleanup SIG_IGN workaround leftovers 2024-11-07 02:14:45 +01:00
sched_clock.c seqlock, treewide: Switch to non-raw seqcount_latch interface 2024-11-05 12:55:35 +01:00
sleep_timeout.c timers: Switch to use hrtimer_setup_sleeper_on_stack() 2024-11-07 02:47:06 +01:00
test_udelay.c
tick-broadcast-hrtimer.c
tick-broadcast.c tick/broadcast: Add kernel-doc for function parameters 2025-01-15 19:49:14 +01:00
tick-common.c tick/nohz_full: Don't abuse smp_call_function_single() in tick_setup_device() 2024-06-10 20:18:13 +02:00
tick-internal.h clockevents: Shutdown and unregister current clockevents at CPUHP_AP_TICK_DYING 2024-10-31 10:41:42 +01:00
tick-legacy.c
tick-oneshot.c
tick-sched.c A rather large update for timekeeping and timers: 2024-11-19 16:35:06 -08:00
tick-sched.h
time.c time: Fix references to _msecs_to_jiffies() handling of values 2024-10-25 19:50:10 +02:00
time_test.c
timeconst.bc
timeconv.c
timecounter.c
timekeeping.c timekeeping: Remove unused ktime_get_fast_timestamps() 2025-01-15 19:49:14 +01:00
timekeeping.h
timekeeping_debug.c timekeeping: Add percpu counter for tracking floor swap events 2024-10-10 10:20:46 +02:00
timekeeping_internal.h clocksource: Make negative motion detection more robust 2024-12-05 16:03:24 +01:00
timer.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
timer_list.c timer_list: Don't use %pK through printk() 2025-03-13 08:19:19 +01:00
timer_migration.c timers/migration: Fix off-by-one root mis-connection 2025-02-07 09:02:16 +01:00
timer_migration.h timer/migration: Fix kernel-doc warnings for union tmigr_state 2025-01-15 19:49:14 +01:00
vsyscall.c A rather large update for timekeeping and timers: 2024-11-19 16:35:06 -08:00