[gentoo-commits] proj/linux-patches:5.14 commit in: / - gentoo-commits

From:	Mike Pagano <mpagano@g.o>
To:	gentoo-commits@l.g.o
Subject:	[gentoo-commits] proj/linux-patches:5.14 commit in: /
Date:	Sun, 21 Nov 2021 21:14:59
Message-Id:	`1637529266.8077ca8990e6d4e9b0db60ec1e302f0699ba8d20.mpagano@gentoo`

1

commit:     8077ca8990e6d4e9b0db60ec1e302f0699ba8d20

2

Author:     Mike Pagano <mpagano <AT> gentoo <DOT> org>

3

AuthorDate: Sun Nov 21 21:14:26 2021 +0000

4

Commit:     Mike Pagano <mpagano <AT> gentoo <DOT> org>

5

CommitDate: Sun Nov 21 21:14:26 2021 +0000

6

URL:        https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=8077ca89

7

8

Remove BMQ, will add back with fixed patch

9

10

Signed-off-by: Mike Pagano <mpagano <AT> gentoo.org>

11

12

 0000_README                                  |    8 -

13

 5020_BMQ-and-PDS-io-scheduler-v5.14-r3.patch | 9523 --------------------------

14

 5021_BMQ-and-PDS-gentoo-defaults.patch       |   13 -

15

 3 files changed, 9544 deletions(-)

16

17

diff --git a/0000_README b/0000_README

18

index e8f44666..35f55e4e 100644

19

--- a/0000_README

20

+++ b/0000_README

21

@@ -166,11 +166,3 @@ Desc:   UID/GID shifting overlay filesystem for containers

22

 Patch:  5010_enable-cpu-optimizations-universal.patch

23

 From:   https://github.com/graysky2/kernel_compiler_patch

24

 Desc:   Kernel >= 5.8 patch enables gcc = v9+ optimizations for additional CPUs.

25

-

26

-Patch:  5020_BMQ-and-PDS-io-scheduler-v5.14-r3.patch

27

-From:   https://gitlab.com/alfredchen/linux-prjc

28

-Desc:   BMQ(BitMap Queue) Scheduler. A new CPU scheduler developed from PDS(incld). Inspired by the scheduler in zircon.

29

-

30

-Patch:  5021_BMQ-and-PDS-gentoo-defaults.patch

31

-From:   https://gitweb.gentoo.org/proj/linux-patches.git/

32

-Desc:   Set defaults for BMQ. Add archs as people test, default to N

33

34

diff --git a/5020_BMQ-and-PDS-io-scheduler-v5.14-r3.patch b/5020_BMQ-and-PDS-io-scheduler-v5.14-r3.patch

35

deleted file mode 100644

36

index cf68d7ea..00000000

37

--- a/5020_BMQ-and-PDS-io-scheduler-v5.14-r3.patch

38

+++ /dev/null

39

@@ -1,9523 +0,0 @@

40

-diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt

41

-index bdb22006f713..d755d7df632f 100644

42

---- a/Documentation/admin-guide/kernel-parameters.txt

43

-+++ b/Documentation/admin-guide/kernel-parameters.txt

44

-@@ -4947,6 +4947,12 @@

45

-

46

- 	sbni=		[NET] Granch SBNI12 leased line adapter

47

-

48

-+	sched_timeslice=

49

-+			[KNL] Time slice in ms for Project C BMQ/PDS scheduler.

50

-+			Format: integer 2, 4

51

-+			Default: 4

52

-+			See Documentation/scheduler/sched-BMQ.txt

53

-+

54

- 	sched_verbose	[KNL] Enables verbose scheduler debug messages.

55

-

56

- 	schedstats=	[KNL,X86] Enable or disable scheduled statistics.

57

-diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst

58

-index 426162009ce9..15ac2d7e47cd 100644

59

---- a/Documentation/admin-guide/sysctl/kernel.rst

60

-+++ b/Documentation/admin-guide/sysctl/kernel.rst

61

-@@ -1542,3 +1542,13 @@ is 10 seconds.

62

-

63

- The softlockup threshold is (``2 * watchdog_thresh``). Setting this

64

- tunable to zero will disable lockup detection altogether.

65

-+

66

-+yield_type:

67

-+===========

68

-+

69

-+BMQ/PDS CPU scheduler only. This determines what type of yield calls

70

-+to sched_yield will perform.

71

-+

72

-+  0 - No yield.

73

-+  1 - Deboost and requeue task. (default)

74

-+  2 - Set run queue skip task.

75

-diff --git a/Documentation/scheduler/sched-BMQ.txt b/Documentation/scheduler/sched-BMQ.txt

76

-new file mode 100644

77

-index 000000000000..05c84eec0f31

78

---- /dev/null

79

-+++ b/Documentation/scheduler/sched-BMQ.txt

80

-@@ -0,0 +1,110 @@

81

-+                         BitMap queue CPU Scheduler

82

-+                         --------------------------

83

-+

84

-+CONTENT

85

-+========

86

-+

87

-+ Background

88

-+ Design

89

-+   Overview

90

-+   Task policy

91

-+   Priority management

92

-+   BitMap Queue

93

-+   CPU Assignment and Migration

94

-+

95

-+

96

-+Background

97

-+==========

98

-+

99

-+BitMap Queue CPU scheduler, referred to as BMQ from here on, is an evolution

100

-+of previous Priority and Deadline based Skiplist multiple queue scheduler(PDS),

101

-+and inspired by Zircon scheduler. The goal of it is to keep the scheduler code

102

-+simple, while efficiency and scalable for interactive tasks, such as desktop,

103

-+movie playback and gaming etc.

104

-+

105

-+Design

106

-+======

107

-+

108

-+Overview

109

-+--------

110

-+

111

-+BMQ use per CPU run queue design, each CPU(logical) has it's own run queue,

112

-+each CPU is responsible for scheduling the tasks that are putting into it's

113

-+run queue.

114

-+

115

-+The run queue is a set of priority queues. Note that these queues are fifo

116

-+queue for non-rt tasks or priority queue for rt tasks in data structure. See

117

-+BitMap Queue below for details. BMQ is optimized for non-rt tasks in the fact

118

-+that most applications are non-rt tasks. No matter the queue is fifo or

119

-+priority, In each queue is an ordered list of runnable tasks awaiting execution

120

-+and the data structures are the same. When it is time for a new task to run,

121

-+the scheduler simply looks the lowest numbered queueue that contains a task,

122

-+and runs the first task from the head of that queue. And per CPU idle task is

123

-+also in the run queue, so the scheduler can always find a task to run on from

124

-+its run queue.

125

-+

126

-+Each task will assigned the same timeslice(default 4ms) when it is picked to

127

-+start running. Task will be reinserted at the end of the appropriate priority

128

-+queue when it uses its whole timeslice. When the scheduler selects a new task

129

-+from the priority queue it sets the CPU's preemption timer for the remainder of

130

-+the previous timeslice. When that timer fires the scheduler will stop execution

131

-+on that task, select another task and start over again.

132

-+

133

-+If a task blocks waiting for a shared resource then it's taken out of its

134

-+priority queue and is placed in a wait queue for the shared resource. When it

135

-+is unblocked it will be reinserted in the appropriate priority queue of an

136

-+eligible CPU.

137

-+

138

-+Task policy

139

-+-----------

140

-+

141

-+BMQ supports DEADLINE, FIFO, RR, NORMAL, BATCH and IDLE task policy like the

142

-+mainline CFS scheduler. But BMQ is heavy optimized for non-rt task, that's

143

-+NORMAL/BATCH/IDLE policy tasks. Below is the implementation detail of each

144

-+policy.

145

-+

146

-+DEADLINE

147

-+	It is squashed as priority 0 FIFO task.

148

-+

149

-+FIFO/RR

150

-+	All RT tasks share one single priority queue in BMQ run queue designed. The

151

-+complexity of insert operation is O(n). BMQ is not designed for system runs

152

-+with major rt policy tasks.

153

-+

154

-+NORMAL/BATCH/IDLE

155

-+	BATCH and IDLE tasks are treated as the same policy. They compete CPU with

156

-+NORMAL policy tasks, but they just don't boost. To control the priority of

157

-+NORMAL/BATCH/IDLE tasks, simply use nice level.

158

-+

159

-+ISO

160

-+	ISO policy is not supported in BMQ. Please use nice level -20 NORMAL policy

161

-+task instead.

162

-+

163

-+Priority management

164

-+-------------------

165

-+

166

-+RT tasks have priority from 0-99. For non-rt tasks, there are three different

167

-+factors used to determine the effective priority of a task. The effective

168

-+priority being what is used to determine which queue it will be in.

169

-+

170

-+The first factor is simply the task’s static priority. Which is assigned from

171

-+task's nice level, within [-20, 19] in userland's point of view and [0, 39]

172

-+internally.

173

-+

174

-+The second factor is the priority boost. This is a value bounded between

175

-+[-MAX_PRIORITY_ADJ, MAX_PRIORITY_ADJ] used to offset the base priority, it is

176

-+modified by the following cases:

177

-+

178

-+*When a thread has used up its entire timeslice, always deboost its boost by

179

-+increasing by one.

180

-+*When a thread gives up cpu control(voluntary or non-voluntary) to reschedule,

181

-+and its switch-in time(time after last switch and run) below the thredhold

182

-+based on its priority boost, will boost its boost by decreasing by one buti is

183

-+capped at 0 (won’t go negative).

184

-+

185

-+The intent in this system is to ensure that interactive threads are serviced

186

-+quickly. These are usually the threads that interact directly with the user

187

-+and cause user-perceivable latency. These threads usually do little work and

188

-+spend most of their time blocked awaiting another user event. So they get the

189

-+priority boost from unblocking while background threads that do most of the

190

-+processing receive the priority penalty for using their entire timeslice.

191

-diff --git a/fs/proc/base.c b/fs/proc/base.c

192

-index e5b5f7709d48..284b3c4b7d90 100644

193

---- a/fs/proc/base.c

194

-+++ b/fs/proc/base.c

195

-@@ -476,7 +476,7 @@ static int proc_pid_schedstat(struct seq_file *m, struct pid_namespace *ns,

196

- 		seq_puts(m, "0 0 0\n");

197

- 	else

198

- 		seq_printf(m, "%llu %llu %lu\n",

199

--		   (unsigned long long)task->se.sum_exec_runtime,

200

-+		   (unsigned long long)tsk_seruntime(task),

201

- 		   (unsigned long long)task->sched_info.run_delay,

202

- 		   task->sched_info.pcount);

203

-

204

-diff --git a/include/asm-generic/resource.h b/include/asm-generic/resource.h

205

-index 8874f681b056..59eb72bf7d5f 100644

206

---- a/include/asm-generic/resource.h

207

-+++ b/include/asm-generic/resource.h

208

-@@ -23,7 +23,7 @@

209

- 	[RLIMIT_LOCKS]		= {  RLIM_INFINITY,  RLIM_INFINITY },	\

210

- 	[RLIMIT_SIGPENDING]	= { 		0,	       0 },	\

211

- 	[RLIMIT_MSGQUEUE]	= {   MQ_BYTES_MAX,   MQ_BYTES_MAX },	\

212

--	[RLIMIT_NICE]		= { 0, 0 },				\

213

-+	[RLIMIT_NICE]		= { 30, 30 },				\

214

- 	[RLIMIT_RTPRIO]		= { 0, 0 },				\

215

- 	[RLIMIT_RTTIME]		= {  RLIM_INFINITY,  RLIM_INFINITY },	\

216

- }

217

-diff --git a/include/linux/sched.h b/include/linux/sched.h

218

-index ec8d07d88641..b12f660404fd 100644

219

---- a/include/linux/sched.h

220

-+++ b/include/linux/sched.h

221

-@@ -681,12 +681,18 @@ struct task_struct {

222

- 	unsigned int			ptrace;

223

-

224

- #ifdef CONFIG_SMP

225

--	int				on_cpu;

226

- 	struct __call_single_node	wake_entry;

227

-+#endif

228

-+#if defined(CONFIG_SMP) || defined(CONFIG_SCHED_ALT)

229

-+	int				on_cpu;

230

-+#endif

231

-+

232

-+#ifdef CONFIG_SMP

233

- #ifdef CONFIG_THREAD_INFO_IN_TASK

234

- 	/* Current CPU: */

235

- 	unsigned int			cpu;

236

- #endif

237

-+#ifndef CONFIG_SCHED_ALT

238

- 	unsigned int			wakee_flips;

239

- 	unsigned long			wakee_flip_decay_ts;

240

- 	struct task_struct		*last_wakee;

241

-@@ -700,6 +706,7 @@ struct task_struct {

242

- 	 */

243

- 	int				recent_used_cpu;

244

- 	int				wake_cpu;

245

-+#endif /* !CONFIG_SCHED_ALT */

246

- #endif

247

- 	int				on_rq;

248

-

249

-@@ -708,6 +715,20 @@ struct task_struct {

250

- 	int				normal_prio;

251

- 	unsigned int			rt_priority;

252

-

253

-+#ifdef CONFIG_SCHED_ALT

254

-+	u64				last_ran;

255

-+	s64				time_slice;

256

-+	int				sq_idx;

257

-+	struct list_head		sq_node;

258

-+#ifdef CONFIG_SCHED_BMQ

259

-+	int				boost_prio;

260

-+#endif /* CONFIG_SCHED_BMQ */

261

-+#ifdef CONFIG_SCHED_PDS

262

-+	u64				deadline;

263

-+#endif /* CONFIG_SCHED_PDS */

264

-+	/* sched_clock time spent running */

265

-+	u64				sched_time;

266

-+#else /* !CONFIG_SCHED_ALT */

267

- 	const struct sched_class	*sched_class;

268

- 	struct sched_entity		se;

269

- 	struct sched_rt_entity		rt;

270

-@@ -718,6 +739,7 @@ struct task_struct {

271

- 	unsigned long			core_cookie;

272

- 	unsigned int			core_occupation;

273

- #endif

274

-+#endif /* !CONFIG_SCHED_ALT */

275

-

276

- #ifdef CONFIG_CGROUP_SCHED

277

- 	struct task_group		*sched_task_group;

278

-@@ -1417,6 +1439,15 @@ struct task_struct {

279

- 	 */

280

- };

281

-

282

-+#ifdef CONFIG_SCHED_ALT

283

-+#define tsk_seruntime(t)		((t)->sched_time)

284

-+/* replace the uncertian rt_timeout with 0UL */

285

-+#define tsk_rttimeout(t)		(0UL)

286

-+#else /* CFS */

287

-+#define tsk_seruntime(t)	((t)->se.sum_exec_runtime)

288

-+#define tsk_rttimeout(t)	((t)->rt.timeout)

289

-+#endif /* !CONFIG_SCHED_ALT */

290

-+

291

- static inline struct pid *task_pid(struct task_struct *task)

292

- {

293

- 	return task->thread_pid;

294

-diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h

295

-index 1aff00b65f3c..216fdf2fe90c 100644

296

---- a/include/linux/sched/deadline.h

297

-+++ b/include/linux/sched/deadline.h

298

-@@ -1,5 +1,24 @@

299

- /* SPDX-License-Identifier: GPL-2.0 */

300

-

301

-+#ifdef CONFIG_SCHED_ALT

302

-+

303

-+static inline int dl_task(struct task_struct *p)

304

-+{

305

-+	return 0;

306

-+}

307

-+

308

-+#ifdef CONFIG_SCHED_BMQ

309

-+#define __tsk_deadline(p)	(0UL)

310

-+#endif

311

-+

312

-+#ifdef CONFIG_SCHED_PDS

313

-+#define __tsk_deadline(p)	((((u64) ((p)->prio))<<56) | (p)->deadline)

314

-+#endif

315

-+

316

-+#else

317

-+

318

-+#define __tsk_deadline(p)	((p)->dl.deadline)

319

-+

320

- /*

321

-  * SCHED_DEADLINE tasks has negative priorities, reflecting

322

-  * the fact that any of them has higher prio than RT and

323

-@@ -19,6 +38,7 @@ static inline int dl_task(struct task_struct *p)

324

- {

325

- 	return dl_prio(p->prio);

326

- }

327

-+#endif /* CONFIG_SCHED_ALT */

328

-

329

- static inline bool dl_time_before(u64 a, u64 b)

330

- {

331

-diff --git a/include/linux/sched/prio.h b/include/linux/sched/prio.h

332

-index ab83d85e1183..6af9ae681116 100644

333

---- a/include/linux/sched/prio.h

334

-+++ b/include/linux/sched/prio.h

335

-@@ -18,6 +18,32 @@

336

- #define MAX_PRIO		(MAX_RT_PRIO + NICE_WIDTH)

337

- #define DEFAULT_PRIO		(MAX_RT_PRIO + NICE_WIDTH / 2)

338

-

339

-+#ifdef CONFIG_SCHED_ALT

340

-+

341

-+/* Undefine MAX_PRIO and DEFAULT_PRIO */

342

-+#undef MAX_PRIO

343

-+#undef DEFAULT_PRIO

344

-+

345

-+/* +/- priority levels from the base priority */

346

-+#ifdef CONFIG_SCHED_BMQ

347

-+#define MAX_PRIORITY_ADJ	(7)

348

-+

349

-+#define MIN_NORMAL_PRIO		(MAX_RT_PRIO)

350

-+#define MAX_PRIO		(MIN_NORMAL_PRIO + NICE_WIDTH)

351

-+#define DEFAULT_PRIO		(MIN_NORMAL_PRIO + NICE_WIDTH / 2)

352

-+#endif

353

-+

354

-+#ifdef CONFIG_SCHED_PDS

355

-+#define MAX_PRIORITY_ADJ	(0)

356

-+

357

-+#define MIN_NORMAL_PRIO		(128)

358

-+#define NORMAL_PRIO_NUM		(64)

359

-+#define MAX_PRIO		(MIN_NORMAL_PRIO + NORMAL_PRIO_NUM)

360

-+#define DEFAULT_PRIO		(MAX_PRIO - NICE_WIDTH / 2)

361

-+#endif

362

-+

363

-+#endif /* CONFIG_SCHED_ALT */

364

-+

365

- /*

366

-  * Convert user-nice values [ -20 ... 0 ... 19 ]

367

-  * to static priority [ MAX_RT_PRIO..MAX_PRIO-1 ],

368

-diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h

369

-index e5af028c08b4..0a7565d0d3cf 100644

370

---- a/include/linux/sched/rt.h

371

-+++ b/include/linux/sched/rt.h

372

-@@ -24,8 +24,10 @@ static inline bool task_is_realtime(struct task_struct *tsk)

373

-

374

- 	if (policy == SCHED_FIFO || policy == SCHED_RR)

375

- 		return true;

376

-+#ifndef CONFIG_SCHED_ALT

377

- 	if (policy == SCHED_DEADLINE)

378

- 		return true;

379

-+#endif

380

- 	return false;

381

- }

382

-

383

-diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h

384

-index 8f0f778b7c91..991f2280475b 100644

385

---- a/include/linux/sched/topology.h

386

-+++ b/include/linux/sched/topology.h

387

-@@ -225,7 +225,8 @@ static inline bool cpus_share_cache(int this_cpu, int that_cpu)

388

-

389

- #endif	/* !CONFIG_SMP */

390

-

391

--#if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)

392

-+#if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL) && \

393

-+	!defined(CONFIG_SCHED_ALT)

394

- extern void rebuild_sched_domains_energy(void);

395

- #else

396

- static inline void rebuild_sched_domains_energy(void)

397

-diff --git a/init/Kconfig b/init/Kconfig

398

-index 55f9f7738ebb..9a9b244d3ca3 100644

399

---- a/init/Kconfig

400

-+++ b/init/Kconfig

401

-@@ -786,9 +786,39 @@ config GENERIC_SCHED_CLOCK

402

-

403

- menu "Scheduler features"

404

-

405

-+menuconfig SCHED_ALT

406

-+	bool "Alternative CPU Schedulers"

407

-+	default y

408

-+	help

409

-+	  This feature enable alternative CPU scheduler"

410

-+

411

-+if SCHED_ALT

412

-+

413

-+choice

414

-+	prompt "Alternative CPU Scheduler"

415

-+	default SCHED_BMQ

416

-+

417

-+config SCHED_BMQ

418

-+	bool "BMQ CPU scheduler"

419

-+	help

420

-+	  The BitMap Queue CPU scheduler for excellent interactivity and

421

-+	  responsiveness on the desktop and solid scalability on normal

422

-+	  hardware and commodity servers.

423

-+

424

-+config SCHED_PDS

425

-+	bool "PDS CPU scheduler"

426

-+	help

427

-+	  The Priority and Deadline based Skip list multiple queue CPU

428

-+	  Scheduler.

429

-+

430

-+endchoice

431

-+

432

-+endif

433

-+

434

- config UCLAMP_TASK

435

- 	bool "Enable utilization clamping for RT/FAIR tasks"

436

- 	depends on CPU_FREQ_GOV_SCHEDUTIL

437

-+	depends on !SCHED_ALT

438

- 	help

439

- 	  This feature enables the scheduler to track the clamped utilization

440

- 	  of each CPU based on RUNNABLE tasks scheduled on that CPU.

441

-@@ -874,6 +904,7 @@ config NUMA_BALANCING

442

- 	depends on ARCH_SUPPORTS_NUMA_BALANCING

443

- 	depends on !ARCH_WANT_NUMA_VARIABLE_LOCALITY

444

- 	depends on SMP && NUMA && MIGRATION

445

-+	depends on !SCHED_ALT

446

- 	help

447

- 	  This option adds support for automatic NUMA aware memory/task placement.

448

- 	  The mechanism is quite primitive and is based on migrating memory when

449

-@@ -966,6 +997,7 @@ config FAIR_GROUP_SCHED

450

- 	depends on CGROUP_SCHED

451

- 	default CGROUP_SCHED

452

-

453

-+if !SCHED_ALT

454

- config CFS_BANDWIDTH

455

- 	bool "CPU bandwidth provisioning for FAIR_GROUP_SCHED"

456

- 	depends on FAIR_GROUP_SCHED

457

-@@ -988,6 +1020,7 @@ config RT_GROUP_SCHED

458

- 	  realtime bandwidth for them.

459

- 	  See Documentation/scheduler/sched-rt-group.rst for more information.

460

-

461

-+endif #!SCHED_ALT

462

- endif #CGROUP_SCHED

463

-

464

- config UCLAMP_TASK_GROUP

465

-@@ -1231,6 +1264,7 @@ config CHECKPOINT_RESTORE

466

-

467

- config SCHED_AUTOGROUP

468

- 	bool "Automatic process group scheduling"

469

-+	depends on !SCHED_ALT

470

- 	select CGROUPS

471

- 	select CGROUP_SCHED

472

- 	select FAIR_GROUP_SCHED

473

-diff --git a/init/init_task.c b/init/init_task.c

474

-index 562f2ef8d157..177b63db4ce0 100644

475

---- a/init/init_task.c

476

-+++ b/init/init_task.c

477

-@@ -75,9 +75,15 @@ struct task_struct init_task

478

- 	.stack		= init_stack,

479

- 	.usage		= REFCOUNT_INIT(2),

480

- 	.flags		= PF_KTHREAD,

481

-+#ifdef CONFIG_SCHED_ALT

482

-+	.prio		= DEFAULT_PRIO + MAX_PRIORITY_ADJ,

483

-+	.static_prio	= DEFAULT_PRIO,

484

-+	.normal_prio	= DEFAULT_PRIO + MAX_PRIORITY_ADJ,

485

-+#else

486

- 	.prio		= MAX_PRIO - 20,

487

- 	.static_prio	= MAX_PRIO - 20,

488

- 	.normal_prio	= MAX_PRIO - 20,

489

-+#endif

490

- 	.policy		= SCHED_NORMAL,

491

- 	.cpus_ptr	= &init_task.cpus_mask,

492

- 	.cpus_mask	= CPU_MASK_ALL,

493

-@@ -87,6 +93,17 @@ struct task_struct init_task

494

- 	.restart_block	= {

495

- 		.fn = do_no_restart_syscall,

496

- 	},

497

-+#ifdef CONFIG_SCHED_ALT

498

-+	.sq_node	= LIST_HEAD_INIT(init_task.sq_node),

499

-+#ifdef CONFIG_SCHED_BMQ

500

-+	.boost_prio	= 0,

501

-+	.sq_idx		= 15,

502

-+#endif

503

-+#ifdef CONFIG_SCHED_PDS

504

-+	.deadline	= 0,

505

-+#endif

506

-+	.time_slice	= HZ,

507

-+#else

508

- 	.se		= {

509

- 		.group_node 	= LIST_HEAD_INIT(init_task.se.group_node),

510

- 	},

511

-@@ -94,6 +111,7 @@ struct task_struct init_task

512

- 		.run_list	= LIST_HEAD_INIT(init_task.rt.run_list),

513

- 		.time_slice	= RR_TIMESLICE,

514

- 	},

515

-+#endif

516

- 	.tasks		= LIST_HEAD_INIT(init_task.tasks),

517

- #ifdef CONFIG_SMP

518

- 	.pushable_tasks	= PLIST_NODE_INIT(init_task.pushable_tasks, MAX_PRIO),

519

-diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt

520

-index 5876e30c5740..7594d0a31869 100644

521

---- a/kernel/Kconfig.preempt

522

-+++ b/kernel/Kconfig.preempt

523

-@@ -102,7 +102,7 @@ config PREEMPT_DYNAMIC

524

-

525

- config SCHED_CORE

526

- 	bool "Core Scheduling for SMT"

527

--	depends on SCHED_SMT

528

-+	depends on SCHED_SMT && !SCHED_ALT

529

- 	help

530

- 	  This option permits Core Scheduling, a means of coordinated task

531

- 	  selection across SMT siblings. When enabled -- see

532

-diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c

533

-index adb5190c4429..8c02bce63146 100644

534

---- a/kernel/cgroup/cpuset.c

535

-+++ b/kernel/cgroup/cpuset.c

536

-@@ -636,7 +636,7 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)

537

- 	return ret;

538

- }

539

-

540

--#ifdef CONFIG_SMP

541

-+#if defined(CONFIG_SMP) && !defined(CONFIG_SCHED_ALT)

542

- /*

543

-  * Helper routine for generate_sched_domains().

544

-  * Do cpusets a, b have overlapping effective cpus_allowed masks?

545

-@@ -1032,7 +1032,7 @@ static void rebuild_sched_domains_locked(void)

546

- 	/* Have scheduler rebuild the domains */

547

- 	partition_and_rebuild_sched_domains(ndoms, doms, attr);

548

- }

549

--#else /* !CONFIG_SMP */

550

-+#else /* !CONFIG_SMP || CONFIG_SCHED_ALT */

551

- static void rebuild_sched_domains_locked(void)

552

- {

553

- }

554

-diff --git a/kernel/delayacct.c b/kernel/delayacct.c

555

-index 51530d5b15a8..e542d71bb94b 100644

556

---- a/kernel/delayacct.c

557

-+++ b/kernel/delayacct.c

558

-@@ -139,7 +139,7 @@ int delayacct_add_tsk(struct taskstats *d, struct task_struct *tsk)

559

- 	 */

560

- 	t1 = tsk->sched_info.pcount;

561

- 	t2 = tsk->sched_info.run_delay;

562

--	t3 = tsk->se.sum_exec_runtime;

563

-+	t3 = tsk_seruntime(tsk);

564

-

565

- 	d->cpu_count += t1;

566

-

567

-diff --git a/kernel/exit.c b/kernel/exit.c

568

-index 9a89e7f36acb..7fe34c56bd08 100644

569

---- a/kernel/exit.c

570

-+++ b/kernel/exit.c

571

-@@ -122,7 +122,7 @@ static void __exit_signal(struct task_struct *tsk)

572

- 			sig->curr_target = next_thread(tsk);

573

- 	}

574

-

575

--	add_device_randomness((const void*) &tsk->se.sum_exec_runtime,

576

-+	add_device_randomness((const void*) &tsk_seruntime(tsk),

577

- 			      sizeof(unsigned long long));

578

-

579

- 	/*

580

-@@ -143,7 +143,7 @@ static void __exit_signal(struct task_struct *tsk)

581

- 	sig->inblock += task_io_get_inblock(tsk);

582

- 	sig->oublock += task_io_get_oublock(tsk);

583

- 	task_io_accounting_add(&sig->ioac, &tsk->ioac);

584

--	sig->sum_sched_runtime += tsk->se.sum_exec_runtime;

585

-+	sig->sum_sched_runtime += tsk_seruntime(tsk);

586

- 	sig->nr_threads--;

587

- 	__unhash_process(tsk, group_dead);

588

- 	write_sequnlock(&sig->stats_lock);

589

-diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c

590

-index 3a4beb9395c4..98a709628cb3 100644

591

---- a/kernel/livepatch/transition.c

592

-+++ b/kernel/livepatch/transition.c

593

-@@ -307,7 +307,11 @@ static bool klp_try_switch_task(struct task_struct *task)

594

- 	 */

595

- 	rq = task_rq_lock(task, &flags);

596

-

597

-+#ifdef	CONFIG_SCHED_ALT

598

-+	if (task_running(task) && task != current) {

599

-+#else

600

- 	if (task_running(rq, task) && task != current) {

601

-+#endif

602

- 		snprintf(err_buf, STACK_ERR_BUF_SIZE,

603

- 			 "%s: %s:%d is running\n", __func__, task->comm,

604

- 			 task->pid);

605

-diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c

606

-index ad0db322ed3b..350b0e506c17 100644

607

---- a/kernel/locking/rtmutex.c

608

-+++ b/kernel/locking/rtmutex.c

609

-@@ -227,14 +227,18 @@ static __always_inline bool unlock_rt_mutex_safe(struct rt_mutex *lock,

610

-  * Only use with rt_mutex_waiter_{less,equal}()

611

-  */

612

- #define task_to_waiter(p)	\

613

--	&(struct rt_mutex_waiter){ .prio = (p)->prio, .deadline = (p)->dl.deadline }

614

-+	&(struct rt_mutex_waiter){ .prio = (p)->prio, .deadline = __tsk_deadline(p) }

615

-

616

- static __always_inline int rt_mutex_waiter_less(struct rt_mutex_waiter *left,

617

- 						struct rt_mutex_waiter *right)

618

- {

619

-+#ifdef CONFIG_SCHED_PDS

620

-+	return (left->deadline < right->deadline);

621

-+#else

622

- 	if (left->prio < right->prio)

623

- 		return 1;

624

-

625

-+#ifndef CONFIG_SCHED_BMQ

626

- 	/*

627

- 	 * If both waiters have dl_prio(), we check the deadlines of the

628

- 	 * associated tasks.

629

-@@ -243,16 +247,22 @@ static __always_inline int rt_mutex_waiter_less(struct rt_mutex_waiter *left,

630

- 	 */

631

- 	if (dl_prio(left->prio))

632

- 		return dl_time_before(left->deadline, right->deadline);

633

-+#endif

634

-

635

- 	return 0;

636

-+#endif

637

- }

638

-

639

- static __always_inline int rt_mutex_waiter_equal(struct rt_mutex_waiter *left,

640

- 						 struct rt_mutex_waiter *right)

641

- {

642

-+#ifdef CONFIG_SCHED_PDS

643

-+	return (left->deadline == right->deadline);

644

-+#else

645

- 	if (left->prio != right->prio)

646

- 		return 0;

647

-

648

-+#ifndef CONFIG_SCHED_BMQ

649

- 	/*

650

- 	 * If both waiters have dl_prio(), we check the deadlines of the

651

- 	 * associated tasks.

652

-@@ -261,8 +271,10 @@ static __always_inline int rt_mutex_waiter_equal(struct rt_mutex_waiter *left,

653

- 	 */

654

- 	if (dl_prio(left->prio))

655

- 		return left->deadline == right->deadline;

656

-+#endif

657

-

658

- 	return 1;

659

-+#endif

660

- }

661

-

662

- #define __node_2_waiter(node) \

663

-@@ -654,7 +666,7 @@ static int __sched rt_mutex_adjust_prio_chain(struct task_struct *task,

664

- 	 * the values of the node being removed.

665

- 	 */

666

- 	waiter->prio = task->prio;

667

--	waiter->deadline = task->dl.deadline;

668

-+	waiter->deadline = __tsk_deadline(task);

669

-

670

- 	rt_mutex_enqueue(lock, waiter);

671

-

672

-@@ -925,7 +937,7 @@ static int __sched task_blocks_on_rt_mutex(struct rt_mutex *lock,

673

- 	waiter->task = task;

674

- 	waiter->lock = lock;

675

- 	waiter->prio = task->prio;

676

--	waiter->deadline = task->dl.deadline;

677

-+	waiter->deadline = __tsk_deadline(task);

678

-

679

- 	/* Get the top priority waiter on the lock */

680

- 	if (rt_mutex_has_waiters(lock))

681

-diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile

682

-index 978fcfca5871..0425ee149b4d 100644

683

---- a/kernel/sched/Makefile

684

-+++ b/kernel/sched/Makefile

685

-@@ -22,14 +22,21 @@ ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y)

686

- CFLAGS_core.o := $(PROFILING) -fno-omit-frame-pointer

687

- endif

688

-

689

--obj-y += core.o loadavg.o clock.o cputime.o

690

--obj-y += idle.o fair.o rt.o deadline.o

691

--obj-y += wait.o wait_bit.o swait.o completion.o

692

--

693

--obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o pelt.o

694

-+ifdef CONFIG_SCHED_ALT

695

-+obj-y += alt_core.o

696

-+obj-$(CONFIG_SCHED_DEBUG) += alt_debug.o

697

-+else

698

-+obj-y += core.o

699

-+obj-y += fair.o rt.o deadline.o

700

-+obj-$(CONFIG_SMP) += cpudeadline.o stop_task.o

701

- obj-$(CONFIG_SCHED_AUTOGROUP) += autogroup.o

702

--obj-$(CONFIG_SCHEDSTATS) += stats.o

703

-+endif

704

- obj-$(CONFIG_SCHED_DEBUG) += debug.o

705

-+obj-y += loadavg.o clock.o cputime.o

706

-+obj-y += idle.o

707

-+obj-y += wait.o wait_bit.o swait.o completion.o

708

-+obj-$(CONFIG_SMP) += cpupri.o pelt.o topology.o

709

-+obj-$(CONFIG_SCHEDSTATS) += stats.o

710

- obj-$(CONFIG_CGROUP_CPUACCT) += cpuacct.o

711

- obj-$(CONFIG_CPU_FREQ) += cpufreq.o

712

- obj-$(CONFIG_CPU_FREQ_GOV_SCHEDUTIL) += cpufreq_schedutil.o

713

-diff --git a/kernel/sched/alt_core.c b/kernel/sched/alt_core.c

714

-new file mode 100644

715

-index 000000000000..56aed2b1e42c

716

---- /dev/null

717

-+++ b/kernel/sched/alt_core.c

718

-@@ -0,0 +1,7341 @@

719

-+/*

720

-+ *  kernel/sched/alt_core.c

721

-+ *

722

-+ *  Core alternative kernel scheduler code and related syscalls

723

-+ *

724

-+ *  Copyright (C) 1991-2002  Linus Torvalds

725

-+ *

726

-+ *  2009-08-13	Brainfuck deadline scheduling policy by Con Kolivas deletes

727

-+ *		a whole lot of those previous things.

728

-+ *  2017-09-06	Priority and Deadline based Skip list multiple queue kernel

729

-+ *		scheduler by Alfred Chen.

730

-+ *  2019-02-20	BMQ(BitMap Queue) kernel scheduler by Alfred Chen.

731

-+ */

732

-+#define CREATE_TRACE_POINTS

733

-+#include <trace/events/sched.h>

734

-+#undef CREATE_TRACE_POINTS

735

-+

736

-+#include "sched.h"

737

-+

738

-+#include <linux/sched/rt.h>

739

-+

740

-+#include <linux/context_tracking.h>

741

-+#include <linux/compat.h>

742

-+#include <linux/blkdev.h>

743

-+#include <linux/delayacct.h>

744

-+#include <linux/freezer.h>

745

-+#include <linux/init_task.h>

746

-+#include <linux/kprobes.h>

747

-+#include <linux/mmu_context.h>

748

-+#include <linux/nmi.h>

749

-+#include <linux/profile.h>

750

-+#include <linux/rcupdate_wait.h>

751

-+#include <linux/security.h>

752

-+#include <linux/syscalls.h>

753

-+#include <linux/wait_bit.h>

754

-+

755

-+#include <linux/kcov.h>

756

-+#include <linux/scs.h>

757

-+

758

-+#include <asm/switch_to.h>

759

-+

760

-+#include "../workqueue_internal.h"

761

-+#include "../../fs/io-wq.h"

762

-+#include "../smpboot.h"

763

-+

764

-+#include "pelt.h"

765

-+#include "smp.h"

766

-+

767

-+/*

768

-+ * Export tracepoints that act as a bare tracehook (ie: have no trace event

769

-+ * associated with them) to allow external modules to probe them.

770

-+ */

771

-+EXPORT_TRACEPOINT_SYMBOL_GPL(pelt_irq_tp);

772

-+

773

-+#ifdef CONFIG_SCHED_DEBUG

774

-+#define sched_feat(x)	(1)

775

-+/*

776

-+ * Print a warning if need_resched is set for the given duration (if

777

-+ * LATENCY_WARN is enabled).

778

-+ *

779

-+ * If sysctl_resched_latency_warn_once is set, only one warning will be shown

780

-+ * per boot.

781

-+ */

782

-+__read_mostly int sysctl_resched_latency_warn_ms = 100;

783

-+__read_mostly int sysctl_resched_latency_warn_once = 1;

784

-+#else

785

-+#define sched_feat(x)	(0)

786

-+#endif /* CONFIG_SCHED_DEBUG */

787

-+

788

-+#define ALT_SCHED_VERSION "v5.14-r3"

789

-+

790

-+/* rt_prio(prio) defined in include/linux/sched/rt.h */

791

-+#define rt_task(p)		rt_prio((p)->prio)

792

-+#define rt_policy(policy)	((policy) == SCHED_FIFO || (policy) == SCHED_RR)

793

-+#define task_has_rt_policy(p)	(rt_policy((p)->policy))

794

-+

795

-+#define STOP_PRIO		(MAX_RT_PRIO - 1)

796

-+

797

-+/* Default time slice is 4 in ms, can be set via kernel parameter "sched_timeslice" */

798

-+u64 sched_timeslice_ns __read_mostly = (4 << 20);

799

-+

800

-+static inline void requeue_task(struct task_struct *p, struct rq *rq);

801

-+

802

-+#ifdef CONFIG_SCHED_BMQ

803

-+#include "bmq.h"

804

-+#endif

805

-+#ifdef CONFIG_SCHED_PDS

806

-+#include "pds.h"

807

-+#endif

808

-+

809

-+static int __init sched_timeslice(char *str)

810

-+{

811

-+	int timeslice_ms;

812

-+

813

-+	get_option(&str, &timeslice_ms);

814

-+	if (2 != timeslice_ms)

815

-+		timeslice_ms = 4;

816

-+	sched_timeslice_ns = timeslice_ms << 20;

817

-+	sched_timeslice_imp(timeslice_ms);

818

-+

819

-+	return 0;

820

-+}

821

-+early_param("sched_timeslice", sched_timeslice);

822

-+

823

-+/* Reschedule if less than this many μs left */

824

-+#define RESCHED_NS		(100 << 10)

825

-+

826

-+/**

827

-+ * sched_yield_type - Choose what sort of yield sched_yield will perform.

828

-+ * 0: No yield.

829

-+ * 1: Deboost and requeue task. (default)

830

-+ * 2: Set rq skip task.

831

-+ */

832

-+int sched_yield_type __read_mostly = 1;

833

-+

834

-+#ifdef CONFIG_SMP

835

-+static cpumask_t sched_rq_pending_mask ____cacheline_aligned_in_smp;

836

-+

837

-+DEFINE_PER_CPU(cpumask_t [NR_CPU_AFFINITY_LEVELS], sched_cpu_topo_masks);

838

-+DEFINE_PER_CPU(cpumask_t *, sched_cpu_llc_mask);

839

-+DEFINE_PER_CPU(cpumask_t *, sched_cpu_topo_end_mask);

840

-+

841

-+#ifdef CONFIG_SCHED_SMT

842

-+DEFINE_STATIC_KEY_FALSE(sched_smt_present);

843

-+EXPORT_SYMBOL_GPL(sched_smt_present);

844

-+#endif

845

-+

846

-+/*

847

-+ * Keep a unique ID per domain (we use the first CPUs number in the cpumask of

848

-+ * the domain), this allows us to quickly tell if two cpus are in the same cache

849

-+ * domain, see cpus_share_cache().

850

-+ */

851

-+DEFINE_PER_CPU(int, sd_llc_id);

852

-+#endif /* CONFIG_SMP */

853

-+

854

-+static DEFINE_MUTEX(sched_hotcpu_mutex);

855

-+

856

-+DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);

857

-+

858

-+#ifndef prepare_arch_switch

859

-+# define prepare_arch_switch(next)	do { } while (0)

860

-+#endif

861

-+#ifndef finish_arch_post_lock_switch

862

-+# define finish_arch_post_lock_switch()	do { } while (0)

863

-+#endif

864

-+

865

-+#ifdef CONFIG_SCHED_SMT

866

-+static cpumask_t sched_sg_idle_mask ____cacheline_aligned_in_smp;

867

-+#endif

868

-+static cpumask_t sched_rq_watermark[SCHED_BITS] ____cacheline_aligned_in_smp;

869

-+

870

-+/* sched_queue related functions */

871

-+static inline void sched_queue_init(struct sched_queue *q)

872

-+{

873

-+	int i;

874

-+

875

-+	bitmap_zero(q->bitmap, SCHED_BITS);

876

-+	for(i = 0; i < SCHED_BITS; i++)

877

-+		INIT_LIST_HEAD(&q->heads[i]);

878

-+}

879

-+

880

-+/*

881

-+ * Init idle task and put into queue structure of rq

882

-+ * IMPORTANT: may be called multiple times for a single cpu

883

-+ */

884

-+static inline void sched_queue_init_idle(struct sched_queue *q,

885

-+					 struct task_struct *idle)

886

-+{

887

-+	idle->sq_idx = IDLE_TASK_SCHED_PRIO;

888

-+	INIT_LIST_HEAD(&q->heads[idle->sq_idx]);

889

-+	list_add(&idle->sq_node, &q->heads[idle->sq_idx]);

890

-+}

891

-+

892

-+/* water mark related functions */

893

-+static inline void update_sched_rq_watermark(struct rq *rq)

894

-+{

895

-+	unsigned long watermark = find_first_bit(rq->queue.bitmap, SCHED_QUEUE_BITS);

896

-+	unsigned long last_wm = rq->watermark;

897

-+	unsigned long i;

898

-+	int cpu;

899

-+

900

-+	if (watermark == last_wm)

901

-+		return;

902

-+

903

-+	rq->watermark = watermark;

904

-+	cpu = cpu_of(rq);

905

-+	if (watermark < last_wm) {

906

-+		for (i = last_wm; i > watermark; i--)

907

-+			cpumask_clear_cpu(cpu, sched_rq_watermark + SCHED_BITS - 1 - i);

908

-+#ifdef CONFIG_SCHED_SMT

909

-+		if (static_branch_likely(&sched_smt_present) &&

910

-+		    IDLE_TASK_SCHED_PRIO == last_wm)

911

-+			cpumask_andnot(&sched_sg_idle_mask,

912

-+				       &sched_sg_idle_mask, cpu_smt_mask(cpu));

913

-+#endif

914

-+		return;

915

-+	}

916

-+	/* last_wm < watermark */

917

-+	for (i = watermark; i > last_wm; i--)

918

-+		cpumask_set_cpu(cpu, sched_rq_watermark + SCHED_BITS - 1 - i);

919

-+#ifdef CONFIG_SCHED_SMT

920

-+	if (static_branch_likely(&sched_smt_present) &&

921

-+	    IDLE_TASK_SCHED_PRIO == watermark) {

922

-+		cpumask_t tmp;

923

-+

924

-+		cpumask_and(&tmp, cpu_smt_mask(cpu), sched_rq_watermark);

925

-+		if (cpumask_equal(&tmp, cpu_smt_mask(cpu)))

926

-+			cpumask_or(&sched_sg_idle_mask,

927

-+				   &sched_sg_idle_mask, cpu_smt_mask(cpu));

928

-+	}

929

-+#endif

930

-+}

931

-+

932

-+/*

933

-+ * This routine assume that the idle task always in queue

934

-+ */

935

-+static inline struct task_struct *sched_rq_first_task(struct rq *rq)

936

-+{

937

-+	unsigned long idx = find_first_bit(rq->queue.bitmap, SCHED_QUEUE_BITS);

938

-+	const struct list_head *head = &rq->queue.heads[sched_prio2idx(idx, rq)];

939

-+

940

-+	return list_first_entry(head, struct task_struct, sq_node);

941

-+}

942

-+

943

-+static inline struct task_struct *

944

-+sched_rq_next_task(struct task_struct *p, struct rq *rq)

945

-+{

946

-+	unsigned long idx = p->sq_idx;

947

-+	struct list_head *head = &rq->queue.heads[idx];

948

-+

949

-+	if (list_is_last(&p->sq_node, head)) {

950

-+		idx = find_next_bit(rq->queue.bitmap, SCHED_QUEUE_BITS,

951

-+				    sched_idx2prio(idx, rq) + 1);

952

-+		head = &rq->queue.heads[sched_prio2idx(idx, rq)];

953

-+

954

-+		return list_first_entry(head, struct task_struct, sq_node);

955

-+	}

956

-+

957

-+	return list_next_entry(p, sq_node);

958

-+}

959

-+

960

-+static inline struct task_struct *rq_runnable_task(struct rq *rq)

961

-+{

962

-+	struct task_struct *next = sched_rq_first_task(rq);

963

-+

964

-+	if (unlikely(next == rq->skip))

965

-+		next = sched_rq_next_task(next, rq);

966

-+

967

-+	return next;

968

-+}

969

-+

970

-+/*

971

-+ * Serialization rules:

972

-+ *

973

-+ * Lock order:

974

-+ *

975

-+ *   p->pi_lock

976

-+ *     rq->lock

977

-+ *       hrtimer_cpu_base->lock (hrtimer_start() for bandwidth controls)

978

-+ *

979

-+ *  rq1->lock

980

-+ *    rq2->lock  where: rq1 < rq2

981

-+ *

982

-+ * Regular state:

983

-+ *

984

-+ * Normal scheduling state is serialized by rq->lock. __schedule() takes the

985

-+ * local CPU's rq->lock, it optionally removes the task from the runqueue and

986

-+ * always looks at the local rq data structures to find the most eligible task

987

-+ * to run next.

988

-+ *

989

-+ * Task enqueue is also under rq->lock, possibly taken from another CPU.

990

-+ * Wakeups from another LLC domain might use an IPI to transfer the enqueue to

991

-+ * the local CPU to avoid bouncing the runqueue state around [ see

992

-+ * ttwu_queue_wakelist() ]

993

-+ *

994

-+ * Task wakeup, specifically wakeups that involve migration, are horribly

995

-+ * complicated to avoid having to take two rq->locks.

996

-+ *

997

-+ * Special state:

998

-+ *

999

-+ * System-calls and anything external will use task_rq_lock() which acquires

1000

-+ * both p->pi_lock and rq->lock. As a consequence the state they change is

1001

-+ * stable while holding either lock:

1002

-+ *

1003

-+ *  - sched_setaffinity()/

1004

-+ *    set_cpus_allowed_ptr():	p->cpus_ptr, p->nr_cpus_allowed

1005

-+ *  - set_user_nice():		p->se.load, p->*prio

1006

-+ *  - __sched_setscheduler():	p->sched_class, p->policy, p->*prio,

1007

-+ *				p->se.load, p->rt_priority,

1008

-+ *				p->dl.dl_{runtime, deadline, period, flags, bw, density}

1009

-+ *  - sched_setnuma():		p->numa_preferred_nid

1010

-+ *  - sched_move_task()/

1011

-+ *    cpu_cgroup_fork():	p->sched_task_group

1012

-+ *  - uclamp_update_active()	p->uclamp*

1013

-+ *

1014

-+ * p->state <- TASK_*:

1015

-+ *

1016

-+ *   is changed locklessly using set_current_state(), __set_current_state() or

1017

-+ *   set_special_state(), see their respective comments, or by

1018

-+ *   try_to_wake_up(). This latter uses p->pi_lock to serialize against

1019

-+ *   concurrent self.

1020

-+ *

1021

-+ * p->on_rq <- { 0, 1 = TASK_ON_RQ_QUEUED, 2 = TASK_ON_RQ_MIGRATING }:

1022

-+ *

1023

-+ *   is set by activate_task() and cleared by deactivate_task(), under

1024

-+ *   rq->lock. Non-zero indicates the task is runnable, the special

1025

-+ *   ON_RQ_MIGRATING state is used for migration without holding both

1026

-+ *   rq->locks. It indicates task_cpu() is not stable, see task_rq_lock().

1027

-+ *

1028

-+ * p->on_cpu <- { 0, 1 }:

1029

-+ *

1030

-+ *   is set by prepare_task() and cleared by finish_task() such that it will be

1031

-+ *   set before p is scheduled-in and cleared after p is scheduled-out, both

1032

-+ *   under rq->lock. Non-zero indicates the task is running on its CPU.

1033

-+ *

1034

-+ *   [ The astute reader will observe that it is possible for two tasks on one

1035

-+ *     CPU to have ->on_cpu = 1 at the same time. ]

1036

-+ *

1037

-+ * task_cpu(p): is changed by set_task_cpu(), the rules are:

1038

-+ *

1039

-+ *  - Don't call set_task_cpu() on a blocked task:

1040

-+ *

1041

-+ *    We don't care what CPU we're not running on, this simplifies hotplug,

1042

-+ *    the CPU assignment of blocked tasks isn't required to be valid.

1043

-+ *

1044

-+ *  - for try_to_wake_up(), called under p->pi_lock:

1045

-+ *

1046

-+ *    This allows try_to_wake_up() to only take one rq->lock, see its comment.

1047

-+ *

1048

-+ *  - for migration called under rq->lock:

1049

-+ *    [ see task_on_rq_migrating() in task_rq_lock() ]

1050

-+ *

1051

-+ *    o move_queued_task()

1052

-+ *    o detach_task()

1053

-+ *

1054

-+ *  - for migration called under double_rq_lock():

1055

-+ *

1056

-+ *    o __migrate_swap_task()

1057

-+ *    o push_rt_task() / pull_rt_task()

1058

-+ *    o push_dl_task() / pull_dl_task()

1059

-+ *    o dl_task_offline_migration()

1060

-+ *

1061

-+ */

1062

-+

1063

-+/*

1064

-+ * Context: p->pi_lock

1065

-+ */

1066

-+static inline struct rq

1067

-+*__task_access_lock(struct task_struct *p, raw_spinlock_t **plock)

1068

-+{

1069

-+	struct rq *rq;

1070

-+	for (;;) {

1071

-+		rq = task_rq(p);

1072

-+		if (p->on_cpu || task_on_rq_queued(p)) {

1073

-+			raw_spin_lock(&rq->lock);

1074

-+			if (likely((p->on_cpu || task_on_rq_queued(p))

1075

-+				   && rq == task_rq(p))) {

1076

-+				*plock = &rq->lock;

1077

-+				return rq;

1078

-+			}

1079

-+			raw_spin_unlock(&rq->lock);

1080

-+		} else if (task_on_rq_migrating(p)) {

1081

-+			do {

1082

-+				cpu_relax();

1083

-+			} while (unlikely(task_on_rq_migrating(p)));

1084

-+		} else {

1085

-+			*plock = NULL;

1086

-+			return rq;

1087

-+		}

1088

-+	}

1089

-+}

1090

-+

1091

-+static inline void

1092

-+__task_access_unlock(struct task_struct *p, raw_spinlock_t *lock)

1093

-+{

1094

-+	if (NULL != lock)

1095

-+		raw_spin_unlock(lock);

1096

-+}

1097

-+

1098

-+static inline struct rq

1099

-+*task_access_lock_irqsave(struct task_struct *p, raw_spinlock_t **plock,

1100

-+			  unsigned long *flags)

1101

-+{

1102

-+	struct rq *rq;

1103

-+	for (;;) {

1104

-+		rq = task_rq(p);

1105

-+		if (p->on_cpu || task_on_rq_queued(p)) {

1106

-+			raw_spin_lock_irqsave(&rq->lock, *flags);

1107

-+			if (likely((p->on_cpu || task_on_rq_queued(p))

1108

-+				   && rq == task_rq(p))) {

1109

-+				*plock = &rq->lock;

1110

-+				return rq;

1111

-+			}

1112

-+			raw_spin_unlock_irqrestore(&rq->lock, *flags);

1113

-+		} else if (task_on_rq_migrating(p)) {

1114

-+			do {

1115

-+				cpu_relax();

1116

-+			} while (unlikely(task_on_rq_migrating(p)));

1117

-+		} else {

1118

-+			raw_spin_lock_irqsave(&p->pi_lock, *flags);

1119

-+			if (likely(!p->on_cpu && !p->on_rq &&

1120

-+				   rq == task_rq(p))) {

1121

-+				*plock = &p->pi_lock;

1122

-+				return rq;

1123

-+			}

1124

-+			raw_spin_unlock_irqrestore(&p->pi_lock, *flags);

1125

-+		}

1126

-+	}

1127

-+}

1128

-+

1129

-+static inline void

1130

-+task_access_unlock_irqrestore(struct task_struct *p, raw_spinlock_t *lock,

1131

-+			      unsigned long *flags)

1132

-+{

1133

-+	raw_spin_unlock_irqrestore(lock, *flags);

1134

-+}

1135

-+

1136

-+/*

1137

-+ * __task_rq_lock - lock the rq @p resides on.

1138

-+ */

1139

-+struct rq *__task_rq_lock(struct task_struct *p, struct rq_flags *rf)

1140

-+	__acquires(rq->lock)

1141

-+{

1142

-+	struct rq *rq;

1143

-+

1144

-+	lockdep_assert_held(&p->pi_lock);

1145

-+

1146

-+	for (;;) {

1147

-+		rq = task_rq(p);

1148

-+		raw_spin_lock(&rq->lock);

1149

-+		if (likely(rq == task_rq(p) && !task_on_rq_migrating(p)))

1150

-+			return rq;

1151

-+		raw_spin_unlock(&rq->lock);

1152

-+

1153

-+		while (unlikely(task_on_rq_migrating(p)))

1154

-+			cpu_relax();

1155

-+	}

1156

-+}

1157

-+

1158

-+/*

1159

-+ * task_rq_lock - lock p->pi_lock and lock the rq @p resides on.

1160

-+ */

1161

-+struct rq *task_rq_lock(struct task_struct *p, struct rq_flags *rf)

1162

-+	__acquires(p->pi_lock)

1163

-+	__acquires(rq->lock)

1164

-+{

1165

-+	struct rq *rq;

1166

-+

1167

-+	for (;;) {

1168

-+		raw_spin_lock_irqsave(&p->pi_lock, rf->flags);

1169

-+		rq = task_rq(p);

1170

-+		raw_spin_lock(&rq->lock);

1171

-+		/*

1172

-+		 *	move_queued_task()		task_rq_lock()

1173

-+		 *

1174

-+		 *	ACQUIRE (rq->lock)

1175

-+		 *	[S] ->on_rq = MIGRATING		[L] rq = task_rq()

1176

-+		 *	WMB (__set_task_cpu())		ACQUIRE (rq->lock);

1177

-+		 *	[S] ->cpu = new_cpu		[L] task_rq()

1178

-+		 *					[L] ->on_rq

1179

-+		 *	RELEASE (rq->lock)

1180

-+		 *

1181

-+		 * If we observe the old CPU in task_rq_lock(), the acquire of

1182

-+		 * the old rq->lock will fully serialize against the stores.

1183

-+		 *

1184

-+		 * If we observe the new CPU in task_rq_lock(), the address

1185

-+		 * dependency headed by '[L] rq = task_rq()' and the acquire

1186

-+		 * will pair with the WMB to ensure we then also see migrating.

1187

-+		 */

1188

-+		if (likely(rq == task_rq(p) && !task_on_rq_migrating(p))) {

1189

-+			return rq;

1190

-+		}

1191

-+		raw_spin_unlock(&rq->lock);

1192

-+		raw_spin_unlock_irqrestore(&p->pi_lock, rf->flags);

1193

-+

1194

-+		while (unlikely(task_on_rq_migrating(p)))

1195

-+			cpu_relax();

1196

-+	}

1197

-+}

1198

-+

1199

-+static inline void

1200

-+rq_lock_irqsave(struct rq *rq, struct rq_flags *rf)

1201

-+	__acquires(rq->lock)

1202

-+{

1203

-+	raw_spin_lock_irqsave(&rq->lock, rf->flags);

1204

-+}

1205

-+

1206

-+static inline void

1207

-+rq_unlock_irqrestore(struct rq *rq, struct rq_flags *rf)

1208

-+	__releases(rq->lock)

1209

-+{

1210

-+	raw_spin_unlock_irqrestore(&rq->lock, rf->flags);

1211

-+}

1212

-+

1213

-+void raw_spin_rq_lock_nested(struct rq *rq, int subclass)

1214

-+{

1215

-+	raw_spinlock_t *lock;

1216

-+

1217

-+	/* Matches synchronize_rcu() in __sched_core_enable() */

1218

-+	preempt_disable();

1219

-+

1220

-+	for (;;) {

1221

-+		lock = __rq_lockp(rq);

1222

-+		raw_spin_lock_nested(lock, subclass);

1223

-+		if (likely(lock == __rq_lockp(rq))) {

1224

-+			/* preempt_count *MUST* be > 1 */

1225

-+			preempt_enable_no_resched();

1226

-+			return;

1227

-+		}

1228

-+		raw_spin_unlock(lock);

1229

-+	}

1230

-+}

1231

-+

1232

-+void raw_spin_rq_unlock(struct rq *rq)

1233

-+{

1234

-+	raw_spin_unlock(rq_lockp(rq));

1235

-+}

1236

-+

1237

-+/*

1238

-+ * RQ-clock updating methods:

1239

-+ */

1240

-+

1241

-+static void update_rq_clock_task(struct rq *rq, s64 delta)

1242

-+{

1243

-+/*

1244

-+ * In theory, the compile should just see 0 here, and optimize out the call

1245

-+ * to sched_rt_avg_update. But I don't trust it...

1246

-+ */

1247

-+	s64 __maybe_unused steal = 0, irq_delta = 0;

1248

-+

1249

-+#ifdef CONFIG_IRQ_TIME_ACCOUNTING

1250

-+	irq_delta = irq_time_read(cpu_of(rq)) - rq->prev_irq_time;

1251

-+

1252

-+	/*

1253

-+	 * Since irq_time is only updated on {soft,}irq_exit, we might run into

1254

-+	 * this case when a previous update_rq_clock() happened inside a

1255

-+	 * {soft,}irq region.

1256

-+	 *

1257

-+	 * When this happens, we stop ->clock_task and only update the

1258

-+	 * prev_irq_time stamp to account for the part that fit, so that a next

1259

-+	 * update will consume the rest. This ensures ->clock_task is

1260

-+	 * monotonic.

1261

-+	 *

1262

-+	 * It does however cause some slight miss-attribution of {soft,}irq

1263

-+	 * time, a more accurate solution would be to update the irq_time using

1264

-+	 * the current rq->clock timestamp, except that would require using

1265

-+	 * atomic ops.

1266

-+	 */

1267

-+	if (irq_delta > delta)

1268

-+		irq_delta = delta;

1269

-+

1270

-+	rq->prev_irq_time += irq_delta;

1271

-+	delta -= irq_delta;

1272

-+#endif

1273

-+#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING

1274

-+	if (static_key_false((&paravirt_steal_rq_enabled))) {

1275

-+		steal = paravirt_steal_clock(cpu_of(rq));

1276

-+		steal -= rq->prev_steal_time_rq;

1277

-+

1278

-+		if (unlikely(steal > delta))

1279

-+			steal = delta;

1280

-+

1281

-+		rq->prev_steal_time_rq += steal;

1282

-+		delta -= steal;

1283

-+	}

1284

-+#endif

1285

-+

1286

-+	rq->clock_task += delta;

1287

-+

1288

-+#ifdef CONFIG_HAVE_SCHED_AVG_IRQ

1289

-+	if ((irq_delta + steal))

1290

-+		update_irq_load_avg(rq, irq_delta + steal);

1291

-+#endif

1292

-+}

1293

-+

1294

-+static inline void update_rq_clock(struct rq *rq)

1295

-+{

1296

-+	s64 delta = sched_clock_cpu(cpu_of(rq)) - rq->clock;

1297

-+

1298

-+	if (unlikely(delta <= 0))

1299

-+		return;

1300

-+	rq->clock += delta;

1301

-+	update_rq_time_edge(rq);

1302

-+	update_rq_clock_task(rq, delta);

1303

-+}

1304

-+

1305

-+/*

1306

-+ * RQ Load update routine

1307

-+ */

1308

-+#define RQ_LOAD_HISTORY_BITS		(sizeof(s32) * 8ULL)

1309

-+#define RQ_UTIL_SHIFT			(8)

1310

-+#define RQ_LOAD_HISTORY_TO_UTIL(l)	(((l) >> (RQ_LOAD_HISTORY_BITS - 1 - RQ_UTIL_SHIFT)) & 0xff)

1311

-+

1312

-+#define LOAD_BLOCK(t)		((t) >> 17)

1313

-+#define LOAD_HALF_BLOCK(t)	((t) >> 16)

1314

-+#define BLOCK_MASK(t)		((t) & ((0x01 << 18) - 1))

1315

-+#define LOAD_BLOCK_BIT(b)	(1UL << (RQ_LOAD_HISTORY_BITS - 1 - (b)))

1316

-+#define CURRENT_LOAD_BIT	LOAD_BLOCK_BIT(0)

1317

-+

1318

-+static inline void rq_load_update(struct rq *rq)

1319

-+{

1320

-+	u64 time = rq->clock;

1321

-+	u64 delta = min(LOAD_BLOCK(time) - LOAD_BLOCK(rq->load_stamp),

1322

-+			RQ_LOAD_HISTORY_BITS - 1);

1323

-+	u64 prev = !!(rq->load_history & CURRENT_LOAD_BIT);

1324

-+	u64 curr = !!cpu_rq(rq->cpu)->nr_running;

1325

-+

1326

-+	if (delta) {

1327

-+		rq->load_history = rq->load_history >> delta;

1328

-+

1329

-+		if (delta < RQ_UTIL_SHIFT) {

1330

-+			rq->load_block += (~BLOCK_MASK(rq->load_stamp)) * prev;

1331

-+			if (!!LOAD_HALF_BLOCK(rq->load_block) ^ curr)

1332

-+				rq->load_history ^= LOAD_BLOCK_BIT(delta);

1333

-+		}

1334

-+

1335

-+		rq->load_block = BLOCK_MASK(time) * prev;

1336

-+	} else {

1337

-+		rq->load_block += (time - rq->load_stamp) * prev;

1338

-+	}

1339

-+	if (prev ^ curr)

1340

-+		rq->load_history ^= CURRENT_LOAD_BIT;

1341

-+	rq->load_stamp = time;

1342

-+}

1343

-+

1344

-+unsigned long rq_load_util(struct rq *rq, unsigned long max)

1345

-+{

1346

-+	return RQ_LOAD_HISTORY_TO_UTIL(rq->load_history) * (max >> RQ_UTIL_SHIFT);

1347

-+}

1348

-+

1349

-+#ifdef CONFIG_SMP

1350

-+unsigned long sched_cpu_util(int cpu, unsigned long max)

1351

-+{

1352

-+	return rq_load_util(cpu_rq(cpu), max);

1353

-+}

1354

-+#endif /* CONFIG_SMP */

1355

-+

1356

-+#ifdef CONFIG_CPU_FREQ

1357

-+/**

1358

-+ * cpufreq_update_util - Take a note about CPU utilization changes.

1359

-+ * @rq: Runqueue to carry out the update for.

1360

-+ * @flags: Update reason flags.

1361

-+ *

1362

-+ * This function is called by the scheduler on the CPU whose utilization is

1363

-+ * being updated.

1364

-+ *

1365

-+ * It can only be called from RCU-sched read-side critical sections.

1366

-+ *

1367

-+ * The way cpufreq is currently arranged requires it to evaluate the CPU

1368

-+ * performance state (frequency/voltage) on a regular basis to prevent it from

1369

-+ * being stuck in a completely inadequate performance level for too long.

1370

-+ * That is not guaranteed to happen if the updates are only triggered from CFS

1371

-+ * and DL, though, because they may not be coming in if only RT tasks are

1372

-+ * active all the time (or there are RT tasks only).

1373

-+ *

1374

-+ * As a workaround for that issue, this function is called periodically by the

1375

-+ * RT sched class to trigger extra cpufreq updates to prevent it from stalling,

1376

-+ * but that really is a band-aid.  Going forward it should be replaced with

1377

-+ * solutions targeted more specifically at RT tasks.

1378

-+ */

1379

-+static inline void cpufreq_update_util(struct rq *rq, unsigned int flags)

1380

-+{

1381

-+	struct update_util_data *data;

1382

-+

1383

-+#ifdef CONFIG_SMP

1384

-+	rq_load_update(rq);

1385

-+#endif

1386

-+	data = rcu_dereference_sched(*per_cpu_ptr(&cpufreq_update_util_data,

1387

-+						  cpu_of(rq)));

1388

-+	if (data)

1389

-+		data->func(data, rq_clock(rq), flags);

1390

-+}

1391

-+#else

1392

-+static inline void cpufreq_update_util(struct rq *rq, unsigned int flags)

1393

-+{

1394

-+#ifdef CONFIG_SMP

1395

-+	rq_load_update(rq);

1396

-+#endif

1397

-+}

1398

-+#endif /* CONFIG_CPU_FREQ */

1399

-+

1400

-+#ifdef CONFIG_NO_HZ_FULL

1401

-+/*

1402

-+ * Tick may be needed by tasks in the runqueue depending on their policy and

1403

-+ * requirements. If tick is needed, lets send the target an IPI to kick it out

1404

-+ * of nohz mode if necessary.

1405

-+ */

1406

-+static inline void sched_update_tick_dependency(struct rq *rq)

1407

-+{

1408

-+	int cpu = cpu_of(rq);

1409

-+

1410

-+	if (!tick_nohz_full_cpu(cpu))

1411

-+		return;

1412

-+

1413

-+	if (rq->nr_running < 2)

1414

-+		tick_nohz_dep_clear_cpu(cpu, TICK_DEP_BIT_SCHED);

1415

-+	else

1416

-+		tick_nohz_dep_set_cpu(cpu, TICK_DEP_BIT_SCHED);

1417

-+}

1418

-+#else /* !CONFIG_NO_HZ_FULL */

1419

-+static inline void sched_update_tick_dependency(struct rq *rq) { }

1420

-+#endif

1421

-+

1422

-+bool sched_task_on_rq(struct task_struct *p)

1423

-+{

1424

-+	return task_on_rq_queued(p);

1425

-+}

1426

-+

1427

-+/*

1428

-+ * Add/Remove/Requeue task to/from the runqueue routines

1429

-+ * Context: rq->lock

1430

-+ */

1431

-+#define __SCHED_DEQUEUE_TASK(p, rq, flags, func)		\

1432

-+	psi_dequeue(p, flags & DEQUEUE_SLEEP);			\

1433

-+	sched_info_dequeue(rq, p);				\

1434

-+								\

1435

-+	list_del(&p->sq_node);					\

1436

-+	if (list_empty(&rq->queue.heads[p->sq_idx])) {		\

1437

-+		clear_bit(sched_idx2prio(p->sq_idx, rq),	\

1438

-+			  rq->queue.bitmap);			\

1439

-+		func;						\

1440

-+	}

1441

-+

1442

-+#define __SCHED_ENQUEUE_TASK(p, rq, flags)				\

1443

-+	sched_info_enqueue(rq, p);					\

1444

-+	psi_enqueue(p, flags);						\

1445

-+									\

1446

-+	p->sq_idx = task_sched_prio_idx(p, rq);				\

1447

-+	list_add_tail(&p->sq_node, &rq->queue.heads[p->sq_idx]);	\

1448

-+	set_bit(sched_idx2prio(p->sq_idx, rq), rq->queue.bitmap);

1449

-+

1450

-+static inline void dequeue_task(struct task_struct *p, struct rq *rq, int flags)

1451

-+{

1452

-+	lockdep_assert_held(&rq->lock);

1453

-+

1454

-+	/*printk(KERN_INFO "sched: dequeue(%d) %px %016llx\n", cpu_of(rq), p, p->priodl);*/

1455

-+	WARN_ONCE(task_rq(p) != rq, "sched: dequeue task reside on cpu%d from cpu%d\n",

1456

-+		  task_cpu(p), cpu_of(rq));

1457

-+

1458

-+	__SCHED_DEQUEUE_TASK(p, rq, flags, update_sched_rq_watermark(rq));

1459

-+	--rq->nr_running;

1460

-+#ifdef CONFIG_SMP

1461

-+	if (1 == rq->nr_running)

1462

-+		cpumask_clear_cpu(cpu_of(rq), &sched_rq_pending_mask);

1463

-+#endif

1464

-+

1465

-+	sched_update_tick_dependency(rq);

1466

-+}

1467

-+

1468

-+static inline void enqueue_task(struct task_struct *p, struct rq *rq, int flags)

1469

-+{

1470

-+	lockdep_assert_held(&rq->lock);

1471

-+

1472

-+	/*printk(KERN_INFO "sched: enqueue(%d) %px %016llx\n", cpu_of(rq), p, p->priodl);*/

1473

-+	WARN_ONCE(task_rq(p) != rq, "sched: enqueue task reside on cpu%d to cpu%d\n",

1474

-+		  task_cpu(p), cpu_of(rq));

1475

-+

1476

-+	__SCHED_ENQUEUE_TASK(p, rq, flags);

1477

-+	update_sched_rq_watermark(rq);

1478

-+	++rq->nr_running;

1479

-+#ifdef CONFIG_SMP

1480

-+	if (2 == rq->nr_running)

1481

-+		cpumask_set_cpu(cpu_of(rq), &sched_rq_pending_mask);

1482

-+#endif

1483

-+

1484

-+	sched_update_tick_dependency(rq);

1485

-+}

1486

-+

1487

-+static inline void requeue_task(struct task_struct *p, struct rq *rq)

1488

-+{

1489

-+	int idx;

1490

-+

1491

-+	lockdep_assert_held(&rq->lock);

1492

-+	/*printk(KERN_INFO "sched: requeue(%d) %px %016llx\n", cpu_of(rq), p, p->priodl);*/

1493

-+	WARN_ONCE(task_rq(p) != rq, "sched: cpu[%d] requeue task reside on cpu%d\n",

1494

-+		  cpu_of(rq), task_cpu(p));

1495

-+

1496

-+	idx = task_sched_prio_idx(p, rq);

1497

-+

1498

-+	list_del(&p->sq_node);

1499

-+	list_add_tail(&p->sq_node, &rq->queue.heads[idx]);

1500

-+	if (idx != p->sq_idx) {

1501

-+		if (list_empty(&rq->queue.heads[p->sq_idx]))

1502

-+			clear_bit(sched_idx2prio(p->sq_idx, rq),

1503

-+				  rq->queue.bitmap);

1504

-+		p->sq_idx = idx;

1505

-+		set_bit(sched_idx2prio(p->sq_idx, rq), rq->queue.bitmap);

1506

-+		update_sched_rq_watermark(rq);

1507

-+	}

1508

-+}

1509

-+

1510

-+/*

1511

-+ * cmpxchg based fetch_or, macro so it works for different integer types

1512

-+ */

1513

-+#define fetch_or(ptr, mask)						\

1514

-+	({								\

1515

-+		typeof(ptr) _ptr = (ptr);				\

1516

-+		typeof(mask) _mask = (mask);				\

1517

-+		typeof(*_ptr) _old, _val = *_ptr;			\

1518

-+									\

1519

-+		for (;;) {						\

1520

-+			_old = cmpxchg(_ptr, _val, _val | _mask);	\

1521

-+			if (_old == _val)				\

1522

-+				break;					\

1523

-+			_val = _old;					\

1524

-+		}							\

1525

-+	_old;								\

1526

-+})

1527

-+

1528

-+#if defined(CONFIG_SMP) && defined(TIF_POLLING_NRFLAG)

1529

-+/*

1530

-+ * Atomically set TIF_NEED_RESCHED and test for TIF_POLLING_NRFLAG,

1531

-+ * this avoids any races wrt polling state changes and thereby avoids

1532

-+ * spurious IPIs.

1533

-+ */

1534

-+static bool set_nr_and_not_polling(struct task_struct *p)

1535

-+{

1536

-+	struct thread_info *ti = task_thread_info(p);

1537

-+	return !(fetch_or(&ti->flags, _TIF_NEED_RESCHED) & _TIF_POLLING_NRFLAG);

1538

-+}

1539

-+

1540

-+/*

1541

-+ * Atomically set TIF_NEED_RESCHED if TIF_POLLING_NRFLAG is set.

1542

-+ *

1543

-+ * If this returns true, then the idle task promises to call

1544

-+ * sched_ttwu_pending() and reschedule soon.

1545

-+ */

1546

-+static bool set_nr_if_polling(struct task_struct *p)

1547

-+{

1548

-+	struct thread_info *ti = task_thread_info(p);

1549

-+	typeof(ti->flags) old, val = READ_ONCE(ti->flags);

1550

-+

1551

-+	for (;;) {

1552

-+		if (!(val & _TIF_POLLING_NRFLAG))

1553

-+			return false;

1554

-+		if (val & _TIF_NEED_RESCHED)

1555

-+			return true;

1556

-+		old = cmpxchg(&ti->flags, val, val | _TIF_NEED_RESCHED);

1557

-+		if (old == val)

1558

-+			break;

1559

-+		val = old;

1560

-+	}

1561

-+	return true;

1562

-+}

1563

-+

1564

-+#else

1565

-+static bool set_nr_and_not_polling(struct task_struct *p)

1566

-+{

1567

-+	set_tsk_need_resched(p);

1568

-+	return true;

1569

-+}

1570

-+

1571

-+#ifdef CONFIG_SMP

1572

-+static bool set_nr_if_polling(struct task_struct *p)

1573

-+{

1574

-+	return false;

1575

-+}

1576

-+#endif

1577

-+#endif

1578

-+

1579

-+static bool __wake_q_add(struct wake_q_head *head, struct task_struct *task)

1580

-+{

1581

-+	struct wake_q_node *node = &task->wake_q;

1582

-+

1583

-+	/*

1584

-+	 * Atomically grab the task, if ->wake_q is !nil already it means

1585

-+	 * it's already queued (either by us or someone else) and will get the

1586

-+	 * wakeup due to that.

1587

-+	 *

1588

-+	 * In order to ensure that a pending wakeup will observe our pending

1589

-+	 * state, even in the failed case, an explicit smp_mb() must be used.

1590

-+	 */

1591

-+	smp_mb__before_atomic();

1592

-+	if (unlikely(cmpxchg_relaxed(&node->next, NULL, WAKE_Q_TAIL)))

1593

-+		return false;

1594

-+

1595

-+	/*

1596

-+	 * The head is context local, there can be no concurrency.

1597

-+	 */

1598

-+	*head->lastp = node;

1599

-+	head->lastp = &node->next;

1600

-+	return true;

1601

-+}

1602

-+

1603

-+/**

1604

-+ * wake_q_add() - queue a wakeup for 'later' waking.

1605

-+ * @head: the wake_q_head to add @task to

1606

-+ * @task: the task to queue for 'later' wakeup

1607

-+ *

1608

-+ * Queue a task for later wakeup, most likely by the wake_up_q() call in the

1609

-+ * same context, _HOWEVER_ this is not guaranteed, the wakeup can come

1610

-+ * instantly.

1611

-+ *

1612

-+ * This function must be used as-if it were wake_up_process(); IOW the task

1613

-+ * must be ready to be woken at this location.

1614

-+ */

1615

-+void wake_q_add(struct wake_q_head *head, struct task_struct *task)

1616

-+{

1617

-+	if (__wake_q_add(head, task))

1618

-+		get_task_struct(task);

1619

-+}

1620

-+

1621

-+/**

1622

-+ * wake_q_add_safe() - safely queue a wakeup for 'later' waking.

1623

-+ * @head: the wake_q_head to add @task to

1624

-+ * @task: the task to queue for 'later' wakeup

1625

-+ *

1626

-+ * Queue a task for later wakeup, most likely by the wake_up_q() call in the

1627

-+ * same context, _HOWEVER_ this is not guaranteed, the wakeup can come

1628

-+ * instantly.

1629

-+ *

1630

-+ * This function must be used as-if it were wake_up_process(); IOW the task

1631

-+ * must be ready to be woken at this location.

1632

-+ *

1633

-+ * This function is essentially a task-safe equivalent to wake_q_add(). Callers

1634

-+ * that already hold reference to @task can call the 'safe' version and trust

1635

-+ * wake_q to do the right thing depending whether or not the @task is already

1636

-+ * queued for wakeup.

1637

-+ */

1638

-+void wake_q_add_safe(struct wake_q_head *head, struct task_struct *task)

1639

-+{

1640

-+	if (!__wake_q_add(head, task))

1641

-+		put_task_struct(task);

1642

-+}

1643

-+

1644

-+void wake_up_q(struct wake_q_head *head)

1645

-+{

1646

-+	struct wake_q_node *node = head->first;

1647

-+

1648

-+	while (node != WAKE_Q_TAIL) {

1649

-+		struct task_struct *task;

1650

-+

1651

-+		task = container_of(node, struct task_struct, wake_q);

1652

-+		/* task can safely be re-inserted now: */

1653

-+		node = node->next;

1654

-+		task->wake_q.next = NULL;

1655

-+

1656

-+		/*

1657

-+		 * wake_up_process() executes a full barrier, which pairs with

1658

-+		 * the queueing in wake_q_add() so as not to miss wakeups.

1659

-+		 */

1660

-+		wake_up_process(task);

1661

-+		put_task_struct(task);

1662

-+	}

1663

-+}

1664

-+

1665

-+/*

1666

-+ * resched_curr - mark rq's current task 'to be rescheduled now'.

1667

-+ *

1668

-+ * On UP this means the setting of the need_resched flag, on SMP it

1669

-+ * might also involve a cross-CPU call to trigger the scheduler on

1670

-+ * the target CPU.

1671

-+ */

1672

-+void resched_curr(struct rq *rq)

1673

-+{

1674

-+	struct task_struct *curr = rq->curr;

1675

-+	int cpu;

1676

-+

1677

-+	lockdep_assert_held(&rq->lock);

1678

-+

1679

-+	if (test_tsk_need_resched(curr))

1680

-+		return;

1681

-+

1682

-+	cpu = cpu_of(rq);

1683

-+	if (cpu == smp_processor_id()) {

1684

-+		set_tsk_need_resched(curr);

1685

-+		set_preempt_need_resched();

1686

-+		return;

1687

-+	}

1688

-+

1689

-+	if (set_nr_and_not_polling(curr))

1690

-+		smp_send_reschedule(cpu);

1691

-+	else

1692

-+		trace_sched_wake_idle_without_ipi(cpu);

1693

-+}

1694

-+

1695

-+void resched_cpu(int cpu)

1696

-+{

1697

-+	struct rq *rq = cpu_rq(cpu);

1698

-+	unsigned long flags;

1699

-+

1700

-+	raw_spin_lock_irqsave(&rq->lock, flags);

1701

-+	if (cpu_online(cpu) || cpu == smp_processor_id())

1702

-+		resched_curr(cpu_rq(cpu));

1703

-+	raw_spin_unlock_irqrestore(&rq->lock, flags);

1704

-+}

1705

-+

1706

-+#ifdef CONFIG_SMP

1707

-+#ifdef CONFIG_NO_HZ_COMMON

1708

-+void nohz_balance_enter_idle(int cpu) {}

1709

-+

1710

-+void select_nohz_load_balancer(int stop_tick) {}

1711

-+

1712

-+void set_cpu_sd_state_idle(void) {}

1713

-+

1714

-+/*

1715

-+ * In the semi idle case, use the nearest busy CPU for migrating timers

1716

-+ * from an idle CPU.  This is good for power-savings.

1717

-+ *

1718

-+ * We don't do similar optimization for completely idle system, as

1719

-+ * selecting an idle CPU will add more delays to the timers than intended

1720

-+ * (as that CPU's timer base may not be uptodate wrt jiffies etc).

1721

-+ */

1722

-+int get_nohz_timer_target(void)

1723

-+{

1724

-+	int i, cpu = smp_processor_id(), default_cpu = -1;

1725

-+	struct cpumask *mask;

1726

-+

1727

-+	if (housekeeping_cpu(cpu, HK_FLAG_TIMER)) {

1728

-+		if (!idle_cpu(cpu))

1729

-+			return cpu;

1730

-+		default_cpu = cpu;

1731

-+	}

1732

-+

1733

-+	for (mask = per_cpu(sched_cpu_topo_masks, cpu) + 1;

1734

-+	     mask < per_cpu(sched_cpu_topo_end_mask, cpu); mask++)

1735

-+		for_each_cpu_and(i, mask, housekeeping_cpumask(HK_FLAG_TIMER))

1736

-+			if (!idle_cpu(i))

1737

-+				return i;

1738

-+

1739

-+	if (default_cpu == -1)

1740

-+		default_cpu = housekeeping_any_cpu(HK_FLAG_TIMER);

1741

-+	cpu = default_cpu;

1742

-+

1743

-+	return cpu;

1744

-+}

1745

-+

1746

-+/*

1747

-+ * When add_timer_on() enqueues a timer into the timer wheel of an

1748

-+ * idle CPU then this timer might expire before the next timer event

1749

-+ * which is scheduled to wake up that CPU. In case of a completely

1750

-+ * idle system the next event might even be infinite time into the

1751

-+ * future. wake_up_idle_cpu() ensures that the CPU is woken up and

1752

-+ * leaves the inner idle loop so the newly added timer is taken into

1753

-+ * account when the CPU goes back to idle and evaluates the timer

1754

-+ * wheel for the next timer event.

1755

-+ */

1756

-+static inline void wake_up_idle_cpu(int cpu)

1757

-+{

1758

-+	struct rq *rq = cpu_rq(cpu);

1759

-+

1760

-+	if (cpu == smp_processor_id())

1761

-+		return;

1762

-+

1763

-+	if (set_nr_and_not_polling(rq->idle))

1764

-+		smp_send_reschedule(cpu);

1765

-+	else

1766

-+		trace_sched_wake_idle_without_ipi(cpu);

1767

-+}

1768

-+

1769

-+static inline bool wake_up_full_nohz_cpu(int cpu)

1770

-+{

1771

-+	/*

1772

-+	 * We just need the target to call irq_exit() and re-evaluate

1773

-+	 * the next tick. The nohz full kick at least implies that.

1774

-+	 * If needed we can still optimize that later with an

1775

-+	 * empty IRQ.

1776

-+	 */

1777

-+	if (cpu_is_offline(cpu))

1778

-+		return true;  /* Don't try to wake offline CPUs. */

1779

-+	if (tick_nohz_full_cpu(cpu)) {

1780

-+		if (cpu != smp_processor_id() ||

1781

-+		    tick_nohz_tick_stopped())

1782

-+			tick_nohz_full_kick_cpu(cpu);

1783

-+		return true;

1784

-+	}

1785

-+

1786

-+	return false;

1787

-+}

1788

-+

1789

-+void wake_up_nohz_cpu(int cpu)

1790

-+{

1791

-+	if (!wake_up_full_nohz_cpu(cpu))

1792

-+		wake_up_idle_cpu(cpu);

1793

-+}

1794

-+

1795

-+static void nohz_csd_func(void *info)

1796

-+{

1797

-+	struct rq *rq = info;

1798

-+	int cpu = cpu_of(rq);

1799

-+	unsigned int flags;

1800

-+

1801

-+	/*

1802

-+	 * Release the rq::nohz_csd.

1803

-+	 */

1804

-+	flags = atomic_fetch_andnot(NOHZ_KICK_MASK, nohz_flags(cpu));

1805

-+	WARN_ON(!(flags & NOHZ_KICK_MASK));

1806

-+

1807

-+	rq->idle_balance = idle_cpu(cpu);

1808

-+	if (rq->idle_balance && !need_resched()) {

1809

-+		rq->nohz_idle_balance = flags;

1810

-+		raise_softirq_irqoff(SCHED_SOFTIRQ);

1811

-+	}

1812

-+}

1813

-+

1814

-+#endif /* CONFIG_NO_HZ_COMMON */

1815

-+#endif /* CONFIG_SMP */

1816

-+

1817

-+static inline void check_preempt_curr(struct rq *rq)

1818

-+{

1819

-+	if (sched_rq_first_task(rq) != rq->curr)

1820

-+		resched_curr(rq);

1821

-+}

1822

-+

1823

-+#ifdef CONFIG_SCHED_HRTICK

1824

-+/*

1825

-+ * Use HR-timers to deliver accurate preemption points.

1826

-+ */

1827

-+

1828

-+static void hrtick_clear(struct rq *rq)

1829

-+{

1830

-+	if (hrtimer_active(&rq->hrtick_timer))

1831

-+		hrtimer_cancel(&rq->hrtick_timer);

1832

-+}

1833

-+

1834

-+/*

1835

-+ * High-resolution timer tick.

1836

-+ * Runs from hardirq context with interrupts disabled.

1837

-+ */

1838

-+static enum hrtimer_restart hrtick(struct hrtimer *timer)

1839

-+{

1840

-+	struct rq *rq = container_of(timer, struct rq, hrtick_timer);

1841

-+

1842

-+	WARN_ON_ONCE(cpu_of(rq) != smp_processor_id());

1843

-+

1844

-+	raw_spin_lock(&rq->lock);

1845

-+	resched_curr(rq);

1846

-+	raw_spin_unlock(&rq->lock);

1847

-+

1848

-+	return HRTIMER_NORESTART;

1849

-+}

1850

-+

1851

-+/*

1852

-+ * Use hrtick when:

1853

-+ *  - enabled by features

1854

-+ *  - hrtimer is actually high res

1855

-+ */

1856

-+static inline int hrtick_enabled(struct rq *rq)

1857

-+{

1858

-+	/**

1859

-+	 * Alt schedule FW doesn't support sched_feat yet

1860

-+	if (!sched_feat(HRTICK))

1861

-+		return 0;

1862

-+	*/

1863

-+	if (!cpu_active(cpu_of(rq)))

1864

-+		return 0;

1865

-+	return hrtimer_is_hres_active(&rq->hrtick_timer);

1866

-+}

1867

-+

1868

-+#ifdef CONFIG_SMP

1869

-+

1870

-+static void __hrtick_restart(struct rq *rq)

1871

-+{

1872

-+	struct hrtimer *timer = &rq->hrtick_timer;

1873

-+	ktime_t time = rq->hrtick_time;

1874

-+

1875

-+	hrtimer_start(timer, time, HRTIMER_MODE_ABS_PINNED_HARD);

1876

-+}

1877

-+

1878

-+/*

1879

-+ * called from hardirq (IPI) context

1880

-+ */

1881

-+static void __hrtick_start(void *arg)

1882

-+{

1883

-+	struct rq *rq = arg;

1884

-+

1885

-+	raw_spin_lock(&rq->lock);

1886

-+	__hrtick_restart(rq);

1887

-+	raw_spin_unlock(&rq->lock);

1888

-+}

1889

-+

1890

-+/*

1891

-+ * Called to set the hrtick timer state.

1892

-+ *

1893

-+ * called with rq->lock held and irqs disabled

1894

-+ */

1895

-+void hrtick_start(struct rq *rq, u64 delay)

1896

-+{

1897

-+	struct hrtimer *timer = &rq->hrtick_timer;

1898

-+	s64 delta;

1899

-+

1900

-+	/*

1901

-+	 * Don't schedule slices shorter than 10000ns, that just

1902

-+	 * doesn't make sense and can cause timer DoS.

1903

-+	 */

1904

-+	delta = max_t(s64, delay, 10000LL);

1905

-+

1906

-+	rq->hrtick_time = ktime_add_ns(timer->base->get_time(), delta);

1907

-+

1908

-+	if (rq == this_rq())

1909

-+		__hrtick_restart(rq);

1910

-+	else

1911

-+		smp_call_function_single_async(cpu_of(rq), &rq->hrtick_csd);

1912

-+}

1913

-+

1914

-+#else

1915

-+/*

1916

-+ * Called to set the hrtick timer state.

1917

-+ *

1918

-+ * called with rq->lock held and irqs disabled

1919

-+ */

1920

-+void hrtick_start(struct rq *rq, u64 delay)

1921

-+{

1922

-+	/*

1923

-+	 * Don't schedule slices shorter than 10000ns, that just

1924

-+	 * doesn't make sense. Rely on vruntime for fairness.

1925

-+	 */

1926

-+	delay = max_t(u64, delay, 10000LL);

1927

-+	hrtimer_start(&rq->hrtick_timer, ns_to_ktime(delay),

1928

-+		      HRTIMER_MODE_REL_PINNED_HARD);

1929

-+}

1930

-+#endif /* CONFIG_SMP */

1931

-+

1932

-+static void hrtick_rq_init(struct rq *rq)

1933

-+{

1934

-+#ifdef CONFIG_SMP

1935

-+	INIT_CSD(&rq->hrtick_csd, __hrtick_start, rq);

1936

-+#endif

1937

-+

1938

-+	hrtimer_init(&rq->hrtick_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_HARD);

1939

-+	rq->hrtick_timer.function = hrtick;

1940

-+}

1941

-+#else	/* CONFIG_SCHED_HRTICK */

1942

-+static inline int hrtick_enabled(struct rq *rq)

1943

-+{

1944

-+	return 0;

1945

-+}

1946

-+

1947

-+static inline void hrtick_clear(struct rq *rq)

1948

-+{

1949

-+}

1950

-+

1951

-+static inline void hrtick_rq_init(struct rq *rq)

1952

-+{

1953

-+}

1954

-+#endif	/* CONFIG_SCHED_HRTICK */

1955

-+

1956

-+static inline int __normal_prio(int policy, int rt_prio, int static_prio)

1957

-+{

1958

-+	return rt_policy(policy) ? (MAX_RT_PRIO - 1 - rt_prio) :

1959

-+		static_prio + MAX_PRIORITY_ADJ;

1960

-+}

1961

-+

1962

-+/*

1963

-+ * Calculate the expected normal priority: i.e. priority

1964

-+ * without taking RT-inheritance into account. Might be

1965

-+ * boosted by interactivity modifiers. Changes upon fork,

1966

-+ * setprio syscalls, and whenever the interactivity

1967

-+ * estimator recalculates.

1968

-+ */

1969

-+static inline int normal_prio(struct task_struct *p)

1970

-+{

1971

-+	return __normal_prio(p->policy, p->rt_priority, p->static_prio);

1972

-+}

1973

-+

1974

-+/*

1975

-+ * Calculate the current priority, i.e. the priority

1976

-+ * taken into account by the scheduler. This value might

1977

-+ * be boosted by RT tasks as it will be RT if the task got

1978

-+ * RT-boosted. If not then it returns p->normal_prio.

1979

-+ */

1980

-+static int effective_prio(struct task_struct *p)

1981

-+{

1982

-+	p->normal_prio = normal_prio(p);

1983

-+	/*

1984

-+	 * If we are RT tasks or we were boosted to RT priority,

1985

-+	 * keep the priority unchanged. Otherwise, update priority

1986

-+	 * to the normal priority:

1987

-+	 */

1988

-+	if (!rt_prio(p->prio))

1989

-+		return p->normal_prio;

1990

-+	return p->prio;

1991

-+}

1992

-+

1993

-+/*

1994

-+ * activate_task - move a task to the runqueue.

1995

-+ *

1996

-+ * Context: rq->lock

1997

-+ */

1998

-+static void activate_task(struct task_struct *p, struct rq *rq)

1999

-+{

2000

-+	enqueue_task(p, rq, ENQUEUE_WAKEUP);

2001

-+	p->on_rq = TASK_ON_RQ_QUEUED;

2002

-+

2003

-+	/*

2004

-+	 * If in_iowait is set, the code below may not trigger any cpufreq

2005

-+	 * utilization updates, so do it here explicitly with the IOWAIT flag

2006

-+	 * passed.

2007

-+	 */

2008

-+	cpufreq_update_util(rq, SCHED_CPUFREQ_IOWAIT * p->in_iowait);

2009

-+}

2010

-+

2011

-+/*

2012

-+ * deactivate_task - remove a task from the runqueue.

2013

-+ *

2014

-+ * Context: rq->lock

2015

-+ */

2016

-+static inline void deactivate_task(struct task_struct *p, struct rq *rq)

2017

-+{

2018

-+	dequeue_task(p, rq, DEQUEUE_SLEEP);

2019

-+	p->on_rq = 0;

2020

-+	cpufreq_update_util(rq, 0);

2021

-+}

2022

-+

2023

-+static inline void __set_task_cpu(struct task_struct *p, unsigned int cpu)

2024

-+{

2025

-+#ifdef CONFIG_SMP

2026

-+	/*

2027

-+	 * After ->cpu is set up to a new value, task_access_lock(p, ...) can be

2028

-+	 * successfully executed on another CPU. We must ensure that updates of

2029

-+	 * per-task data have been completed by this moment.

2030

-+	 */

2031

-+	smp_wmb();

2032

-+

2033

-+#ifdef CONFIG_THREAD_INFO_IN_TASK

2034

-+	WRITE_ONCE(p->cpu, cpu);

2035

-+#else

2036

-+	WRITE_ONCE(task_thread_info(p)->cpu, cpu);

2037

-+#endif

2038

-+#endif

2039

-+}

2040

-+

2041

-+static inline bool is_migration_disabled(struct task_struct *p)

2042

-+{

2043

-+#ifdef CONFIG_SMP

2044

-+	return p->migration_disabled;

2045

-+#else

2046

-+	return false;

2047

-+#endif

2048

-+}

2049

-+

2050

-+#define SCA_CHECK		0x01

2051

-+

2052

-+#ifdef CONFIG_SMP

2053

-+

2054

-+void set_task_cpu(struct task_struct *p, unsigned int new_cpu)

2055

-+{

2056

-+#ifdef CONFIG_SCHED_DEBUG

2057

-+	unsigned int state = READ_ONCE(p->__state);

2058

-+

2059

-+	/*

2060

-+	 * We should never call set_task_cpu() on a blocked task,

2061

-+	 * ttwu() will sort out the placement.

2062

-+	 */

2063

-+	WARN_ON_ONCE(state != TASK_RUNNING && state != TASK_WAKING && !p->on_rq);

2064

-+

2065

-+#ifdef CONFIG_LOCKDEP

2066

-+	/*

2067

-+	 * The caller should hold either p->pi_lock or rq->lock, when changing

2068

-+	 * a task's CPU. ->pi_lock for waking tasks, rq->lock for runnable tasks.

2069

-+	 *

2070

-+	 * sched_move_task() holds both and thus holding either pins the cgroup,

2071

-+	 * see task_group().

2072

-+	 */

2073

-+	WARN_ON_ONCE(debug_locks && !(lockdep_is_held(&p->pi_lock) ||

2074

-+				      lockdep_is_held(&task_rq(p)->lock)));

2075

-+#endif

2076

-+	/*

2077

-+	 * Clearly, migrating tasks to offline CPUs is a fairly daft thing.

2078

-+	 */

2079

-+	WARN_ON_ONCE(!cpu_online(new_cpu));

2080

-+

2081

-+	WARN_ON_ONCE(is_migration_disabled(p));

2082

-+#endif

2083

-+	if (task_cpu(p) == new_cpu)

2084

-+		return;

2085

-+	trace_sched_migrate_task(p, new_cpu);

2086

-+	rseq_migrate(p);

2087

-+	perf_event_task_migrate(p);

2088

-+

2089

-+	__set_task_cpu(p, new_cpu);

2090

-+}

2091

-+

2092

-+#define MDF_FORCE_ENABLED	0x80

2093

-+

2094

-+static void

2095

-+__do_set_cpus_ptr(struct task_struct *p, const struct cpumask *new_mask)

2096

-+{

2097

-+	/*

2098

-+	 * This here violates the locking rules for affinity, since we're only

2099

-+	 * supposed to change these variables while holding both rq->lock and

2100

-+	 * p->pi_lock.

2101

-+	 *

2102

-+	 * HOWEVER, it magically works, because ttwu() is the only code that

2103

-+	 * accesses these variables under p->pi_lock and only does so after

2104

-+	 * smp_cond_load_acquire(&p->on_cpu, !VAL), and we're in __schedule()

2105

-+	 * before finish_task().

2106

-+	 *

2107

-+	 * XXX do further audits, this smells like something putrid.

2108

-+	 */

2109

-+	SCHED_WARN_ON(!p->on_cpu);

2110

-+	p->cpus_ptr = new_mask;

2111

-+}

2112

-+

2113

-+void migrate_disable(void)

2114

-+{

2115

-+	struct task_struct *p = current;

2116

-+	int cpu;

2117

-+

2118

-+	if (p->migration_disabled) {

2119

-+		p->migration_disabled++;

2120

-+		return;

2121

-+	}

2122

-+

2123

-+	preempt_disable();

2124

-+	cpu = smp_processor_id();

2125

-+	if (cpumask_test_cpu(cpu, &p->cpus_mask)) {

2126

-+		cpu_rq(cpu)->nr_pinned++;

2127

-+		p->migration_disabled = 1;

2128

-+		p->migration_flags &= ~MDF_FORCE_ENABLED;

2129

-+

2130

-+		/*

2131

-+		 * Violates locking rules! see comment in __do_set_cpus_ptr().

2132

-+		 */

2133

-+		if (p->cpus_ptr == &p->cpus_mask)

2134

-+			__do_set_cpus_ptr(p, cpumask_of(cpu));

2135

-+	}

2136

-+	preempt_enable();

2137

-+}

2138

-+EXPORT_SYMBOL_GPL(migrate_disable);

2139

-+

2140

-+void migrate_enable(void)

2141

-+{

2142

-+	struct task_struct *p = current;

2143

-+

2144

-+	if (0 == p->migration_disabled)

2145

-+		return;

2146

-+

2147

-+	if (p->migration_disabled > 1) {

2148

-+		p->migration_disabled--;

2149

-+		return;

2150

-+	}

2151

-+

2152

-+	/*

2153

-+	 * Ensure stop_task runs either before or after this, and that

2154

-+	 * __set_cpus_allowed_ptr(SCA_MIGRATE_ENABLE) doesn't schedule().

2155

-+	 */

2156

-+	preempt_disable();

2157

-+	/*

2158

-+	 * Assumption: current should be running on allowed cpu

2159

-+	 */

2160

-+	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &p->cpus_mask));

2161

-+	if (p->cpus_ptr != &p->cpus_mask)

2162

-+		__do_set_cpus_ptr(p, &p->cpus_mask);

2163

-+	/*

2164

-+	 * Mustn't clear migration_disabled() until cpus_ptr points back at the

2165

-+	 * regular cpus_mask, otherwise things that race (eg.

2166

-+	 * select_fallback_rq) get confused.

2167

-+	 */

2168

-+	barrier();

2169

-+	p->migration_disabled = 0;

2170

-+	this_rq()->nr_pinned--;

2171

-+	preempt_enable();

2172

-+}

2173

-+EXPORT_SYMBOL_GPL(migrate_enable);

2174

-+

2175

-+static inline bool rq_has_pinned_tasks(struct rq *rq)

2176

-+{

2177

-+	return rq->nr_pinned;

2178

-+}

2179

-+

2180

-+/*

2181

-+ * Per-CPU kthreads are allowed to run on !active && online CPUs, see

2182

-+ * __set_cpus_allowed_ptr() and select_fallback_rq().

2183

-+ */

2184

-+static inline bool is_cpu_allowed(struct task_struct *p, int cpu)

2185

-+{

2186

-+	/* When not in the task's cpumask, no point in looking further. */

2187

-+	if (!cpumask_test_cpu(cpu, p->cpus_ptr))

2188

-+		return false;

2189

-+

2190

-+	/* migrate_disabled() must be allowed to finish. */

2191

-+	if (is_migration_disabled(p))

2192

-+		return cpu_online(cpu);

2193

-+

2194

-+	/* Non kernel threads are not allowed during either online or offline. */

2195

-+	if (!(p->flags & PF_KTHREAD))

2196

-+		return cpu_active(cpu);

2197

-+

2198

-+	/* KTHREAD_IS_PER_CPU is always allowed. */

2199

-+	if (kthread_is_per_cpu(p))

2200

-+		return cpu_online(cpu);

2201

-+

2202

-+	/* Regular kernel threads don't get to stay during offline. */

2203

-+	if (cpu_dying(cpu))

2204

-+		return false;

2205

-+

2206

-+	/* But are allowed during online. */

2207

-+	return cpu_online(cpu);

2208

-+}

2209

-+

2210

-+/*

2211

-+ * This is how migration works:

2212

-+ *

2213

-+ * 1) we invoke migration_cpu_stop() on the target CPU using

2214

-+ *    stop_one_cpu().

2215

-+ * 2) stopper starts to run (implicitly forcing the migrated thread

2216

-+ *    off the CPU)

2217

-+ * 3) it checks whether the migrated task is still in the wrong runqueue.

2218

-+ * 4) if it's in the wrong runqueue then the migration thread removes

2219

-+ *    it and puts it into the right queue.

2220

-+ * 5) stopper completes and stop_one_cpu() returns and the migration

2221

-+ *    is done.

2222

-+ */

2223

-+

2224

-+/*

2225

-+ * move_queued_task - move a queued task to new rq.

2226

-+ *

2227

-+ * Returns (locked) new rq. Old rq's lock is released.

2228

-+ */

2229

-+static struct rq *move_queued_task(struct rq *rq, struct task_struct *p, int

2230

-+				   new_cpu)

2231

-+{

2232

-+	lockdep_assert_held(&rq->lock);

2233

-+

2234

-+	WRITE_ONCE(p->on_rq, TASK_ON_RQ_MIGRATING);

2235

-+	dequeue_task(p, rq, 0);

2236

-+	set_task_cpu(p, new_cpu);

2237

-+	raw_spin_unlock(&rq->lock);

2238

-+

2239

-+	rq = cpu_rq(new_cpu);

2240

-+

2241

-+	raw_spin_lock(&rq->lock);

2242

-+	BUG_ON(task_cpu(p) != new_cpu);

2243

-+	sched_task_sanity_check(p, rq);

2244

-+	enqueue_task(p, rq, 0);

2245

-+	p->on_rq = TASK_ON_RQ_QUEUED;

2246

-+	check_preempt_curr(rq);

2247

-+

2248

-+	return rq;

2249

-+}

2250

-+

2251

-+struct migration_arg {

2252

-+	struct task_struct *task;

2253

-+	int dest_cpu;

2254

-+};

2255

-+

2256

-+/*

2257

-+ * Move (not current) task off this CPU, onto the destination CPU. We're doing

2258

-+ * this because either it can't run here any more (set_cpus_allowed()

2259

-+ * away from this CPU, or CPU going down), or because we're

2260

-+ * attempting to rebalance this task on exec (sched_exec).

2261

-+ *

2262

-+ * So we race with normal scheduler movements, but that's OK, as long

2263

-+ * as the task is no longer on this CPU.

2264

-+ */

2265

-+static struct rq *__migrate_task(struct rq *rq, struct task_struct *p, int

2266

-+				 dest_cpu)

2267

-+{

2268

-+	/* Affinity changed (again). */

2269

-+	if (!is_cpu_allowed(p, dest_cpu))

2270

-+		return rq;

2271

-+

2272

-+	update_rq_clock(rq);

2273

-+	return move_queued_task(rq, p, dest_cpu);

2274

-+}

2275

-+

2276

-+/*

2277

-+ * migration_cpu_stop - this will be executed by a highprio stopper thread

2278

-+ * and performs thread migration by bumping thread off CPU then

2279

-+ * 'pushing' onto another runqueue.

2280

-+ */

2281

-+static int migration_cpu_stop(void *data)

2282

-+{

2283

-+	struct migration_arg *arg = data;

2284

-+	struct task_struct *p = arg->task;

2285

-+	struct rq *rq = this_rq();

2286

-+	unsigned long flags;

2287

-+

2288

-+	/*

2289

-+	 * The original target CPU might have gone down and we might

2290

-+	 * be on another CPU but it doesn't matter.

2291

-+	 */

2292

-+	local_irq_save(flags);

2293

-+	/*

2294

-+	 * We need to explicitly wake pending tasks before running

2295

-+	 * __migrate_task() such that we will not miss enforcing cpus_ptr

2296

-+	 * during wakeups, see set_cpus_allowed_ptr()'s TASK_WAKING test.

2297

-+	 */

2298

-+	flush_smp_call_function_from_idle();

2299

-+

2300

-+	raw_spin_lock(&p->pi_lock);

2301

-+	raw_spin_lock(&rq->lock);

2302

-+	/*

2303

-+	 * If task_rq(p) != rq, it cannot be migrated here, because we're

2304

-+	 * holding rq->lock, if p->on_rq == 0 it cannot get enqueued because

2305

-+	 * we're holding p->pi_lock.

2306

-+	 */

2307

-+	if (task_rq(p) == rq && task_on_rq_queued(p))

2308

-+		rq = __migrate_task(rq, p, arg->dest_cpu);

2309

-+	raw_spin_unlock(&rq->lock);

2310

-+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);

2311

-+

2312

-+	return 0;

2313

-+}

2314

-+

2315

-+static inline void

2316

-+set_cpus_allowed_common(struct task_struct *p, const struct cpumask *new_mask)

2317

-+{

2318

-+	cpumask_copy(&p->cpus_mask, new_mask);

2319

-+	p->nr_cpus_allowed = cpumask_weight(new_mask);

2320

-+}

2321

-+

2322

-+static void

2323

-+__do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)

2324

-+{

2325

-+	lockdep_assert_held(&p->pi_lock);

2326

-+	set_cpus_allowed_common(p, new_mask);

2327

-+}

2328

-+

2329

-+void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)

2330

-+{

2331

-+	__do_set_cpus_allowed(p, new_mask);

2332

-+}

2333

-+

2334

-+#endif

2335

-+

2336

-+/**

2337

-+ * task_curr - is this task currently executing on a CPU?

2338

-+ * @p: the task in question.

2339

-+ *

2340

-+ * Return: 1 if the task is currently executing. 0 otherwise.

2341

-+ */

2342

-+inline int task_curr(const struct task_struct *p)

2343

-+{

2344

-+	return cpu_curr(task_cpu(p)) == p;

2345

-+}

2346

-+

2347

-+#ifdef CONFIG_SMP

2348

-+/*

2349

-+ * wait_task_inactive - wait for a thread to unschedule.

2350

-+ *

2351

-+ * If @match_state is nonzero, it's the @p->state value just checked and

2352

-+ * not expected to change.  If it changes, i.e. @p might have woken up,

2353

-+ * then return zero.  When we succeed in waiting for @p to be off its CPU,

2354

-+ * we return a positive number (its total switch count).  If a second call

2355

-+ * a short while later returns the same number, the caller can be sure that

2356

-+ * @p has remained unscheduled the whole time.

2357

-+ *

2358

-+ * The caller must ensure that the task *will* unschedule sometime soon,

2359

-+ * else this function might spin for a *long* time. This function can't

2360

-+ * be called with interrupts off, or it may introduce deadlock with

2361

-+ * smp_call_function() if an IPI is sent by the same process we are

2362

-+ * waiting to become inactive.

2363

-+ */

2364

-+unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state)

2365

-+{

2366

-+	unsigned long flags;

2367

-+	bool running, on_rq;

2368

-+	unsigned long ncsw;

2369

-+	struct rq *rq;

2370

-+	raw_spinlock_t *lock;

2371

-+

2372

-+	for (;;) {

2373

-+		rq = task_rq(p);

2374

-+

2375

-+		/*

2376

-+		 * If the task is actively running on another CPU

2377

-+		 * still, just relax and busy-wait without holding

2378

-+		 * any locks.

2379

-+		 *

2380

-+		 * NOTE! Since we don't hold any locks, it's not

2381

-+		 * even sure that "rq" stays as the right runqueue!

2382

-+		 * But we don't care, since this will return false

2383

-+		 * if the runqueue has changed and p is actually now

2384

-+		 * running somewhere else!

2385

-+		 */

2386

-+		while (task_running(p) && p == rq->curr) {

2387

-+			if (match_state && unlikely(READ_ONCE(p->__state) != match_state))

2388

-+				return 0;

2389

-+			cpu_relax();

2390

-+		}

2391

-+

2392

-+		/*

2393

-+		 * Ok, time to look more closely! We need the rq

2394

-+		 * lock now, to be *sure*. If we're wrong, we'll

2395

-+		 * just go back and repeat.

2396

-+		 */

2397

-+		task_access_lock_irqsave(p, &lock, &flags);

2398

-+		trace_sched_wait_task(p);

2399

-+		running = task_running(p);

2400

-+		on_rq = p->on_rq;

2401

-+		ncsw = 0;

2402

-+		if (!match_state || READ_ONCE(p->__state) == match_state)

2403

-+			ncsw = p->nvcsw | LONG_MIN; /* sets MSB */

2404

-+		task_access_unlock_irqrestore(p, lock, &flags);

2405

-+

2406

-+		/*

2407

-+		 * If it changed from the expected state, bail out now.

2408

-+		 */

2409

-+		if (unlikely(!ncsw))

2410

-+			break;

2411

-+

2412

-+		/*

2413

-+		 * Was it really running after all now that we

2414

-+		 * checked with the proper locks actually held?

2415

-+		 *

2416

-+		 * Oops. Go back and try again..

2417

-+		 */

2418

-+		if (unlikely(running)) {

2419

-+			cpu_relax();

2420

-+			continue;

2421

-+		}

2422

-+

2423

-+		/*

2424

-+		 * It's not enough that it's not actively running,

2425

-+		 * it must be off the runqueue _entirely_, and not

2426

-+		 * preempted!

2427

-+		 *

2428

-+		 * So if it was still runnable (but just not actively

2429

-+		 * running right now), it's preempted, and we should

2430

-+		 * yield - it could be a while.

2431

-+		 */

2432

-+		if (unlikely(on_rq)) {

2433

-+			ktime_t to = NSEC_PER_SEC / HZ;

2434

-+

2435

-+			set_current_state(TASK_UNINTERRUPTIBLE);

2436

-+			schedule_hrtimeout(&to, HRTIMER_MODE_REL);

2437

-+			continue;

2438

-+		}

2439

-+

2440

-+		/*

2441

-+		 * Ahh, all good. It wasn't running, and it wasn't

2442

-+		 * runnable, which means that it will never become

2443

-+		 * running in the future either. We're all done!

2444

-+		 */

2445

-+		break;

2446

-+	}

2447

-+

2448

-+	return ncsw;

2449

-+}

2450

-+

2451

-+/***

2452

-+ * kick_process - kick a running thread to enter/exit the kernel

2453

-+ * @p: the to-be-kicked thread

2454

-+ *

2455

-+ * Cause a process which is running on another CPU to enter

2456

-+ * kernel-mode, without any delay. (to get signals handled.)

2457

-+ *

2458

-+ * NOTE: this function doesn't have to take the runqueue lock,

2459

-+ * because all it wants to ensure is that the remote task enters

2460

-+ * the kernel. If the IPI races and the task has been migrated

2461

-+ * to another CPU then no harm is done and the purpose has been

2462

-+ * achieved as well.

2463

-+ */

2464

-+void kick_process(struct task_struct *p)

2465

-+{

2466

-+	int cpu;

2467

-+

2468

-+	preempt_disable();

2469

-+	cpu = task_cpu(p);

2470

-+	if ((cpu != smp_processor_id()) && task_curr(p))

2471

-+		smp_send_reschedule(cpu);

2472

-+	preempt_enable();

2473

-+}

2474

-+EXPORT_SYMBOL_GPL(kick_process);

2475

-+

2476

-+/*

2477

-+ * ->cpus_ptr is protected by both rq->lock and p->pi_lock

2478

-+ *

2479

-+ * A few notes on cpu_active vs cpu_online:

2480

-+ *

2481

-+ *  - cpu_active must be a subset of cpu_online

2482

-+ *

2483

-+ *  - on CPU-up we allow per-CPU kthreads on the online && !active CPU,

2484

-+ *    see __set_cpus_allowed_ptr(). At this point the newly online

2485

-+ *    CPU isn't yet part of the sched domains, and balancing will not

2486

-+ *    see it.

2487

-+ *

2488

-+ *  - on cpu-down we clear cpu_active() to mask the sched domains and

2489

-+ *    avoid the load balancer to place new tasks on the to be removed

2490

-+ *    CPU. Existing tasks will remain running there and will be taken

2491

-+ *    off.

2492

-+ *

2493

-+ * This means that fallback selection must not select !active CPUs.

2494

-+ * And can assume that any active CPU must be online. Conversely

2495

-+ * select_task_rq() below may allow selection of !active CPUs in order

2496

-+ * to satisfy the above rules.

2497

-+ */

2498

-+static int select_fallback_rq(int cpu, struct task_struct *p)

2499

-+{

2500

-+	int nid = cpu_to_node(cpu);

2501

-+	const struct cpumask *nodemask = NULL;

2502

-+	enum { cpuset, possible, fail } state = cpuset;

2503

-+	int dest_cpu;

2504

-+

2505

-+	/*

2506

-+	 * If the node that the CPU is on has been offlined, cpu_to_node()

2507

-+	 * will return -1. There is no CPU on the node, and we should

2508

-+	 * select the CPU on the other node.

2509

-+	 */

2510

-+	if (nid != -1) {

2511

-+		nodemask = cpumask_of_node(nid);

2512

-+

2513

-+		/* Look for allowed, online CPU in same node. */

2514

-+		for_each_cpu(dest_cpu, nodemask) {

2515

-+			if (!cpu_active(dest_cpu))

2516

-+				continue;

2517

-+			if (cpumask_test_cpu(dest_cpu, p->cpus_ptr))

2518

-+				return dest_cpu;

2519

-+		}

2520

-+	}

2521

-+

2522

-+	for (;;) {

2523

-+		/* Any allowed, online CPU? */

2524

-+		for_each_cpu(dest_cpu, p->cpus_ptr) {

2525

-+			if (!is_cpu_allowed(p, dest_cpu))

2526

-+				continue;

2527

-+			goto out;

2528

-+		}

2529

-+

2530

-+		/* No more Mr. Nice Guy. */

2531

-+		switch (state) {

2532

-+		case cpuset:

2533

-+			if (IS_ENABLED(CONFIG_CPUSETS)) {

2534

-+				cpuset_cpus_allowed_fallback(p);

2535

-+				state = possible;

2536

-+				break;

2537

-+			}

2538

-+			fallthrough;

2539

-+		case possible:

2540

-+			/*

2541

-+			 * XXX When called from select_task_rq() we only

2542

-+			 * hold p->pi_lock and again violate locking order.

2543

-+			 *

2544

-+			 * More yuck to audit.

2545

-+			 */

2546

-+			do_set_cpus_allowed(p, cpu_possible_mask);

2547

-+			state = fail;

2548

-+			break;

2549

-+

2550

-+		case fail:

2551

-+			BUG();

2552

-+			break;

2553

-+		}

2554

-+	}

2555

-+

2556

-+out:

2557

-+	if (state != cpuset) {

2558

-+		/*

2559

-+		 * Don't tell them about moving exiting tasks or

2560

-+		 * kernel threads (both mm NULL), since they never

2561

-+		 * leave kernel.

2562

-+		 */

2563

-+		if (p->mm && printk_ratelimit()) {

2564

-+			printk_deferred("process %d (%s) no longer affine to cpu%d\n",

2565

-+					task_pid_nr(p), p->comm, cpu);

2566

-+		}

2567

-+	}

2568

-+

2569

-+	return dest_cpu;

2570

-+}

2571

-+

2572

-+static inline int select_task_rq(struct task_struct *p)

2573

-+{

2574

-+	cpumask_t chk_mask, tmp;

2575

-+

2576

-+	if (unlikely(!cpumask_and(&chk_mask, p->cpus_ptr, cpu_active_mask)))

2577

-+		return select_fallback_rq(task_cpu(p), p);

2578

-+

2579

-+	if (

2580

-+#ifdef CONFIG_SCHED_SMT

2581

-+	    cpumask_and(&tmp, &chk_mask, &sched_sg_idle_mask) ||

2582

-+#endif

2583

-+	    cpumask_and(&tmp, &chk_mask, sched_rq_watermark) ||

2584

-+	    cpumask_and(&tmp, &chk_mask,

2585

-+			sched_rq_watermark + SCHED_BITS - task_sched_prio(p)))

2586

-+		return best_mask_cpu(task_cpu(p), &tmp);

2587

-+

2588

-+	return best_mask_cpu(task_cpu(p), &chk_mask);

2589

-+}

2590

-+

2591

-+void sched_set_stop_task(int cpu, struct task_struct *stop)

2592

-+{

2593

-+	static struct lock_class_key stop_pi_lock;

2594

-+	struct sched_param stop_param = { .sched_priority = STOP_PRIO };

2595

-+	struct sched_param start_param = { .sched_priority = 0 };

2596

-+	struct task_struct *old_stop = cpu_rq(cpu)->stop;

2597

-+

2598

-+	if (stop) {

2599

-+		/*

2600

-+		 * Make it appear like a SCHED_FIFO task, its something

2601

-+		 * userspace knows about and won't get confused about.

2602

-+		 *

2603

-+		 * Also, it will make PI more or less work without too

2604

-+		 * much confusion -- but then, stop work should not

2605

-+		 * rely on PI working anyway.

2606

-+		 */

2607

-+		sched_setscheduler_nocheck(stop, SCHED_FIFO, &stop_param);

2608

-+

2609

-+		/*

2610

-+		 * The PI code calls rt_mutex_setprio() with ->pi_lock held to

2611

-+		 * adjust the effective priority of a task. As a result,

2612

-+		 * rt_mutex_setprio() can trigger (RT) balancing operations,

2613

-+		 * which can then trigger wakeups of the stop thread to push

2614

-+		 * around the current task.

2615

-+		 *

2616

-+		 * The stop task itself will never be part of the PI-chain, it

2617

-+		 * never blocks, therefore that ->pi_lock recursion is safe.

2618

-+		 * Tell lockdep about this by placing the stop->pi_lock in its

2619

-+		 * own class.

2620

-+		 */

2621

-+		lockdep_set_class(&stop->pi_lock, &stop_pi_lock);

2622

-+	}

2623

-+

2624

-+	cpu_rq(cpu)->stop = stop;

2625

-+

2626

-+	if (old_stop) {

2627

-+		/*

2628

-+		 * Reset it back to a normal scheduling policy so that

2629

-+		 * it can die in pieces.

2630

-+		 */

2631

-+		sched_setscheduler_nocheck(old_stop, SCHED_NORMAL, &start_param);

2632

-+	}

2633

-+}

2634

-+

2635

-+/*

2636

-+ * Change a given task's CPU affinity. Migrate the thread to a

2637

-+ * proper CPU and schedule it away if the CPU it's executing on

2638

-+ * is removed from the allowed bitmask.

2639

-+ *

2640

-+ * NOTE: the caller must have a valid reference to the task, the

2641

-+ * task must not exit() & deallocate itself prematurely. The

2642

-+ * call is not atomic; no spinlocks may be held.

2643

-+ */

2644

-+static int __set_cpus_allowed_ptr(struct task_struct *p,

2645

-+				  const struct cpumask *new_mask,

2646

-+				  u32 flags)

2647

-+{

2648

-+	const struct cpumask *cpu_valid_mask = cpu_active_mask;

2649

-+	int dest_cpu;

2650

-+	unsigned long irq_flags;

2651

-+	struct rq *rq;

2652

-+	raw_spinlock_t *lock;

2653

-+	int ret = 0;

2654

-+

2655

-+	raw_spin_lock_irqsave(&p->pi_lock, irq_flags);

2656

-+	rq = __task_access_lock(p, &lock);

2657

-+

2658

-+	if (p->flags & PF_KTHREAD || is_migration_disabled(p)) {

2659

-+		/*

2660

-+		 * Kernel threads are allowed on online && !active CPUs,

2661

-+		 * however, during cpu-hot-unplug, even these might get pushed

2662

-+		 * away if not KTHREAD_IS_PER_CPU.

2663

-+		 *

2664

-+		 * Specifically, migration_disabled() tasks must not fail the

2665

-+		 * cpumask_any_and_distribute() pick below, esp. so on

2666

-+		 * SCA_MIGRATE_ENABLE, otherwise we'll not call

2667

-+		 * set_cpus_allowed_common() and actually reset p->cpus_ptr.

2668

-+		 */

2669

-+		cpu_valid_mask = cpu_online_mask;

2670

-+	}

2671

-+

2672

-+	/*

2673

-+	 * Must re-check here, to close a race against __kthread_bind(),

2674

-+	 * sched_setaffinity() is not guaranteed to observe the flag.

2675

-+	 */

2676

-+	if ((flags & SCA_CHECK) && (p->flags & PF_NO_SETAFFINITY)) {

2677

-+		ret = -EINVAL;

2678

-+		goto out;

2679

-+	}

2680

-+

2681

-+	if (cpumask_equal(&p->cpus_mask, new_mask))

2682

-+		goto out;

2683

-+

2684

-+	dest_cpu = cpumask_any_and(cpu_valid_mask, new_mask);

2685

-+	if (dest_cpu >= nr_cpu_ids) {

2686

-+		ret = -EINVAL;

2687

-+		goto out;

2688

-+	}

2689

-+

2690

-+	__do_set_cpus_allowed(p, new_mask);

2691

-+

2692

-+	/* Can the task run on the task's current CPU? If so, we're done */

2693

-+	if (cpumask_test_cpu(task_cpu(p), new_mask))

2694

-+		goto out;

2695

-+

2696

-+	if (p->migration_disabled) {

2697

-+		if (likely(p->cpus_ptr != &p->cpus_mask))

2698

-+			__do_set_cpus_ptr(p, &p->cpus_mask);

2699

-+		p->migration_disabled = 0;

2700

-+		p->migration_flags |= MDF_FORCE_ENABLED;

2701

-+		/* When p is migrate_disabled, rq->lock should be held */

2702

-+		rq->nr_pinned--;

2703

-+	}

2704

-+

2705

-+	if (task_running(p) || READ_ONCE(p->__state) == TASK_WAKING) {

2706

-+		struct migration_arg arg = { p, dest_cpu };

2707

-+

2708

-+		/* Need help from migration thread: drop lock and wait. */

2709

-+		__task_access_unlock(p, lock);

2710

-+		raw_spin_unlock_irqrestore(&p->pi_lock, irq_flags);

2711

-+		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);

2712

-+		return 0;

2713

-+	}

2714

-+	if (task_on_rq_queued(p)) {

2715

-+		/*

2716

-+		 * OK, since we're going to drop the lock immediately

2717

-+		 * afterwards anyway.

2718

-+		 */

2719

-+		update_rq_clock(rq);

2720

-+		rq = move_queued_task(rq, p, dest_cpu);

2721

-+		lock = &rq->lock;

2722

-+	}

2723

-+

2724

-+out:

2725

-+	__task_access_unlock(p, lock);

2726

-+	raw_spin_unlock_irqrestore(&p->pi_lock, irq_flags);

2727

-+

2728

-+	return ret;

2729

-+}

2730

-+

2731

-+int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)

2732

-+{

2733

-+	return __set_cpus_allowed_ptr(p, new_mask, 0);

2734

-+}

2735

-+EXPORT_SYMBOL_GPL(set_cpus_allowed_ptr);

2736

-+

2737

-+#else /* CONFIG_SMP */

2738

-+

2739

-+static inline int select_task_rq(struct task_struct *p)

2740

-+{

2741

-+	return 0;

2742

-+}

2743

-+

2744

-+static inline int

2745

-+__set_cpus_allowed_ptr(struct task_struct *p,

2746

-+		       const struct cpumask *new_mask,

2747

-+		       u32 flags)

2748

-+{

2749

-+	return set_cpus_allowed_ptr(p, new_mask);

2750

-+}

2751

-+

2752

-+static inline bool rq_has_pinned_tasks(struct rq *rq)

2753

-+{

2754

-+	return false;

2755

-+}

2756

-+

2757

-+#endif /* !CONFIG_SMP */

2758

-+

2759

-+static void

2760

-+ttwu_stat(struct task_struct *p, int cpu, int wake_flags)

2761

-+{

2762

-+	struct rq *rq;

2763

-+

2764

-+	if (!schedstat_enabled())

2765

-+		return;

2766

-+

2767

-+	rq = this_rq();

2768

-+

2769

-+#ifdef CONFIG_SMP

2770

-+	if (cpu == rq->cpu)

2771

-+		__schedstat_inc(rq->ttwu_local);

2772

-+	else {

2773

-+		/** Alt schedule FW ToDo:

2774

-+		 * How to do ttwu_wake_remote

2775

-+		 */

2776

-+	}

2777

-+#endif /* CONFIG_SMP */

2778

-+

2779

-+	__schedstat_inc(rq->ttwu_count);

2780

-+}

2781

-+

2782

-+/*

2783

-+ * Mark the task runnable and perform wakeup-preemption.

2784

-+ */

2785

-+static inline void

2786

-+ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)

2787

-+{

2788

-+	check_preempt_curr(rq);

2789

-+	WRITE_ONCE(p->__state, TASK_RUNNING);

2790

-+	trace_sched_wakeup(p);

2791

-+}

2792

-+

2793

-+static inline void

2794

-+ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags)

2795

-+{

2796

-+	if (p->sched_contributes_to_load)

2797

-+		rq->nr_uninterruptible--;

2798

-+

2799

-+	if (

2800

-+#ifdef CONFIG_SMP

2801

-+	    !(wake_flags & WF_MIGRATED) &&

2802

-+#endif

2803

-+	    p->in_iowait) {

2804

-+		delayacct_blkio_end(p);

2805

-+		atomic_dec(&task_rq(p)->nr_iowait);

2806

-+	}

2807

-+

2808

-+	activate_task(p, rq);

2809

-+	ttwu_do_wakeup(rq, p, 0);

2810

-+}

2811

-+

2812

-+/*

2813

-+ * Consider @p being inside a wait loop:

2814

-+ *

2815

-+ *   for (;;) {

2816

-+ *      set_current_state(TASK_UNINTERRUPTIBLE);

2817

-+ *

2818

-+ *      if (CONDITION)

2819

-+ *         break;

2820

-+ *

2821

-+ *      schedule();

2822

-+ *   }

2823

-+ *   __set_current_state(TASK_RUNNING);

2824

-+ *

2825

-+ * between set_current_state() and schedule(). In this case @p is still

2826

-+ * runnable, so all that needs doing is change p->state back to TASK_RUNNING in

2827

-+ * an atomic manner.

2828

-+ *

2829

-+ * By taking task_rq(p)->lock we serialize against schedule(), if @p->on_rq

2830

-+ * then schedule() must still happen and p->state can be changed to

2831

-+ * TASK_RUNNING. Otherwise we lost the race, schedule() has happened, and we

2832

-+ * need to do a full wakeup with enqueue.

2833

-+ *

2834

-+ * Returns: %true when the wakeup is done,

2835

-+ *          %false otherwise.

2836

-+ */

2837

-+static int ttwu_runnable(struct task_struct *p, int wake_flags)

2838

-+{

2839

-+	struct rq *rq;

2840

-+	raw_spinlock_t *lock;

2841

-+	int ret = 0;

2842

-+

2843

-+	rq = __task_access_lock(p, &lock);

2844

-+	if (task_on_rq_queued(p)) {

2845

-+		/* check_preempt_curr() may use rq clock */

2846

-+		update_rq_clock(rq);

2847

-+		ttwu_do_wakeup(rq, p, wake_flags);

2848

-+		ret = 1;

2849

-+	}

2850

-+	__task_access_unlock(p, lock);

2851

-+

2852

-+	return ret;

2853

-+}

2854

-+

2855

-+#ifdef CONFIG_SMP

2856

-+void sched_ttwu_pending(void *arg)

2857

-+{

2858

-+	struct llist_node *llist = arg;

2859

-+	struct rq *rq = this_rq();

2860

-+	struct task_struct *p, *t;

2861

-+	struct rq_flags rf;

2862

-+

2863

-+	if (!llist)

2864

-+		return;

2865

-+

2866

-+	/*

2867

-+	 * rq::ttwu_pending racy indication of out-standing wakeups.

2868

-+	 * Races such that false-negatives are possible, since they

2869

-+	 * are shorter lived that false-positives would be.

2870

-+	 */

2871

-+	WRITE_ONCE(rq->ttwu_pending, 0);

2872

-+

2873

-+	rq_lock_irqsave(rq, &rf);

2874

-+	update_rq_clock(rq);

2875

-+

2876

-+	llist_for_each_entry_safe(p, t, llist, wake_entry.llist) {

2877

-+		if (WARN_ON_ONCE(p->on_cpu))

2878

-+			smp_cond_load_acquire(&p->on_cpu, !VAL);

2879

-+

2880

-+		if (WARN_ON_ONCE(task_cpu(p) != cpu_of(rq)))

2881

-+			set_task_cpu(p, cpu_of(rq));

2882

-+

2883

-+		ttwu_do_activate(rq, p, p->sched_remote_wakeup ? WF_MIGRATED : 0);

2884

-+	}

2885

-+

2886

-+	rq_unlock_irqrestore(rq, &rf);

2887

-+}

2888

-+

2889

-+void send_call_function_single_ipi(int cpu)

2890

-+{

2891

-+	struct rq *rq = cpu_rq(cpu);

2892

-+

2893

-+	if (!set_nr_if_polling(rq->idle))

2894

-+		arch_send_call_function_single_ipi(cpu);

2895

-+	else

2896

-+		trace_sched_wake_idle_without_ipi(cpu);

2897

-+}

2898

-+

2899

-+/*

2900

-+ * Queue a task on the target CPUs wake_list and wake the CPU via IPI if

2901

-+ * necessary. The wakee CPU on receipt of the IPI will queue the task

2902

-+ * via sched_ttwu_wakeup() for activation so the wakee incurs the cost

2903

-+ * of the wakeup instead of the waker.

2904

-+ */

2905

-+static void __ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_flags)

2906

-+{

2907

-+	struct rq *rq = cpu_rq(cpu);

2908

-+

2909

-+	p->sched_remote_wakeup = !!(wake_flags & WF_MIGRATED);

2910

-+

2911

-+	WRITE_ONCE(rq->ttwu_pending, 1);

2912

-+	__smp_call_single_queue(cpu, &p->wake_entry.llist);

2913

-+}

2914

-+

2915

-+static inline bool ttwu_queue_cond(int cpu, int wake_flags)

2916

-+{

2917

-+	/*

2918

-+	 * Do not complicate things with the async wake_list while the CPU is

2919

-+	 * in hotplug state.

2920

-+	 */

2921

-+	if (!cpu_active(cpu))

2922

-+		return false;

2923

-+

2924

-+	/*

2925

-+	 * If the CPU does not share cache, then queue the task on the

2926

-+	 * remote rqs wakelist to avoid accessing remote data.

2927

-+	 */

2928

-+	if (!cpus_share_cache(smp_processor_id(), cpu))

2929

-+		return true;

2930

-+

2931

-+	/*

2932

-+	 * If the task is descheduling and the only running task on the

2933

-+	 * CPU then use the wakelist to offload the task activation to

2934

-+	 * the soon-to-be-idle CPU as the current CPU is likely busy.

2935

-+	 * nr_running is checked to avoid unnecessary task stacking.

2936

-+	 */

2937

-+	if ((wake_flags & WF_ON_CPU) && cpu_rq(cpu)->nr_running <= 1)

2938

-+		return true;

2939

-+

2940

-+	return false;

2941

-+}

2942

-+

2943

-+static bool ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_flags)

2944

-+{

2945

-+	if (__is_defined(ALT_SCHED_TTWU_QUEUE) && ttwu_queue_cond(cpu, wake_flags)) {

2946

-+		if (WARN_ON_ONCE(cpu == smp_processor_id()))

2947

-+			return false;

2948

-+

2949

-+		sched_clock_cpu(cpu); /* Sync clocks across CPUs */

2950

-+		__ttwu_queue_wakelist(p, cpu, wake_flags);

2951

-+		return true;

2952

-+	}

2953

-+

2954

-+	return false;

2955

-+}

2956

-+

2957

-+void wake_up_if_idle(int cpu)

2958

-+{

2959

-+	struct rq *rq = cpu_rq(cpu);

2960

-+	unsigned long flags;

2961

-+

2962

-+	rcu_read_lock();

2963

-+

2964

-+	if (!is_idle_task(rcu_dereference(rq->curr)))

2965

-+		goto out;

2966

-+

2967

-+	if (set_nr_if_polling(rq->idle)) {

2968

-+		trace_sched_wake_idle_without_ipi(cpu);

2969

-+	} else {

2970

-+		raw_spin_lock_irqsave(&rq->lock, flags);

2971

-+		if (is_idle_task(rq->curr))

2972

-+			smp_send_reschedule(cpu);

2973

-+		/* Else CPU is not idle, do nothing here */

2974

-+		raw_spin_unlock_irqrestore(&rq->lock, flags);

2975

-+	}

2976

-+

2977

-+out:

2978

-+	rcu_read_unlock();

2979

-+}

2980

-+

2981

-+bool cpus_share_cache(int this_cpu, int that_cpu)

2982

-+{

2983

-+	return per_cpu(sd_llc_id, this_cpu) == per_cpu(sd_llc_id, that_cpu);

2984

-+}

2985

-+#else /* !CONFIG_SMP */

2986

-+

2987

-+static inline bool ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_flags)

2988

-+{

2989

-+	return false;

2990

-+}

2991

-+

2992

-+#endif /* CONFIG_SMP */

2993

-+

2994

-+static inline void ttwu_queue(struct task_struct *p, int cpu, int wake_flags)

2995

-+{

2996

-+	struct rq *rq = cpu_rq(cpu);

2997

-+

2998

-+	if (ttwu_queue_wakelist(p, cpu, wake_flags))

2999

-+		return;

3000

-+

3001

-+	raw_spin_lock(&rq->lock);

3002

-+	update_rq_clock(rq);

3003

-+	ttwu_do_activate(rq, p, wake_flags);

3004

-+	raw_spin_unlock(&rq->lock);

3005

-+}

3006

-+

3007

-+/*

3008

-+ * Notes on Program-Order guarantees on SMP systems.

3009

-+ *

3010

-+ *  MIGRATION

3011

-+ *

3012

-+ * The basic program-order guarantee on SMP systems is that when a task [t]

3013

-+ * migrates, all its activity on its old CPU [c0] happens-before any subsequent

3014

-+ * execution on its new CPU [c1].

3015

-+ *

3016

-+ * For migration (of runnable tasks) this is provided by the following means:

3017

-+ *

3018

-+ *  A) UNLOCK of the rq(c0)->lock scheduling out task t

3019

-+ *  B) migration for t is required to synchronize *both* rq(c0)->lock and

3020

-+ *     rq(c1)->lock (if not at the same time, then in that order).

3021

-+ *  C) LOCK of the rq(c1)->lock scheduling in task

3022

-+ *

3023

-+ * Transitivity guarantees that B happens after A and C after B.

3024

-+ * Note: we only require RCpc transitivity.

3025

-+ * Note: the CPU doing B need not be c0 or c1

3026

-+ *

3027

-+ * Example:

3028

-+ *

3029

-+ *   CPU0            CPU1            CPU2

3030

-+ *

3031

-+ *   LOCK rq(0)->lock

3032

-+ *   sched-out X

3033

-+ *   sched-in Y

3034

-+ *   UNLOCK rq(0)->lock

3035

-+ *

3036

-+ *                                   LOCK rq(0)->lock // orders against CPU0

3037

-+ *                                   dequeue X

3038

-+ *                                   UNLOCK rq(0)->lock

3039

-+ *

3040

-+ *                                   LOCK rq(1)->lock

3041

-+ *                                   enqueue X

3042

-+ *                                   UNLOCK rq(1)->lock

3043

-+ *

3044

-+ *                   LOCK rq(1)->lock // orders against CPU2

3045

-+ *                   sched-out Z

3046

-+ *                   sched-in X

3047

-+ *                   UNLOCK rq(1)->lock

3048

-+ *

3049

-+ *

3050

-+ *  BLOCKING -- aka. SLEEP + WAKEUP

3051

-+ *

3052

-+ * For blocking we (obviously) need to provide the same guarantee as for

3053

-+ * migration. However the means are completely different as there is no lock

3054

-+ * chain to provide order. Instead we do:

3055

-+ *

3056

-+ *   1) smp_store_release(X->on_cpu, 0)   -- finish_task()

3057

-+ *   2) smp_cond_load_acquire(!X->on_cpu) -- try_to_wake_up()

3058

-+ *

3059

-+ * Example:

3060

-+ *

3061

-+ *   CPU0 (schedule)  CPU1 (try_to_wake_up) CPU2 (schedule)

3062

-+ *

3063

-+ *   LOCK rq(0)->lock LOCK X->pi_lock

3064

-+ *   dequeue X

3065

-+ *   sched-out X

3066

-+ *   smp_store_release(X->on_cpu, 0);

3067

-+ *

3068

-+ *                    smp_cond_load_acquire(&X->on_cpu, !VAL);

3069

-+ *                    X->state = WAKING

3070

-+ *                    set_task_cpu(X,2)

3071

-+ *

3072

-+ *                    LOCK rq(2)->lock

3073

-+ *                    enqueue X

3074

-+ *                    X->state = RUNNING

3075

-+ *                    UNLOCK rq(2)->lock

3076

-+ *

3077

-+ *                                          LOCK rq(2)->lock // orders against CPU1

3078

-+ *                                          sched-out Z

3079

-+ *                                          sched-in X

3080

-+ *                                          UNLOCK rq(2)->lock

3081

-+ *

3082

-+ *                    UNLOCK X->pi_lock

3083

-+ *   UNLOCK rq(0)->lock

3084

-+ *

3085

-+ *

3086

-+ * However; for wakeups there is a second guarantee we must provide, namely we

3087

-+ * must observe the state that lead to our wakeup. That is, not only must our

3088

-+ * task observe its own prior state, it must also observe the stores prior to

3089

-+ * its wakeup.

3090

-+ *

3091

-+ * This means that any means of doing remote wakeups must order the CPU doing

3092

-+ * the wakeup against the CPU the task is going to end up running on. This,

3093

-+ * however, is already required for the regular Program-Order guarantee above,

3094

-+ * since the waking CPU is the one issueing the ACQUIRE (smp_cond_load_acquire).

3095

-+ *

3096

-+ */

3097

-+

3098

-+/**

3099

-+ * try_to_wake_up - wake up a thread

3100

-+ * @p: the thread to be awakened

3101

-+ * @state: the mask of task states that can be woken

3102

-+ * @wake_flags: wake modifier flags (WF_*)

3103

-+ *

3104

-+ * Conceptually does:

3105

-+ *

3106

-+ *   If (@state & @p->state) @p->state = TASK_RUNNING.

3107

-+ *

3108

-+ * If the task was not queued/runnable, also place it back on a runqueue.

3109

-+ *

3110

-+ * This function is atomic against schedule() which would dequeue the task.

3111

-+ *

3112

-+ * It issues a full memory barrier before accessing @p->state, see the comment

3113

-+ * with set_current_state().

3114

-+ *

3115

-+ * Uses p->pi_lock to serialize against concurrent wake-ups.

3116

-+ *

3117

-+ * Relies on p->pi_lock stabilizing:

3118

-+ *  - p->sched_class

3119

-+ *  - p->cpus_ptr

3120

-+ *  - p->sched_task_group

3121

-+ * in order to do migration, see its use of select_task_rq()/set_task_cpu().

3122

-+ *

3123

-+ * Tries really hard to only take one task_rq(p)->lock for performance.

3124

-+ * Takes rq->lock in:

3125

-+ *  - ttwu_runnable()    -- old rq, unavoidable, see comment there;

3126

-+ *  - ttwu_queue()       -- new rq, for enqueue of the task;

3127

-+ *  - psi_ttwu_dequeue() -- much sadness :-( accounting will kill us.

3128

-+ *

3129

-+ * As a consequence we race really badly with just about everything. See the

3130

-+ * many memory barriers and their comments for details.

3131

-+ *

3132

-+ * Return: %true if @p->state changes (an actual wakeup was done),

3133

-+ *	   %false otherwise.

3134

-+ */

3135

-+static int try_to_wake_up(struct task_struct *p, unsigned int state,

3136

-+			  int wake_flags)

3137

-+{

3138

-+	unsigned long flags;

3139

-+	int cpu, success = 0;

3140

-+

3141

-+	preempt_disable();

3142

-+	if (p == current) {

3143

-+		/*

3144

-+		 * We're waking current, this means 'p->on_rq' and 'task_cpu(p)

3145

-+		 * == smp_processor_id()'. Together this means we can special

3146

-+		 * case the whole 'p->on_rq && ttwu_runnable()' case below

3147

-+		 * without taking any locks.

3148

-+		 *

3149

-+		 * In particular:

3150

-+		 *  - we rely on Program-Order guarantees for all the ordering,

3151

-+		 *  - we're serialized against set_special_state() by virtue of

3152

-+		 *    it disabling IRQs (this allows not taking ->pi_lock).

3153

-+		 */

3154

-+		if (!(READ_ONCE(p->__state) & state))

3155

-+			goto out;

3156

-+

3157

-+		success = 1;

3158

-+		trace_sched_waking(p);

3159

-+		WRITE_ONCE(p->__state, TASK_RUNNING);

3160

-+		trace_sched_wakeup(p);

3161

-+		goto out;

3162

-+	}

3163

-+

3164

-+	/*

3165

-+	 * If we are going to wake up a thread waiting for CONDITION we

3166

-+	 * need to ensure that CONDITION=1 done by the caller can not be

3167

-+	 * reordered with p->state check below. This pairs with smp_store_mb()

3168

-+	 * in set_current_state() that the waiting thread does.

3169

-+	 */

3170

-+	raw_spin_lock_irqsave(&p->pi_lock, flags);

3171

-+	smp_mb__after_spinlock();

3172

-+	if (!(READ_ONCE(p->__state) & state))

3173

-+		goto unlock;

3174

-+

3175

-+	trace_sched_waking(p);

3176

-+

3177

-+	/* We're going to change ->state: */

3178

-+	success = 1;

3179

-+

3180

-+	/*

3181

-+	 * Ensure we load p->on_rq _after_ p->state, otherwise it would

3182

-+	 * be possible to, falsely, observe p->on_rq == 0 and get stuck

3183

-+	 * in smp_cond_load_acquire() below.

3184

-+	 *

3185

-+	 * sched_ttwu_pending()			try_to_wake_up()

3186

-+	 *   STORE p->on_rq = 1			  LOAD p->state

3187

-+	 *   UNLOCK rq->lock

3188

-+	 *

3189

-+	 * __schedule() (switch to task 'p')

3190

-+	 *   LOCK rq->lock			  smp_rmb();

3191

-+	 *   smp_mb__after_spinlock();

3192

-+	 *   UNLOCK rq->lock

3193

-+	 *

3194

-+	 * [task p]

3195

-+	 *   STORE p->state = UNINTERRUPTIBLE	  LOAD p->on_rq

3196

-+	 *

3197

-+	 * Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in

3198

-+	 * __schedule().  See the comment for smp_mb__after_spinlock().

3199

-+	 *

3200

-+	 * A similar smb_rmb() lives in try_invoke_on_locked_down_task().

3201

-+	 */

3202

-+	smp_rmb();

3203

-+	if (READ_ONCE(p->on_rq) && ttwu_runnable(p, wake_flags))

3204

-+		goto unlock;

3205

-+

3206

-+#ifdef CONFIG_SMP

3207

-+	/*

3208

-+	 * Ensure we load p->on_cpu _after_ p->on_rq, otherwise it would be

3209

-+	 * possible to, falsely, observe p->on_cpu == 0.

3210

-+	 *

3211

-+	 * One must be running (->on_cpu == 1) in order to remove oneself

3212

-+	 * from the runqueue.

3213

-+	 *

3214

-+	 * __schedule() (switch to task 'p')	try_to_wake_up()

3215

-+	 *   STORE p->on_cpu = 1		  LOAD p->on_rq

3216

-+	 *   UNLOCK rq->lock

3217

-+	 *

3218

-+	 * __schedule() (put 'p' to sleep)

3219

-+	 *   LOCK rq->lock			  smp_rmb();

3220

-+	 *   smp_mb__after_spinlock();

3221

-+	 *   STORE p->on_rq = 0			  LOAD p->on_cpu

3222

-+	 *

3223

-+	 * Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in

3224

-+	 * __schedule().  See the comment for smp_mb__after_spinlock().

3225

-+	 *

3226

-+	 * Form a control-dep-acquire with p->on_rq == 0 above, to ensure

3227

-+	 * schedule()'s deactivate_task() has 'happened' and p will no longer

3228

-+	 * care about it's own p->state. See the comment in __schedule().

3229

-+	 */

3230

-+	smp_acquire__after_ctrl_dep();

3231

-+

3232

-+	/*

3233

-+	 * We're doing the wakeup (@success == 1), they did a dequeue (p->on_rq

3234

-+	 * == 0), which means we need to do an enqueue, change p->state to

3235

-+	 * TASK_WAKING such that we can unlock p->pi_lock before doing the

3236

-+	 * enqueue, such as ttwu_queue_wakelist().

3237

-+	 */

3238

-+	WRITE_ONCE(p->__state, TASK_WAKING);

3239

-+

3240

-+	/*

3241

-+	 * If the owning (remote) CPU is still in the middle of schedule() with

3242

-+	 * this task as prev, considering queueing p on the remote CPUs wake_list

3243

-+	 * which potentially sends an IPI instead of spinning on p->on_cpu to

3244

-+	 * let the waker make forward progress. This is safe because IRQs are

3245

-+	 * disabled and the IPI will deliver after on_cpu is cleared.

3246

-+	 *

3247

-+	 * Ensure we load task_cpu(p) after p->on_cpu:

3248

-+	 *

3249

-+	 * set_task_cpu(p, cpu);

3250

-+	 *   STORE p->cpu = @cpu

3251

-+	 * __schedule() (switch to task 'p')

3252

-+	 *   LOCK rq->lock

3253

-+	 *   smp_mb__after_spin_lock()          smp_cond_load_acquire(&p->on_cpu)

3254

-+	 *   STORE p->on_cpu = 1                LOAD p->cpu

3255

-+	 *

3256

-+	 * to ensure we observe the correct CPU on which the task is currently

3257

-+	 * scheduling.

3258

-+	 */

3259

-+	if (smp_load_acquire(&p->on_cpu) &&

3260

-+	    ttwu_queue_wakelist(p, task_cpu(p), wake_flags | WF_ON_CPU))

3261

-+		goto unlock;

3262

-+

3263

-+	/*

3264

-+	 * If the owning (remote) CPU is still in the middle of schedule() with

3265

-+	 * this task as prev, wait until it's done referencing the task.

3266

-+	 *

3267

-+	 * Pairs with the smp_store_release() in finish_task().

3268

-+	 *

3269

-+	 * This ensures that tasks getting woken will be fully ordered against

3270

-+	 * their previous state and preserve Program Order.

3271

-+	 */

3272

-+	smp_cond_load_acquire(&p->on_cpu, !VAL);

3273

-+

3274

-+	sched_task_ttwu(p);

3275

-+

3276

-+	cpu = select_task_rq(p);

3277

-+

3278

-+	if (cpu != task_cpu(p)) {

3279

-+		if (p->in_iowait) {

3280

-+			delayacct_blkio_end(p);

3281

-+			atomic_dec(&task_rq(p)->nr_iowait);

3282

-+		}

3283

-+

3284

-+		wake_flags |= WF_MIGRATED;

3285

-+		psi_ttwu_dequeue(p);

3286

-+		set_task_cpu(p, cpu);

3287

-+	}

3288

-+#else

3289

-+	cpu = task_cpu(p);

3290

-+#endif /* CONFIG_SMP */

3291

-+

3292

-+	ttwu_queue(p, cpu, wake_flags);

3293

-+unlock:

3294

-+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);

3295

-+out:

3296

-+	if (success)

3297

-+		ttwu_stat(p, task_cpu(p), wake_flags);

3298

-+	preempt_enable();

3299

-+

3300

-+	return success;

3301

-+}

3302

-+

3303

-+/**

3304

-+ * try_invoke_on_locked_down_task - Invoke a function on task in fixed state

3305

-+ * @p: Process for which the function is to be invoked, can be @current.

3306

-+ * @func: Function to invoke.

3307

-+ * @arg: Argument to function.

3308

-+ *

3309

-+ * If the specified task can be quickly locked into a definite state

3310

-+ * (either sleeping or on a given runqueue), arrange to keep it in that

3311

-+ * state while invoking @func(@arg).  This function can use ->on_rq and

3312

-+ * task_curr() to work out what the state is, if required.  Given that

3313

-+ * @func can be invoked with a runqueue lock held, it had better be quite

3314

-+ * lightweight.

3315

-+ *

3316

-+ * Returns:

3317

-+ *	@false if the task slipped out from under the locks.

3318

-+ *	@true if the task was locked onto a runqueue or is sleeping.

3319

-+ *		However, @func can override this by returning @false.

3320

-+ */

3321

-+bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct task_struct *t, void *arg), void *arg)

3322

-+{

3323

-+	struct rq_flags rf;

3324

-+	bool ret = false;

3325

-+	struct rq *rq;

3326

-+

3327

-+	raw_spin_lock_irqsave(&p->pi_lock, rf.flags);

3328

-+	if (p->on_rq) {

3329

-+		rq = __task_rq_lock(p, &rf);

3330

-+		if (task_rq(p) == rq)

3331

-+			ret = func(p, arg);

3332

-+		__task_rq_unlock(rq, &rf);

3333

-+	} else {

3334

-+		switch (READ_ONCE(p->__state)) {

3335

-+		case TASK_RUNNING:

3336

-+		case TASK_WAKING:

3337

-+			break;

3338

-+		default:

3339

-+			smp_rmb(); // See smp_rmb() comment in try_to_wake_up().

3340

-+			if (!p->on_rq)

3341

-+				ret = func(p, arg);

3342

-+		}

3343

-+	}

3344

-+	raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags);

3345

-+	return ret;

3346

-+}

3347

-+

3348

-+/**

3349

-+ * wake_up_process - Wake up a specific process

3350

-+ * @p: The process to be woken up.

3351

-+ *

3352

-+ * Attempt to wake up the nominated process and move it to the set of runnable

3353

-+ * processes.

3354

-+ *

3355

-+ * Return: 1 if the process was woken up, 0 if it was already running.

3356

-+ *

3357

-+ * This function executes a full memory barrier before accessing the task state.

3358

-+ */

3359

-+int wake_up_process(struct task_struct *p)

3360

-+{

3361

-+	return try_to_wake_up(p, TASK_NORMAL, 0);

3362

-+}

3363

-+EXPORT_SYMBOL(wake_up_process);

3364

-+

3365

-+int wake_up_state(struct task_struct *p, unsigned int state)

3366

-+{

3367

-+	return try_to_wake_up(p, state, 0);

3368

-+}

3369

-+

3370

-+/*

3371

-+ * Perform scheduler related setup for a newly forked process p.

3372

-+ * p is forked by current.

3373

-+ *

3374

-+ * __sched_fork() is basic setup used by init_idle() too:

3375

-+ */

3376

-+static inline void __sched_fork(unsigned long clone_flags, struct task_struct *p)

3377

-+{

3378

-+	p->on_rq			= 0;

3379

-+	p->on_cpu			= 0;

3380

-+	p->utime			= 0;

3381

-+	p->stime			= 0;

3382

-+	p->sched_time			= 0;

3383

-+

3384

-+#ifdef CONFIG_PREEMPT_NOTIFIERS

3385

-+	INIT_HLIST_HEAD(&p->preempt_notifiers);

3386

-+#endif

3387

-+

3388

-+#ifdef CONFIG_COMPACTION

3389

-+	p->capture_control = NULL;

3390

-+#endif

3391

-+#ifdef CONFIG_SMP

3392

-+	p->wake_entry.u_flags = CSD_TYPE_TTWU;

3393

-+#endif

3394

-+}

3395

-+

3396

-+/*

3397

-+ * fork()/clone()-time setup:

3398

-+ */

3399

-+int sched_fork(unsigned long clone_flags, struct task_struct *p)

3400

-+{

3401

-+	unsigned long flags;

3402

-+	struct rq *rq;

3403

-+

3404

-+	__sched_fork(clone_flags, p);

3405

-+	/*

3406

-+	 * We mark the process as NEW here. This guarantees that

3407

-+	 * nobody will actually run it, and a signal or other external

3408

-+	 * event cannot wake it up and insert it on the runqueue either.

3409

-+	 */

3410

-+	p->__state = TASK_NEW;

3411

-+

3412

-+	/*

3413

-+	 * Make sure we do not leak PI boosting priority to the child.

3414

-+	 */

3415

-+	p->prio = current->normal_prio;

3416

-+

3417

-+	/*

3418

-+	 * Revert to default priority/policy on fork if requested.

3419

-+	 */

3420

-+	if (unlikely(p->sched_reset_on_fork)) {

3421

-+		if (task_has_rt_policy(p)) {

3422

-+			p->policy = SCHED_NORMAL;

3423

-+			p->static_prio = NICE_TO_PRIO(0);

3424

-+			p->rt_priority = 0;

3425

-+		} else if (PRIO_TO_NICE(p->static_prio) < 0)

3426

-+			p->static_prio = NICE_TO_PRIO(0);

3427

-+

3428

-+		p->prio = p->normal_prio = p->static_prio;

3429

-+

3430

-+		/*

3431

-+		 * We don't need the reset flag anymore after the fork. It has

3432

-+		 * fulfilled its duty:

3433

-+		 */

3434

-+		p->sched_reset_on_fork = 0;

3435

-+	}

3436

-+

3437

-+	/*

3438

-+	 * The child is not yet in the pid-hash so no cgroup attach races,

3439

-+	 * and the cgroup is pinned to this child due to cgroup_fork()

3440

-+	 * is ran before sched_fork().

3441

-+	 *

3442

-+	 * Silence PROVE_RCU.

3443

-+	 */

3444

-+	raw_spin_lock_irqsave(&p->pi_lock, flags);

3445

-+	/*

3446

-+	 * Share the timeslice between parent and child, thus the

3447

-+	 * total amount of pending timeslices in the system doesn't change,

3448

-+	 * resulting in more scheduling fairness.

3449

-+	 */

3450

-+	rq = this_rq();

3451

-+	raw_spin_lock(&rq->lock);

3452

-+

3453

-+	rq->curr->time_slice /= 2;

3454

-+	p->time_slice = rq->curr->time_slice;

3455

-+#ifdef CONFIG_SCHED_HRTICK

3456

-+	hrtick_start(rq, rq->curr->time_slice);

3457

-+#endif

3458

-+

3459

-+	if (p->time_slice < RESCHED_NS) {

3460

-+		p->time_slice = sched_timeslice_ns;

3461

-+		resched_curr(rq);

3462

-+	}

3463

-+	sched_task_fork(p, rq);

3464

-+	raw_spin_unlock(&rq->lock);

3465

-+

3466

-+	rseq_migrate(p);

3467

-+	/*

3468

-+	 * We're setting the CPU for the first time, we don't migrate,

3469

-+	 * so use __set_task_cpu().

3470

-+	 */

3471

-+	__set_task_cpu(p, cpu_of(rq));

3472

-+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);

3473

-+

3474

-+#ifdef CONFIG_SCHED_INFO

3475

-+	if (unlikely(sched_info_on()))

3476

-+		memset(&p->sched_info, 0, sizeof(p->sched_info));

3477

-+#endif

3478

-+	init_task_preempt_count(p);

3479

-+

3480

-+	return 0;

3481

-+}

3482

-+

3483

-+void sched_post_fork(struct task_struct *p) {}

3484

-+

3485

-+#ifdef CONFIG_SCHEDSTATS

3486

-+

3487

-+DEFINE_STATIC_KEY_FALSE(sched_schedstats);

3488

-+

3489

-+static void set_schedstats(bool enabled)

3490

-+{

3491

-+	if (enabled)

3492

-+		static_branch_enable(&sched_schedstats);

3493

-+	else

3494

-+		static_branch_disable(&sched_schedstats);

3495

-+}

3496

-+

3497

-+void force_schedstat_enabled(void)

3498

-+{

3499

-+	if (!schedstat_enabled()) {

3500

-+		pr_info("kernel profiling enabled schedstats, disable via kernel.sched_schedstats.\n");

3501

-+		static_branch_enable(&sched_schedstats);

3502

-+	}

3503

-+}

3504

-+

3505

-+static int __init setup_schedstats(char *str)

3506

-+{

3507

-+	int ret = 0;

3508

-+	if (!str)

3509

-+		goto out;

3510

-+

3511

-+	if (!strcmp(str, "enable")) {

3512

-+		set_schedstats(true);

3513

-+		ret = 1;

3514

-+	} else if (!strcmp(str, "disable")) {

3515

-+		set_schedstats(false);

3516

-+		ret = 1;

3517

-+	}

3518

-+out:

3519

-+	if (!ret)

3520

-+		pr_warn("Unable to parse schedstats=\n");

3521

-+

3522

-+	return ret;

3523

-+}

3524

-+__setup("schedstats=", setup_schedstats);

3525

-+

3526

-+#ifdef CONFIG_PROC_SYSCTL

3527

-+int sysctl_schedstats(struct ctl_table *table, int write,

3528

-+			 void __user *buffer, size_t *lenp, loff_t *ppos)

3529

-+{

3530

-+	struct ctl_table t;

3531

-+	int err;

3532

-+	int state = static_branch_likely(&sched_schedstats);

3533

-+

3534

-+	if (write && !capable(CAP_SYS_ADMIN))

3535

-+		return -EPERM;

3536

-+

3537

-+	t = *table;

3538

-+	t.data = &state;

3539

-+	err = proc_dointvec_minmax(&t, write, buffer, lenp, ppos);

3540

-+	if (err < 0)

3541

-+		return err;

3542

-+	if (write)

3543

-+		set_schedstats(state);

3544

-+	return err;

3545

-+}

3546

-+#endif /* CONFIG_PROC_SYSCTL */

3547

-+#endif /* CONFIG_SCHEDSTATS */

3548

-+

3549

-+/*

3550

-+ * wake_up_new_task - wake up a newly created task for the first time.

3551

-+ *

3552

-+ * This function will do some initial scheduler statistics housekeeping

3553

-+ * that must be done for every newly created context, then puts the task

3554

-+ * on the runqueue and wakes it.

3555

-+ */

3556

-+void wake_up_new_task(struct task_struct *p)

3557

-+{

3558

-+	unsigned long flags;

3559

-+	struct rq *rq;

3560

-+

3561

-+	raw_spin_lock_irqsave(&p->pi_lock, flags);

3562

-+	WRITE_ONCE(p->__state, TASK_RUNNING);

3563

-+	rq = cpu_rq(select_task_rq(p));

3564

-+#ifdef CONFIG_SMP

3565

-+	rseq_migrate(p);

3566

-+	/*

3567

-+	 * Fork balancing, do it here and not earlier because:

3568

-+	 * - cpus_ptr can change in the fork path

3569

-+	 * - any previously selected CPU might disappear through hotplug

3570

-+	 *

3571

-+	 * Use __set_task_cpu() to avoid calling sched_class::migrate_task_rq,

3572

-+	 * as we're not fully set-up yet.

3573

-+	 */

3574

-+	__set_task_cpu(p, cpu_of(rq));

3575

-+#endif

3576

-+

3577

-+	raw_spin_lock(&rq->lock);

3578

-+	update_rq_clock(rq);

3579

-+

3580

-+	activate_task(p, rq);

3581

-+	trace_sched_wakeup_new(p);

3582

-+	check_preempt_curr(rq);

3583

-+

3584

-+	raw_spin_unlock(&rq->lock);

3585

-+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);

3586

-+}

3587

-+

3588

-+#ifdef CONFIG_PREEMPT_NOTIFIERS

3589

-+

3590

-+static DEFINE_STATIC_KEY_FALSE(preempt_notifier_key);

3591

-+

3592

-+void preempt_notifier_inc(void)

3593

-+{

3594

-+	static_branch_inc(&preempt_notifier_key);

3595

-+}

3596

-+EXPORT_SYMBOL_GPL(preempt_notifier_inc);

3597

-+

3598

-+void preempt_notifier_dec(void)

3599

-+{

3600

-+	static_branch_dec(&preempt_notifier_key);

3601

-+}

3602

-+EXPORT_SYMBOL_GPL(preempt_notifier_dec);

3603

-+

3604

-+/**

3605

-+ * preempt_notifier_register - tell me when current is being preempted & rescheduled

3606

-+ * @notifier: notifier struct to register

3607

-+ */

3608

-+void preempt_notifier_register(struct preempt_notifier *notifier)

3609

-+{

3610

-+	if (!static_branch_unlikely(&preempt_notifier_key))

3611

-+		WARN(1, "registering preempt_notifier while notifiers disabled\n");

3612

-+

3613

-+	hlist_add_head(&notifier->link, &current->preempt_notifiers);

3614

-+}

3615

-+EXPORT_SYMBOL_GPL(preempt_notifier_register);

3616

-+

3617

-+/**

3618

-+ * preempt_notifier_unregister - no longer interested in preemption notifications

3619

-+ * @notifier: notifier struct to unregister

3620

-+ *

3621

-+ * This is *not* safe to call from within a preemption notifier.

3622

-+ */

3623

-+void preempt_notifier_unregister(struct preempt_notifier *notifier)

3624

-+{

3625

-+	hlist_del(&notifier->link);

3626

-+}

3627

-+EXPORT_SYMBOL_GPL(preempt_notifier_unregister);

3628

-+

3629

-+static void __fire_sched_in_preempt_notifiers(struct task_struct *curr)

3630

-+{

3631

-+	struct preempt_notifier *notifier;

3632

-+

3633

-+	hlist_for_each_entry(notifier, &curr->preempt_notifiers, link)

3634

-+		notifier->ops->sched_in(notifier, raw_smp_processor_id());

3635

-+}

3636

-+

3637

-+static __always_inline void fire_sched_in_preempt_notifiers(struct task_struct *curr)

3638

-+{

3639

-+	if (static_branch_unlikely(&preempt_notifier_key))

3640

-+		__fire_sched_in_preempt_notifiers(curr);

3641

-+}

3642

-+

3643

-+static void

3644

-+__fire_sched_out_preempt_notifiers(struct task_struct *curr,

3645

-+				   struct task_struct *next)

3646

-+{

3647

-+	struct preempt_notifier *notifier;

3648

-+

3649

-+	hlist_for_each_entry(notifier, &curr->preempt_notifiers, link)

3650

-+		notifier->ops->sched_out(notifier, next);

3651

-+}

3652

-+

3653

-+static __always_inline void

3654

-+fire_sched_out_preempt_notifiers(struct task_struct *curr,

3655

-+				 struct task_struct *next)

3656

-+{

3657

-+	if (static_branch_unlikely(&preempt_notifier_key))

3658

-+		__fire_sched_out_preempt_notifiers(curr, next);

3659

-+}

3660

-+

3661

-+#else /* !CONFIG_PREEMPT_NOTIFIERS */

3662

-+

3663

-+static inline void fire_sched_in_preempt_notifiers(struct task_struct *curr)

3664

-+{

3665

-+}

3666

-+

3667

-+static inline void

3668

-+fire_sched_out_preempt_notifiers(struct task_struct *curr,

3669

-+				 struct task_struct *next)

3670

-+{

3671

-+}

3672

-+

3673

-+#endif /* CONFIG_PREEMPT_NOTIFIERS */

3674

-+

3675

-+static inline void prepare_task(struct task_struct *next)

3676

-+{

3677

-+	/*

3678

-+	 * Claim the task as running, we do this before switching to it

3679

-+	 * such that any running task will have this set.

3680

-+	 *

3681

-+	 * See the ttwu() WF_ON_CPU case and its ordering comment.

3682

-+	 */

3683

-+	WRITE_ONCE(next->on_cpu, 1);

3684

-+}

3685

-+

3686

-+static inline void finish_task(struct task_struct *prev)

3687

-+{

3688

-+#ifdef CONFIG_SMP

3689

-+	/*

3690

-+	 * This must be the very last reference to @prev from this CPU. After

3691

-+	 * p->on_cpu is cleared, the task can be moved to a different CPU. We

3692

-+	 * must ensure this doesn't happen until the switch is completely

3693

-+	 * finished.

3694

-+	 *

3695

-+	 * In particular, the load of prev->state in finish_task_switch() must

3696

-+	 * happen before this.

3697

-+	 *

3698

-+	 * Pairs with the smp_cond_load_acquire() in try_to_wake_up().

3699

-+	 */

3700

-+	smp_store_release(&prev->on_cpu, 0);

3701

-+#else

3702

-+	prev->on_cpu = 0;

3703

-+#endif

3704

-+}

3705

-+

3706

-+#ifdef CONFIG_SMP

3707

-+

3708

-+static void do_balance_callbacks(struct rq *rq, struct callback_head *head)

3709

-+{

3710

-+	void (*func)(struct rq *rq);

3711

-+	struct callback_head *next;

3712

-+

3713

-+	lockdep_assert_held(&rq->lock);

3714

-+

3715

-+	while (head) {

3716

-+		func = (void (*)(struct rq *))head->func;

3717

-+		next = head->next;

3718

-+		head->next = NULL;

3719

-+		head = next;

3720

-+

3721

-+		func(rq);

3722

-+	}

3723

-+}

3724

-+

3725

-+static void balance_push(struct rq *rq);

3726

-+

3727

-+struct callback_head balance_push_callback = {

3728

-+	.next = NULL,

3729

-+	.func = (void (*)(struct callback_head *))balance_push,

3730

-+};

3731

-+

3732

-+static inline struct callback_head *splice_balance_callbacks(struct rq *rq)

3733

-+{

3734

-+	struct callback_head *head = rq->balance_callback;

3735

-+

3736

-+	if (head) {

3737

-+		lockdep_assert_held(&rq->lock);

3738

-+		rq->balance_callback = NULL;

3739

-+	}

3740

-+

3741

-+	return head;

3742

-+}

3743

-+

3744

-+static void __balance_callbacks(struct rq *rq)

3745

-+{

3746

-+	do_balance_callbacks(rq, splice_balance_callbacks(rq));

3747

-+}

3748

-+

3749

-+static inline void balance_callbacks(struct rq *rq, struct callback_head *head)

3750

-+{

3751

-+	unsigned long flags;

3752

-+

3753

-+	if (unlikely(head)) {

3754

-+		raw_spin_lock_irqsave(&rq->lock, flags);

3755

-+		do_balance_callbacks(rq, head);

3756

-+		raw_spin_unlock_irqrestore(&rq->lock, flags);

3757

-+	}

3758

-+}

3759

-+

3760

-+#else

3761

-+

3762

-+static inline void __balance_callbacks(struct rq *rq)

3763

-+{

3764

-+}

3765

-+

3766

-+static inline struct callback_head *splice_balance_callbacks(struct rq *rq)

3767

-+{

3768

-+	return NULL;

3769

-+}

3770

-+

3771

-+static inline void balance_callbacks(struct rq *rq, struct callback_head *head)

3772

-+{

3773

-+}

3774

-+

3775

-+#endif

3776

-+

3777

-+static inline void

3778

-+prepare_lock_switch(struct rq *rq, struct task_struct *next)

3779

-+{

3780

-+	/*

3781

-+	 * Since the runqueue lock will be released by the next

3782

-+	 * task (which is an invalid locking op but in the case

3783

-+	 * of the scheduler it's an obvious special-case), so we

3784

-+	 * do an early lockdep release here:

3785

-+	 */

3786

-+	spin_release(&rq->lock.dep_map, _THIS_IP_);

3787

-+#ifdef CONFIG_DEBUG_SPINLOCK

3788

-+	/* this is a valid case when another task releases the spinlock */

3789

-+	rq->lock.owner = next;

3790

-+#endif

3791

-+}

3792

-+

3793

-+static inline void finish_lock_switch(struct rq *rq)

3794

-+{

3795

-+	/*

3796

-+	 * If we are tracking spinlock dependencies then we have to

3797

-+	 * fix up the runqueue lock - which gets 'carried over' from

3798

-+	 * prev into current:

3799

-+	 */

3800

-+	spin_acquire(&rq->lock.dep_map, 0, 0, _THIS_IP_);

3801

-+	__balance_callbacks(rq);

3802

-+	raw_spin_unlock_irq(&rq->lock);

3803

-+}

3804

-+

3805

-+/*

3806

-+ * NOP if the arch has not defined these:

3807

-+ */

3808

-+

3809

-+#ifndef prepare_arch_switch

3810

-+# define prepare_arch_switch(next)	do { } while (0)

3811

-+#endif

3812

-+

3813

-+#ifndef finish_arch_post_lock_switch

3814

-+# define finish_arch_post_lock_switch()	do { } while (0)

3815

-+#endif

3816

-+

3817

-+static inline void kmap_local_sched_out(void)

3818

-+{

3819

-+#ifdef CONFIG_KMAP_LOCAL

3820

-+	if (unlikely(current->kmap_ctrl.idx))

3821

-+		__kmap_local_sched_out();

3822

-+#endif

3823

-+}

3824

-+

3825

-+static inline void kmap_local_sched_in(void)

3826

-+{

3827

-+#ifdef CONFIG_KMAP_LOCAL

3828

-+	if (unlikely(current->kmap_ctrl.idx))

3829

-+		__kmap_local_sched_in();

3830

-+#endif

3831

-+}

3832

-+

3833

-+/**

3834

-+ * prepare_task_switch - prepare to switch tasks

3835

-+ * @rq: the runqueue preparing to switch

3836

-+ * @next: the task we are going to switch to.

3837

-+ *

3838

-+ * This is called with the rq lock held and interrupts off. It must

3839

-+ * be paired with a subsequent finish_task_switch after the context

3840

-+ * switch.

3841

-+ *

3842

-+ * prepare_task_switch sets up locking and calls architecture specific

3843

-+ * hooks.

3844

-+ */

3845

-+static inline void

3846

-+prepare_task_switch(struct rq *rq, struct task_struct *prev,

3847

-+		    struct task_struct *next)

3848

-+{

3849

-+	kcov_prepare_switch(prev);

3850

-+	sched_info_switch(rq, prev, next);

3851

-+	perf_event_task_sched_out(prev, next);

3852

-+	rseq_preempt(prev);

3853

-+	fire_sched_out_preempt_notifiers(prev, next);

3854

-+	kmap_local_sched_out();

3855

-+	prepare_task(next);

3856

-+	prepare_arch_switch(next);

3857

-+}

3858

-+

3859

-+/**

3860

-+ * finish_task_switch - clean up after a task-switch

3861

-+ * @rq: runqueue associated with task-switch

3862

-+ * @prev: the thread we just switched away from.

3863

-+ *

3864

-+ * finish_task_switch must be called after the context switch, paired

3865

-+ * with a prepare_task_switch call before the context switch.

3866

-+ * finish_task_switch will reconcile locking set up by prepare_task_switch,

3867

-+ * and do any other architecture-specific cleanup actions.

3868

-+ *

3869

-+ * Note that we may have delayed dropping an mm in context_switch(). If

3870

-+ * so, we finish that here outside of the runqueue lock.  (Doing it

3871

-+ * with the lock held can cause deadlocks; see schedule() for

3872

-+ * details.)

3873

-+ *

3874

-+ * The context switch have flipped the stack from under us and restored the

3875

-+ * local variables which were saved when this task called schedule() in the

3876

-+ * past. prev == current is still correct but we need to recalculate this_rq

3877

-+ * because prev may have moved to another CPU.

3878

-+ */

3879

-+static struct rq *finish_task_switch(struct task_struct *prev)

3880

-+	__releases(rq->lock)

3881

-+{

3882

-+	struct rq *rq = this_rq();

3883

-+	struct mm_struct *mm = rq->prev_mm;

3884

-+	long prev_state;

3885

-+

3886

-+	/*

3887

-+	 * The previous task will have left us with a preempt_count of 2

3888

-+	 * because it left us after:

3889

-+	 *

3890

-+	 *	schedule()

3891

-+	 *	  preempt_disable();			// 1

3892

-+	 *	  __schedule()

3893

-+	 *	    raw_spin_lock_irq(&rq->lock)	// 2

3894

-+	 *

3895

-+	 * Also, see FORK_PREEMPT_COUNT.

3896

-+	 */

3897

-+	if (WARN_ONCE(preempt_count() != 2*PREEMPT_DISABLE_OFFSET,

3898

-+		      "corrupted preempt_count: %s/%d/0x%x\n",

3899

-+		      current->comm, current->pid, preempt_count()))

3900

-+		preempt_count_set(FORK_PREEMPT_COUNT);

3901

-+

3902

-+	rq->prev_mm = NULL;

3903

-+

3904

-+	/*

3905

-+	 * A task struct has one reference for the use as "current".

3906

-+	 * If a task dies, then it sets TASK_DEAD in tsk->state and calls

3907

-+	 * schedule one last time. The schedule call will never return, and

3908

-+	 * the scheduled task must drop that reference.

3909

-+	 *

3910

-+	 * We must observe prev->state before clearing prev->on_cpu (in

3911

-+	 * finish_task), otherwise a concurrent wakeup can get prev

3912

-+	 * running on another CPU and we could rave with its RUNNING -> DEAD

3913

-+	 * transition, resulting in a double drop.

3914

-+	 */

3915

-+	prev_state = READ_ONCE(prev->__state);

3916

-+	vtime_task_switch(prev);

3917

-+	perf_event_task_sched_in(prev, current);

3918

-+	finish_task(prev);

3919

-+	tick_nohz_task_switch();

3920

-+	finish_lock_switch(rq);

3921

-+	finish_arch_post_lock_switch();

3922

-+	kcov_finish_switch(current);

3923

-+	/*

3924

-+	 * kmap_local_sched_out() is invoked with rq::lock held and

3925

-+	 * interrupts disabled. There is no requirement for that, but the

3926

-+	 * sched out code does not have an interrupt enabled section.

3927

-+	 * Restoring the maps on sched in does not require interrupts being

3928

-+	 * disabled either.

3929

-+	 */

3930

-+	kmap_local_sched_in();

3931

-+

3932

-+	fire_sched_in_preempt_notifiers(current);

3933

-+	/*

3934

-+	 * When switching through a kernel thread, the loop in

3935

-+	 * membarrier_{private,global}_expedited() may have observed that

3936

-+	 * kernel thread and not issued an IPI. It is therefore possible to

3937

-+	 * schedule between user->kernel->user threads without passing though

3938

-+	 * switch_mm(). Membarrier requires a barrier after storing to

3939

-+	 * rq->curr, before returning to userspace, so provide them here:

3940

-+	 *

3941

-+	 * - a full memory barrier for {PRIVATE,GLOBAL}_EXPEDITED, implicitly

3942

-+	 *   provided by mmdrop(),

3943

-+	 * - a sync_core for SYNC_CORE.

3944

-+	 */

3945

-+	if (mm) {

3946

-+		membarrier_mm_sync_core_before_usermode(mm);

3947

-+		mmdrop(mm);

3948

-+	}

3949

-+	if (unlikely(prev_state == TASK_DEAD)) {

3950

-+		/*

3951

-+		 * Remove function-return probe instances associated with this

3952

-+		 * task and put them back on the free list.

3953

-+		 */

3954

-+		kprobe_flush_task(prev);

3955

-+

3956

-+		/* Task is done with its stack. */

3957

-+		put_task_stack(prev);

3958

-+

3959

-+		put_task_struct_rcu_user(prev);

3960

-+	}

3961

-+

3962

-+	return rq;

3963

-+}

3964

-+

3965

-+/**

3966

-+ * schedule_tail - first thing a freshly forked thread must call.

3967

-+ * @prev: the thread we just switched away from.

3968

-+ */

3969

-+asmlinkage __visible void schedule_tail(struct task_struct *prev)

3970

-+	__releases(rq->lock)

3971

-+{

3972

-+	/*

3973

-+	 * New tasks start with FORK_PREEMPT_COUNT, see there and

3974

-+	 * finish_task_switch() for details.

3975

-+	 *

3976

-+	 * finish_task_switch() will drop rq->lock() and lower preempt_count

3977

-+	 * and the preempt_enable() will end up enabling preemption (on

3978

-+	 * PREEMPT_COUNT kernels).

3979

-+	 */

3980

-+

3981

-+	finish_task_switch(prev);

3982

-+	preempt_enable();

3983

-+

3984

-+	if (current->set_child_tid)

3985

-+		put_user(task_pid_vnr(current), current->set_child_tid);

3986

-+

3987

-+	calculate_sigpending();

3988

-+}

3989

-+

3990

-+/*

3991

-+ * context_switch - switch to the new MM and the new thread's register state.

3992

-+ */

3993

-+static __always_inline struct rq *

3994

-+context_switch(struct rq *rq, struct task_struct *prev,

3995

-+	       struct task_struct *next)

3996

-+{

3997

-+	prepare_task_switch(rq, prev, next);

3998

-+

3999

-+	/*

4000

-+	 * For paravirt, this is coupled with an exit in switch_to to

4001

-+	 * combine the page table reload and the switch backend into

4002

-+	 * one hypercall.

4003

-+	 */

4004

-+	arch_start_context_switch(prev);

4005

-+

4006

-+	/*

4007

-+	 * kernel -> kernel   lazy + transfer active

4008

-+	 *   user -> kernel   lazy + mmgrab() active

4009

-+	 *

4010

-+	 * kernel ->   user   switch + mmdrop() active

4011

-+	 *   user ->   user   switch

4012

-+	 */

4013

-+	if (!next->mm) {                                // to kernel

4014

-+		enter_lazy_tlb(prev->active_mm, next);

4015

-+

4016

-+		next->active_mm = prev->active_mm;

4017

-+		if (prev->mm)                           // from user

4018

-+			mmgrab(prev->active_mm);

4019

-+		else

4020

-+			prev->active_mm = NULL;

4021

-+	} else {                                        // to user

4022

-+		membarrier_switch_mm(rq, prev->active_mm, next->mm);

4023

-+		/*

4024

-+		 * sys_membarrier() requires an smp_mb() between setting

4025

-+		 * rq->curr / membarrier_switch_mm() and returning to userspace.

4026

-+		 *

4027

-+		 * The below provides this either through switch_mm(), or in

4028

-+		 * case 'prev->active_mm == next->mm' through

4029

-+		 * finish_task_switch()'s mmdrop().

4030

-+		 */

4031

-+		switch_mm_irqs_off(prev->active_mm, next->mm, next);

4032

-+

4033

-+		if (!prev->mm) {                        // from kernel

4034

-+			/* will mmdrop() in finish_task_switch(). */

4035

-+			rq->prev_mm = prev->active_mm;

4036

-+			prev->active_mm = NULL;

4037

-+		}

4038

-+	}

4039

-+

4040

-+	prepare_lock_switch(rq, next);

4041

-+

4042

-+	/* Here we just switch the register state and the stack. */

4043

-+	switch_to(prev, next, prev);

4044

-+	barrier();

4045

-+

4046

-+	return finish_task_switch(prev);

4047

-+}

4048

-+

4049

-+/*

4050

-+ * nr_running, nr_uninterruptible and nr_context_switches:

4051

-+ *

4052

-+ * externally visible scheduler statistics: current number of runnable

4053

-+ * threads, total number of context switches performed since bootup.

4054

-+ */

4055

-+unsigned int nr_running(void)

4056

-+{

4057

-+	unsigned int i, sum = 0;

4058

-+

4059

-+	for_each_online_cpu(i)

4060

-+		sum += cpu_rq(i)->nr_running;

4061

-+

4062

-+	return sum;

4063

-+}

4064

-+

4065

-+/*

4066

-+ * Check if only the current task is running on the CPU.

4067

-+ *

4068

-+ * Caution: this function does not check that the caller has disabled

4069

-+ * preemption, thus the result might have a time-of-check-to-time-of-use

4070

-+ * race.  The caller is responsible to use it correctly, for example:

4071

-+ *

4072

-+ * - from a non-preemptible section (of course)

4073

-+ *

4074

-+ * - from a thread that is bound to a single CPU

4075

-+ *

4076

-+ * - in a loop with very short iterations (e.g. a polling loop)

4077

-+ */

4078

-+bool single_task_running(void)

4079

-+{

4080

-+	return raw_rq()->nr_running == 1;

4081

-+}

4082

-+EXPORT_SYMBOL(single_task_running);

4083

-+

4084

-+unsigned long long nr_context_switches(void)

4085

-+{

4086

-+	int i;

4087

-+	unsigned long long sum = 0;

4088

-+

4089

-+	for_each_possible_cpu(i)

4090

-+		sum += cpu_rq(i)->nr_switches;

4091

-+

4092

-+	return sum;

4093

-+}

4094

-+

4095

-+/*

4096

-+ * Consumers of these two interfaces, like for example the cpuidle menu

4097

-+ * governor, are using nonsensical data. Preferring shallow idle state selection

4098

-+ * for a CPU that has IO-wait which might not even end up running the task when

4099

-+ * it does become runnable.

4100

-+ */

4101

-+

4102

-+unsigned int nr_iowait_cpu(int cpu)

4103

-+{

4104

-+	return atomic_read(&cpu_rq(cpu)->nr_iowait);

4105

-+}

4106

-+

4107

-+/*

4108

-+ * IO-wait accounting, and how it's mostly bollocks (on SMP).

4109

-+ *

4110

-+ * The idea behind IO-wait account is to account the idle time that we could

4111

-+ * have spend running if it were not for IO. That is, if we were to improve the

4112

-+ * storage performance, we'd have a proportional reduction in IO-wait time.

4113

-+ *

4114

-+ * This all works nicely on UP, where, when a task blocks on IO, we account

4115

-+ * idle time as IO-wait, because if the storage were faster, it could've been

4116

-+ * running and we'd not be idle.

4117

-+ *

4118

-+ * This has been extended to SMP, by doing the same for each CPU. This however

4119

-+ * is broken.

4120

-+ *

4121

-+ * Imagine for instance the case where two tasks block on one CPU, only the one

4122

-+ * CPU will have IO-wait accounted, while the other has regular idle. Even

4123

-+ * though, if the storage were faster, both could've ran at the same time,

4124

-+ * utilising both CPUs.

4125

-+ *

4126

-+ * This means, that when looking globally, the current IO-wait accounting on

4127

-+ * SMP is a lower bound, by reason of under accounting.

4128

-+ *

4129

-+ * Worse, since the numbers are provided per CPU, they are sometimes

4130

-+ * interpreted per CPU, and that is nonsensical. A blocked task isn't strictly

4131

-+ * associated with any one particular CPU, it can wake to another CPU than it

4132

-+ * blocked on. This means the per CPU IO-wait number is meaningless.

4133

-+ *

4134

-+ * Task CPU affinities can make all that even more 'interesting'.

4135

-+ */

4136

-+

4137

-+unsigned int nr_iowait(void)

4138

-+{

4139

-+	unsigned int i, sum = 0;

4140

-+

4141

-+	for_each_possible_cpu(i)

4142

-+		sum += nr_iowait_cpu(i);

4143

-+

4144

-+	return sum;

4145

-+}

4146

-+

4147

-+#ifdef CONFIG_SMP

4148

-+

4149

-+/*

4150

-+ * sched_exec - execve() is a valuable balancing opportunity, because at

4151

-+ * this point the task has the smallest effective memory and cache

4152

-+ * footprint.

4153

-+ */

4154

-+void sched_exec(void)

4155

-+{

4156

-+	struct task_struct *p = current;

4157

-+	unsigned long flags;

4158

-+	int dest_cpu;

4159

-+

4160

-+	raw_spin_lock_irqsave(&p->pi_lock, flags);

4161

-+	dest_cpu = cpumask_any(p->cpus_ptr);

4162

-+	if (dest_cpu == smp_processor_id())

4163

-+		goto unlock;

4164

-+

4165

-+	if (likely(cpu_active(dest_cpu))) {

4166

-+		struct migration_arg arg = { p, dest_cpu };

4167

-+

4168

-+		raw_spin_unlock_irqrestore(&p->pi_lock, flags);

4169

-+		stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg);

4170

-+		return;

4171

-+	}

4172

-+unlock:

4173

-+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);

4174

-+}

4175

-+

4176

-+#endif

4177

-+

4178

-+DEFINE_PER_CPU(struct kernel_stat, kstat);

4179

-+DEFINE_PER_CPU(struct kernel_cpustat, kernel_cpustat);

4180

-+

4181

-+EXPORT_PER_CPU_SYMBOL(kstat);

4182

-+EXPORT_PER_CPU_SYMBOL(kernel_cpustat);

4183

-+

4184

-+static inline void update_curr(struct rq *rq, struct task_struct *p)

4185

-+{

4186

-+	s64 ns = rq->clock_task - p->last_ran;

4187

-+

4188

-+	p->sched_time += ns;

4189

-+	cgroup_account_cputime(p, ns);

4190

-+	account_group_exec_runtime(p, ns);

4191

-+

4192

-+	p->time_slice -= ns;

4193

-+	p->last_ran = rq->clock_task;

4194

-+}

4195

-+

4196

-+/*

4197

-+ * Return accounted runtime for the task.

4198

-+ * Return separately the current's pending runtime that have not been

4199

-+ * accounted yet.

4200

-+ */

4201

-+unsigned long long task_sched_runtime(struct task_struct *p)

4202

-+{

4203

-+	unsigned long flags;

4204

-+	struct rq *rq;

4205

-+	raw_spinlock_t *lock;

4206

-+	u64 ns;

4207

-+

4208

-+#if defined(CONFIG_64BIT) && defined(CONFIG_SMP)

4209

-+	/*

4210

-+	 * 64-bit doesn't need locks to atomically read a 64-bit value.

4211

-+	 * So we have a optimization chance when the task's delta_exec is 0.

4212

-+	 * Reading ->on_cpu is racy, but this is ok.

4213

-+	 *

4214

-+	 * If we race with it leaving CPU, we'll take a lock. So we're correct.

4215

-+	 * If we race with it entering CPU, unaccounted time is 0. This is

4216

-+	 * indistinguishable from the read occurring a few cycles earlier.

4217

-+	 * If we see ->on_cpu without ->on_rq, the task is leaving, and has

4218

-+	 * been accounted, so we're correct here as well.

4219

-+	 */

4220

-+	if (!p->on_cpu || !task_on_rq_queued(p))

4221

-+		return tsk_seruntime(p);

4222

-+#endif

4223

-+

4224

-+	rq = task_access_lock_irqsave(p, &lock, &flags);

4225

-+	/*

4226

-+	 * Must be ->curr _and_ ->on_rq.  If dequeued, we would

4227

-+	 * project cycles that may never be accounted to this

4228

-+	 * thread, breaking clock_gettime().

4229

-+	 */

4230

-+	if (p == rq->curr && task_on_rq_queued(p)) {

4231

-+		update_rq_clock(rq);

4232

-+		update_curr(rq, p);

4233

-+	}

4234

-+	ns = tsk_seruntime(p);

4235

-+	task_access_unlock_irqrestore(p, lock, &flags);

4236

-+

4237

-+	return ns;

4238

-+}

4239

-+

4240

-+/* This manages tasks that have run out of timeslice during a scheduler_tick */

4241

-+static inline void scheduler_task_tick(struct rq *rq)

4242

-+{

4243

-+	struct task_struct *p = rq->curr;

4244

-+

4245

-+	if (is_idle_task(p))

4246

-+		return;

4247

-+

4248

-+	update_curr(rq, p);

4249

-+	cpufreq_update_util(rq, 0);

4250

-+

4251

-+	/*

4252

-+	 * Tasks have less than RESCHED_NS of time slice left they will be

4253

-+	 * rescheduled.

4254

-+	 */

4255

-+	if (p->time_slice >= RESCHED_NS)

4256

-+		return;

4257

-+	set_tsk_need_resched(p);

4258

-+	set_preempt_need_resched();

4259

-+}

4260

-+

4261

-+#ifdef CONFIG_SCHED_DEBUG

4262

-+static u64 cpu_resched_latency(struct rq *rq)

4263

-+{

4264

-+	int latency_warn_ms = READ_ONCE(sysctl_resched_latency_warn_ms);

4265

-+	u64 resched_latency, now = rq_clock(rq);

4266

-+	static bool warned_once;

4267

-+

4268

-+	if (sysctl_resched_latency_warn_once && warned_once)

4269

-+		return 0;

4270

-+

4271

-+	if (!need_resched() || !latency_warn_ms)

4272

-+		return 0;

4273

-+

4274

-+	if (system_state == SYSTEM_BOOTING)

4275

-+		return 0;

4276

-+

4277

-+	if (!rq->last_seen_need_resched_ns) {

4278

-+		rq->last_seen_need_resched_ns = now;

4279

-+		rq->ticks_without_resched = 0;

4280

-+		return 0;

4281

-+	}

4282

-+

4283

-+	rq->ticks_without_resched++;

4284

-+	resched_latency = now - rq->last_seen_need_resched_ns;

4285

-+	if (resched_latency <= latency_warn_ms * NSEC_PER_MSEC)

4286

-+		return 0;

4287

-+

4288

-+	warned_once = true;

4289

-+

4290

-+	return resched_latency;

4291

-+}

4292

-+

4293

-+static int __init setup_resched_latency_warn_ms(char *str)

4294

-+{

4295

-+	long val;

4296

-+

4297

-+	if ((kstrtol(str, 0, &val))) {

4298

-+		pr_warn("Unable to set resched_latency_warn_ms\n");

4299

-+		return 1;

4300

-+	}

4301

-+

4302

-+	sysctl_resched_latency_warn_ms = val;

4303

-+	return 1;

4304

-+}

4305

-+__setup("resched_latency_warn_ms=", setup_resched_latency_warn_ms);

4306

-+#else

4307

-+static inline u64 cpu_resched_latency(struct rq *rq) { return 0; }

4308

-+#endif /* CONFIG_SCHED_DEBUG */

4309

-+

4310

-+/*

4311

-+ * This function gets called by the timer code, with HZ frequency.

4312

-+ * We call it with interrupts disabled.

4313

-+ */

4314

-+void scheduler_tick(void)

4315

-+{

4316

-+	int cpu __maybe_unused = smp_processor_id();

4317

-+	struct rq *rq = cpu_rq(cpu);

4318

-+	u64 resched_latency;

4319

-+

4320

-+	arch_scale_freq_tick();

4321

-+	sched_clock_tick();

4322

-+

4323

-+	raw_spin_lock(&rq->lock);

4324

-+	update_rq_clock(rq);

4325

-+

4326

-+	scheduler_task_tick(rq);

4327

-+	if (sched_feat(LATENCY_WARN))

4328

-+		resched_latency = cpu_resched_latency(rq);

4329

-+	calc_global_load_tick(rq);

4330

-+

4331

-+	rq->last_tick = rq->clock;

4332

-+	raw_spin_unlock(&rq->lock);

4333

-+

4334

-+	if (sched_feat(LATENCY_WARN) && resched_latency)

4335

-+		resched_latency_warn(cpu, resched_latency);

4336

-+

4337

-+	perf_event_task_tick();

4338

-+}

4339

-+

4340

-+#ifdef CONFIG_SCHED_SMT

4341

-+static inline int active_load_balance_cpu_stop(void *data)

4342

-+{

4343

-+	struct rq *rq = this_rq();

4344

-+	struct task_struct *p = data;

4345

-+	cpumask_t tmp;

4346

-+	unsigned long flags;

4347

-+

4348

-+	local_irq_save(flags);

4349

-+

4350

-+	raw_spin_lock(&p->pi_lock);

4351

-+	raw_spin_lock(&rq->lock);

4352

-+

4353

-+	rq->active_balance = 0;

4354

-+	/* _something_ may have changed the task, double check again */

4355

-+	if (task_on_rq_queued(p) && task_rq(p) == rq &&

4356

-+	    cpumask_and(&tmp, p->cpus_ptr, &sched_sg_idle_mask) &&

4357

-+	    !is_migration_disabled(p)) {

4358

-+		int cpu = cpu_of(rq);

4359

-+		int dcpu = __best_mask_cpu(&tmp, per_cpu(sched_cpu_llc_mask, cpu));

4360

-+		rq = move_queued_task(rq, p, dcpu);

4361

-+	}

4362

-+

4363

-+	raw_spin_unlock(&rq->lock);

4364

-+	raw_spin_unlock(&p->pi_lock);

4365

-+

4366

-+	local_irq_restore(flags);

4367

-+

4368

-+	return 0;

4369

-+}

4370

-+

4371

-+/* sg_balance_trigger - trigger slibing group balance for @cpu */

4372

-+static inline int sg_balance_trigger(const int cpu)

4373

-+{

4374

-+	struct rq *rq= cpu_rq(cpu);

4375

-+	unsigned long flags;

4376

-+	struct task_struct *curr;

4377

-+	int res;

4378

-+

4379

-+	if (!raw_spin_trylock_irqsave(&rq->lock, flags))

4380

-+		return 0;

4381

-+	curr = rq->curr;

4382

-+	res = (!is_idle_task(curr)) && (1 == rq->nr_running) &&\

4383

-+	      cpumask_intersects(curr->cpus_ptr, &sched_sg_idle_mask) &&\

4384

-+	      !is_migration_disabled(curr) && (!rq->active_balance);

4385

-+

4386

-+	if (res)

4387

-+		rq->active_balance = 1;

4388

-+

4389

-+	raw_spin_unlock_irqrestore(&rq->lock, flags);

4390

-+

4391

-+	if (res)

4392

-+		stop_one_cpu_nowait(cpu, active_load_balance_cpu_stop,

4393

-+				    curr, &rq->active_balance_work);

4394

-+	return res;

4395

-+}

4396

-+

4397

-+/*

4398

-+ * sg_balance_check - slibing group balance check for run queue @rq

4399

-+ */

4400

-+static inline void sg_balance_check(struct rq *rq)

4401

-+{

4402

-+	cpumask_t chk;

4403

-+	int cpu = cpu_of(rq);

4404

-+

4405

-+	/* exit when cpu is offline */

4406

-+	if (unlikely(!rq->online))

4407

-+		return;

4408

-+

4409

-+	/*

4410

-+	 * Only cpu in slibing idle group will do the checking and then

4411

-+	 * find potential cpus which can migrate the current running task

4412

-+	 */

4413

-+	if (cpumask_test_cpu(cpu, &sched_sg_idle_mask) &&

4414

-+	    cpumask_andnot(&chk, cpu_online_mask, sched_rq_watermark) &&

4415

-+	    cpumask_andnot(&chk, &chk, &sched_rq_pending_mask)) {

4416

-+		int i;

4417

-+

4418

-+		for_each_cpu_wrap(i, &chk, cpu) {

4419

-+			if (cpumask_subset(cpu_smt_mask(i), &chk) &&

4420

-+			    sg_balance_trigger(i))

4421

-+				return;

4422

-+		}

4423

-+	}

4424

-+}

4425

-+#endif /* CONFIG_SCHED_SMT */

4426

-+

4427

-+#ifdef CONFIG_NO_HZ_FULL

4428

-+

4429

-+struct tick_work {

4430

-+	int			cpu;

4431

-+	atomic_t		state;

4432

-+	struct delayed_work	work;

4433

-+};

4434

-+/* Values for ->state, see diagram below. */

4435

-+#define TICK_SCHED_REMOTE_OFFLINE	0

4436

-+#define TICK_SCHED_REMOTE_OFFLINING	1

4437

-+#define TICK_SCHED_REMOTE_RUNNING	2

4438

-+

4439

-+/*

4440

-+ * State diagram for ->state:

4441

-+ *

4442

-+ *

4443

-+ *          TICK_SCHED_REMOTE_OFFLINE

4444

-+ *                    |   ^

4445

-+ *                    |   |

4446

-+ *                    |   | sched_tick_remote()

4447

-+ *                    |   |

4448

-+ *                    |   |

4449

-+ *                    +--TICK_SCHED_REMOTE_OFFLINING

4450

-+ *                    |   ^

4451

-+ *                    |   |

4452

-+ * sched_tick_start() |   | sched_tick_stop()

4453

-+ *                    |   |

4454

-+ *                    V   |

4455

-+ *          TICK_SCHED_REMOTE_RUNNING

4456

-+ *

4457

-+ *

4458

-+ * Other transitions get WARN_ON_ONCE(), except that sched_tick_remote()

4459

-+ * and sched_tick_start() are happy to leave the state in RUNNING.

4460

-+ */

4461

-+

4462

-+static struct tick_work __percpu *tick_work_cpu;

4463

-+

4464

-+static void sched_tick_remote(struct work_struct *work)

4465

-+{

4466

-+	struct delayed_work *dwork = to_delayed_work(work);

4467

-+	struct tick_work *twork = container_of(dwork, struct tick_work, work);

4468

-+	int cpu = twork->cpu;

4469

-+	struct rq *rq = cpu_rq(cpu);

4470

-+	struct task_struct *curr;

4471

-+	unsigned long flags;

4472

-+	u64 delta;

4473

-+	int os;

4474

-+

4475

-+	/*

4476

-+	 * Handle the tick only if it appears the remote CPU is running in full

4477

-+	 * dynticks mode. The check is racy by nature, but missing a tick or

4478

-+	 * having one too much is no big deal because the scheduler tick updates

4479

-+	 * statistics and checks timeslices in a time-independent way, regardless

4480

-+	 * of when exactly it is running.

4481

-+	 */

4482

-+	if (!tick_nohz_tick_stopped_cpu(cpu))

4483

-+		goto out_requeue;

4484

-+

4485

-+	raw_spin_lock_irqsave(&rq->lock, flags);

4486

-+	curr = rq->curr;

4487

-+	if (cpu_is_offline(cpu))

4488

-+		goto out_unlock;

4489

-+

4490

-+	update_rq_clock(rq);

4491

-+	if (!is_idle_task(curr)) {

4492

-+		/*

4493

-+		 * Make sure the next tick runs within a reasonable

4494

-+		 * amount of time.

4495

-+		 */

4496

-+		delta = rq_clock_task(rq) - curr->last_ran;

4497

-+		WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);

4498

-+	}

4499

-+	scheduler_task_tick(rq);

4500

-+

4501

-+	calc_load_nohz_remote(rq);

4502

-+out_unlock:

4503

-+	raw_spin_unlock_irqrestore(&rq->lock, flags);

4504

-+

4505

-+out_requeue:

4506

-+	/*

4507

-+	 * Run the remote tick once per second (1Hz). This arbitrary

4508

-+	 * frequency is large enough to avoid overload but short enough

4509

-+	 * to keep scheduler internal stats reasonably up to date.  But

4510

-+	 * first update state to reflect hotplug activity if required.

4511

-+	 */

4512

-+	os = atomic_fetch_add_unless(&twork->state, -1, TICK_SCHED_REMOTE_RUNNING);

4513

-+	WARN_ON_ONCE(os == TICK_SCHED_REMOTE_OFFLINE);

4514

-+	if (os == TICK_SCHED_REMOTE_RUNNING)

4515

-+		queue_delayed_work(system_unbound_wq, dwork, HZ);

4516

-+}

4517

-+

4518

-+static void sched_tick_start(int cpu)

4519

-+{

4520

-+	int os;

4521

-+	struct tick_work *twork;

4522

-+

4523

-+	if (housekeeping_cpu(cpu, HK_FLAG_TICK))

4524

-+		return;

4525

-+

4526

-+	WARN_ON_ONCE(!tick_work_cpu);

4527

-+

4528

-+	twork = per_cpu_ptr(tick_work_cpu, cpu);

4529

-+	os = atomic_xchg(&twork->state, TICK_SCHED_REMOTE_RUNNING);

4530

-+	WARN_ON_ONCE(os == TICK_SCHED_REMOTE_RUNNING);

4531

-+	if (os == TICK_SCHED_REMOTE_OFFLINE) {

4532

-+		twork->cpu = cpu;

4533

-+		INIT_DELAYED_WORK(&twork->work, sched_tick_remote);

4534

-+		queue_delayed_work(system_unbound_wq, &twork->work, HZ);

4535

-+	}

4536

-+}

4537

-+

4538

-+#ifdef CONFIG_HOTPLUG_CPU

4539

-+static void sched_tick_stop(int cpu)

4540

-+{

4541

-+	struct tick_work *twork;

4542

-+

4543

-+	if (housekeeping_cpu(cpu, HK_FLAG_TICK))

4544

-+		return;

4545

-+

4546

-+	WARN_ON_ONCE(!tick_work_cpu);

4547

-+

4548

-+	twork = per_cpu_ptr(tick_work_cpu, cpu);

4549

-+	cancel_delayed_work_sync(&twork->work);

4550

-+}

4551

-+#endif /* CONFIG_HOTPLUG_CPU */

4552

-+

4553

-+int __init sched_tick_offload_init(void)

4554

-+{

4555

-+	tick_work_cpu = alloc_percpu(struct tick_work);

4556

-+	BUG_ON(!tick_work_cpu);

4557

-+	return 0;

4558

-+}

4559

-+

4560

-+#else /* !CONFIG_NO_HZ_FULL */

4561

-+static inline void sched_tick_start(int cpu) { }

4562

-+static inline void sched_tick_stop(int cpu) { }

4563

-+#endif

4564

-+

4565

-+#if defined(CONFIG_PREEMPTION) && (defined(CONFIG_DEBUG_PREEMPT) || \

4566

-+				defined(CONFIG_PREEMPT_TRACER))

4567

-+/*

4568

-+ * If the value passed in is equal to the current preempt count

4569

-+ * then we just disabled preemption. Start timing the latency.

4570

-+ */

4571

-+static inline void preempt_latency_start(int val)

4572

-+{

4573

-+	if (preempt_count() == val) {

4574

-+		unsigned long ip = get_lock_parent_ip();

4575

-+#ifdef CONFIG_DEBUG_PREEMPT

4576

-+		current->preempt_disable_ip = ip;

4577

-+#endif

4578

-+		trace_preempt_off(CALLER_ADDR0, ip);

4579

-+	}

4580

-+}

4581

-+

4582

-+void preempt_count_add(int val)

4583

-+{

4584

-+#ifdef CONFIG_DEBUG_PREEMPT

4585

-+	/*

4586

-+	 * Underflow?

4587

-+	 */

4588

-+	if (DEBUG_LOCKS_WARN_ON((preempt_count() < 0)))

4589

-+		return;

4590

-+#endif

4591

-+	__preempt_count_add(val);

4592

-+#ifdef CONFIG_DEBUG_PREEMPT

4593

-+	/*

4594

-+	 * Spinlock count overflowing soon?

4595

-+	 */

4596

-+	DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >=

4597

-+				PREEMPT_MASK - 10);

4598

-+#endif

4599

-+	preempt_latency_start(val);

4600

-+}

4601

-+EXPORT_SYMBOL(preempt_count_add);

4602

-+NOKPROBE_SYMBOL(preempt_count_add);

4603

-+

4604

-+/*

4605

-+ * If the value passed in equals to the current preempt count

4606

-+ * then we just enabled preemption. Stop timing the latency.

4607

-+ */

4608

-+static inline void preempt_latency_stop(int val)

4609

-+{

4610

-+	if (preempt_count() == val)

4611

-+		trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip());

4612

-+}

4613

-+

4614

-+void preempt_count_sub(int val)

4615

-+{

4616

-+#ifdef CONFIG_DEBUG_PREEMPT

4617

-+	/*

4618

-+	 * Underflow?

4619

-+	 */

4620

-+	if (DEBUG_LOCKS_WARN_ON(val > preempt_count()))

4621

-+		return;

4622

-+	/*

4623

-+	 * Is the spinlock portion underflowing?

4624

-+	 */

4625

-+	if (DEBUG_LOCKS_WARN_ON((val < PREEMPT_MASK) &&

4626

-+			!(preempt_count() & PREEMPT_MASK)))

4627

-+		return;

4628

-+#endif

4629

-+

4630

-+	preempt_latency_stop(val);

4631

-+	__preempt_count_sub(val);

4632

-+}

4633

-+EXPORT_SYMBOL(preempt_count_sub);

4634

-+NOKPROBE_SYMBOL(preempt_count_sub);

4635

-+

4636

-+#else

4637

-+static inline void preempt_latency_start(int val) { }

4638

-+static inline void preempt_latency_stop(int val) { }

4639

-+#endif

4640

-+

4641

-+static inline unsigned long get_preempt_disable_ip(struct task_struct *p)

4642

-+{

4643

-+#ifdef CONFIG_DEBUG_PREEMPT

4644

-+	return p->preempt_disable_ip;

4645

-+#else

4646

-+	return 0;

4647

-+#endif

4648

-+}

4649

-+

4650

-+/*

4651

-+ * Print scheduling while atomic bug:

4652

-+ */

4653

-+static noinline void __schedule_bug(struct task_struct *prev)

4654

-+{

4655

-+	/* Save this before calling printk(), since that will clobber it */

4656

-+	unsigned long preempt_disable_ip = get_preempt_disable_ip(current);

4657

-+

4658

-+	if (oops_in_progress)

4659

-+		return;

4660

-+

4661

-+	printk(KERN_ERR "BUG: scheduling while atomic: %s/%d/0x%08x\n",

4662

-+		prev->comm, prev->pid, preempt_count());

4663

-+

4664

-+	debug_show_held_locks(prev);

4665

-+	print_modules();

4666

-+	if (irqs_disabled())

4667

-+		print_irqtrace_events(prev);

4668

-+	if (IS_ENABLED(CONFIG_DEBUG_PREEMPT)

4669

-+	    && in_atomic_preempt_off()) {

4670

-+		pr_err("Preemption disabled at:");

4671

-+		print_ip_sym(KERN_ERR, preempt_disable_ip);

4672

-+	}

4673

-+	if (panic_on_warn)

4674

-+		panic("scheduling while atomic\n");

4675

-+

4676

-+	dump_stack();

4677

-+	add_taint(TAINT_WARN, LOCKDEP_STILL_OK);

4678

-+}

4679

-+

4680

-+/*

4681

-+ * Various schedule()-time debugging checks and statistics:

4682

-+ */

4683

-+static inline void schedule_debug(struct task_struct *prev, bool preempt)

4684

-+{

4685

-+#ifdef CONFIG_SCHED_STACK_END_CHECK

4686

-+	if (task_stack_end_corrupted(prev))

4687

-+		panic("corrupted stack end detected inside scheduler\n");

4688

-+

4689

-+	if (task_scs_end_corrupted(prev))

4690

-+		panic("corrupted shadow stack detected inside scheduler\n");

4691

-+#endif

4692

-+

4693

-+#ifdef CONFIG_DEBUG_ATOMIC_SLEEP

4694

-+	if (!preempt && READ_ONCE(prev->__state) && prev->non_block_count) {

4695

-+		printk(KERN_ERR "BUG: scheduling in a non-blocking section: %s/%d/%i\n",

4696

-+			prev->comm, prev->pid, prev->non_block_count);

4697

-+		dump_stack();

4698

-+		add_taint(TAINT_WARN, LOCKDEP_STILL_OK);

4699

-+	}

4700

-+#endif

4701

-+

4702

-+	if (unlikely(in_atomic_preempt_off())) {

4703

-+		__schedule_bug(prev);

4704

-+		preempt_count_set(PREEMPT_DISABLED);

4705

-+	}

4706

-+	rcu_sleep_check();

4707

-+	SCHED_WARN_ON(ct_state() == CONTEXT_USER);

4708

-+

4709

-+	profile_hit(SCHED_PROFILING, __builtin_return_address(0));

4710

-+

4711

-+	schedstat_inc(this_rq()->sched_count);

4712

-+}

4713

-+

4714

-+/*

4715

-+ * Compile time debug macro

4716

-+ * #define ALT_SCHED_DEBUG

4717

-+ */

4718

-+

4719

-+#ifdef ALT_SCHED_DEBUG

4720

-+void alt_sched_debug(void)

4721

-+{

4722

-+	printk(KERN_INFO "sched: pending: 0x%04lx, idle: 0x%04lx, sg_idle: 0x%04lx\n",

4723

-+	       sched_rq_pending_mask.bits[0],

4724

-+	       sched_rq_watermark[0].bits[0],

4725

-+	       sched_sg_idle_mask.bits[0]);

4726

-+}

4727

-+#else

4728

-+inline void alt_sched_debug(void) {}

4729

-+#endif

4730

-+

4731

-+#ifdef	CONFIG_SMP

4732

-+

4733

-+#define SCHED_RQ_NR_MIGRATION (32U)

4734

-+/*

4735

-+ * Migrate pending tasks in @rq to @dest_cpu

4736

-+ * Will try to migrate mininal of half of @rq nr_running tasks and

4737

-+ * SCHED_RQ_NR_MIGRATION to @dest_cpu

4738

-+ */

4739

-+static inline int

4740

-+migrate_pending_tasks(struct rq *rq, struct rq *dest_rq, const int dest_cpu)

4741

-+{

4742

-+	struct task_struct *p, *skip = rq->curr;

4743

-+	int nr_migrated = 0;

4744

-+	int nr_tries = min(rq->nr_running / 2, SCHED_RQ_NR_MIGRATION);

4745

-+

4746

-+	while (skip != rq->idle && nr_tries &&

4747

-+	       (p = sched_rq_next_task(skip, rq)) != rq->idle) {

4748

-+		skip = sched_rq_next_task(p, rq);

4749

-+		if (cpumask_test_cpu(dest_cpu, p->cpus_ptr)) {

4750

-+			__SCHED_DEQUEUE_TASK(p, rq, 0, );

4751

-+			set_task_cpu(p, dest_cpu);

4752

-+			sched_task_sanity_check(p, dest_rq);

4753

-+			__SCHED_ENQUEUE_TASK(p, dest_rq, 0);

4754

-+			nr_migrated++;

4755

-+		}

4756

-+		nr_tries--;

4757

-+	}

4758

-+

4759

-+	return nr_migrated;

4760

-+}

4761

-+

4762

-+static inline int take_other_rq_tasks(struct rq *rq, int cpu)

4763

-+{

4764

-+	struct cpumask *topo_mask, *end_mask;

4765

-+

4766

-+	if (unlikely(!rq->online))

4767

-+		return 0;

4768

-+

4769

-+	if (cpumask_empty(&sched_rq_pending_mask))

4770

-+		return 0;

4771

-+

4772

-+	topo_mask = per_cpu(sched_cpu_topo_masks, cpu) + 1;

4773

-+	end_mask = per_cpu(sched_cpu_topo_end_mask, cpu);

4774

-+	do {

4775

-+		int i;

4776

-+		for_each_cpu_and(i, &sched_rq_pending_mask, topo_mask) {

4777

-+			int nr_migrated;

4778

-+			struct rq *src_rq;

4779

-+

4780

-+			src_rq = cpu_rq(i);

4781

-+			if (!do_raw_spin_trylock(&src_rq->lock))

4782

-+				continue;

4783

-+			spin_acquire(&src_rq->lock.dep_map,

4784

-+				     SINGLE_DEPTH_NESTING, 1, _RET_IP_);

4785

-+

4786

-+			if ((nr_migrated = migrate_pending_tasks(src_rq, rq, cpu))) {

4787

-+				src_rq->nr_running -= nr_migrated;

4788

-+				if (src_rq->nr_running < 2)

4789

-+					cpumask_clear_cpu(i, &sched_rq_pending_mask);

4790

-+

4791

-+				rq->nr_running += nr_migrated;

4792

-+				if (rq->nr_running > 1)

4793

-+					cpumask_set_cpu(cpu, &sched_rq_pending_mask);

4794

-+

4795

-+				update_sched_rq_watermark(rq);

4796

-+				cpufreq_update_util(rq, 0);

4797

-+

4798

-+				spin_release(&src_rq->lock.dep_map, _RET_IP_);

4799

-+				do_raw_spin_unlock(&src_rq->lock);

4800

-+

4801

-+				return 1;

4802

-+			}

4803

-+

4804

-+			spin_release(&src_rq->lock.dep_map, _RET_IP_);

4805

-+			do_raw_spin_unlock(&src_rq->lock);

4806

-+		}

4807

-+	} while (++topo_mask < end_mask);

4808

-+

4809

-+	return 0;

4810

-+}

4811

-+#endif

4812

-+

4813

-+/*

4814

-+ * Timeslices below RESCHED_NS are considered as good as expired as there's no

4815

-+ * point rescheduling when there's so little time left.

4816

-+ */

4817

-+static inline void check_curr(struct task_struct *p, struct rq *rq)

4818

-+{

4819

-+	if (unlikely(rq->idle == p))

4820

-+		return;

4821

-+

4822

-+	update_curr(rq, p);

4823

-+

4824

-+	if (p->time_slice < RESCHED_NS)

4825

-+		time_slice_expired(p, rq);

4826

-+}

4827

-+

4828

-+static inline struct task_struct *

4829

-+choose_next_task(struct rq *rq, int cpu, struct task_struct *prev)

4830

-+{

4831

-+	struct task_struct *next;

4832

-+

4833

-+	if (unlikely(rq->skip)) {

4834

-+		next = rq_runnable_task(rq);

4835

-+		if (next == rq->idle) {

4836

-+#ifdef	CONFIG_SMP

4837

-+			if (!take_other_rq_tasks(rq, cpu)) {

4838

-+#endif

4839

-+				rq->skip = NULL;

4840

-+				schedstat_inc(rq->sched_goidle);

4841

-+				return next;

4842

-+#ifdef	CONFIG_SMP

4843

-+			}

4844

-+			next = rq_runnable_task(rq);

4845

-+#endif

4846

-+		}

4847

-+		rq->skip = NULL;

4848

-+#ifdef CONFIG_HIGH_RES_TIMERS

4849

-+		hrtick_start(rq, next->time_slice);

4850

-+#endif

4851

-+		return next;

4852

-+	}

4853

-+

4854

-+	next = sched_rq_first_task(rq);

4855

-+	if (next == rq->idle) {

4856

-+#ifdef	CONFIG_SMP

4857

-+		if (!take_other_rq_tasks(rq, cpu)) {

4858

-+#endif

4859

-+			schedstat_inc(rq->sched_goidle);

4860

-+			/*printk(KERN_INFO "sched: choose_next_task(%d) idle %px\n", cpu, next);*/

4861

-+			return next;

4862

-+#ifdef	CONFIG_SMP

4863

-+		}

4864

-+		next = sched_rq_first_task(rq);

4865

-+#endif

4866

-+	}

4867

-+#ifdef CONFIG_HIGH_RES_TIMERS

4868

-+	hrtick_start(rq, next->time_slice);

4869

-+#endif

4870

-+	/*printk(KERN_INFO "sched: choose_next_task(%d) next %px\n", cpu,

4871

-+	 * next);*/

4872

-+	return next;

4873

-+}

4874

-+

4875

-+/*

4876

-+ * schedule() is the main scheduler function.

4877

-+ *

4878

-+ * The main means of driving the scheduler and thus entering this function are:

4879

-+ *

4880

-+ *   1. Explicit blocking: mutex, semaphore, waitqueue, etc.

4881

-+ *

4882

-+ *   2. TIF_NEED_RESCHED flag is checked on interrupt and userspace return

4883

-+ *      paths. For example, see arch/x86/entry_64.S.

4884

-+ *

4885

-+ *      To drive preemption between tasks, the scheduler sets the flag in timer

4886

-+ *      interrupt handler scheduler_tick().

4887

-+ *

4888

-+ *   3. Wakeups don't really cause entry into schedule(). They add a

4889

-+ *      task to the run-queue and that's it.

4890

-+ *

4891

-+ *      Now, if the new task added to the run-queue preempts the current

4892

-+ *      task, then the wakeup sets TIF_NEED_RESCHED and schedule() gets

4893

-+ *      called on the nearest possible occasion:

4894

-+ *

4895

-+ *       - If the kernel is preemptible (CONFIG_PREEMPTION=y):

4896

-+ *

4897

-+ *         - in syscall or exception context, at the next outmost

4898

-+ *           preempt_enable(). (this might be as soon as the wake_up()'s

4899

-+ *           spin_unlock()!)

4900

-+ *

4901

-+ *         - in IRQ context, return from interrupt-handler to

4902

-+ *           preemptible context

4903

-+ *

4904

-+ *       - If the kernel is not preemptible (CONFIG_PREEMPTION is not set)

4905

-+ *         then at the next:

4906

-+ *

4907

-+ *          - cond_resched() call

4908

-+ *          - explicit schedule() call

4909

-+ *          - return from syscall or exception to user-space

4910

-+ *          - return from interrupt-handler to user-space

4911

-+ *

4912

-+ * WARNING: must be called with preemption disabled!

4913

-+ */

4914

-+static void __sched notrace __schedule(bool preempt)

4915

-+{

4916

-+	struct task_struct *prev, *next;

4917

-+	unsigned long *switch_count;

4918

-+	unsigned long prev_state;

4919

-+	struct rq *rq;

4920

-+	int cpu;

4921

-+

4922

-+	cpu = smp_processor_id();

4923

-+	rq = cpu_rq(cpu);

4924

-+	prev = rq->curr;

4925

-+

4926

-+	schedule_debug(prev, preempt);

4927

-+

4928

-+	/* by passing sched_feat(HRTICK) checking which Alt schedule FW doesn't support */

4929

-+	hrtick_clear(rq);

4930

-+

4931

-+	local_irq_disable();

4932

-+	rcu_note_context_switch(preempt);

4933

-+

4934

-+	/*

4935

-+	 * Make sure that signal_pending_state()->signal_pending() below

4936

-+	 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)

4937

-+	 * done by the caller to avoid the race with signal_wake_up():

4938

-+	 *

4939

-+	 * __set_current_state(@state)		signal_wake_up()

4940

-+	 * schedule()				  set_tsk_thread_flag(p, TIF_SIGPENDING)

4941

-+	 *					  wake_up_state(p, state)

4942

-+	 *   LOCK rq->lock			    LOCK p->pi_state

4943

-+	 *   smp_mb__after_spinlock()		    smp_mb__after_spinlock()

4944

-+	 *     if (signal_pending_state())	    if (p->state & @state)

4945

-+	 *

4946

-+	 * Also, the membarrier system call requires a full memory barrier

4947

-+	 * after coming from user-space, before storing to rq->curr.

4948

-+	 */

4949

-+	raw_spin_lock(&rq->lock);

4950

-+	smp_mb__after_spinlock();

4951

-+

4952

-+	update_rq_clock(rq);

4953

-+

4954

-+	switch_count = &prev->nivcsw;

4955

-+	/*

4956

-+	 * We must load prev->state once (task_struct::state is volatile), such

4957

-+	 * that:

4958

-+	 *

4959

-+	 *  - we form a control dependency vs deactivate_task() below.

4960

-+	 *  - ptrace_{,un}freeze_traced() can change ->state underneath us.

4961

-+	 */

4962

-+	prev_state = READ_ONCE(prev->__state);

4963

-+	if (!preempt && prev_state) {

4964

-+		if (signal_pending_state(prev_state, prev)) {

4965

-+			WRITE_ONCE(prev->__state, TASK_RUNNING);

4966

-+		} else {

4967

-+			prev->sched_contributes_to_load =

4968

-+				(prev_state & TASK_UNINTERRUPTIBLE) &&

4969

-+				!(prev_state & TASK_NOLOAD) &&

4970

-+				!(prev->flags & PF_FROZEN);

4971

-+

4972

-+			if (prev->sched_contributes_to_load)

4973

-+				rq->nr_uninterruptible++;

4974

-+

4975

-+			/*

4976

-+			 * __schedule()			ttwu()

4977

-+			 *   prev_state = prev->state;    if (p->on_rq && ...)

4978

-+			 *   if (prev_state)		    goto out;

4979

-+			 *     p->on_rq = 0;		  smp_acquire__after_ctrl_dep();

4980

-+			 *				  p->state = TASK_WAKING

4981

-+			 *

4982

-+			 * Where __schedule() and ttwu() have matching control dependencies.

4983

-+			 *

4984

-+			 * After this, schedule() must not care about p->state any more.

4985

-+			 */

4986

-+			sched_task_deactivate(prev, rq);

4987

-+			deactivate_task(prev, rq);

4988

-+

4989

-+			if (prev->in_iowait) {

4990

-+				atomic_inc(&rq->nr_iowait);

4991

-+				delayacct_blkio_start();

4992

-+			}

4993

-+		}

4994

-+		switch_count = &prev->nvcsw;

4995

-+	}

4996

-+

4997

-+	check_curr(prev, rq);

4998

-+

4999

-+	next = choose_next_task(rq, cpu, prev);

5000

-+	clear_tsk_need_resched(prev);

5001

-+	clear_preempt_need_resched();

5002

-+#ifdef CONFIG_SCHED_DEBUG

5003

-+	rq->last_seen_need_resched_ns = 0;

5004

-+#endif

5005

-+

5006

-+	if (likely(prev != next)) {

5007

-+		next->last_ran = rq->clock_task;

5008

-+		rq->last_ts_switch = rq->clock;

5009

-+

5010

-+		rq->nr_switches++;

5011

-+		/*

5012

-+		 * RCU users of rcu_dereference(rq->curr) may not see

5013

-+		 * changes to task_struct made by pick_next_task().

5014

-+		 */

5015

-+		RCU_INIT_POINTER(rq->curr, next);

5016

-+		/*

5017

-+		 * The membarrier system call requires each architecture

5018

-+		 * to have a full memory barrier after updating

5019

-+		 * rq->curr, before returning to user-space.

5020

-+		 *

5021

-+		 * Here are the schemes providing that barrier on the

5022

-+		 * various architectures:

5023

-+		 * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC.

5024

-+		 *   switch_mm() rely on membarrier_arch_switch_mm() on PowerPC.

5025

-+		 * - finish_lock_switch() for weakly-ordered

5026

-+		 *   architectures where spin_unlock is a full barrier,

5027

-+		 * - switch_to() for arm64 (weakly-ordered, spin_unlock

5028

-+		 *   is a RELEASE barrier),

5029

-+		 */

5030

-+		++*switch_count;

5031

-+

5032

-+		psi_sched_switch(prev, next, !task_on_rq_queued(prev));

5033

-+

5034

-+		trace_sched_switch(preempt, prev, next);

5035

-+

5036

-+		/* Also unlocks the rq: */

5037

-+		rq = context_switch(rq, prev, next);

5038

-+	} else {

5039

-+		__balance_callbacks(rq);

5040

-+		raw_spin_unlock_irq(&rq->lock);

5041

-+	}

5042

-+

5043

-+#ifdef CONFIG_SCHED_SMT

5044

-+	sg_balance_check(rq);

5045

-+#endif

5046

-+}

5047

-+

5048

-+void __noreturn do_task_dead(void)

5049

-+{

5050

-+	/* Causes final put_task_struct in finish_task_switch(): */

5051

-+	set_special_state(TASK_DEAD);

5052

-+

5053

-+	/* Tell freezer to ignore us: */

5054

-+	current->flags |= PF_NOFREEZE;

5055

-+

5056

-+	__schedule(false);

5057

-+	BUG();

5058

-+

5059

-+	/* Avoid "noreturn function does return" - but don't continue if BUG() is a NOP: */

5060

-+	for (;;)

5061

-+		cpu_relax();

5062

-+}

5063

-+

5064

-+static inline void sched_submit_work(struct task_struct *tsk)

5065

-+{

5066

-+	unsigned int task_flags;

5067

-+

5068

-+	if (task_is_running(tsk))

5069

-+		return;

5070

-+

5071

-+	task_flags = tsk->flags;

5072

-+	/*

5073

-+	 * If a worker went to sleep, notify and ask workqueue whether

5074

-+	 * it wants to wake up a task to maintain concurrency.

5075

-+	 * As this function is called inside the schedule() context,

5076

-+	 * we disable preemption to avoid it calling schedule() again

5077

-+	 * in the possible wakeup of a kworker and because wq_worker_sleeping()

5078

-+	 * requires it.

5079

-+	 */

5080

-+	if (task_flags & (PF_WQ_WORKER | PF_IO_WORKER)) {

5081

-+		preempt_disable();

5082

-+		if (task_flags & PF_WQ_WORKER)

5083

-+			wq_worker_sleeping(tsk);

5084

-+		else

5085

-+			io_wq_worker_sleeping(tsk);

5086

-+		preempt_enable_no_resched();

5087

-+	}

5088

-+

5089

-+	if (tsk_is_pi_blocked(tsk))

5090

-+		return;

5091

-+

5092

-+	/*

5093

-+	 * If we are going to sleep and we have plugged IO queued,

5094

-+	 * make sure to submit it to avoid deadlocks.

5095

-+	 */

5096

-+	if (blk_needs_flush_plug(tsk))

5097

-+		blk_schedule_flush_plug(tsk);

5098

-+}

5099

-+

5100

-+static void sched_update_worker(struct task_struct *tsk)

5101

-+{

5102

-+	if (tsk->flags & (PF_WQ_WORKER | PF_IO_WORKER)) {

5103

-+		if (tsk->flags & PF_WQ_WORKER)

5104

-+			wq_worker_running(tsk);

5105

-+		else

5106

-+			io_wq_worker_running(tsk);

5107

-+	}

5108

-+}

5109

-+

5110

-+asmlinkage __visible void __sched schedule(void)

5111

-+{

5112

-+	struct task_struct *tsk = current;

5113

-+

5114

-+	sched_submit_work(tsk);

5115

-+	do {

5116

-+		preempt_disable();

5117

-+		__schedule(false);

5118

-+		sched_preempt_enable_no_resched();

5119

-+	} while (need_resched());

5120

-+	sched_update_worker(tsk);

5121

-+}

5122

-+EXPORT_SYMBOL(schedule);

5123

-+

5124

-+/*

5125

-+ * synchronize_rcu_tasks() makes sure that no task is stuck in preempted

5126

-+ * state (have scheduled out non-voluntarily) by making sure that all

5127

-+ * tasks have either left the run queue or have gone into user space.

5128

-+ * As idle tasks do not do either, they must not ever be preempted

5129

-+ * (schedule out non-voluntarily).

5130

-+ *

5131

-+ * schedule_idle() is similar to schedule_preempt_disable() except that it

5132

-+ * never enables preemption because it does not call sched_submit_work().

5133

-+ */

5134

-+void __sched schedule_idle(void)

5135

-+{

5136

-+	/*

5137

-+	 * As this skips calling sched_submit_work(), which the idle task does

5138

-+	 * regardless because that function is a nop when the task is in a

5139

-+	 * TASK_RUNNING state, make sure this isn't used someplace that the

5140

-+	 * current task can be in any other state. Note, idle is always in the

5141

-+	 * TASK_RUNNING state.

5142

-+	 */

5143

-+	WARN_ON_ONCE(current->__state);

5144

-+	do {

5145

-+		__schedule(false);

5146

-+	} while (need_resched());

5147

-+}

5148

-+

5149

-+#if defined(CONFIG_CONTEXT_TRACKING) && !defined(CONFIG_HAVE_CONTEXT_TRACKING_OFFSTACK)

5150

-+asmlinkage __visible void __sched schedule_user(void)

5151

-+{

5152

-+	/*

5153

-+	 * If we come here after a random call to set_need_resched(),

5154

-+	 * or we have been woken up remotely but the IPI has not yet arrived,

5155

-+	 * we haven't yet exited the RCU idle mode. Do it here manually until

5156

-+	 * we find a better solution.

5157

-+	 *

5158

-+	 * NB: There are buggy callers of this function.  Ideally we

5159

-+	 * should warn if prev_state != CONTEXT_USER, but that will trigger

5160

-+	 * too frequently to make sense yet.

5161

-+	 */

5162

-+	enum ctx_state prev_state = exception_enter();

5163

-+	schedule();

5164

-+	exception_exit(prev_state);

5165

-+}

5166

-+#endif

5167

-+

5168

-+/**

5169

-+ * schedule_preempt_disabled - called with preemption disabled

5170

-+ *

5171

-+ * Returns with preemption disabled. Note: preempt_count must be 1

5172

-+ */

5173

-+void __sched schedule_preempt_disabled(void)

5174

-+{

5175

-+	sched_preempt_enable_no_resched();

5176

-+	schedule();

5177

-+	preempt_disable();

5178

-+}

5179

-+

5180

-+static void __sched notrace preempt_schedule_common(void)

5181

-+{

5182

-+	do {

5183

-+		/*

5184

-+		 * Because the function tracer can trace preempt_count_sub()

5185

-+		 * and it also uses preempt_enable/disable_notrace(), if

5186

-+		 * NEED_RESCHED is set, the preempt_enable_notrace() called

5187

-+		 * by the function tracer will call this function again and

5188

-+		 * cause infinite recursion.

5189

-+		 *

5190

-+		 * Preemption must be disabled here before the function

5191

-+		 * tracer can trace. Break up preempt_disable() into two

5192

-+		 * calls. One to disable preemption without fear of being

5193

-+		 * traced. The other to still record the preemption latency,

5194

-+		 * which can also be traced by the function tracer.

5195

-+		 */

5196

-+		preempt_disable_notrace();

5197

-+		preempt_latency_start(1);

5198

-+		__schedule(true);

5199

-+		preempt_latency_stop(1);

5200

-+		preempt_enable_no_resched_notrace();

5201

-+

5202

-+		/*

5203

-+		 * Check again in case we missed a preemption opportunity

5204

-+		 * between schedule and now.

5205

-+		 */

5206

-+	} while (need_resched());

5207

-+}

5208

-+

5209

-+#ifdef CONFIG_PREEMPTION

5210

-+/*

5211

-+ * This is the entry point to schedule() from in-kernel preemption

5212

-+ * off of preempt_enable.

5213

-+ */

5214

-+asmlinkage __visible void __sched notrace preempt_schedule(void)

5215

-+{

5216

-+	/*

5217

-+	 * If there is a non-zero preempt_count or interrupts are disabled,

5218

-+	 * we do not want to preempt the current task. Just return..

5219

-+	 */

5220

-+	if (likely(!preemptible()))

5221

-+		return;

5222

-+

5223

-+	preempt_schedule_common();

5224

-+}

5225

-+NOKPROBE_SYMBOL(preempt_schedule);

5226

-+EXPORT_SYMBOL(preempt_schedule);

5227

-+

5228

-+#ifdef CONFIG_PREEMPT_DYNAMIC

5229

-+DEFINE_STATIC_CALL(preempt_schedule, __preempt_schedule_func);

5230

-+EXPORT_STATIC_CALL_TRAMP(preempt_schedule);

5231

-+#endif

5232

-+

5233

-+

5234

-+/**

5235

-+ * preempt_schedule_notrace - preempt_schedule called by tracing

5236

-+ *

5237

-+ * The tracing infrastructure uses preempt_enable_notrace to prevent

5238

-+ * recursion and tracing preempt enabling caused by the tracing

5239

-+ * infrastructure itself. But as tracing can happen in areas coming

5240

-+ * from userspace or just about to enter userspace, a preempt enable

5241

-+ * can occur before user_exit() is called. This will cause the scheduler

5242

-+ * to be called when the system is still in usermode.

5243

-+ *

5244

-+ * To prevent this, the preempt_enable_notrace will use this function

5245

-+ * instead of preempt_schedule() to exit user context if needed before

5246

-+ * calling the scheduler.

5247

-+ */

5248

-+asmlinkage __visible void __sched notrace preempt_schedule_notrace(void)

5249

-+{

5250

-+	enum ctx_state prev_ctx;

5251

-+

5252

-+	if (likely(!preemptible()))

5253

-+		return;

5254

-+

5255

-+	do {

5256

-+		/*

5257

-+		 * Because the function tracer can trace preempt_count_sub()

5258

-+		 * and it also uses preempt_enable/disable_notrace(), if

5259

-+		 * NEED_RESCHED is set, the preempt_enable_notrace() called

5260

-+		 * by the function tracer will call this function again and

5261

-+		 * cause infinite recursion.

5262

-+		 *

5263

-+		 * Preemption must be disabled here before the function

5264

-+		 * tracer can trace. Break up preempt_disable() into two

5265

-+		 * calls. One to disable preemption without fear of being

5266

-+		 * traced. The other to still record the preemption latency,

5267

-+		 * which can also be traced by the function tracer.

5268

-+		 */

5269

-+		preempt_disable_notrace();

5270

-+		preempt_latency_start(1);

5271

-+		/*

5272

-+		 * Needs preempt disabled in case user_exit() is traced

5273

-+		 * and the tracer calls preempt_enable_notrace() causing

5274

-+		 * an infinite recursion.

5275

-+		 */

5276

-+		prev_ctx = exception_enter();

5277

-+		__schedule(true);

5278

-+		exception_exit(prev_ctx);

5279

-+

5280

-+		preempt_latency_stop(1);

5281

-+		preempt_enable_no_resched_notrace();

5282

-+	} while (need_resched());

5283

-+}

5284

-+EXPORT_SYMBOL_GPL(preempt_schedule_notrace);

5285

-+

5286

-+#ifdef CONFIG_PREEMPT_DYNAMIC

5287

-+DEFINE_STATIC_CALL(preempt_schedule_notrace, __preempt_schedule_notrace_func);

5288

-+EXPORT_STATIC_CALL_TRAMP(preempt_schedule_notrace);

5289

-+#endif

5290

-+

5291

-+#endif /* CONFIG_PREEMPTION */

5292

-+

5293

-+#ifdef CONFIG_PREEMPT_DYNAMIC

5294

-+

5295

-+#include <linux/entry-common.h>

5296

-+

5297

-+/*

5298

-+ * SC:cond_resched

5299

-+ * SC:might_resched

5300

-+ * SC:preempt_schedule

5301

-+ * SC:preempt_schedule_notrace

5302

-+ * SC:irqentry_exit_cond_resched

5303

-+ *

5304

-+ *

5305

-+ * NONE:

5306

-+ *   cond_resched               <- __cond_resched

5307

-+ *   might_resched              <- RET0

5308

-+ *   preempt_schedule           <- NOP

5309

-+ *   preempt_schedule_notrace   <- NOP

5310

-+ *   irqentry_exit_cond_resched <- NOP

5311

-+ *

5312

-+ * VOLUNTARY:

5313

-+ *   cond_resched               <- __cond_resched

5314

-+ *   might_resched              <- __cond_resched

5315

-+ *   preempt_schedule           <- NOP

5316

-+ *   preempt_schedule_notrace   <- NOP

5317

-+ *   irqentry_exit_cond_resched <- NOP

5318

-+ *

5319

-+ * FULL:

5320

-+ *   cond_resched               <- RET0

5321

-+ *   might_resched              <- RET0

5322

-+ *   preempt_schedule           <- preempt_schedule

5323

-+ *   preempt_schedule_notrace   <- preempt_schedule_notrace

5324

-+ *   irqentry_exit_cond_resched <- irqentry_exit_cond_resched

5325

-+ */

5326

-+

5327

-+enum {

5328

-+	preempt_dynamic_none = 0,

5329

-+	preempt_dynamic_voluntary,

5330

-+	preempt_dynamic_full,

5331

-+};

5332

-+

5333

-+int preempt_dynamic_mode = preempt_dynamic_full;

5334

-+

5335

-+int sched_dynamic_mode(const char *str)

5336

-+{

5337

-+	if (!strcmp(str, "none"))

5338

-+		return preempt_dynamic_none;

5339

-+

5340

-+	if (!strcmp(str, "voluntary"))

5341

-+		return preempt_dynamic_voluntary;

5342

-+

5343

-+	if (!strcmp(str, "full"))

5344

-+		return preempt_dynamic_full;

5345

-+

5346

-+	return -EINVAL;

5347

-+}

5348

-+

5349

-+void sched_dynamic_update(int mode)

5350

-+{

5351

-+	/*

5352

-+	 * Avoid {NONE,VOLUNTARY} -> FULL transitions from ever ending up in

5353

-+	 * the ZERO state, which is invalid.

5354

-+	 */

5355

-+	static_call_update(cond_resched, __cond_resched);

5356

-+	static_call_update(might_resched, __cond_resched);

5357

-+	static_call_update(preempt_schedule, __preempt_schedule_func);

5358

-+	static_call_update(preempt_schedule_notrace, __preempt_schedule_notrace_func);

5359

-+	static_call_update(irqentry_exit_cond_resched, irqentry_exit_cond_resched);

5360

-+

5361

-+	switch (mode) {

5362

-+	case preempt_dynamic_none:

5363

-+		static_call_update(cond_resched, __cond_resched);

5364

-+		static_call_update(might_resched, (void *)&__static_call_return0);

5365

-+		static_call_update(preempt_schedule, NULL);

5366

-+		static_call_update(preempt_schedule_notrace, NULL);

5367

-+		static_call_update(irqentry_exit_cond_resched, NULL);

5368

-+		pr_info("Dynamic Preempt: none\n");

5369

-+		break;

5370

-+

5371

-+	case preempt_dynamic_voluntary:

5372

-+		static_call_update(cond_resched, __cond_resched);

5373

-+		static_call_update(might_resched, __cond_resched);

5374

-+		static_call_update(preempt_schedule, NULL);

5375

-+		static_call_update(preempt_schedule_notrace, NULL);

5376

-+		static_call_update(irqentry_exit_cond_resched, NULL);

5377

-+		pr_info("Dynamic Preempt: voluntary\n");

5378

-+		break;

5379

-+

5380

-+	case preempt_dynamic_full:

5381

-+		static_call_update(cond_resched, (void *)&__static_call_return0);

5382

-+		static_call_update(might_resched, (void *)&__static_call_return0);

5383

-+		static_call_update(preempt_schedule, __preempt_schedule_func);

5384

-+		static_call_update(preempt_schedule_notrace, __preempt_schedule_notrace_func);

5385

-+		static_call_update(irqentry_exit_cond_resched, irqentry_exit_cond_resched);

5386

-+		pr_info("Dynamic Preempt: full\n");

5387

-+		break;

5388

-+	}

5389

-+

5390

-+	preempt_dynamic_mode = mode;

5391

-+}

5392

-+

5393

-+static int __init setup_preempt_mode(char *str)

5394

-+{

5395

-+	int mode = sched_dynamic_mode(str);

5396

-+	if (mode < 0) {

5397

-+		pr_warn("Dynamic Preempt: unsupported mode: %s\n", str);

5398

-+		return 1;

5399

-+	}

5400

-+

5401

-+	sched_dynamic_update(mode);

5402

-+	return 0;

5403

-+}

5404

-+__setup("preempt=", setup_preempt_mode);

5405

-+

5406

-+#endif /* CONFIG_PREEMPT_DYNAMIC */

5407

-+

5408

-+/*

5409

-+ * This is the entry point to schedule() from kernel preemption

5410

-+ * off of irq context.

5411

-+ * Note, that this is called and return with irqs disabled. This will

5412

-+ * protect us against recursive calling from irq.

5413

-+ */

5414

-+asmlinkage __visible void __sched preempt_schedule_irq(void)

5415

-+{

5416

-+	enum ctx_state prev_state;

5417

-+

5418

-+	/* Catch callers which need to be fixed */

5419

-+	BUG_ON(preempt_count() || !irqs_disabled());

5420

-+

5421

-+	prev_state = exception_enter();

5422

-+

5423

-+	do {

5424

-+		preempt_disable();

5425

-+		local_irq_enable();

5426

-+		__schedule(true);

5427

-+		local_irq_disable();

5428

-+		sched_preempt_enable_no_resched();

5429

-+	} while (need_resched());

5430

-+

5431

-+	exception_exit(prev_state);

5432

-+}

5433

-+

5434

-+int default_wake_function(wait_queue_entry_t *curr, unsigned mode, int wake_flags,

5435

-+			  void *key)

5436

-+{

5437

-+	WARN_ON_ONCE(IS_ENABLED(CONFIG_SCHED_DEBUG) && wake_flags & ~WF_SYNC);

5438

-+	return try_to_wake_up(curr->private, mode, wake_flags);

5439

-+}

5440

-+EXPORT_SYMBOL(default_wake_function);

5441

-+

5442

-+static inline void check_task_changed(struct task_struct *p, struct rq *rq)

5443

-+{

5444

-+	/* Trigger resched if task sched_prio has been modified. */

5445

-+	if (task_on_rq_queued(p) && task_sched_prio_idx(p, rq) != p->sq_idx) {

5446

-+		requeue_task(p, rq);

5447

-+		check_preempt_curr(rq);

5448

-+	}

5449

-+}

5450

-+

5451

-+static void __setscheduler_prio(struct task_struct *p, int prio)

5452

-+{

5453

-+	p->prio = prio;

5454

-+}

5455

-+

5456

-+#ifdef CONFIG_RT_MUTEXES

5457

-+

5458

-+static inline int __rt_effective_prio(struct task_struct *pi_task, int prio)

5459

-+{

5460

-+	if (pi_task)

5461

-+		prio = min(prio, pi_task->prio);

5462

-+

5463

-+	return prio;

5464

-+}

5465

-+

5466

-+static inline int rt_effective_prio(struct task_struct *p, int prio)

5467

-+{

5468

-+	struct task_struct *pi_task = rt_mutex_get_top_task(p);

5469

-+

5470

-+	return __rt_effective_prio(pi_task, prio);

5471

-+}

5472

-+

5473

-+/*

5474

-+ * rt_mutex_setprio - set the current priority of a task

5475

-+ * @p: task to boost

5476

-+ * @pi_task: donor task

5477

-+ *

5478

-+ * This function changes the 'effective' priority of a task. It does

5479

-+ * not touch ->normal_prio like __setscheduler().

5480

-+ *

5481

-+ * Used by the rt_mutex code to implement priority inheritance

5482

-+ * logic. Call site only calls if the priority of the task changed.

5483

-+ */

5484

-+void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task)

5485

-+{

5486

-+	int prio;

5487

-+	struct rq *rq;

5488

-+	raw_spinlock_t *lock;

5489

-+

5490

-+	/* XXX used to be waiter->prio, not waiter->task->prio */

5491

-+	prio = __rt_effective_prio(pi_task, p->normal_prio);

5492

-+

5493

-+	/*

5494

-+	 * If nothing changed; bail early.

5495

-+	 */

5496

-+	if (p->pi_top_task == pi_task && prio == p->prio)

5497

-+		return;

5498

-+

5499

-+	rq = __task_access_lock(p, &lock);

5500

-+	/*

5501

-+	 * Set under pi_lock && rq->lock, such that the value can be used under

5502

-+	 * either lock.

5503

-+	 *

5504

-+	 * Note that there is loads of tricky to make this pointer cache work

5505

-+	 * right. rt_mutex_slowunlock()+rt_mutex_postunlock() work together to

5506

-+	 * ensure a task is de-boosted (pi_task is set to NULL) before the

5507

-+	 * task is allowed to run again (and can exit). This ensures the pointer

5508

-+	 * points to a blocked task -- which guarantees the task is present.

5509

-+	 */

5510

-+	p->pi_top_task = pi_task;

5511

-+

5512

-+	/*

5513

-+	 * For FIFO/RR we only need to set prio, if that matches we're done.

5514

-+	 */

5515

-+	if (prio == p->prio)

5516

-+		goto out_unlock;

5517

-+

5518

-+	/*

5519

-+	 * Idle task boosting is a nono in general. There is one

5520

-+	 * exception, when PREEMPT_RT and NOHZ is active:

5521

-+	 *

5522

-+	 * The idle task calls get_next_timer_interrupt() and holds

5523

-+	 * the timer wheel base->lock on the CPU and another CPU wants

5524

-+	 * to access the timer (probably to cancel it). We can safely

5525

-+	 * ignore the boosting request, as the idle CPU runs this code

5526

-+	 * with interrupts disabled and will complete the lock

5527

-+	 * protected section without being interrupted. So there is no

5528

-+	 * real need to boost.

5529

-+	 */

5530

-+	if (unlikely(p == rq->idle)) {

5531

-+		WARN_ON(p != rq->curr);

5532

-+		WARN_ON(p->pi_blocked_on);

5533

-+		goto out_unlock;

5534

-+	}

5535

-+

5536

-+	trace_sched_pi_setprio(p, pi_task);

5537

-+

5538

-+	__setscheduler_prio(p, prio);

5539

-+

5540

-+	check_task_changed(p, rq);

5541

-+out_unlock:

5542

-+	/* Avoid rq from going away on us: */

5543

-+	preempt_disable();

5544

-+

5545

-+	__balance_callbacks(rq);

5546

-+	__task_access_unlock(p, lock);

5547

-+

5548

-+	preempt_enable();

5549

-+}

5550

-+#else

5551

-+static inline int rt_effective_prio(struct task_struct *p, int prio)

5552

-+{

5553

-+	return prio;

5554

-+}

5555

-+#endif

5556

-+

5557

-+void set_user_nice(struct task_struct *p, long nice)

5558

-+{

5559

-+	unsigned long flags;

5560

-+	struct rq *rq;

5561

-+	raw_spinlock_t *lock;

5562

-+

5563

-+	if (task_nice(p) == nice || nice < MIN_NICE || nice > MAX_NICE)

5564

-+		return;

5565

-+	/*

5566

-+	 * We have to be careful, if called from sys_setpriority(),

5567

-+	 * the task might be in the middle of scheduling on another CPU.

5568

-+	 */

5569

-+	raw_spin_lock_irqsave(&p->pi_lock, flags);

5570

-+	rq = __task_access_lock(p, &lock);

5571

-+

5572

-+	p->static_prio = NICE_TO_PRIO(nice);

5573

-+	/*

5574

-+	 * The RT priorities are set via sched_setscheduler(), but we still

5575

-+	 * allow the 'normal' nice value to be set - but as expected

5576

-+	 * it won't have any effect on scheduling until the task is

5577

-+	 * not SCHED_NORMAL/SCHED_BATCH:

5578

-+	 */

5579

-+	if (task_has_rt_policy(p))

5580

-+		goto out_unlock;

5581

-+

5582

-+	p->prio = effective_prio(p);

5583

-+

5584

-+	check_task_changed(p, rq);

5585

-+out_unlock:

5586

-+	__task_access_unlock(p, lock);

5587

-+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);

5588

-+}

5589

-+EXPORT_SYMBOL(set_user_nice);

5590

-+

5591

-+/*

5592

-+ * can_nice - check if a task can reduce its nice value

5593

-+ * @p: task

5594

-+ * @nice: nice value

5595

-+ */

5596

-+int can_nice(const struct task_struct *p, const int nice)

5597

-+{

5598

-+	/* Convert nice value [19,-20] to rlimit style value [1,40] */

5599

-+	int nice_rlim = nice_to_rlimit(nice);

5600

-+

5601

-+	return (nice_rlim <= task_rlimit(p, RLIMIT_NICE) ||

5602

-+		capable(CAP_SYS_NICE));

5603

-+}

5604

-+

5605

-+#ifdef __ARCH_WANT_SYS_NICE

5606

-+

5607

-+/*

5608

-+ * sys_nice - change the priority of the current process.

5609

-+ * @increment: priority increment

5610

-+ *

5611

-+ * sys_setpriority is a more generic, but much slower function that

5612

-+ * does similar things.

5613

-+ */

5614

-+SYSCALL_DEFINE1(nice, int, increment)

5615

-+{

5616

-+	long nice, retval;

5617

-+

5618

-+	/*

5619

-+	 * Setpriority might change our priority at the same moment.

5620

-+	 * We don't have to worry. Conceptually one call occurs first

5621

-+	 * and we have a single winner.

5622

-+	 */

5623

-+

5624

-+	increment = clamp(increment, -NICE_WIDTH, NICE_WIDTH);

5625

-+	nice = task_nice(current) + increment;

5626

-+

5627

-+	nice = clamp_val(nice, MIN_NICE, MAX_NICE);

5628

-+	if (increment < 0 && !can_nice(current, nice))

5629

-+		return -EPERM;

5630

-+

5631

-+	retval = security_task_setnice(current, nice);

5632

-+	if (retval)

5633

-+		return retval;

5634

-+

5635

-+	set_user_nice(current, nice);

5636

-+	return 0;

5637

-+}

5638

-+

5639

-+#endif

5640

-+

5641

-+/**

5642

-+ * task_prio - return the priority value of a given task.

5643

-+ * @p: the task in question.

5644

-+ *

5645

-+ * Return: The priority value as seen by users in /proc.

5646

-+ *

5647

-+ * sched policy         return value   kernel prio    user prio/nice

5648

-+ *

5649

-+ * (BMQ)normal, batch, idle[0 ... 53]  [100 ... 139]          0/[-20 ... 19]/[-7 ... 7]

5650

-+ * (PDS)normal, batch, idle[0 ... 39]            100          0/[-20 ... 19]

5651

-+ * fifo, rr             [-1 ... -100]     [99 ... 0]  [0 ... 99]

5652

-+ */

5653

-+int task_prio(const struct task_struct *p)

5654

-+{

5655

-+	return (p->prio < MAX_RT_PRIO) ? p->prio - MAX_RT_PRIO :

5656

-+		task_sched_prio_normal(p, task_rq(p));

5657

-+}

5658

-+

5659

-+/**

5660

-+ * idle_cpu - is a given CPU idle currently?

5661

-+ * @cpu: the processor in question.

5662

-+ *

5663

-+ * Return: 1 if the CPU is currently idle. 0 otherwise.

5664

-+ */

5665

-+int idle_cpu(int cpu)

5666

-+{

5667

-+	struct rq *rq = cpu_rq(cpu);

5668

-+

5669

-+	if (rq->curr != rq->idle)

5670

-+		return 0;

5671

-+

5672

-+	if (rq->nr_running)

5673

-+		return 0;

5674

-+

5675

-+#ifdef CONFIG_SMP

5676

-+	if (rq->ttwu_pending)

5677

-+		return 0;

5678

-+#endif

5679

-+

5680

-+	return 1;

5681

-+}

5682

-+

5683

-+/**

5684

-+ * idle_task - return the idle task for a given CPU.

5685

-+ * @cpu: the processor in question.

5686

-+ *

5687

-+ * Return: The idle task for the cpu @cpu.

5688

-+ */

5689

-+struct task_struct *idle_task(int cpu)

5690

-+{

5691

-+	return cpu_rq(cpu)->idle;

5692

-+}

5693

-+

5694

-+/**

5695

-+ * find_process_by_pid - find a process with a matching PID value.

5696

-+ * @pid: the pid in question.

5697

-+ *

5698

-+ * The task of @pid, if found. %NULL otherwise.

5699

-+ */

5700

-+static inline struct task_struct *find_process_by_pid(pid_t pid)

5701

-+{

5702

-+	return pid ? find_task_by_vpid(pid) : current;

5703

-+}

5704

-+

5705

-+/*

5706

-+ * sched_setparam() passes in -1 for its policy, to let the functions

5707

-+ * it calls know not to change it.

5708

-+ */

5709

-+#define SETPARAM_POLICY -1

5710

-+

5711

-+static void __setscheduler_params(struct task_struct *p,

5712

-+		const struct sched_attr *attr)

5713

-+{

5714

-+	int policy = attr->sched_policy;

5715

-+

5716

-+	if (policy == SETPARAM_POLICY)

5717

-+		policy = p->policy;

5718

-+

5719

-+	p->policy = policy;

5720

-+

5721

-+	/*

5722

-+	 * allow normal nice value to be set, but will not have any

5723

-+	 * effect on scheduling until the task not SCHED_NORMAL/

5724

-+	 * SCHED_BATCH

5725

-+	 */

5726

-+	p->static_prio = NICE_TO_PRIO(attr->sched_nice);

5727

-+

5728

-+	/*

5729

-+	 * __sched_setscheduler() ensures attr->sched_priority == 0 when

5730

-+	 * !rt_policy. Always setting this ensures that things like

5731

-+	 * getparam()/getattr() don't report silly values for !rt tasks.

5732

-+	 */

5733

-+	p->rt_priority = attr->sched_priority;

5734

-+	p->normal_prio = normal_prio(p);

5735

-+}

5736

-+

5737

-+/*

5738

-+ * check the target process has a UID that matches the current process's

5739

-+ */

5740

-+static bool check_same_owner(struct task_struct *p)

5741

-+{

5742

-+	const struct cred *cred = current_cred(), *pcred;

5743

-+	bool match;

5744

-+

5745

-+	rcu_read_lock();

5746

-+	pcred = __task_cred(p);

5747

-+	match = (uid_eq(cred->euid, pcred->euid) ||

5748

-+		 uid_eq(cred->euid, pcred->uid));

5749

-+	rcu_read_unlock();

5750

-+	return match;

5751

-+}

5752

-+

5753

-+static int __sched_setscheduler(struct task_struct *p,

5754

-+				const struct sched_attr *attr,

5755

-+				bool user, bool pi)

5756

-+{

5757

-+	const struct sched_attr dl_squash_attr = {

5758

-+		.size		= sizeof(struct sched_attr),

5759

-+		.sched_policy	= SCHED_FIFO,

5760

-+		.sched_nice	= 0,

5761

-+		.sched_priority = 99,

5762

-+	};

5763

-+	int oldpolicy = -1, policy = attr->sched_policy;

5764

-+	int retval, newprio;

5765

-+	struct callback_head *head;

5766

-+	unsigned long flags;

5767

-+	struct rq *rq;

5768

-+	int reset_on_fork;

5769

-+	raw_spinlock_t *lock;

5770

-+

5771

-+	/* The pi code expects interrupts enabled */

5772

-+	BUG_ON(pi && in_interrupt());

5773

-+

5774

-+	/*

5775

-+	 * Alt schedule FW supports SCHED_DEADLINE by squash it as prio 0 SCHED_FIFO

5776

-+	 */

5777

-+	if (unlikely(SCHED_DEADLINE == policy)) {

5778

-+		attr = &dl_squash_attr;

5779

-+		policy = attr->sched_policy;

5780

-+	}

5781

-+recheck:

5782

-+	/* Double check policy once rq lock held */

5783

-+	if (policy < 0) {

5784

-+		reset_on_fork = p->sched_reset_on_fork;

5785

-+		policy = oldpolicy = p->policy;

5786

-+	} else {

5787

-+		reset_on_fork = !!(attr->sched_flags & SCHED_RESET_ON_FORK);

5788

-+

5789

-+		if (policy > SCHED_IDLE)

5790

-+			return -EINVAL;

5791

-+	}

5792

-+

5793

-+	if (attr->sched_flags & ~(SCHED_FLAG_ALL))

5794

-+		return -EINVAL;

5795

-+

5796

-+	/*

5797

-+	 * Valid priorities for SCHED_FIFO and SCHED_RR are

5798

-+	 * 1..MAX_RT_PRIO-1, valid priority for SCHED_NORMAL and

5799

-+	 * SCHED_BATCH and SCHED_IDLE is 0.

5800

-+	 */

5801

-+	if (attr->sched_priority < 0 ||

5802

-+	    (p->mm && attr->sched_priority > MAX_RT_PRIO - 1) ||

5803

-+	    (!p->mm && attr->sched_priority > MAX_RT_PRIO - 1))

5804

-+		return -EINVAL;

5805

-+	if ((SCHED_RR == policy || SCHED_FIFO == policy) !=

5806

-+	    (attr->sched_priority != 0))

5807

-+		return -EINVAL;

5808

-+

5809

-+	/*

5810

-+	 * Allow unprivileged RT tasks to decrease priority:

5811

-+	 */

5812

-+	if (user && !capable(CAP_SYS_NICE)) {

5813

-+		if (SCHED_FIFO == policy || SCHED_RR == policy) {

5814

-+			unsigned long rlim_rtprio =

5815

-+					task_rlimit(p, RLIMIT_RTPRIO);

5816

-+

5817

-+			/* Can't set/change the rt policy */

5818

-+			if (policy != p->policy && !rlim_rtprio)

5819

-+				return -EPERM;

5820

-+

5821

-+			/* Can't increase priority */

5822

-+			if (attr->sched_priority > p->rt_priority &&

5823

-+			    attr->sched_priority > rlim_rtprio)

5824

-+				return -EPERM;

5825

-+		}

5826

-+

5827

-+		/* Can't change other user's priorities */

5828

-+		if (!check_same_owner(p))

5829

-+			return -EPERM;

5830

-+

5831

-+		/* Normal users shall not reset the sched_reset_on_fork flag */

5832

-+		if (p->sched_reset_on_fork && !reset_on_fork)

5833

-+			return -EPERM;

5834

-+	}

5835

-+

5836

-+	if (user) {

5837

-+		retval = security_task_setscheduler(p);

5838

-+		if (retval)

5839

-+			return retval;

5840

-+	}

5841

-+

5842

-+	if (pi)

5843

-+		cpuset_read_lock();

5844

-+

5845

-+	/*

5846

-+	 * Make sure no PI-waiters arrive (or leave) while we are

5847

-+	 * changing the priority of the task:

5848

-+	 */

5849

-+	raw_spin_lock_irqsave(&p->pi_lock, flags);

5850

-+

5851

-+	/*

5852

-+	 * To be able to change p->policy safely, task_access_lock()

5853

-+	 * must be called.

5854

-+	 * IF use task_access_lock() here:

5855

-+	 * For the task p which is not running, reading rq->stop is

5856

-+	 * racy but acceptable as ->stop doesn't change much.

5857

-+	 * An enhancemnet can be made to read rq->stop saftly.

5858

-+	 */

5859

-+	rq = __task_access_lock(p, &lock);

5860

-+

5861

-+	/*

5862

-+	 * Changing the policy of the stop threads its a very bad idea

5863

-+	 */

5864

-+	if (p == rq->stop) {

5865

-+		retval = -EINVAL;

5866

-+		goto unlock;

5867

-+	}

5868

-+

5869

-+	/*

5870

-+	 * If not changing anything there's no need to proceed further:

5871

-+	 */

5872

-+	if (unlikely(policy == p->policy)) {

5873

-+		if (rt_policy(policy) && attr->sched_priority != p->rt_priority)

5874

-+			goto change;

5875

-+		if (!rt_policy(policy) &&

5876

-+		    NICE_TO_PRIO(attr->sched_nice) != p->static_prio)

5877

-+			goto change;

5878

-+

5879

-+		p->sched_reset_on_fork = reset_on_fork;

5880

-+		retval = 0;

5881

-+		goto unlock;

5882

-+	}

5883

-+change:

5884

-+

5885

-+	/* Re-check policy now with rq lock held */

5886

-+	if (unlikely(oldpolicy != -1 && oldpolicy != p->policy)) {

5887

-+		policy = oldpolicy = -1;

5888

-+		__task_access_unlock(p, lock);

5889

-+		raw_spin_unlock_irqrestore(&p->pi_lock, flags);

5890

-+		if (pi)

5891

-+			cpuset_read_unlock();

5892

-+		goto recheck;

5893

-+	}

5894

-+

5895

-+	p->sched_reset_on_fork = reset_on_fork;

5896

-+

5897

-+	newprio = __normal_prio(policy, attr->sched_priority, NICE_TO_PRIO(attr->sched_nice));

5898

-+	if (pi) {

5899

-+		/*

5900

-+		 * Take priority boosted tasks into account. If the new

5901

-+		 * effective priority is unchanged, we just store the new

5902

-+		 * normal parameters and do not touch the scheduler class and

5903

-+		 * the runqueue. This will be done when the task deboost

5904

-+		 * itself.

5905

-+		 */

5906

-+		newprio = rt_effective_prio(p, newprio);

5907

-+	}

5908

-+

5909

-+	if (!(attr->sched_flags & SCHED_FLAG_KEEP_PARAMS)) {

5910

-+		__setscheduler_params(p, attr);

5911

-+		__setscheduler_prio(p, newprio);

5912

-+	}

5913

-+

5914

-+	check_task_changed(p, rq);

5915

-+

5916

-+	/* Avoid rq from going away on us: */

5917

-+	preempt_disable();

5918

-+	head = splice_balance_callbacks(rq);

5919

-+	__task_access_unlock(p, lock);

5920

-+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);

5921

-+

5922

-+	if (pi) {

5923

-+		cpuset_read_unlock();

5924

-+		rt_mutex_adjust_pi(p);

5925

-+	}

5926

-+

5927

-+	/* Run balance callbacks after we've adjusted the PI chain: */

5928

-+	balance_callbacks(rq, head);

5929

-+	preempt_enable();

5930

-+

5931

-+	return 0;

5932

-+

5933

-+unlock:

5934

-+	__task_access_unlock(p, lock);

5935

-+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);

5936

-+	if (pi)

5937

-+		cpuset_read_unlock();

5938

-+	return retval;

5939

-+}

5940

-+

5941

-+static int _sched_setscheduler(struct task_struct *p, int policy,

5942

-+			       const struct sched_param *param, bool check)

5943

-+{

5944

-+	struct sched_attr attr = {

5945

-+		.sched_policy   = policy,

5946

-+		.sched_priority = param->sched_priority,

5947

-+		.sched_nice     = PRIO_TO_NICE(p->static_prio),

5948

-+	};

5949

-+

5950

-+	/* Fixup the legacy SCHED_RESET_ON_FORK hack. */

5951

-+	if ((policy != SETPARAM_POLICY) && (policy & SCHED_RESET_ON_FORK)) {

5952

-+		attr.sched_flags |= SCHED_FLAG_RESET_ON_FORK;

5953

-+		policy &= ~SCHED_RESET_ON_FORK;

5954

-+		attr.sched_policy = policy;

5955

-+	}

5956

-+

5957

-+	return __sched_setscheduler(p, &attr, check, true);

5958

-+}

5959

-+

5960

-+/**

5961

-+ * sched_setscheduler - change the scheduling policy and/or RT priority of a thread.

5962

-+ * @p: the task in question.

5963

-+ * @policy: new policy.

5964

-+ * @param: structure containing the new RT priority.

5965

-+ *

5966

-+ * Use sched_set_fifo(), read its comment.

5967

-+ *

5968

-+ * Return: 0 on success. An error code otherwise.

5969

-+ *

5970

-+ * NOTE that the task may be already dead.

5971

-+ */

5972

-+int sched_setscheduler(struct task_struct *p, int policy,

5973

-+		       const struct sched_param *param)

5974

-+{

5975

-+	return _sched_setscheduler(p, policy, param, true);

5976

-+}

5977

-+

5978

-+int sched_setattr(struct task_struct *p, const struct sched_attr *attr)

5979

-+{

5980

-+	return __sched_setscheduler(p, attr, true, true);

5981

-+}

5982

-+

5983

-+int sched_setattr_nocheck(struct task_struct *p, const struct sched_attr *attr)

5984

-+{

5985

-+	return __sched_setscheduler(p, attr, false, true);

5986

-+}

5987

-+EXPORT_SYMBOL_GPL(sched_setattr_nocheck);

5988

-+

5989

-+/**

5990

-+ * sched_setscheduler_nocheck - change the scheduling policy and/or RT priority of a thread from kernelspace.

5991

-+ * @p: the task in question.

5992

-+ * @policy: new policy.

5993

-+ * @param: structure containing the new RT priority.

5994

-+ *

5995

-+ * Just like sched_setscheduler, only don't bother checking if the

5996

-+ * current context has permission.  For example, this is needed in

5997

-+ * stop_machine(): we create temporary high priority worker threads,

5998

-+ * but our caller might not have that capability.

5999

-+ *

6000

-+ * Return: 0 on success. An error code otherwise.

6001

-+ */

6002

-+int sched_setscheduler_nocheck(struct task_struct *p, int policy,

6003

-+			       const struct sched_param *param)

6004

-+{

6005

-+	return _sched_setscheduler(p, policy, param, false);

6006

-+}

6007

-+

6008

-+/*

6009

-+ * SCHED_FIFO is a broken scheduler model; that is, it is fundamentally

6010

-+ * incapable of resource management, which is the one thing an OS really should

6011

-+ * be doing.

6012

-+ *

6013

-+ * This is of course the reason it is limited to privileged users only.

6014

-+ *

6015

-+ * Worse still; it is fundamentally impossible to compose static priority

6016

-+ * workloads. You cannot take two correctly working static prio workloads

6017

-+ * and smash them together and still expect them to work.

6018

-+ *

6019

-+ * For this reason 'all' FIFO tasks the kernel creates are basically at:

6020

-+ *

6021

-+ *   MAX_RT_PRIO / 2

6022

-+ *

6023

-+ * The administrator _MUST_ configure the system, the kernel simply doesn't

6024

-+ * know enough information to make a sensible choice.

6025

-+ */

6026

-+void sched_set_fifo(struct task_struct *p)

6027

-+{

6028

-+	struct sched_param sp = { .sched_priority = MAX_RT_PRIO / 2 };

6029

-+	WARN_ON_ONCE(sched_setscheduler_nocheck(p, SCHED_FIFO, &sp) != 0);

6030

-+}

6031

-+EXPORT_SYMBOL_GPL(sched_set_fifo);

6032

-+

6033

-+/*

6034

-+ * For when you don't much care about FIFO, but want to be above SCHED_NORMAL.

6035

-+ */

6036

-+void sched_set_fifo_low(struct task_struct *p)

6037

-+{

6038

-+	struct sched_param sp = { .sched_priority = 1 };

6039

-+	WARN_ON_ONCE(sched_setscheduler_nocheck(p, SCHED_FIFO, &sp) != 0);

6040

-+}

6041

-+EXPORT_SYMBOL_GPL(sched_set_fifo_low);

6042

-+

6043

-+void sched_set_normal(struct task_struct *p, int nice)

6044

-+{

6045

-+	struct sched_attr attr = {

6046

-+		.sched_policy = SCHED_NORMAL,

6047

-+		.sched_nice = nice,

6048

-+	};

6049

-+	WARN_ON_ONCE(sched_setattr_nocheck(p, &attr) != 0);

6050

-+}

6051

-+EXPORT_SYMBOL_GPL(sched_set_normal);

6052

-+

6053

-+static int

6054

-+do_sched_setscheduler(pid_t pid, int policy, struct sched_param __user *param)

6055

-+{

6056

-+	struct sched_param lparam;

6057

-+	struct task_struct *p;

6058

-+	int retval;

6059

-+

6060

-+	if (!param || pid < 0)

6061

-+		return -EINVAL;

6062

-+	if (copy_from_user(&lparam, param, sizeof(struct sched_param)))

6063

-+		return -EFAULT;

6064

-+

6065

-+	rcu_read_lock();

6066

-+	retval = -ESRCH;

6067

-+	p = find_process_by_pid(pid);

6068

-+	if (likely(p))

6069

-+		get_task_struct(p);

6070

-+	rcu_read_unlock();

6071

-+

6072

-+	if (likely(p)) {

6073

-+		retval = sched_setscheduler(p, policy, &lparam);

6074

-+		put_task_struct(p);

6075

-+	}

6076

-+

6077

-+	return retval;

6078

-+}

6079

-+

6080

-+/*

6081

-+ * Mimics kernel/events/core.c perf_copy_attr().

6082

-+ */

6083

-+static int sched_copy_attr(struct sched_attr __user *uattr, struct sched_attr *attr)

6084

-+{

6085

-+	u32 size;

6086

-+	int ret;

6087

-+

6088

-+	/* Zero the full structure, so that a short copy will be nice: */

6089

-+	memset(attr, 0, sizeof(*attr));

6090

-+

6091

-+	ret = get_user(size, &uattr->size);

6092

-+	if (ret)

6093

-+		return ret;

6094

-+

6095

-+	/* ABI compatibility quirk: */

6096

-+	if (!size)

6097

-+		size = SCHED_ATTR_SIZE_VER0;

6098

-+

6099

-+	if (size < SCHED_ATTR_SIZE_VER0 || size > PAGE_SIZE)

6100

-+		goto err_size;

6101

-+

6102

-+	ret = copy_struct_from_user(attr, sizeof(*attr), uattr, size);

6103

-+	if (ret) {

6104

-+		if (ret == -E2BIG)

6105

-+			goto err_size;

6106

-+		return ret;

6107

-+	}

6108

-+

6109

-+	/*

6110

-+	 * XXX: Do we want to be lenient like existing syscalls; or do we want

6111

-+	 * to be strict and return an error on out-of-bounds values?

6112

-+	 */

6113

-+	attr->sched_nice = clamp(attr->sched_nice, -20, 19);

6114

-+

6115

-+	/* sched/core.c uses zero here but we already know ret is zero */

6116

-+	return 0;

6117

-+

6118

-+err_size:

6119

-+	put_user(sizeof(*attr), &uattr->size);

6120

-+	return -E2BIG;

6121

-+}

6122

-+

6123

-+/**

6124

-+ * sys_sched_setscheduler - set/change the scheduler policy and RT priority

6125

-+ * @pid: the pid in question.

6126

-+ * @policy: new policy.

6127

-+ *

6128

-+ * Return: 0 on success. An error code otherwise.

6129

-+ * @param: structure containing the new RT priority.

6130

-+ */

6131

-+SYSCALL_DEFINE3(sched_setscheduler, pid_t, pid, int, policy, struct sched_param __user *, param)

6132

-+{

6133

-+	if (policy < 0)

6134

-+		return -EINVAL;

6135

-+

6136

-+	return do_sched_setscheduler(pid, policy, param);

6137

-+}

6138

-+

6139

-+/**

6140

-+ * sys_sched_setparam - set/change the RT priority of a thread

6141

-+ * @pid: the pid in question.

6142

-+ * @param: structure containing the new RT priority.

6143

-+ *

6144

-+ * Return: 0 on success. An error code otherwise.

6145

-+ */

6146

-+SYSCALL_DEFINE2(sched_setparam, pid_t, pid, struct sched_param __user *, param)

6147

-+{

6148

-+	return do_sched_setscheduler(pid, SETPARAM_POLICY, param);

6149

-+}

6150

-+

6151

-+/**

6152

-+ * sys_sched_setattr - same as above, but with extended sched_attr

6153

-+ * @pid: the pid in question.

6154

-+ * @uattr: structure containing the extended parameters.

6155

-+ */

6156

-+SYSCALL_DEFINE3(sched_setattr, pid_t, pid, struct sched_attr __user *, uattr,

6157

-+			       unsigned int, flags)

6158

-+{

6159

-+	struct sched_attr attr;

6160

-+	struct task_struct *p;

6161

-+	int retval;

6162

-+

6163

-+	if (!uattr || pid < 0 || flags)

6164

-+		return -EINVAL;

6165

-+

6166

-+	retval = sched_copy_attr(uattr, &attr);

6167

-+	if (retval)

6168

-+		return retval;

6169

-+

6170

-+	if ((int)attr.sched_policy < 0)

6171

-+		return -EINVAL;

6172

-+

6173

-+	rcu_read_lock();

6174

-+	retval = -ESRCH;

6175

-+	p = find_process_by_pid(pid);

6176

-+	if (likely(p))

6177

-+		get_task_struct(p);

6178

-+	rcu_read_unlock();

6179

-+

6180

-+	if (likely(p)) {

6181

-+		retval = sched_setattr(p, &attr);

6182

-+		put_task_struct(p);

6183

-+	}

6184

-+

6185

-+	return retval;

6186

-+}

6187

-+

6188

-+/**

6189

-+ * sys_sched_getscheduler - get the policy (scheduling class) of a thread

6190

-+ * @pid: the pid in question.

6191

-+ *

6192

-+ * Return: On success, the policy of the thread. Otherwise, a negative error

6193

-+ * code.

6194

-+ */

6195

-+SYSCALL_DEFINE1(sched_getscheduler, pid_t, pid)

6196

-+{

6197

-+	struct task_struct *p;

6198

-+	int retval = -EINVAL;

6199

-+

6200

-+	if (pid < 0)

6201

-+		goto out_nounlock;

6202

-+

6203

-+	retval = -ESRCH;

6204

-+	rcu_read_lock();

6205

-+	p = find_process_by_pid(pid);

6206

-+	if (p) {

6207

-+		retval = security_task_getscheduler(p);

6208

-+		if (!retval)

6209

-+			retval = p->policy;

6210

-+	}

6211

-+	rcu_read_unlock();

6212

-+

6213

-+out_nounlock:

6214

-+	return retval;

6215

-+}

6216

-+

6217

-+/**

6218

-+ * sys_sched_getscheduler - get the RT priority of a thread

6219

-+ * @pid: the pid in question.

6220

-+ * @param: structure containing the RT priority.

6221

-+ *

6222

-+ * Return: On success, 0 and the RT priority is in @param. Otherwise, an error

6223

-+ * code.

6224

-+ */

6225

-+SYSCALL_DEFINE2(sched_getparam, pid_t, pid, struct sched_param __user *, param)

6226

-+{

6227

-+	struct sched_param lp = { .sched_priority = 0 };

6228

-+	struct task_struct *p;

6229

-+	int retval = -EINVAL;

6230

-+

6231

-+	if (!param || pid < 0)

6232

-+		goto out_nounlock;

6233

-+

6234

-+	rcu_read_lock();

6235

-+	p = find_process_by_pid(pid);

6236

-+	retval = -ESRCH;

6237

-+	if (!p)

6238

-+		goto out_unlock;

6239

-+

6240

-+	retval = security_task_getscheduler(p);

6241

-+	if (retval)

6242

-+		goto out_unlock;

6243

-+

6244

-+	if (task_has_rt_policy(p))

6245

-+		lp.sched_priority = p->rt_priority;

6246

-+	rcu_read_unlock();

6247

-+

6248

-+	/*

6249

-+	 * This one might sleep, we cannot do it with a spinlock held ...

6250

-+	 */

6251

-+	retval = copy_to_user(param, &lp, sizeof(*param)) ? -EFAULT : 0;

6252

-+

6253

-+out_nounlock:

6254

-+	return retval;

6255

-+

6256

-+out_unlock:

6257

-+	rcu_read_unlock();

6258

-+	return retval;

6259

-+}

6260

-+

6261

-+/*

6262

-+ * Copy the kernel size attribute structure (which might be larger

6263

-+ * than what user-space knows about) to user-space.

6264

-+ *

6265

-+ * Note that all cases are valid: user-space buffer can be larger or

6266

-+ * smaller than the kernel-space buffer. The usual case is that both

6267

-+ * have the same size.

6268

-+ */

6269

-+static int

6270

-+sched_attr_copy_to_user(struct sched_attr __user *uattr,

6271

-+			struct sched_attr *kattr,

6272

-+			unsigned int usize)

6273

-+{

6274

-+	unsigned int ksize = sizeof(*kattr);

6275

-+

6276

-+	if (!access_ok(uattr, usize))

6277

-+		return -EFAULT;

6278

-+

6279

-+	/*

6280

-+	 * sched_getattr() ABI forwards and backwards compatibility:

6281

-+	 *

6282

-+	 * If usize == ksize then we just copy everything to user-space and all is good.

6283

-+	 *

6284

-+	 * If usize < ksize then we only copy as much as user-space has space for,

6285

-+	 * this keeps ABI compatibility as well. We skip the rest.

6286

-+	 *

6287

-+	 * If usize > ksize then user-space is using a newer version of the ABI,

6288

-+	 * which part the kernel doesn't know about. Just ignore it - tooling can

6289

-+	 * detect the kernel's knowledge of attributes from the attr->size value

6290

-+	 * which is set to ksize in this case.

6291

-+	 */

6292

-+	kattr->size = min(usize, ksize);

6293

-+

6294

-+	if (copy_to_user(uattr, kattr, kattr->size))

6295

-+		return -EFAULT;

6296

-+

6297

-+	return 0;

6298

-+}

6299

-+

6300

-+/**

6301

-+ * sys_sched_getattr - similar to sched_getparam, but with sched_attr

6302

-+ * @pid: the pid in question.

6303

-+ * @uattr: structure containing the extended parameters.

6304

-+ * @usize: sizeof(attr) for fwd/bwd comp.

6305

-+ * @flags: for future extension.

6306

-+ */

6307

-+SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,

6308

-+		unsigned int, usize, unsigned int, flags)

6309

-+{

6310

-+	struct sched_attr kattr = { };

6311

-+	struct task_struct *p;

6312

-+	int retval;

6313

-+

6314

-+	if (!uattr || pid < 0 || usize > PAGE_SIZE ||

6315

-+	    usize < SCHED_ATTR_SIZE_VER0 || flags)

6316

-+		return -EINVAL;

6317

-+

6318

-+	rcu_read_lock();

6319

-+	p = find_process_by_pid(pid);

6320

-+	retval = -ESRCH;

6321

-+	if (!p)

6322

-+		goto out_unlock;

6323

-+

6324

-+	retval = security_task_getscheduler(p);

6325

-+	if (retval)

6326

-+		goto out_unlock;

6327

-+

6328

-+	kattr.sched_policy = p->policy;

6329

-+	if (p->sched_reset_on_fork)

6330

-+		kattr.sched_flags |= SCHED_FLAG_RESET_ON_FORK;

6331

-+	if (task_has_rt_policy(p))

6332

-+		kattr.sched_priority = p->rt_priority;

6333

-+	else

6334

-+		kattr.sched_nice = task_nice(p);

6335

-+

6336

-+#ifdef CONFIG_UCLAMP_TASK

6337

-+	kattr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value;

6338

-+	kattr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value;

6339

-+#endif

6340

-+

6341

-+	rcu_read_unlock();

6342

-+

6343

-+	return sched_attr_copy_to_user(uattr, &kattr, usize);

6344

-+

6345

-+out_unlock:

6346

-+	rcu_read_unlock();

6347

-+	return retval;

6348

-+}

6349

-+

6350

-+long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)

6351

-+{

6352

-+	cpumask_var_t cpus_allowed, new_mask;

6353

-+	struct task_struct *p;

6354

-+	int retval;

6355

-+

6356

-+	rcu_read_lock();

6357

-+

6358

-+	p = find_process_by_pid(pid);

6359

-+	if (!p) {

6360

-+		rcu_read_unlock();

6361

-+		return -ESRCH;

6362

-+	}

6363

-+

6364

-+	/* Prevent p going away */

6365

-+	get_task_struct(p);

6366

-+	rcu_read_unlock();

6367

-+

6368

-+	if (p->flags & PF_NO_SETAFFINITY) {

6369

-+		retval = -EINVAL;

6370

-+		goto out_put_task;

6371

-+	}

6372

-+	if (!alloc_cpumask_var(&cpus_allowed, GFP_KERNEL)) {

6373

-+		retval = -ENOMEM;

6374

-+		goto out_put_task;

6375

-+	}

6376

-+	if (!alloc_cpumask_var(&new_mask, GFP_KERNEL)) {

6377

-+		retval = -ENOMEM;

6378

-+		goto out_free_cpus_allowed;

6379

-+	}

6380

-+	retval = -EPERM;

6381

-+	if (!check_same_owner(p)) {

6382

-+		rcu_read_lock();

6383

-+		if (!ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE)) {

6384

-+			rcu_read_unlock();

6385

-+			goto out_free_new_mask;

6386

-+		}

6387

-+		rcu_read_unlock();

6388

-+	}

6389

-+

6390

-+	retval = security_task_setscheduler(p);

6391

-+	if (retval)

6392

-+		goto out_free_new_mask;

6393

-+

6394

-+	cpuset_cpus_allowed(p, cpus_allowed);

6395

-+	cpumask_and(new_mask, in_mask, cpus_allowed);

6396

-+

6397

-+again:

6398

-+	retval = __set_cpus_allowed_ptr(p, new_mask, SCA_CHECK);

6399

-+

6400

-+	if (!retval) {

6401

-+		cpuset_cpus_allowed(p, cpus_allowed);

6402

-+		if (!cpumask_subset(new_mask, cpus_allowed)) {

6403

-+			/*

6404

-+			 * We must have raced with a concurrent cpuset

6405

-+			 * update. Just reset the cpus_allowed to the

6406

-+			 * cpuset's cpus_allowed

6407

-+			 */

6408

-+			cpumask_copy(new_mask, cpus_allowed);

6409

-+			goto again;

6410

-+		}

6411

-+	}

6412

-+out_free_new_mask:

6413

-+	free_cpumask_var(new_mask);

6414

-+out_free_cpus_allowed:

6415

-+	free_cpumask_var(cpus_allowed);

6416

-+out_put_task:

6417

-+	put_task_struct(p);

6418

-+	return retval;

6419

-+}

6420

-+

6421

-+static int get_user_cpu_mask(unsigned long __user *user_mask_ptr, unsigned len,

6422

-+			     struct cpumask *new_mask)

6423

-+{

6424

-+	if (len < cpumask_size())

6425

-+		cpumask_clear(new_mask);

6426

-+	else if (len > cpumask_size())

6427

-+		len = cpumask_size();

6428

-+

6429

-+	return copy_from_user(new_mask, user_mask_ptr, len) ? -EFAULT : 0;

6430

-+}

6431

-+

6432

-+/**

6433

-+ * sys_sched_setaffinity - set the CPU affinity of a process

6434

-+ * @pid: pid of the process

6435

-+ * @len: length in bytes of the bitmask pointed to by user_mask_ptr

6436

-+ * @user_mask_ptr: user-space pointer to the new CPU mask

6437

-+ *

6438

-+ * Return: 0 on success. An error code otherwise.

6439

-+ */

6440

-+SYSCALL_DEFINE3(sched_setaffinity, pid_t, pid, unsigned int, len,

6441

-+		unsigned long __user *, user_mask_ptr)

6442

-+{

6443

-+	cpumask_var_t new_mask;

6444

-+	int retval;

6445

-+

6446

-+	if (!alloc_cpumask_var(&new_mask, GFP_KERNEL))

6447

-+		return -ENOMEM;

6448

-+

6449

-+	retval = get_user_cpu_mask(user_mask_ptr, len, new_mask);

6450

-+	if (retval == 0)

6451

-+		retval = sched_setaffinity(pid, new_mask);

6452

-+	free_cpumask_var(new_mask);

6453

-+	return retval;

6454

-+}

6455

-+

6456

-+long sched_getaffinity(pid_t pid, cpumask_t *mask)

6457

-+{

6458

-+	struct task_struct *p;

6459

-+	raw_spinlock_t *lock;

6460

-+	unsigned long flags;

6461

-+	int retval;

6462

-+

6463

-+	rcu_read_lock();

6464

-+

6465

-+	retval = -ESRCH;

6466

-+	p = find_process_by_pid(pid);

6467

-+	if (!p)

6468

-+		goto out_unlock;

6469

-+

6470

-+	retval = security_task_getscheduler(p);

6471

-+	if (retval)

6472

-+		goto out_unlock;

6473

-+

6474

-+	task_access_lock_irqsave(p, &lock, &flags);

6475

-+	cpumask_and(mask, &p->cpus_mask, cpu_active_mask);

6476

-+	task_access_unlock_irqrestore(p, lock, &flags);

6477

-+

6478

-+out_unlock:

6479

-+	rcu_read_unlock();

6480

-+

6481

-+	return retval;

6482

-+}

6483

-+

6484

-+/**

6485

-+ * sys_sched_getaffinity - get the CPU affinity of a process

6486

-+ * @pid: pid of the process

6487

-+ * @len: length in bytes of the bitmask pointed to by user_mask_ptr

6488

-+ * @user_mask_ptr: user-space pointer to hold the current CPU mask

6489

-+ *

6490

-+ * Return: size of CPU mask copied to user_mask_ptr on success. An

6491

-+ * error code otherwise.

6492

-+ */

6493

-+SYSCALL_DEFINE3(sched_getaffinity, pid_t, pid, unsigned int, len,

6494

-+		unsigned long __user *, user_mask_ptr)

6495

-+{

6496

-+	int ret;

6497

-+	cpumask_var_t mask;

6498

-+

6499

-+	if ((len * BITS_PER_BYTE) < nr_cpu_ids)

6500

-+		return -EINVAL;

6501

-+	if (len & (sizeof(unsigned long)-1))

6502

-+		return -EINVAL;

6503

-+

6504

-+	if (!alloc_cpumask_var(&mask, GFP_KERNEL))

6505

-+		return -ENOMEM;

6506

-+

6507

-+	ret = sched_getaffinity(pid, mask);

6508

-+	if (ret == 0) {

6509

-+		unsigned int retlen = min_t(size_t, len, cpumask_size());

6510

-+

6511

-+		if (copy_to_user(user_mask_ptr, mask, retlen))

6512

-+			ret = -EFAULT;

6513

-+		else

6514

-+			ret = retlen;

6515

-+	}

6516

-+	free_cpumask_var(mask);

6517

-+

6518

-+	return ret;

6519

-+}

6520

-+

6521

-+static void do_sched_yield(void)

6522

-+{

6523

-+	struct rq *rq;

6524

-+	struct rq_flags rf;

6525

-+

6526

-+	if (!sched_yield_type)

6527

-+		return;

6528

-+

6529

-+	rq = this_rq_lock_irq(&rf);

6530

-+

6531

-+	schedstat_inc(rq->yld_count);

6532

-+

6533

-+	if (1 == sched_yield_type) {

6534

-+		if (!rt_task(current))

6535

-+			do_sched_yield_type_1(current, rq);

6536

-+	} else if (2 == sched_yield_type) {

6537

-+		if (rq->nr_running > 1)

6538

-+			rq->skip = current;

6539

-+	}

6540

-+

6541

-+	preempt_disable();

6542

-+	raw_spin_unlock_irq(&rq->lock);

6543

-+	sched_preempt_enable_no_resched();

6544

-+

6545

-+	schedule();

6546

-+}

6547

-+

6548

-+/**

6549

-+ * sys_sched_yield - yield the current processor to other threads.

6550

-+ *

6551

-+ * This function yields the current CPU to other tasks. If there are no

6552

-+ * other threads running on this CPU then this function will return.

6553

-+ *

6554

-+ * Return: 0.

6555

-+ */

6556

-+SYSCALL_DEFINE0(sched_yield)

6557

-+{

6558

-+	do_sched_yield();

6559

-+	return 0;

6560

-+}

6561

-+

6562

-+#if !defined(CONFIG_PREEMPTION) || defined(CONFIG_PREEMPT_DYNAMIC)

6563

-+int __sched __cond_resched(void)

6564

-+{

6565

-+	if (should_resched(0)) {

6566

-+		preempt_schedule_common();

6567

-+		return 1;

6568

-+	}

6569

-+#ifndef CONFIG_PREEMPT_RCU

6570

-+	rcu_all_qs();

6571

-+#endif

6572

-+	return 0;

6573

-+}

6574

-+EXPORT_SYMBOL(__cond_resched);

6575

-+#endif

6576

-+

6577

-+#ifdef CONFIG_PREEMPT_DYNAMIC

6578

-+DEFINE_STATIC_CALL_RET0(cond_resched, __cond_resched);

6579

-+EXPORT_STATIC_CALL_TRAMP(cond_resched);

6580

-+

6581

-+DEFINE_STATIC_CALL_RET0(might_resched, __cond_resched);

6582

-+EXPORT_STATIC_CALL_TRAMP(might_resched);

6583

-+#endif

6584

-+

6585

-+/*

6586

-+ * __cond_resched_lock() - if a reschedule is pending, drop the given lock,

6587

-+ * call schedule, and on return reacquire the lock.

6588

-+ *

6589

-+ * This works OK both with and without CONFIG_PREEMPTION.  We do strange low-level

6590

-+ * operations here to prevent schedule() from being called twice (once via

6591

-+ * spin_unlock(), once by hand).

6592

-+ */

6593

-+int __cond_resched_lock(spinlock_t *lock)

6594

-+{

6595

-+	int resched = should_resched(PREEMPT_LOCK_OFFSET);

6596

-+	int ret = 0;

6597

-+

6598

-+	lockdep_assert_held(lock);

6599

-+

6600

-+	if (spin_needbreak(lock) || resched) {

6601

-+		spin_unlock(lock);

6602

-+		if (resched)

6603

-+			preempt_schedule_common();

6604

-+		else

6605

-+			cpu_relax();

6606

-+		ret = 1;

6607

-+		spin_lock(lock);

6608

-+	}

6609

-+	return ret;

6610

-+}

6611

-+EXPORT_SYMBOL(__cond_resched_lock);

6612

-+

6613

-+int __cond_resched_rwlock_read(rwlock_t *lock)

6614

-+{

6615

-+	int resched = should_resched(PREEMPT_LOCK_OFFSET);

6616

-+	int ret = 0;

6617

-+

6618

-+	lockdep_assert_held_read(lock);

6619

-+

6620

-+	if (rwlock_needbreak(lock) || resched) {

6621

-+		read_unlock(lock);

6622

-+		if (resched)

6623

-+			preempt_schedule_common();

6624

-+		else

6625

-+			cpu_relax();

6626

-+		ret = 1;

6627

-+		read_lock(lock);

6628

-+	}

6629

-+	return ret;

6630

-+}

6631

-+EXPORT_SYMBOL(__cond_resched_rwlock_read);

6632

-+

6633

-+int __cond_resched_rwlock_write(rwlock_t *lock)

6634

-+{

6635

-+	int resched = should_resched(PREEMPT_LOCK_OFFSET);

6636

-+	int ret = 0;

6637

-+

6638

-+	lockdep_assert_held_write(lock);

6639

-+

6640

-+	if (rwlock_needbreak(lock) || resched) {

6641

-+		write_unlock(lock);

6642

-+		if (resched)

6643

-+			preempt_schedule_common();

6644

-+		else

6645

-+			cpu_relax();

6646

-+		ret = 1;

6647

-+		write_lock(lock);

6648

-+	}

6649

-+	return ret;

6650

-+}

6651

-+EXPORT_SYMBOL(__cond_resched_rwlock_write);

6652

-+

6653

-+/**

6654

-+ * yield - yield the current processor to other threads.

6655

-+ *

6656

-+ * Do not ever use this function, there's a 99% chance you're doing it wrong.

6657

-+ *

6658

-+ * The scheduler is at all times free to pick the calling task as the most

6659

-+ * eligible task to run, if removing the yield() call from your code breaks

6660

-+ * it, it's already broken.

6661

-+ *

6662

-+ * Typical broken usage is:

6663

-+ *

6664

-+ * while (!event)

6665

-+ * 	yield();

6666

-+ *

6667

-+ * where one assumes that yield() will let 'the other' process run that will

6668

-+ * make event true. If the current task is a SCHED_FIFO task that will never

6669

-+ * happen. Never use yield() as a progress guarantee!!

6670

-+ *

6671

-+ * If you want to use yield() to wait for something, use wait_event().

6672

-+ * If you want to use yield() to be 'nice' for others, use cond_resched().

6673

-+ * If you still want to use yield(), do not!

6674

-+ */

6675

-+void __sched yield(void)

6676

-+{

6677

-+	set_current_state(TASK_RUNNING);

6678

-+	do_sched_yield();

6679

-+}

6680

-+EXPORT_SYMBOL(yield);

6681

-+

6682

-+/**

6683

-+ * yield_to - yield the current processor to another thread in

6684

-+ * your thread group, or accelerate that thread toward the

6685

-+ * processor it's on.

6686

-+ * @p: target task

6687

-+ * @preempt: whether task preemption is allowed or not

6688

-+ *

6689

-+ * It's the caller's job to ensure that the target task struct

6690

-+ * can't go away on us before we can do any checks.

6691

-+ *

6692

-+ * In Alt schedule FW, yield_to is not supported.

6693

-+ *

6694

-+ * Return:

6695

-+ *	true (>0) if we indeed boosted the target task.

6696

-+ *	false (0) if we failed to boost the target.

6697

-+ *	-ESRCH if there's no task to yield to.

6698

-+ */

6699

-+int __sched yield_to(struct task_struct *p, bool preempt)

6700

-+{

6701

-+	return 0;

6702

-+}

6703

-+EXPORT_SYMBOL_GPL(yield_to);

6704

-+

6705

-+int io_schedule_prepare(void)

6706

-+{

6707

-+	int old_iowait = current->in_iowait;

6708

-+

6709

-+	current->in_iowait = 1;

6710

-+	blk_schedule_flush_plug(current);

6711

-+

6712

-+	return old_iowait;

6713

-+}

6714

-+

6715

-+void io_schedule_finish(int token)

6716

-+{

6717

-+	current->in_iowait = token;

6718

-+}

6719

-+

6720

-+/*

6721

-+ * This task is about to go to sleep on IO.  Increment rq->nr_iowait so

6722

-+ * that process accounting knows that this is a task in IO wait state.

6723

-+ *

6724

-+ * But don't do that if it is a deliberate, throttling IO wait (this task

6725

-+ * has set its backing_dev_info: the queue against which it should throttle)

6726

-+ */

6727

-+

6728

-+long __sched io_schedule_timeout(long timeout)

6729

-+{

6730

-+	int token;

6731

-+	long ret;

6732

-+

6733

-+	token = io_schedule_prepare();

6734

-+	ret = schedule_timeout(timeout);

6735

-+	io_schedule_finish(token);

6736

-+

6737

-+	return ret;

6738

-+}

6739

-+EXPORT_SYMBOL(io_schedule_timeout);

6740

-+

6741

-+void __sched io_schedule(void)

6742

-+{

6743

-+	int token;

6744

-+

6745

-+	token = io_schedule_prepare();

6746

-+	schedule();

6747

-+	io_schedule_finish(token);

6748

-+}

6749

-+EXPORT_SYMBOL(io_schedule);

6750

-+

6751

-+/**

6752

-+ * sys_sched_get_priority_max - return maximum RT priority.

6753

-+ * @policy: scheduling class.

6754

-+ *

6755

-+ * Return: On success, this syscall returns the maximum

6756

-+ * rt_priority that can be used by a given scheduling class.

6757

-+ * On failure, a negative error code is returned.

6758

-+ */

6759

-+SYSCALL_DEFINE1(sched_get_priority_max, int, policy)

6760

-+{

6761

-+	int ret = -EINVAL;

6762

-+

6763

-+	switch (policy) {

6764

-+	case SCHED_FIFO:

6765

-+	case SCHED_RR:

6766

-+		ret = MAX_RT_PRIO - 1;

6767

-+		break;

6768

-+	case SCHED_NORMAL:

6769

-+	case SCHED_BATCH:

6770

-+	case SCHED_IDLE:

6771

-+		ret = 0;

6772

-+		break;

6773

-+	}

6774

-+	return ret;

6775

-+}

6776

-+

6777

-+/**

6778

-+ * sys_sched_get_priority_min - return minimum RT priority.

6779

-+ * @policy: scheduling class.

6780

-+ *

6781

-+ * Return: On success, this syscall returns the minimum

6782

-+ * rt_priority that can be used by a given scheduling class.

6783

-+ * On failure, a negative error code is returned.

6784

-+ */

6785

-+SYSCALL_DEFINE1(sched_get_priority_min, int, policy)

6786

-+{

6787

-+	int ret = -EINVAL;

6788

-+

6789

-+	switch (policy) {

6790

-+	case SCHED_FIFO:

6791

-+	case SCHED_RR:

6792

-+		ret = 1;

6793

-+		break;

6794

-+	case SCHED_NORMAL:

6795

-+	case SCHED_BATCH:

6796

-+	case SCHED_IDLE:

6797

-+		ret = 0;

6798

-+		break;

6799

-+	}

6800

-+	return ret;

6801

-+}

6802

-+

6803

-+static int sched_rr_get_interval(pid_t pid, struct timespec64 *t)

6804

-+{

6805

-+	struct task_struct *p;

6806

-+	int retval;

6807

-+

6808

-+	alt_sched_debug();

6809

-+

6810

-+	if (pid < 0)

6811

-+		return -EINVAL;

6812

-+

6813

-+	retval = -ESRCH;

6814

-+	rcu_read_lock();

6815

-+	p = find_process_by_pid(pid);

6816

-+	if (!p)

6817

-+		goto out_unlock;

6818

-+

6819

-+	retval = security_task_getscheduler(p);

6820

-+	if (retval)

6821

-+		goto out_unlock;

6822

-+	rcu_read_unlock();

6823

-+

6824

-+	*t = ns_to_timespec64(sched_timeslice_ns);

6825

-+	return 0;

6826

-+

6827

-+out_unlock:

6828

-+	rcu_read_unlock();

6829

-+	return retval;

6830

-+}

6831

-+

6832

-+/**

6833

-+ * sys_sched_rr_get_interval - return the default timeslice of a process.

6834

-+ * @pid: pid of the process.

6835

-+ * @interval: userspace pointer to the timeslice value.

6836

-+ *

6837

-+ *

6838

-+ * Return: On success, 0 and the timeslice is in @interval. Otherwise,

6839

-+ * an error code.

6840

-+ */

6841

-+SYSCALL_DEFINE2(sched_rr_get_interval, pid_t, pid,

6842

-+		struct __kernel_timespec __user *, interval)

6843

-+{

6844

-+	struct timespec64 t;

6845

-+	int retval = sched_rr_get_interval(pid, &t);

6846

-+

6847

-+	if (retval == 0)

6848

-+		retval = put_timespec64(&t, interval);

6849

-+

6850

-+	return retval;

6851

-+}

6852

-+

6853

-+#ifdef CONFIG_COMPAT_32BIT_TIME

6854

-+SYSCALL_DEFINE2(sched_rr_get_interval_time32, pid_t, pid,

6855

-+		struct old_timespec32 __user *, interval)

6856

-+{

6857

-+	struct timespec64 t;

6858

-+	int retval = sched_rr_get_interval(pid, &t);

6859

-+

6860

-+	if (retval == 0)

6861

-+		retval = put_old_timespec32(&t, interval);

6862

-+	return retval;

6863

-+}

6864

-+#endif

6865

-+

6866

-+void sched_show_task(struct task_struct *p)

6867

-+{

6868

-+	unsigned long free = 0;

6869

-+	int ppid;

6870

-+

6871

-+	if (!try_get_task_stack(p))

6872

-+		return;

6873

-+

6874

-+	pr_info("task:%-15.15s state:%c", p->comm, task_state_to_char(p));

6875

-+

6876

-+	if (task_is_running(p))

6877

-+		pr_cont("  running task    ");

6878

-+#ifdef CONFIG_DEBUG_STACK_USAGE

6879

-+	free = stack_not_used(p);

6880

-+#endif

6881

-+	ppid = 0;

6882

-+	rcu_read_lock();

6883

-+	if (pid_alive(p))

6884

-+		ppid = task_pid_nr(rcu_dereference(p->real_parent));

6885

-+	rcu_read_unlock();

6886

-+	pr_cont(" stack:%5lu pid:%5d ppid:%6d flags:0x%08lx\n",

6887

-+		free, task_pid_nr(p), ppid,

6888

-+		(unsigned long)task_thread_info(p)->flags);

6889

-+

6890

-+	print_worker_info(KERN_INFO, p);

6891

-+	print_stop_info(KERN_INFO, p);

6892

-+	show_stack(p, NULL, KERN_INFO);

6893

-+	put_task_stack(p);

6894

-+}

6895

-+EXPORT_SYMBOL_GPL(sched_show_task);

6896

-+

6897

-+static inline bool

6898

-+state_filter_match(unsigned long state_filter, struct task_struct *p)

6899

-+{

6900

-+	unsigned int state = READ_ONCE(p->__state);

6901

-+

6902

-+	/* no filter, everything matches */

6903

-+	if (!state_filter)

6904

-+		return true;

6905

-+

6906

-+	/* filter, but doesn't match */

6907

-+	if (!(state & state_filter))

6908

-+		return false;

6909

-+

6910

-+	/*

6911

-+	 * When looking for TASK_UNINTERRUPTIBLE skip TASK_IDLE (allows

6912

-+	 * TASK_KILLABLE).

6913

-+	 */

6914

-+	if (state_filter == TASK_UNINTERRUPTIBLE && state == TASK_IDLE)

6915

-+		return false;

6916

-+

6917

-+	return true;

6918

-+}

6919

-+

6920

-+

6921

-+void show_state_filter(unsigned int state_filter)

6922

-+{

6923

-+	struct task_struct *g, *p;

6924

-+

6925

-+	rcu_read_lock();

6926

-+	for_each_process_thread(g, p) {

6927

-+		/*

6928

-+		 * reset the NMI-timeout, listing all files on a slow

6929

-+		 * console might take a lot of time:

6930

-+		 * Also, reset softlockup watchdogs on all CPUs, because

6931

-+		 * another CPU might be blocked waiting for us to process

6932

-+		 * an IPI.

6933

-+		 */

6934

-+		touch_nmi_watchdog();

6935

-+		touch_all_softlockup_watchdogs();

6936

-+		if (state_filter_match(state_filter, p))

6937

-+			sched_show_task(p);

6938

-+	}

6939

-+

6940

-+#ifdef CONFIG_SCHED_DEBUG

6941

-+	/* TODO: Alt schedule FW should support this

6942

-+	if (!state_filter)

6943

-+		sysrq_sched_debug_show();

6944

-+	*/

6945

-+#endif

6946

-+	rcu_read_unlock();

6947

-+	/*

6948

-+	 * Only show locks if all tasks are dumped:

6949

-+	 */

6950

-+	if (!state_filter)

6951

-+		debug_show_all_locks();

6952

-+}

6953

-+

6954

-+void dump_cpu_task(int cpu)

6955

-+{

6956

-+	pr_info("Task dump for CPU %d:\n", cpu);

6957

-+	sched_show_task(cpu_curr(cpu));

6958

-+}

6959

-+

6960

-+/**

6961

-+ * init_idle - set up an idle thread for a given CPU

6962

-+ * @idle: task in question

6963

-+ * @cpu: CPU the idle task belongs to

6964

-+ *

6965

-+ * NOTE: this function does not set the idle thread's NEED_RESCHED

6966

-+ * flag, to make booting more robust.

6967

-+ */

6968

-+void __init init_idle(struct task_struct *idle, int cpu)

6969

-+{

6970

-+	struct rq *rq = cpu_rq(cpu);

6971

-+	unsigned long flags;

6972

-+

6973

-+	__sched_fork(0, idle);

6974

-+

6975

-+	/*

6976

-+	 * The idle task doesn't need the kthread struct to function, but it

6977

-+	 * is dressed up as a per-CPU kthread and thus needs to play the part

6978

-+	 * if we want to avoid special-casing it in code that deals with per-CPU

6979

-+	 * kthreads.

6980

-+	 */

6981

-+	set_kthread_struct(idle);

6982

-+

6983

-+	raw_spin_lock_irqsave(&idle->pi_lock, flags);

6984

-+	raw_spin_lock(&rq->lock);

6985

-+	update_rq_clock(rq);

6986

-+

6987

-+	idle->last_ran = rq->clock_task;

6988

-+	idle->__state = TASK_RUNNING;

6989

-+	/*

6990

-+	 * PF_KTHREAD should already be set at this point; regardless, make it

6991

-+	 * look like a proper per-CPU kthread.

6992

-+	 */

6993

-+	idle->flags |= PF_IDLE | PF_KTHREAD | PF_NO_SETAFFINITY;

6994

-+	kthread_set_per_cpu(idle, cpu);

6995

-+

6996

-+	sched_queue_init_idle(&rq->queue, idle);

6997

-+

6998

-+	scs_task_reset(idle);

6999

-+	kasan_unpoison_task_stack(idle);

7000

-+

7001

-+#ifdef CONFIG_SMP

7002

-+	/*

7003

-+	 * It's possible that init_idle() gets called multiple times on a task,

7004

-+	 * in that case do_set_cpus_allowed() will not do the right thing.

7005

-+	 *

7006

-+	 * And since this is boot we can forgo the serialisation.

7007

-+	 */

7008

-+	set_cpus_allowed_common(idle, cpumask_of(cpu));

7009

-+#endif

7010

-+

7011

-+	/* Silence PROVE_RCU */

7012

-+	rcu_read_lock();

7013

-+	__set_task_cpu(idle, cpu);

7014

-+	rcu_read_unlock();

7015

-+

7016

-+	rq->idle = idle;

7017

-+	rcu_assign_pointer(rq->curr, idle);

7018

-+	idle->on_cpu = 1;

7019

-+

7020

-+	raw_spin_unlock(&rq->lock);

7021

-+	raw_spin_unlock_irqrestore(&idle->pi_lock, flags);

7022

-+

7023

-+	/* Set the preempt count _outside_ the spinlocks! */

7024

-+	init_idle_preempt_count(idle, cpu);

7025

-+

7026

-+	ftrace_graph_init_idle_task(idle, cpu);

7027

-+	vtime_init_idle(idle, cpu);

7028

-+#ifdef CONFIG_SMP

7029

-+	sprintf(idle->comm, "%s/%d", INIT_TASK_COMM, cpu);

7030

-+#endif

7031

-+}

7032

-+

7033

-+#ifdef CONFIG_SMP

7034

-+

7035

-+int cpuset_cpumask_can_shrink(const struct cpumask __maybe_unused *cur,

7036

-+			      const struct cpumask __maybe_unused *trial)

7037

-+{

7038

-+	return 1;

7039

-+}

7040

-+

7041

-+int task_can_attach(struct task_struct *p,

7042

-+		    const struct cpumask *cs_cpus_allowed)

7043

-+{

7044

-+	int ret = 0;

7045

-+

7046

-+	/*

7047

-+	 * Kthreads which disallow setaffinity shouldn't be moved

7048

-+	 * to a new cpuset; we don't want to change their CPU

7049

-+	 * affinity and isolating such threads by their set of

7050

-+	 * allowed nodes is unnecessary.  Thus, cpusets are not

7051

-+	 * applicable for such threads.  This prevents checking for

7052

-+	 * success of set_cpus_allowed_ptr() on all attached tasks

7053

-+	 * before cpus_mask may be changed.

7054

-+	 */

7055

-+	if (p->flags & PF_NO_SETAFFINITY)

7056

-+		ret = -EINVAL;

7057

-+

7058

-+	return ret;

7059

-+}

7060

-+

7061

-+bool sched_smp_initialized __read_mostly;

7062

-+

7063

-+#ifdef CONFIG_HOTPLUG_CPU

7064

-+/*

7065

-+ * Ensures that the idle task is using init_mm right before its CPU goes

7066

-+ * offline.

7067

-+ */

7068

-+void idle_task_exit(void)

7069

-+{

7070

-+	struct mm_struct *mm = current->active_mm;

7071

-+

7072

-+	BUG_ON(current != this_rq()->idle);

7073

-+

7074

-+	if (mm != &init_mm) {

7075

-+		switch_mm(mm, &init_mm, current);

7076

-+		finish_arch_post_lock_switch();

7077

-+	}

7078

-+

7079

-+	/* finish_cpu(), as ran on the BP, will clean up the active_mm state */

7080

-+}

7081

-+

7082

-+static int __balance_push_cpu_stop(void *arg)

7083

-+{

7084

-+	struct task_struct *p = arg;

7085

-+	struct rq *rq = this_rq();

7086

-+	struct rq_flags rf;

7087

-+	int cpu;

7088

-+

7089

-+	raw_spin_lock_irq(&p->pi_lock);

7090

-+	rq_lock(rq, &rf);

7091

-+

7092

-+	update_rq_clock(rq);

7093

-+

7094

-+	if (task_rq(p) == rq && task_on_rq_queued(p)) {

7095

-+		cpu = select_fallback_rq(rq->cpu, p);

7096

-+		rq = __migrate_task(rq, p, cpu);

7097

-+	}

7098

-+

7099

-+	rq_unlock(rq, &rf);

7100

-+	raw_spin_unlock_irq(&p->pi_lock);

7101

-+

7102

-+	put_task_struct(p);

7103

-+

7104

-+	return 0;

7105

-+}

7106

-+

7107

-+static DEFINE_PER_CPU(struct cpu_stop_work, push_work);

7108

-+

7109

-+/*

7110

-+ * This is enabled below SCHED_AP_ACTIVE; when !cpu_active(), but only

7111

-+ * effective when the hotplug motion is down.

7112

-+ */

7113

-+static void balance_push(struct rq *rq)

7114

-+{

7115

-+	struct task_struct *push_task = rq->curr;

7116

-+

7117

-+	lockdep_assert_held(&rq->lock);

7118

-+

7119

-+	/*

7120

-+	 * Ensure the thing is persistent until balance_push_set(.on = false);

7121

-+	 */

7122

-+	rq->balance_callback = &balance_push_callback;

7123

-+

7124

-+	/*

7125

-+	 * Only active while going offline and when invoked on the outgoing

7126

-+	 * CPU.

7127

-+	 */

7128

-+	if (!cpu_dying(rq->cpu) || rq != this_rq())

7129

-+		return;

7130

-+

7131

-+	/*

7132

-+	 * Both the cpu-hotplug and stop task are in this case and are

7133

-+	 * required to complete the hotplug process.

7134

-+	 */

7135

-+	if (kthread_is_per_cpu(push_task) ||

7136

-+	    is_migration_disabled(push_task)) {

7137

-+

7138

-+		/*

7139

-+		 * If this is the idle task on the outgoing CPU try to wake

7140

-+		 * up the hotplug control thread which might wait for the

7141

-+		 * last task to vanish. The rcuwait_active() check is

7142

-+		 * accurate here because the waiter is pinned on this CPU

7143

-+		 * and can't obviously be running in parallel.

7144

-+		 *

7145

-+		 * On RT kernels this also has to check whether there are

7146

-+		 * pinned and scheduled out tasks on the runqueue. They

7147

-+		 * need to leave the migrate disabled section first.

7148

-+		 */

7149

-+		if (!rq->nr_running && !rq_has_pinned_tasks(rq) &&

7150

-+		    rcuwait_active(&rq->hotplug_wait)) {

7151

-+			raw_spin_unlock(&rq->lock);

7152

-+			rcuwait_wake_up(&rq->hotplug_wait);

7153

-+			raw_spin_lock(&rq->lock);

7154

-+		}

7155

-+		return;

7156

-+	}

7157

-+

7158

-+	get_task_struct(push_task);

7159

-+	/*

7160

-+	 * Temporarily drop rq->lock such that we can wake-up the stop task.

7161

-+	 * Both preemption and IRQs are still disabled.

7162

-+	 */

7163

-+	raw_spin_unlock(&rq->lock);

7164

-+	stop_one_cpu_nowait(rq->cpu, __balance_push_cpu_stop, push_task,

7165

-+			    this_cpu_ptr(&push_work));

7166

-+	/*

7167

-+	 * At this point need_resched() is true and we'll take the loop in

7168

-+	 * schedule(). The next pick is obviously going to be the stop task

7169

-+	 * which kthread_is_per_cpu() and will push this task away.

7170

-+	 */

7171

-+	raw_spin_lock(&rq->lock);

7172

-+}

7173

-+

7174

-+static void balance_push_set(int cpu, bool on)

7175

-+{

7176

-+	struct rq *rq = cpu_rq(cpu);

7177

-+	struct rq_flags rf;

7178

-+

7179

-+	rq_lock_irqsave(rq, &rf);

7180

-+	if (on) {

7181

-+		WARN_ON_ONCE(rq->balance_callback);

7182

-+		rq->balance_callback = &balance_push_callback;

7183

-+	} else if (rq->balance_callback == &balance_push_callback) {

7184

-+		rq->balance_callback = NULL;

7185

-+	}

7186

-+	rq_unlock_irqrestore(rq, &rf);

7187

-+}

7188

-+

7189

-+/*

7190

-+ * Invoked from a CPUs hotplug control thread after the CPU has been marked

7191

-+ * inactive. All tasks which are not per CPU kernel threads are either

7192

-+ * pushed off this CPU now via balance_push() or placed on a different CPU

7193

-+ * during wakeup. Wait until the CPU is quiescent.

7194

-+ */

7195

-+static void balance_hotplug_wait(void)

7196

-+{

7197

-+	struct rq *rq = this_rq();

7198

-+

7199

-+	rcuwait_wait_event(&rq->hotplug_wait,

7200

-+			   rq->nr_running == 1 && !rq_has_pinned_tasks(rq),

7201

-+			   TASK_UNINTERRUPTIBLE);

7202

-+}

7203

-+

7204

-+#else

7205

-+

7206

-+static void balance_push(struct rq *rq)

7207

-+{

7208

-+}

7209

-+

7210

-+static void balance_push_set(int cpu, bool on)

7211

-+{

7212

-+}

7213

-+

7214

-+static inline void balance_hotplug_wait(void)

7215

-+{

7216

-+}

7217

-+#endif /* CONFIG_HOTPLUG_CPU */

7218

-+

7219

-+static void set_rq_offline(struct rq *rq)

7220

-+{

7221

-+	if (rq->online)

7222

-+		rq->online = false;

7223

-+}

7224

-+

7225

-+static void set_rq_online(struct rq *rq)

7226

-+{

7227

-+	if (!rq->online)

7228

-+		rq->online = true;

7229

-+}

7230

-+

7231

-+/*

7232

-+ * used to mark begin/end of suspend/resume:

7233

-+ */

7234

-+static int num_cpus_frozen;

7235

-+

7236

-+/*

7237

-+ * Update cpusets according to cpu_active mask.  If cpusets are

7238

-+ * disabled, cpuset_update_active_cpus() becomes a simple wrapper

7239

-+ * around partition_sched_domains().

7240

-+ *

7241

-+ * If we come here as part of a suspend/resume, don't touch cpusets because we

7242

-+ * want to restore it back to its original state upon resume anyway.

7243

-+ */

7244

-+static void cpuset_cpu_active(void)

7245

-+{

7246

-+	if (cpuhp_tasks_frozen) {

7247

-+		/*

7248

-+		 * num_cpus_frozen tracks how many CPUs are involved in suspend

7249

-+		 * resume sequence. As long as this is not the last online

7250

-+		 * operation in the resume sequence, just build a single sched

7251

-+		 * domain, ignoring cpusets.

7252

-+		 */

7253

-+		partition_sched_domains(1, NULL, NULL);

7254

-+		if (--num_cpus_frozen)

7255

-+			return;

7256

-+		/*

7257

-+		 * This is the last CPU online operation. So fall through and

7258

-+		 * restore the original sched domains by considering the

7259

-+		 * cpuset configurations.

7260

-+		 */

7261

-+		cpuset_force_rebuild();

7262

-+	}

7263

-+

7264

-+	cpuset_update_active_cpus();

7265

-+}

7266

-+

7267

-+static int cpuset_cpu_inactive(unsigned int cpu)

7268

-+{

7269

-+	if (!cpuhp_tasks_frozen) {

7270

-+		cpuset_update_active_cpus();

7271

-+	} else {

7272

-+		num_cpus_frozen++;

7273

-+		partition_sched_domains(1, NULL, NULL);

7274

-+	}

7275

-+	return 0;

7276

-+}

7277

-+

7278

-+int sched_cpu_activate(unsigned int cpu)

7279

-+{

7280

-+	struct rq *rq = cpu_rq(cpu);

7281

-+	unsigned long flags;

7282

-+

7283

-+	/*

7284

-+	 * Clear the balance_push callback and prepare to schedule

7285

-+	 * regular tasks.

7286

-+	 */

7287

-+	balance_push_set(cpu, false);

7288

-+

7289

-+#ifdef CONFIG_SCHED_SMT

7290

-+	/*

7291

-+	 * When going up, increment the number of cores with SMT present.

7292

-+	 */

7293

-+	if (cpumask_weight(cpu_smt_mask(cpu)) == 2)

7294

-+		static_branch_inc_cpuslocked(&sched_smt_present);

7295

-+#endif

7296

-+	set_cpu_active(cpu, true);

7297

-+

7298

-+	if (sched_smp_initialized)

7299

-+		cpuset_cpu_active();

7300

-+

7301

-+	/*

7302

-+	 * Put the rq online, if not already. This happens:

7303

-+	 *

7304

-+	 * 1) In the early boot process, because we build the real domains

7305

-+	 *    after all cpus have been brought up.

7306

-+	 *

7307

-+	 * 2) At runtime, if cpuset_cpu_active() fails to rebuild the

7308

-+	 *    domains.

7309

-+	 */

7310

-+	raw_spin_lock_irqsave(&rq->lock, flags);

7311

-+	set_rq_online(rq);

7312

-+	raw_spin_unlock_irqrestore(&rq->lock, flags);

7313

-+

7314

-+	return 0;

7315

-+}

7316

-+

7317

-+int sched_cpu_deactivate(unsigned int cpu)

7318

-+{

7319

-+	struct rq *rq = cpu_rq(cpu);

7320

-+	unsigned long flags;

7321

-+	int ret;

7322

-+

7323

-+	set_cpu_active(cpu, false);

7324

-+

7325

-+	/*

7326

-+	 * From this point forward, this CPU will refuse to run any task that

7327

-+	 * is not: migrate_disable() or KTHREAD_IS_PER_CPU, and will actively

7328

-+	 * push those tasks away until this gets cleared, see

7329

-+	 * sched_cpu_dying().

7330

-+	 */

7331

-+	balance_push_set(cpu, true);

7332

-+

7333

-+	/*

7334

-+	 * We've cleared cpu_active_mask, wait for all preempt-disabled and RCU

7335

-+	 * users of this state to go away such that all new such users will

7336

-+	 * observe it.

7337

-+	 *

7338

-+	 * Specifically, we rely on ttwu to no longer target this CPU, see

7339

-+	 * ttwu_queue_cond() and is_cpu_allowed().

7340

-+	 *

7341

-+	 * Do sync before park smpboot threads to take care the rcu boost case.

7342

-+	 */

7343

-+	synchronize_rcu();

7344

-+

7345

-+	raw_spin_lock_irqsave(&rq->lock, flags);

7346

-+	update_rq_clock(rq);

7347

-+	set_rq_offline(rq);

7348

-+	raw_spin_unlock_irqrestore(&rq->lock, flags);

7349

-+

7350

-+#ifdef CONFIG_SCHED_SMT

7351

-+	/*

7352

-+	 * When going down, decrement the number of cores with SMT present.

7353

-+	 */

7354

-+	if (cpumask_weight(cpu_smt_mask(cpu)) == 2) {

7355

-+		static_branch_dec_cpuslocked(&sched_smt_present);

7356

-+		if (!static_branch_likely(&sched_smt_present))

7357

-+			cpumask_clear(&sched_sg_idle_mask);

7358

-+	}

7359

-+#endif

7360

-+

7361

-+	if (!sched_smp_initialized)

7362

-+		return 0;

7363

-+

7364

-+	ret = cpuset_cpu_inactive(cpu);

7365

-+	if (ret) {

7366

-+		balance_push_set(cpu, false);

7367

-+		set_cpu_active(cpu, true);

7368

-+		return ret;

7369

-+	}

7370

-+

7371

-+	return 0;

7372

-+}

7373

-+

7374

-+static void sched_rq_cpu_starting(unsigned int cpu)

7375

-+{

7376

-+	struct rq *rq = cpu_rq(cpu);

7377

-+

7378

-+	rq->calc_load_update = calc_load_update;

7379

-+}

7380

-+

7381

-+int sched_cpu_starting(unsigned int cpu)

7382

-+{

7383

-+	sched_rq_cpu_starting(cpu);

7384

-+	sched_tick_start(cpu);

7385

-+	return 0;

7386

-+}

7387

-+

7388

-+#ifdef CONFIG_HOTPLUG_CPU

7389

-+

7390

-+/*

7391

-+ * Invoked immediately before the stopper thread is invoked to bring the

7392

-+ * CPU down completely. At this point all per CPU kthreads except the

7393

-+ * hotplug thread (current) and the stopper thread (inactive) have been

7394

-+ * either parked or have been unbound from the outgoing CPU. Ensure that

7395

-+ * any of those which might be on the way out are gone.

7396

-+ *

7397

-+ * If after this point a bound task is being woken on this CPU then the

7398

-+ * responsible hotplug callback has failed to do it's job.

7399

-+ * sched_cpu_dying() will catch it with the appropriate fireworks.

7400

-+ */

7401

-+int sched_cpu_wait_empty(unsigned int cpu)

7402

-+{

7403

-+	balance_hotplug_wait();

7404

-+	return 0;

7405

-+}

7406

-+

7407

-+/*

7408

-+ * Since this CPU is going 'away' for a while, fold any nr_active delta we

7409

-+ * might have. Called from the CPU stopper task after ensuring that the

7410

-+ * stopper is the last running task on the CPU, so nr_active count is

7411

-+ * stable. We need to take the teardown thread which is calling this into

7412

-+ * account, so we hand in adjust = 1 to the load calculation.

7413

-+ *

7414

-+ * Also see the comment "Global load-average calculations".

7415

-+ */

7416

-+static void calc_load_migrate(struct rq *rq)

7417

-+{

7418

-+	long delta = calc_load_fold_active(rq, 1);

7419

-+

7420

-+	if (delta)

7421

-+		atomic_long_add(delta, &calc_load_tasks);

7422

-+}

7423

-+

7424

-+static void dump_rq_tasks(struct rq *rq, const char *loglvl)

7425

-+{

7426

-+	struct task_struct *g, *p;

7427

-+	int cpu = cpu_of(rq);

7428

-+

7429

-+	lockdep_assert_held(&rq->lock);

7430

-+

7431

-+	printk("%sCPU%d enqueued tasks (%u total):\n", loglvl, cpu, rq->nr_running);

7432

-+	for_each_process_thread(g, p) {

7433

-+		if (task_cpu(p) != cpu)

7434

-+			continue;

7435

-+

7436

-+		if (!task_on_rq_queued(p))

7437

-+			continue;

7438

-+

7439

-+		printk("%s\tpid: %d, name: %s\n", loglvl, p->pid, p->comm);

7440

-+	}

7441

-+}

7442

-+

7443

-+int sched_cpu_dying(unsigned int cpu)

7444

-+{

7445

-+	struct rq *rq = cpu_rq(cpu);

7446

-+	unsigned long flags;

7447

-+

7448

-+	/* Handle pending wakeups and then migrate everything off */

7449

-+	sched_tick_stop(cpu);

7450

-+

7451

-+	raw_spin_lock_irqsave(&rq->lock, flags);

7452

-+	if (rq->nr_running != 1 || rq_has_pinned_tasks(rq)) {

7453

-+		WARN(true, "Dying CPU not properly vacated!");

7454

-+		dump_rq_tasks(rq, KERN_WARNING);

7455

-+	}

7456

-+	raw_spin_unlock_irqrestore(&rq->lock, flags);

7457

-+

7458

-+	calc_load_migrate(rq);

7459

-+	hrtick_clear(rq);

7460

-+	return 0;

7461

-+}

7462

-+#endif

7463

-+

7464

-+#ifdef CONFIG_SMP

7465

-+static void sched_init_topology_cpumask_early(void)

7466

-+{

7467

-+	int cpu;

7468

-+	cpumask_t *tmp;

7469

-+

7470

-+	for_each_possible_cpu(cpu) {

7471

-+		/* init topo masks */

7472

-+		tmp = per_cpu(sched_cpu_topo_masks, cpu);

7473

-+

7474

-+		cpumask_copy(tmp, cpumask_of(cpu));

7475

-+		tmp++;

7476

-+		cpumask_copy(tmp, cpu_possible_mask);

7477

-+		per_cpu(sched_cpu_llc_mask, cpu) = tmp;

7478

-+		per_cpu(sched_cpu_topo_end_mask, cpu) = ++tmp;

7479

-+		/*per_cpu(sd_llc_id, cpu) = cpu;*/

7480

-+	}

7481

-+}

7482

-+

7483

-+#define TOPOLOGY_CPUMASK(name, mask, last)\

7484

-+	if (cpumask_and(topo, topo, mask)) {					\

7485

-+		cpumask_copy(topo, mask);					\

7486

-+		printk(KERN_INFO "sched: cpu#%02d topo: 0x%08lx - "#name,	\

7487

-+		       cpu, (topo++)->bits[0]);					\

7488

-+	}									\

7489

-+	if (!last)								\

7490

-+		cpumask_complement(topo, mask)

7491

-+

7492

-+static void sched_init_topology_cpumask(void)

7493

-+{

7494

-+	int cpu;

7495

-+	cpumask_t *topo;

7496

-+

7497

-+	for_each_online_cpu(cpu) {

7498

-+		/* take chance to reset time slice for idle tasks */

7499

-+		cpu_rq(cpu)->idle->time_slice = sched_timeslice_ns;

7500

-+

7501

-+		topo = per_cpu(sched_cpu_topo_masks, cpu) + 1;

7502

-+

7503

-+		cpumask_complement(topo, cpumask_of(cpu));

7504

-+#ifdef CONFIG_SCHED_SMT

7505

-+		TOPOLOGY_CPUMASK(smt, topology_sibling_cpumask(cpu), false);

7506

-+#endif

7507

-+		per_cpu(sd_llc_id, cpu) = cpumask_first(cpu_coregroup_mask(cpu));

7508

-+		per_cpu(sched_cpu_llc_mask, cpu) = topo;

7509

-+		TOPOLOGY_CPUMASK(coregroup, cpu_coregroup_mask(cpu), false);

7510

-+

7511

-+		TOPOLOGY_CPUMASK(core, topology_core_cpumask(cpu), false);

7512

-+

7513

-+		TOPOLOGY_CPUMASK(others, cpu_online_mask, true);

7514

-+

7515

-+		per_cpu(sched_cpu_topo_end_mask, cpu) = topo;

7516

-+		printk(KERN_INFO "sched: cpu#%02d llc_id = %d, llc_mask idx = %d\n",

7517

-+		       cpu, per_cpu(sd_llc_id, cpu),

7518

-+		       (int) (per_cpu(sched_cpu_llc_mask, cpu) -

7519

-+			      per_cpu(sched_cpu_topo_masks, cpu)));

7520

-+	}

7521

-+}

7522

-+#endif

7523

-+

7524

-+void __init sched_init_smp(void)

7525

-+{

7526

-+	/* Move init over to a non-isolated CPU */

7527

-+	if (set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_FLAG_DOMAIN)) < 0)

7528

-+		BUG();

7529

-+	current->flags &= ~PF_NO_SETAFFINITY;

7530

-+

7531

-+	sched_init_topology_cpumask();

7532

-+

7533

-+	sched_smp_initialized = true;

7534

-+}

7535

-+#else

7536

-+void __init sched_init_smp(void)

7537

-+{

7538

-+	cpu_rq(0)->idle->time_slice = sched_timeslice_ns;

7539

-+}

7540

-+#endif /* CONFIG_SMP */

7541

-+

7542

-+int in_sched_functions(unsigned long addr)

7543

-+{

7544

-+	return in_lock_functions(addr) ||

7545

-+		(addr >= (unsigned long)__sched_text_start

7546

-+		&& addr < (unsigned long)__sched_text_end);

7547

-+}

7548

-+

7549

-+#ifdef CONFIG_CGROUP_SCHED

7550

-+/* task group related information */

7551

-+struct task_group {

7552

-+	struct cgroup_subsys_state css;

7553

-+

7554

-+	struct rcu_head rcu;

7555

-+	struct list_head list;

7556

-+

7557

-+	struct task_group *parent;

7558

-+	struct list_head siblings;

7559

-+	struct list_head children;

7560

-+#ifdef CONFIG_FAIR_GROUP_SCHED

7561

-+	unsigned long		shares;

7562

-+#endif

7563

-+};

7564

-+

7565

-+/*

7566

-+ * Default task group.

7567

-+ * Every task in system belongs to this group at bootup.

7568

-+ */

7569

-+struct task_group root_task_group;

7570

-+LIST_HEAD(task_groups);

7571

-+

7572

-+/* Cacheline aligned slab cache for task_group */

7573

-+static struct kmem_cache *task_group_cache __read_mostly;

7574

-+#endif /* CONFIG_CGROUP_SCHED */

7575

-+

7576

-+void __init sched_init(void)

7577

-+{

7578

-+	int i;

7579

-+	struct rq *rq;

7580

-+

7581

-+	printk(KERN_INFO ALT_SCHED_VERSION_MSG);

7582

-+

7583

-+	wait_bit_init();

7584

-+

7585

-+#ifdef CONFIG_SMP

7586

-+	for (i = 0; i < SCHED_BITS; i++)

7587

-+		cpumask_copy(sched_rq_watermark + i, cpu_present_mask);

7588

-+#endif

7589

-+

7590

-+#ifdef CONFIG_CGROUP_SCHED

7591

-+	task_group_cache = KMEM_CACHE(task_group, 0);

7592

-+

7593

-+	list_add(&root_task_group.list, &task_groups);

7594

-+	INIT_LIST_HEAD(&root_task_group.children);

7595

-+	INIT_LIST_HEAD(&root_task_group.siblings);

7596

-+#endif /* CONFIG_CGROUP_SCHED */

7597

-+	for_each_possible_cpu(i) {

7598

-+		rq = cpu_rq(i);

7599

-+

7600

-+		sched_queue_init(&rq->queue);

7601

-+		rq->watermark = IDLE_TASK_SCHED_PRIO;

7602

-+		rq->skip = NULL;

7603

-+

7604

-+		raw_spin_lock_init(&rq->lock);

7605

-+		rq->nr_running = rq->nr_uninterruptible = 0;

7606

-+		rq->calc_load_active = 0;

7607

-+		rq->calc_load_update = jiffies + LOAD_FREQ;

7608

-+#ifdef CONFIG_SMP

7609

-+		rq->online = false;

7610

-+		rq->cpu = i;

7611

-+

7612

-+#ifdef CONFIG_SCHED_SMT

7613

-+		rq->active_balance = 0;

7614

-+#endif

7615

-+

7616

-+#ifdef CONFIG_NO_HZ_COMMON

7617

-+		INIT_CSD(&rq->nohz_csd, nohz_csd_func, rq);

7618

-+#endif

7619

-+		rq->balance_callback = &balance_push_callback;

7620

-+#ifdef CONFIG_HOTPLUG_CPU

7621

-+		rcuwait_init(&rq->hotplug_wait);

7622

-+#endif

7623

-+#endif /* CONFIG_SMP */

7624

-+		rq->nr_switches = 0;

7625

-+

7626

-+		hrtick_rq_init(rq);

7627

-+		atomic_set(&rq->nr_iowait, 0);

7628

-+	}

7629

-+#ifdef CONFIG_SMP

7630

-+	/* Set rq->online for cpu 0 */

7631

-+	cpu_rq(0)->online = true;

7632

-+#endif

7633

-+	/*

7634

-+	 * The boot idle thread does lazy MMU switching as well:

7635

-+	 */

7636

-+	mmgrab(&init_mm);

7637

-+	enter_lazy_tlb(&init_mm, current);

7638

-+

7639

-+	/*

7640

-+	 * Make us the idle thread. Technically, schedule() should not be

7641

-+	 * called from this thread, however somewhere below it might be,

7642

-+	 * but because we are the idle thread, we just pick up running again

7643

-+	 * when this runqueue becomes "idle".

7644

-+	 */

7645

-+	init_idle(current, smp_processor_id());

7646

-+

7647

-+	calc_load_update = jiffies + LOAD_FREQ;

7648

-+

7649

-+#ifdef CONFIG_SMP

7650

-+	idle_thread_set_boot_cpu();

7651

-+	balance_push_set(smp_processor_id(), false);

7652

-+

7653

-+	sched_init_topology_cpumask_early();

7654

-+#endif /* SMP */

7655

-+

7656

-+	psi_init();

7657

-+}

7658

-+

7659

-+#ifdef CONFIG_DEBUG_ATOMIC_SLEEP

7660

-+static inline int preempt_count_equals(int preempt_offset)

7661

-+{

7662

-+	int nested = preempt_count() + rcu_preempt_depth();

7663

-+

7664

-+	return (nested == preempt_offset);

7665

-+}

7666

-+

7667

-+void __might_sleep(const char *file, int line, int preempt_offset)

7668

-+{

7669

-+	unsigned int state = get_current_state();

7670

-+	/*

7671

-+	 * Blocking primitives will set (and therefore destroy) current->state,

7672

-+	 * since we will exit with TASK_RUNNING make sure we enter with it,

7673

-+	 * otherwise we will destroy state.

7674

-+	 */

7675

-+	WARN_ONCE(state != TASK_RUNNING && current->task_state_change,

7676

-+			"do not call blocking ops when !TASK_RUNNING; "

7677

-+			"state=%x set at [<%p>] %pS\n", state,

7678

-+			(void *)current->task_state_change,

7679

-+			(void *)current->task_state_change);

7680

-+

7681

-+	___might_sleep(file, line, preempt_offset);

7682

-+}

7683

-+EXPORT_SYMBOL(__might_sleep);

7684

-+

7685

-+void ___might_sleep(const char *file, int line, int preempt_offset)

7686

-+{

7687

-+	/* Ratelimiting timestamp: */

7688

-+	static unsigned long prev_jiffy;

7689

-+

7690

-+	unsigned long preempt_disable_ip;

7691

-+

7692

-+	/* WARN_ON_ONCE() by default, no rate limit required: */

7693

-+	rcu_sleep_check();

7694

-+

7695

-+	if ((preempt_count_equals(preempt_offset) && !irqs_disabled() &&

7696

-+	     !is_idle_task(current) && !current->non_block_count) ||

7697

-+	    system_state == SYSTEM_BOOTING || system_state > SYSTEM_RUNNING ||

7698

-+	    oops_in_progress)

7699

-+		return;

7700

-+	if (time_before(jiffies, prev_jiffy + HZ) && prev_jiffy)

7701

-+		return;

7702

-+	prev_jiffy = jiffies;

7703

-+

7704

-+	/* Save this before calling printk(), since that will clobber it: */

7705

-+	preempt_disable_ip = get_preempt_disable_ip(current);

7706

-+

7707

-+	printk(KERN_ERR

7708

-+		"BUG: sleeping function called from invalid context at %s:%d\n",

7709

-+			file, line);

7710

-+	printk(KERN_ERR

7711

-+		"in_atomic(): %d, irqs_disabled(): %d, non_block: %d, pid: %d, name: %s\n",

7712

-+			in_atomic(), irqs_disabled(), current->non_block_count,

7713

-+			current->pid, current->comm);

7714

-+

7715

-+	if (task_stack_end_corrupted(current))

7716

-+		printk(KERN_EMERG "Thread overran stack, or stack corrupted\n");

7717

-+

7718

-+	debug_show_held_locks(current);

7719

-+	if (irqs_disabled())

7720

-+		print_irqtrace_events(current);

7721

-+#ifdef CONFIG_DEBUG_PREEMPT

7722

-+	if (!preempt_count_equals(preempt_offset)) {

7723

-+		pr_err("Preemption disabled at:");

7724

-+		print_ip_sym(KERN_ERR, preempt_disable_ip);

7725

-+	}

7726

-+#endif

7727

-+	dump_stack();

7728

-+	add_taint(TAINT_WARN, LOCKDEP_STILL_OK);

7729

-+}

7730

-+EXPORT_SYMBOL(___might_sleep);

7731

-+

7732

-+void __cant_sleep(const char *file, int line, int preempt_offset)

7733

-+{

7734

-+	static unsigned long prev_jiffy;

7735

-+

7736

-+	if (irqs_disabled())

7737

-+		return;

7738

-+

7739

-+	if (!IS_ENABLED(CONFIG_PREEMPT_COUNT))

7740

-+		return;

7741

-+

7742

-+	if (preempt_count() > preempt_offset)

7743

-+		return;

7744

-+

7745

-+	if (time_before(jiffies, prev_jiffy + HZ) && prev_jiffy)

7746

-+		return;

7747

-+	prev_jiffy = jiffies;

7748

-+

7749

-+	printk(KERN_ERR "BUG: assuming atomic context at %s:%d\n", file, line);

7750

-+	printk(KERN_ERR "in_atomic(): %d, irqs_disabled(): %d, pid: %d, name: %s\n",

7751

-+			in_atomic(), irqs_disabled(),

7752

-+			current->pid, current->comm);

7753

-+

7754

-+	debug_show_held_locks(current);

7755

-+	dump_stack();

7756

-+	add_taint(TAINT_WARN, LOCKDEP_STILL_OK);

7757

-+}

7758

-+EXPORT_SYMBOL_GPL(__cant_sleep);

7759

-+

7760

-+#ifdef CONFIG_SMP

7761

-+void __cant_migrate(const char *file, int line)

7762

-+{

7763

-+	static unsigned long prev_jiffy;

7764

-+

7765

-+	if (irqs_disabled())

7766

-+		return;

7767

-+

7768

-+	if (is_migration_disabled(current))

7769

-+		return;

7770

-+

7771

-+	if (!IS_ENABLED(CONFIG_PREEMPT_COUNT))

7772

-+		return;

7773

-+

7774

-+	if (preempt_count() > 0)

7775

-+		return;

7776

-+

7777

-+	if (current->migration_flags & MDF_FORCE_ENABLED)

7778

-+		return;

7779

-+

7780

-+	if (time_before(jiffies, prev_jiffy + HZ) && prev_jiffy)

7781

-+		return;

7782

-+	prev_jiffy = jiffies;

7783

-+

7784

-+	pr_err("BUG: assuming non migratable context at %s:%d\n", file, line);

7785

-+	pr_err("in_atomic(): %d, irqs_disabled(): %d, migration_disabled() %u pid: %d, name: %s\n",

7786

-+	       in_atomic(), irqs_disabled(), is_migration_disabled(current),

7787

-+	       current->pid, current->comm);

7788

-+

7789

-+	debug_show_held_locks(current);

7790

-+	dump_stack();

7791

-+	add_taint(TAINT_WARN, LOCKDEP_STILL_OK);

7792

-+}

7793

-+EXPORT_SYMBOL_GPL(__cant_migrate);

7794

-+#endif

7795

-+#endif

7796

-+

7797

-+#ifdef CONFIG_MAGIC_SYSRQ

7798

-+void normalize_rt_tasks(void)

7799

-+{

7800

-+	struct task_struct *g, *p;

7801

-+	struct sched_attr attr = {

7802

-+		.sched_policy = SCHED_NORMAL,

7803

-+	};

7804

-+

7805

-+	read_lock(&tasklist_lock);

7806

-+	for_each_process_thread(g, p) {

7807

-+		/*

7808

-+		 * Only normalize user tasks:

7809

-+		 */

7810

-+		if (p->flags & PF_KTHREAD)

7811

-+			continue;

7812

-+

7813

-+		if (!rt_task(p)) {

7814

-+			/*

7815

-+			 * Renice negative nice level userspace

7816

-+			 * tasks back to 0:

7817

-+			 */

7818

-+			if (task_nice(p) < 0)

7819

-+				set_user_nice(p, 0);

7820

-+			continue;

7821

-+		}

7822

-+

7823

-+		__sched_setscheduler(p, &attr, false, false);

7824

-+	}

7825

-+	read_unlock(&tasklist_lock);

7826

-+}

7827

-+#endif /* CONFIG_MAGIC_SYSRQ */

7828

-+

7829

-+#if defined(CONFIG_IA64) || defined(CONFIG_KGDB_KDB)

7830

-+/*

7831

-+ * These functions are only useful for the IA64 MCA handling, or kdb.

7832

-+ *

7833

-+ * They can only be called when the whole system has been

7834

-+ * stopped - every CPU needs to be quiescent, and no scheduling

7835

-+ * activity can take place. Using them for anything else would

7836

-+ * be a serious bug, and as a result, they aren't even visible

7837

-+ * under any other configuration.

7838

-+ */

7839

-+

7840

-+/**

7841

-+ * curr_task - return the current task for a given CPU.

7842

-+ * @cpu: the processor in question.

7843

-+ *

7844

-+ * ONLY VALID WHEN THE WHOLE SYSTEM IS STOPPED!

7845

-+ *

7846

-+ * Return: The current task for @cpu.

7847

-+ */

7848

-+struct task_struct *curr_task(int cpu)

7849

-+{

7850

-+	return cpu_curr(cpu);

7851

-+}

7852

-+

7853

-+#endif /* defined(CONFIG_IA64) || defined(CONFIG_KGDB_KDB) */

7854

-+

7855

-+#ifdef CONFIG_IA64

7856

-+/**

7857

-+ * ia64_set_curr_task - set the current task for a given CPU.

7858

-+ * @cpu: the processor in question.

7859

-+ * @p: the task pointer to set.

7860

-+ *

7861

-+ * Description: This function must only be used when non-maskable interrupts

7862

-+ * are serviced on a separate stack.  It allows the architecture to switch the

7863

-+ * notion of the current task on a CPU in a non-blocking manner.  This function

7864

-+ * must be called with all CPU's synchronised, and interrupts disabled, the

7865

-+ * and caller must save the original value of the current task (see

7866

-+ * curr_task() above) and restore that value before reenabling interrupts and

7867

-+ * re-starting the system.

7868

-+ *

7869

-+ * ONLY VALID WHEN THE WHOLE SYSTEM IS STOPPED!

7870

-+ */

7871

-+void ia64_set_curr_task(int cpu, struct task_struct *p)

7872

-+{

7873

-+	cpu_curr(cpu) = p;

7874

-+}

7875

-+

7876

-+#endif

7877

-+

7878

-+#ifdef CONFIG_CGROUP_SCHED

7879

-+static void sched_free_group(struct task_group *tg)

7880

-+{

7881

-+	kmem_cache_free(task_group_cache, tg);

7882

-+}

7883

-+

7884

-+/* allocate runqueue etc for a new task group */

7885

-+struct task_group *sched_create_group(struct task_group *parent)

7886

-+{

7887

-+	struct task_group *tg;

7888

-+

7889

-+	tg = kmem_cache_alloc(task_group_cache, GFP_KERNEL | __GFP_ZERO);

7890

-+	if (!tg)

7891

-+		return ERR_PTR(-ENOMEM);

7892

-+

7893

-+	return tg;

7894

-+}

7895

-+

7896

-+void sched_online_group(struct task_group *tg, struct task_group *parent)

7897

-+{

7898

-+}

7899

-+

7900

-+/* rcu callback to free various structures associated with a task group */

7901

-+static void sched_free_group_rcu(struct rcu_head *rhp)

7902

-+{

7903

-+	/* Now it should be safe to free those cfs_rqs */

7904

-+	sched_free_group(container_of(rhp, struct task_group, rcu));

7905

-+}

7906

-+

7907

-+void sched_destroy_group(struct task_group *tg)

7908

-+{

7909

-+	/* Wait for possible concurrent references to cfs_rqs complete */

7910

-+	call_rcu(&tg->rcu, sched_free_group_rcu);

7911

-+}

7912

-+

7913

-+void sched_offline_group(struct task_group *tg)

7914

-+{

7915

-+}

7916

-+

7917

-+static inline struct task_group *css_tg(struct cgroup_subsys_state *css)

7918

-+{

7919

-+	return css ? container_of(css, struct task_group, css) : NULL;

7920

-+}

7921

-+

7922

-+static struct cgroup_subsys_state *

7923

-+cpu_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)

7924

-+{

7925

-+	struct task_group *parent = css_tg(parent_css);

7926

-+	struct task_group *tg;

7927

-+

7928

-+	if (!parent) {

7929

-+		/* This is early initialization for the top cgroup */

7930

-+		return &root_task_group.css;

7931

-+	}

7932

-+

7933

-+	tg = sched_create_group(parent);

7934

-+	if (IS_ERR(tg))

7935

-+		return ERR_PTR(-ENOMEM);

7936

-+	return &tg->css;

7937

-+}

7938

-+

7939

-+/* Expose task group only after completing cgroup initialization */

7940

-+static int cpu_cgroup_css_online(struct cgroup_subsys_state *css)

7941

-+{

7942

-+	struct task_group *tg = css_tg(css);

7943

-+	struct task_group *parent = css_tg(css->parent);

7944

-+

7945

-+	if (parent)

7946

-+		sched_online_group(tg, parent);

7947

-+	return 0;

7948

-+}

7949

-+

7950

-+static void cpu_cgroup_css_released(struct cgroup_subsys_state *css)

7951

-+{

7952

-+	struct task_group *tg = css_tg(css);

7953

-+

7954

-+	sched_offline_group(tg);

7955

-+}

7956

-+

7957

-+static void cpu_cgroup_css_free(struct cgroup_subsys_state *css)

7958

-+{

7959

-+	struct task_group *tg = css_tg(css);

7960

-+

7961

-+	/*

7962

-+	 * Relies on the RCU grace period between css_released() and this.

7963

-+	 */

7964

-+	sched_free_group(tg);

7965

-+}

7966

-+

7967

-+static void cpu_cgroup_fork(struct task_struct *task)

7968

-+{

7969

-+}

7970

-+

7971

-+static int cpu_cgroup_can_attach(struct cgroup_taskset *tset)

7972

-+{

7973

-+	return 0;

7974

-+}

7975

-+

7976

-+static void cpu_cgroup_attach(struct cgroup_taskset *tset)

7977

-+{

7978

-+}

7979

-+

7980

-+#ifdef CONFIG_FAIR_GROUP_SCHED

7981

-+static DEFINE_MUTEX(shares_mutex);

7982

-+

7983

-+int sched_group_set_shares(struct task_group *tg, unsigned long shares)

7984

-+{

7985

-+	/*

7986

-+	 * We can't change the weight of the root cgroup.

7987

-+	 */

7988

-+	if (&root_task_group == tg)

7989

-+		return -EINVAL;

7990

-+

7991

-+	shares = clamp(shares, scale_load(MIN_SHARES), scale_load(MAX_SHARES));

7992

-+

7993

-+	mutex_lock(&shares_mutex);

7994

-+	if (tg->shares == shares)

7995

-+		goto done;

7996

-+

7997

-+	tg->shares = shares;

7998

-+done:

7999

-+	mutex_unlock(&shares_mutex);

8000

-+	return 0;

8001

-+}

8002

-+

8003

-+static int cpu_shares_write_u64(struct cgroup_subsys_state *css,

8004

-+				struct cftype *cftype, u64 shareval)

8005

-+{

8006

-+	if (shareval > scale_load_down(ULONG_MAX))

8007

-+		shareval = MAX_SHARES;

8008

-+	return sched_group_set_shares(css_tg(css), scale_load(shareval));

8009

-+}

8010

-+

8011

-+static u64 cpu_shares_read_u64(struct cgroup_subsys_state *css,

8012

-+			       struct cftype *cft)

8013

-+{

8014

-+	struct task_group *tg = css_tg(css);

8015

-+

8016

-+	return (u64) scale_load_down(tg->shares);

8017

-+}

8018

-+#endif

8019

-+

8020

-+static struct cftype cpu_legacy_files[] = {

8021

-+#ifdef CONFIG_FAIR_GROUP_SCHED

8022

-+	{

8023

-+		.name = "shares",

8024

-+		.read_u64 = cpu_shares_read_u64,

8025

-+		.write_u64 = cpu_shares_write_u64,

8026

-+	},

8027

-+#endif

8028

-+	{ }	/* Terminate */

8029

-+};

8030

-+

8031

-+

8032

-+static struct cftype cpu_files[] = {

8033

-+	{ }	/* terminate */

8034

-+};

8035

-+

8036

-+static int cpu_extra_stat_show(struct seq_file *sf,

8037

-+			       struct cgroup_subsys_state *css)

8038

-+{

8039

-+	return 0;

8040

-+}

8041

-+

8042

-+struct cgroup_subsys cpu_cgrp_subsys = {

8043

-+	.css_alloc	= cpu_cgroup_css_alloc,

8044

-+	.css_online	= cpu_cgroup_css_online,

8045

-+	.css_released	= cpu_cgroup_css_released,

8046

-+	.css_free	= cpu_cgroup_css_free,

8047

-+	.css_extra_stat_show = cpu_extra_stat_show,

8048

-+	.fork		= cpu_cgroup_fork,

8049

-+	.can_attach	= cpu_cgroup_can_attach,

8050

-+	.attach		= cpu_cgroup_attach,

8051

-+	.legacy_cftypes	= cpu_files,

8052

-+	.legacy_cftypes	= cpu_legacy_files,

8053

-+	.dfl_cftypes	= cpu_files,

8054

-+	.early_init	= true,

8055

-+	.threaded	= true,

8056

-+};

8057

-+#endif	/* CONFIG_CGROUP_SCHED */

8058

-+

8059

-+#undef CREATE_TRACE_POINTS

8060

-diff --git a/kernel/sched/alt_debug.c b/kernel/sched/alt_debug.c

8061

-new file mode 100644

8062

-index 000000000000..1212a031700e

8063

---- /dev/null

8064

-+++ b/kernel/sched/alt_debug.c

8065

-@@ -0,0 +1,31 @@

8066

-+/*

8067

-+ * kernel/sched/alt_debug.c

8068

-+ *

8069

-+ * Print the alt scheduler debugging details

8070

-+ *

8071

-+ * Author: Alfred Chen

8072

-+ * Date  : 2020

8073

-+ */

8074

-+#include "sched.h"

8075

-+

8076

-+/*

8077

-+ * This allows printing both to /proc/sched_debug and

8078

-+ * to the console

8079

-+ */

8080

-+#define SEQ_printf(m, x...)			\

8081

-+ do {						\

8082

-+	if (m)					\

8083

-+		seq_printf(m, x);		\

8084

-+	else					\

8085

-+		pr_cont(x);			\

8086

-+ } while (0)

8087

-+

8088

-+void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns,

8089

-+			  struct seq_file *m)

8090

-+{

8091

-+	SEQ_printf(m, "%s (%d, #threads: %d)\n", p->comm, task_pid_nr_ns(p, ns),

8092

-+						get_nr_threads(p));

8093

-+}

8094

-+

8095

-+void proc_sched_set_task(struct task_struct *p)

8096

-+{}

8097

-diff --git a/kernel/sched/alt_sched.h b/kernel/sched/alt_sched.h

8098

-new file mode 100644

8099

-index 000000000000..289058a09bd5

8100

---- /dev/null

8101

-+++ b/kernel/sched/alt_sched.h

8102

-@@ -0,0 +1,666 @@

8103

-+#ifndef ALT_SCHED_H

8104

-+#define ALT_SCHED_H

8105

-+

8106

-+#include <linux/sched.h>

8107

-+

8108

-+#include <linux/sched/clock.h>

8109

-+#include <linux/sched/cpufreq.h>

8110

-+#include <linux/sched/cputime.h>

8111

-+#include <linux/sched/debug.h>

8112

-+#include <linux/sched/init.h>

8113

-+#include <linux/sched/isolation.h>

8114

-+#include <linux/sched/loadavg.h>

8115

-+#include <linux/sched/mm.h>

8116

-+#include <linux/sched/nohz.h>

8117

-+#include <linux/sched/signal.h>

8118

-+#include <linux/sched/stat.h>

8119

-+#include <linux/sched/sysctl.h>

8120

-+#include <linux/sched/task.h>

8121

-+#include <linux/sched/topology.h>

8122

-+#include <linux/sched/wake_q.h>

8123

-+

8124

-+#include <uapi/linux/sched/types.h>

8125

-+

8126

-+#include <linux/cgroup.h>

8127

-+#include <linux/cpufreq.h>

8128

-+#include <linux/cpuidle.h>

8129

-+#include <linux/cpuset.h>

8130

-+#include <linux/ctype.h>

8131

-+#include <linux/debugfs.h>

8132

-+#include <linux/kthread.h>

8133

-+#include <linux/livepatch.h>

8134

-+#include <linux/membarrier.h>

8135

-+#include <linux/proc_fs.h>

8136

-+#include <linux/psi.h>

8137

-+#include <linux/slab.h>

8138

-+#include <linux/stop_machine.h>

8139

-+#include <linux/suspend.h>

8140

-+#include <linux/swait.h>

8141

-+#include <linux/syscalls.h>

8142

-+#include <linux/tsacct_kern.h>

8143

-+

8144

-+#include <asm/tlb.h>

8145

-+

8146

-+#ifdef CONFIG_PARAVIRT

8147

-+# include <asm/paravirt.h>

8148

-+#endif

8149

-+

8150

-+#include "cpupri.h"

8151

-+

8152

-+#include <trace/events/sched.h>

8153

-+

8154

-+#ifdef CONFIG_SCHED_BMQ

8155

-+/* bits:

8156

-+ * RT(0-99), (Low prio adj range, nice width, high prio adj range) / 2, cpu idle task */

8157

-+#define SCHED_BITS	(MAX_RT_PRIO + NICE_WIDTH / 2 + MAX_PRIORITY_ADJ + 1)

8158

-+#endif

8159

-+

8160

-+#ifdef CONFIG_SCHED_PDS

8161

-+/* bits: RT(0-99), reserved(100-127), NORMAL_PRIO_NUM, cpu idle task */

8162

-+#define SCHED_BITS	(MIN_NORMAL_PRIO + NORMAL_PRIO_NUM + 1)

8163

-+#endif /* CONFIG_SCHED_PDS */

8164

-+

8165

-+#define IDLE_TASK_SCHED_PRIO	(SCHED_BITS - 1)

8166

-+

8167

-+#ifdef CONFIG_SCHED_DEBUG

8168

-+# define SCHED_WARN_ON(x)	WARN_ONCE(x, #x)

8169

-+extern void resched_latency_warn(int cpu, u64 latency);

8170

-+#else

8171

-+# define SCHED_WARN_ON(x)	({ (void)(x), 0; })

8172

-+static inline void resched_latency_warn(int cpu, u64 latency) {}

8173

-+#endif

8174

-+

8175

-+/*

8176

-+ * Increase resolution of nice-level calculations for 64-bit architectures.

8177

-+ * The extra resolution improves shares distribution and load balancing of

8178

-+ * low-weight task groups (eg. nice +19 on an autogroup), deeper taskgroup

8179

-+ * hierarchies, especially on larger systems. This is not a user-visible change

8180

-+ * and does not change the user-interface for setting shares/weights.

8181

-+ *

8182

-+ * We increase resolution only if we have enough bits to allow this increased

8183

-+ * resolution (i.e. 64-bit). The costs for increasing resolution when 32-bit

8184

-+ * are pretty high and the returns do not justify the increased costs.

8185

-+ *

8186

-+ * Really only required when CONFIG_FAIR_GROUP_SCHED=y is also set, but to

8187

-+ * increase coverage and consistency always enable it on 64-bit platforms.

8188

-+ */

8189

-+#ifdef CONFIG_64BIT

8190

-+# define NICE_0_LOAD_SHIFT	(SCHED_FIXEDPOINT_SHIFT + SCHED_FIXEDPOINT_SHIFT)

8191

-+# define scale_load(w)		((w) << SCHED_FIXEDPOINT_SHIFT)

8192

-+# define scale_load_down(w) \

8193

-+({ \

8194

-+	unsigned long __w = (w); \

8195

-+	if (__w) \

8196

-+		__w = max(2UL, __w >> SCHED_FIXEDPOINT_SHIFT); \

8197

-+	__w; \

8198

-+})

8199

-+#else

8200

-+# define NICE_0_LOAD_SHIFT	(SCHED_FIXEDPOINT_SHIFT)

8201

-+# define scale_load(w)		(w)

8202

-+# define scale_load_down(w)	(w)

8203

-+#endif

8204

-+

8205

-+#ifdef CONFIG_FAIR_GROUP_SCHED

8206

-+#define ROOT_TASK_GROUP_LOAD	NICE_0_LOAD

8207

-+

8208

-+/*

8209

-+ * A weight of 0 or 1 can cause arithmetics problems.

8210

-+ * A weight of a cfs_rq is the sum of weights of which entities

8211

-+ * are queued on this cfs_rq, so a weight of a entity should not be

8212

-+ * too large, so as the shares value of a task group.

8213

-+ * (The default weight is 1024 - so there's no practical

8214

-+ *  limitation from this.)

8215

-+ */

8216

-+#define MIN_SHARES		(1UL <<  1)

8217

-+#define MAX_SHARES		(1UL << 18)

8218

-+#endif

8219

-+

8220

-+/* task_struct::on_rq states: */

8221

-+#define TASK_ON_RQ_QUEUED	1

8222

-+#define TASK_ON_RQ_MIGRATING	2

8223

-+

8224

-+static inline int task_on_rq_queued(struct task_struct *p)

8225

-+{

8226

-+	return p->on_rq == TASK_ON_RQ_QUEUED;

8227

-+}

8228

-+

8229

-+static inline int task_on_rq_migrating(struct task_struct *p)

8230

-+{

8231

-+	return READ_ONCE(p->on_rq) == TASK_ON_RQ_MIGRATING;

8232

-+}

8233

-+

8234

-+/*

8235

-+ * wake flags

8236

-+ */

8237

-+#define WF_SYNC		0x01		/* waker goes to sleep after wakeup */

8238

-+#define WF_FORK		0x02		/* child wakeup after fork */

8239

-+#define WF_MIGRATED	0x04		/* internal use, task got migrated */

8240

-+#define WF_ON_CPU	0x08		/* Wakee is on_rq */

8241

-+

8242

-+#define SCHED_QUEUE_BITS	(SCHED_BITS - 1)

8243

-+

8244

-+struct sched_queue {

8245

-+	DECLARE_BITMAP(bitmap, SCHED_QUEUE_BITS);

8246

-+	struct list_head heads[SCHED_BITS];

8247

-+};

8248

-+

8249

-+/*

8250

-+ * This is the main, per-CPU runqueue data structure.

8251

-+ * This data should only be modified by the local cpu.

8252

-+ */

8253

-+struct rq {

8254

-+	/* runqueue lock: */

8255

-+	raw_spinlock_t lock;

8256

-+

8257

-+	struct task_struct __rcu *curr;

8258

-+	struct task_struct *idle, *stop, *skip;

8259

-+	struct mm_struct *prev_mm;

8260

-+

8261

-+	struct sched_queue	queue;

8262

-+#ifdef CONFIG_SCHED_PDS

8263

-+	u64			time_edge;

8264

-+#endif

8265

-+	unsigned long watermark;

8266

-+

8267

-+	/* switch count */

8268

-+	u64 nr_switches;

8269

-+

8270

-+	atomic_t nr_iowait;

8271

-+

8272

-+#ifdef CONFIG_SCHED_DEBUG

8273

-+	u64 last_seen_need_resched_ns;

8274

-+	int ticks_without_resched;

8275

-+#endif

8276

-+

8277

-+#ifdef CONFIG_MEMBARRIER

8278

-+	int membarrier_state;

8279

-+#endif

8280

-+

8281

-+#ifdef CONFIG_SMP

8282

-+	int cpu;		/* cpu of this runqueue */

8283

-+	bool online;

8284

-+

8285

-+	unsigned int		ttwu_pending;

8286

-+	unsigned char		nohz_idle_balance;

8287

-+	unsigned char		idle_balance;

8288

-+

8289

-+#ifdef CONFIG_HAVE_SCHED_AVG_IRQ

8290

-+	struct sched_avg	avg_irq;

8291

-+#endif

8292

-+

8293

-+#ifdef CONFIG_SCHED_SMT

8294

-+	int active_balance;

8295

-+	struct cpu_stop_work	active_balance_work;

8296

-+#endif

8297

-+	struct callback_head	*balance_callback;

8298

-+#ifdef CONFIG_HOTPLUG_CPU

8299

-+	struct rcuwait		hotplug_wait;

8300

-+#endif

8301

-+	unsigned int		nr_pinned;

8302

-+

8303

-+#endif /* CONFIG_SMP */

8304

-+#ifdef CONFIG_IRQ_TIME_ACCOUNTING

8305

-+	u64 prev_irq_time;

8306

-+#endif /* CONFIG_IRQ_TIME_ACCOUNTING */

8307

-+#ifdef CONFIG_PARAVIRT

8308

-+	u64 prev_steal_time;

8309

-+#endif /* CONFIG_PARAVIRT */

8310

-+#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING

8311

-+	u64 prev_steal_time_rq;

8312

-+#endif /* CONFIG_PARAVIRT_TIME_ACCOUNTING */

8313

-+

8314

-+	/* For genenal cpu load util */

8315

-+	s32 load_history;

8316

-+	u64 load_block;

8317

-+	u64 load_stamp;

8318

-+

8319

-+	/* calc_load related fields */

8320

-+	unsigned long calc_load_update;

8321

-+	long calc_load_active;

8322

-+

8323

-+	u64 clock, last_tick;

8324

-+	u64 last_ts_switch;

8325

-+	u64 clock_task;

8326

-+

8327

-+	unsigned int  nr_running;

8328

-+	unsigned long nr_uninterruptible;

8329

-+

8330

-+#ifdef CONFIG_SCHED_HRTICK

8331

-+#ifdef CONFIG_SMP

8332

-+	call_single_data_t hrtick_csd;

8333

-+#endif

8334

-+	struct hrtimer		hrtick_timer;

8335

-+	ktime_t			hrtick_time;

8336

-+#endif

8337

-+

8338

-+#ifdef CONFIG_SCHEDSTATS

8339

-+

8340

-+	/* latency stats */

8341

-+	struct sched_info rq_sched_info;

8342

-+	unsigned long long rq_cpu_time;

8343

-+	/* could above be rq->cfs_rq.exec_clock + rq->rt_rq.rt_runtime ? */

8344

-+

8345

-+	/* sys_sched_yield() stats */

8346

-+	unsigned int yld_count;

8347

-+

8348

-+	/* schedule() stats */

8349

-+	unsigned int sched_switch;

8350

-+	unsigned int sched_count;

8351

-+	unsigned int sched_goidle;

8352

-+

8353

-+	/* try_to_wake_up() stats */

8354

-+	unsigned int ttwu_count;

8355

-+	unsigned int ttwu_local;

8356

-+#endif /* CONFIG_SCHEDSTATS */

8357

-+

8358

-+#ifdef CONFIG_CPU_IDLE

8359

-+	/* Must be inspected within a rcu lock section */

8360

-+	struct cpuidle_state *idle_state;

8361

-+#endif

8362

-+

8363

-+#ifdef CONFIG_NO_HZ_COMMON

8364

-+#ifdef CONFIG_SMP

8365

-+	call_single_data_t	nohz_csd;

8366

-+#endif

8367

-+	atomic_t		nohz_flags;

8368

-+#endif /* CONFIG_NO_HZ_COMMON */

8369

-+};

8370

-+

8371

-+extern unsigned long rq_load_util(struct rq *rq, unsigned long max);

8372

-+

8373

-+extern unsigned long calc_load_update;

8374

-+extern atomic_long_t calc_load_tasks;

8375

-+

8376

-+extern void calc_global_load_tick(struct rq *this_rq);

8377

-+extern long calc_load_fold_active(struct rq *this_rq, long adjust);

8378

-+

8379

-+DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);

8380

-+#define cpu_rq(cpu)		(&per_cpu(runqueues, (cpu)))

8381

-+#define this_rq()		this_cpu_ptr(&runqueues)

8382

-+#define task_rq(p)		cpu_rq(task_cpu(p))

8383

-+#define cpu_curr(cpu)		(cpu_rq(cpu)->curr)

8384

-+#define raw_rq()		raw_cpu_ptr(&runqueues)

8385

-+

8386

-+#ifdef CONFIG_SMP

8387

-+#if defined(CONFIG_SCHED_DEBUG) && defined(CONFIG_SYSCTL)

8388

-+void register_sched_domain_sysctl(void);

8389

-+void unregister_sched_domain_sysctl(void);

8390

-+#else

8391

-+static inline void register_sched_domain_sysctl(void)

8392

-+{

8393

-+}

8394

-+static inline void unregister_sched_domain_sysctl(void)

8395

-+{

8396

-+}

8397

-+#endif

8398

-+

8399

-+extern bool sched_smp_initialized;

8400

-+

8401

-+enum {

8402

-+	ITSELF_LEVEL_SPACE_HOLDER,

8403

-+#ifdef CONFIG_SCHED_SMT

8404

-+	SMT_LEVEL_SPACE_HOLDER,

8405

-+#endif

8406

-+	COREGROUP_LEVEL_SPACE_HOLDER,

8407

-+	CORE_LEVEL_SPACE_HOLDER,

8408

-+	OTHER_LEVEL_SPACE_HOLDER,

8409

-+	NR_CPU_AFFINITY_LEVELS

8410

-+};

8411

-+

8412

-+DECLARE_PER_CPU(cpumask_t [NR_CPU_AFFINITY_LEVELS], sched_cpu_topo_masks);

8413

-+DECLARE_PER_CPU(cpumask_t *, sched_cpu_llc_mask);

8414

-+

8415

-+static inline int

8416

-+__best_mask_cpu(const cpumask_t *cpumask, const cpumask_t *mask)

8417

-+{

8418

-+	int cpu;

8419

-+

8420

-+	while ((cpu = cpumask_any_and(cpumask, mask)) >= nr_cpu_ids)

8421

-+		mask++;

8422

-+

8423

-+	return cpu;

8424

-+}

8425

-+

8426

-+static inline int best_mask_cpu(int cpu, const cpumask_t *mask)

8427

-+{

8428

-+	return __best_mask_cpu(mask, per_cpu(sched_cpu_topo_masks, cpu));

8429

-+}

8430

-+

8431

-+extern void flush_smp_call_function_from_idle(void);

8432

-+

8433

-+#else  /* !CONFIG_SMP */

8434

-+static inline void flush_smp_call_function_from_idle(void) { }

8435

-+#endif

8436

-+

8437

-+#ifndef arch_scale_freq_tick

8438

-+static __always_inline

8439

-+void arch_scale_freq_tick(void)

8440

-+{

8441

-+}

8442

-+#endif

8443

-+

8444

-+#ifndef arch_scale_freq_capacity

8445

-+static __always_inline

8446

-+unsigned long arch_scale_freq_capacity(int cpu)

8447

-+{

8448

-+	return SCHED_CAPACITY_SCALE;

8449

-+}

8450

-+#endif

8451

-+

8452

-+static inline u64 __rq_clock_broken(struct rq *rq)

8453

-+{

8454

-+	return READ_ONCE(rq->clock);

8455

-+}

8456

-+

8457

-+static inline u64 rq_clock(struct rq *rq)

8458

-+{

8459

-+	/*

8460

-+	 * Relax lockdep_assert_held() checking as in VRQ, call to

8461

-+	 * sched_info_xxxx() may not held rq->lock

8462

-+	 * lockdep_assert_held(&rq->lock);

8463

-+	 */

8464

-+	return rq->clock;

8465

-+}

8466

-+

8467

-+static inline u64 rq_clock_task(struct rq *rq)

8468

-+{

8469

-+	/*

8470

-+	 * Relax lockdep_assert_held() checking as in VRQ, call to

8471

-+	 * sched_info_xxxx() may not held rq->lock

8472

-+	 * lockdep_assert_held(&rq->lock);

8473

-+	 */

8474

-+	return rq->clock_task;

8475

-+}

8476

-+

8477

-+/*

8478

-+ * {de,en}queue flags:

8479

-+ *

8480

-+ * DEQUEUE_SLEEP  - task is no longer runnable

8481

-+ * ENQUEUE_WAKEUP - task just became runnable

8482

-+ *

8483

-+ */

8484

-+

8485

-+#define DEQUEUE_SLEEP		0x01

8486

-+

8487

-+#define ENQUEUE_WAKEUP		0x01

8488

-+

8489

-+

8490

-+/*

8491

-+ * Below are scheduler API which using in other kernel code

8492

-+ * It use the dummy rq_flags

8493

-+ * ToDo : BMQ need to support these APIs for compatibility with mainline

8494

-+ * scheduler code.

8495

-+ */

8496

-+struct rq_flags {

8497

-+	unsigned long flags;

8498

-+};

8499

-+

8500

-+struct rq *__task_rq_lock(struct task_struct *p, struct rq_flags *rf)

8501

-+	__acquires(rq->lock);

8502

-+

8503

-+struct rq *task_rq_lock(struct task_struct *p, struct rq_flags *rf)

8504

-+	__acquires(p->pi_lock)

8505

-+	__acquires(rq->lock);

8506

-+

8507

-+static inline void __task_rq_unlock(struct rq *rq, struct rq_flags *rf)

8508

-+	__releases(rq->lock)

8509

-+{

8510

-+	raw_spin_unlock(&rq->lock);

8511

-+}

8512

-+

8513

-+static inline void

8514

-+task_rq_unlock(struct rq *rq, struct task_struct *p, struct rq_flags *rf)

8515

-+	__releases(rq->lock)

8516

-+	__releases(p->pi_lock)

8517

-+{

8518

-+	raw_spin_unlock(&rq->lock);

8519

-+	raw_spin_unlock_irqrestore(&p->pi_lock, rf->flags);

8520

-+}

8521

-+

8522

-+static inline void

8523

-+rq_lock(struct rq *rq, struct rq_flags *rf)

8524

-+	__acquires(rq->lock)

8525

-+{

8526

-+	raw_spin_lock(&rq->lock);

8527

-+}

8528

-+

8529

-+static inline void

8530

-+rq_unlock_irq(struct rq *rq, struct rq_flags *rf)

8531

-+	__releases(rq->lock)

8532

-+{

8533

-+	raw_spin_unlock_irq(&rq->lock);

8534

-+}

8535

-+

8536

-+static inline void

8537

-+rq_unlock(struct rq *rq, struct rq_flags *rf)

8538

-+	__releases(rq->lock)

8539

-+{

8540

-+	raw_spin_unlock(&rq->lock);

8541

-+}

8542

-+

8543

-+static inline struct rq *

8544

-+this_rq_lock_irq(struct rq_flags *rf)

8545

-+	__acquires(rq->lock)

8546

-+{

8547

-+	struct rq *rq;

8548

-+

8549

-+	local_irq_disable();

8550

-+	rq = this_rq();

8551

-+	raw_spin_lock(&rq->lock);

8552

-+

8553

-+	return rq;

8554

-+}

8555

-+

8556

-+extern void raw_spin_rq_lock_nested(struct rq *rq, int subclass);

8557

-+extern void raw_spin_rq_unlock(struct rq *rq);

8558

-+

8559

-+static inline raw_spinlock_t *__rq_lockp(struct rq *rq)

8560

-+{

8561

-+	return &rq->lock;

8562

-+}

8563

-+

8564

-+static inline raw_spinlock_t *rq_lockp(struct rq *rq)

8565

-+{

8566

-+	return __rq_lockp(rq);

8567

-+}

8568

-+

8569

-+static inline void raw_spin_rq_lock(struct rq *rq)

8570

-+{

8571

-+	raw_spin_rq_lock_nested(rq, 0);

8572

-+}

8573

-+

8574

-+static inline void raw_spin_rq_lock_irq(struct rq *rq)

8575

-+{

8576

-+	local_irq_disable();

8577

-+	raw_spin_rq_lock(rq);

8578

-+}

8579

-+

8580

-+static inline void raw_spin_rq_unlock_irq(struct rq *rq)

8581

-+{

8582

-+	raw_spin_rq_unlock(rq);

8583

-+	local_irq_enable();

8584

-+}

8585

-+

8586

-+static inline int task_current(struct rq *rq, struct task_struct *p)

8587

-+{

8588

-+	return rq->curr == p;

8589

-+}

8590

-+

8591

-+static inline bool task_running(struct task_struct *p)

8592

-+{

8593

-+	return p->on_cpu;

8594

-+}

8595

-+

8596

-+extern int task_running_nice(struct task_struct *p);

8597

-+

8598

-+extern struct static_key_false sched_schedstats;

8599

-+

8600

-+#ifdef CONFIG_CPU_IDLE

8601

-+static inline void idle_set_state(struct rq *rq,

8602

-+				  struct cpuidle_state *idle_state)

8603

-+{

8604

-+	rq->idle_state = idle_state;

8605

-+}

8606

-+

8607

-+static inline struct cpuidle_state *idle_get_state(struct rq *rq)

8608

-+{

8609

-+	WARN_ON(!rcu_read_lock_held());

8610

-+	return rq->idle_state;

8611

-+}

8612

-+#else

8613

-+static inline void idle_set_state(struct rq *rq,

8614

-+				  struct cpuidle_state *idle_state)

8615

-+{

8616

-+}

8617

-+

8618

-+static inline struct cpuidle_state *idle_get_state(struct rq *rq)

8619

-+{

8620

-+	return NULL;

8621

-+}

8622

-+#endif

8623

-+

8624

-+static inline int cpu_of(const struct rq *rq)

8625

-+{

8626

-+#ifdef CONFIG_SMP

8627

-+	return rq->cpu;

8628

-+#else

8629

-+	return 0;

8630

-+#endif

8631

-+}

8632

-+

8633

-+#include "stats.h"

8634

-+

8635

-+#ifdef CONFIG_NO_HZ_COMMON

8636

-+#define NOHZ_BALANCE_KICK_BIT	0

8637

-+#define NOHZ_STATS_KICK_BIT	1

8638

-+

8639

-+#define NOHZ_BALANCE_KICK	BIT(NOHZ_BALANCE_KICK_BIT)

8640

-+#define NOHZ_STATS_KICK		BIT(NOHZ_STATS_KICK_BIT)

8641

-+

8642

-+#define NOHZ_KICK_MASK	(NOHZ_BALANCE_KICK | NOHZ_STATS_KICK)

8643

-+

8644

-+#define nohz_flags(cpu)	(&cpu_rq(cpu)->nohz_flags)

8645

-+

8646

-+/* TODO: needed?

8647

-+extern void nohz_balance_exit_idle(struct rq *rq);

8648

-+#else

8649

-+static inline void nohz_balance_exit_idle(struct rq *rq) { }

8650

-+*/

8651

-+#endif

8652

-+

8653

-+#ifdef CONFIG_IRQ_TIME_ACCOUNTING

8654

-+struct irqtime {

8655

-+	u64			total;

8656

-+	u64			tick_delta;

8657

-+	u64			irq_start_time;

8658

-+	struct u64_stats_sync	sync;

8659

-+};

8660

-+

8661

-+DECLARE_PER_CPU(struct irqtime, cpu_irqtime);

8662

-+

8663

-+/*

8664

-+ * Returns the irqtime minus the softirq time computed by ksoftirqd.

8665

-+ * Otherwise ksoftirqd's sum_exec_runtime is substracted its own runtime

8666

-+ * and never move forward.

8667

-+ */

8668

-+static inline u64 irq_time_read(int cpu)

8669

-+{

8670

-+	struct irqtime *irqtime = &per_cpu(cpu_irqtime, cpu);

8671

-+	unsigned int seq;

8672

-+	u64 total;

8673

-+

8674

-+	do {

8675

-+		seq = __u64_stats_fetch_begin(&irqtime->sync);

8676

-+		total = irqtime->total;

8677

-+	} while (__u64_stats_fetch_retry(&irqtime->sync, seq));

8678

-+

8679

-+	return total;

8680

-+}

8681

-+#endif /* CONFIG_IRQ_TIME_ACCOUNTING */

8682

-+

8683

-+#ifdef CONFIG_CPU_FREQ

8684

-+DECLARE_PER_CPU(struct update_util_data __rcu *, cpufreq_update_util_data);

8685

-+#endif /* CONFIG_CPU_FREQ */

8686

-+

8687

-+#ifdef CONFIG_NO_HZ_FULL

8688

-+extern int __init sched_tick_offload_init(void);

8689

-+#else

8690

-+static inline int sched_tick_offload_init(void) { return 0; }

8691

-+#endif

8692

-+

8693

-+#ifdef arch_scale_freq_capacity

8694

-+#ifndef arch_scale_freq_invariant

8695

-+#define arch_scale_freq_invariant()	(true)

8696

-+#endif

8697

-+#else /* arch_scale_freq_capacity */

8698

-+#define arch_scale_freq_invariant()	(false)

8699

-+#endif

8700

-+

8701

-+extern void schedule_idle(void);

8702

-+

8703

-+#define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)

8704

-+

8705

-+/*

8706

-+ * !! For sched_setattr_nocheck() (kernel) only !!

8707

-+ *

8708

-+ * This is actually gross. :(

8709

-+ *

8710

-+ * It is used to make schedutil kworker(s) higher priority than SCHED_DEADLINE

8711

-+ * tasks, but still be able to sleep. We need this on platforms that cannot

8712

-+ * atomically change clock frequency. Remove once fast switching will be

8713

-+ * available on such platforms.

8714

-+ *

8715

-+ * SUGOV stands for SchedUtil GOVernor.

8716

-+ */

8717

-+#define SCHED_FLAG_SUGOV	0x10000000

8718

-+

8719

-+#ifdef CONFIG_MEMBARRIER

8720

-+/*

8721

-+ * The scheduler provides memory barriers required by membarrier between:

8722

-+ * - prior user-space memory accesses and store to rq->membarrier_state,

8723

-+ * - store to rq->membarrier_state and following user-space memory accesses.

8724

-+ * In the same way it provides those guarantees around store to rq->curr.

8725

-+ */

8726

-+static inline void membarrier_switch_mm(struct rq *rq,

8727

-+					struct mm_struct *prev_mm,

8728

-+					struct mm_struct *next_mm)

8729

-+{

8730

-+	int membarrier_state;

8731

-+

8732

-+	if (prev_mm == next_mm)

8733

-+		return;

8734

-+

8735

-+	membarrier_state = atomic_read(&next_mm->membarrier_state);

8736

-+	if (READ_ONCE(rq->membarrier_state) == membarrier_state)

8737

-+		return;

8738

-+

8739

-+	WRITE_ONCE(rq->membarrier_state, membarrier_state);

8740

-+}

8741

-+#else

8742

-+static inline void membarrier_switch_mm(struct rq *rq,

8743

-+					struct mm_struct *prev_mm,

8744

-+					struct mm_struct *next_mm)

8745

-+{

8746

-+}

8747

-+#endif

8748

-+

8749

-+#ifdef CONFIG_NUMA

8750

-+extern int sched_numa_find_closest(const struct cpumask *cpus, int cpu);

8751

-+#else

8752

-+static inline int sched_numa_find_closest(const struct cpumask *cpus, int cpu)

8753

-+{

8754

-+	return nr_cpu_ids;

8755

-+}

8756

-+#endif

8757

-+

8758

-+extern void swake_up_all_locked(struct swait_queue_head *q);

8759

-+extern void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait);

8760

-+

8761

-+#ifdef CONFIG_PREEMPT_DYNAMIC

8762

-+extern int preempt_dynamic_mode;

8763

-+extern int sched_dynamic_mode(const char *str);

8764

-+extern void sched_dynamic_update(int mode);

8765

-+#endif

8766

-+

8767

-+static inline void nohz_run_idle_balance(int cpu) { }

8768

-+#endif /* ALT_SCHED_H */

8769

-diff --git a/kernel/sched/bmq.h b/kernel/sched/bmq.h

8770

-new file mode 100644

8771

-index 000000000000..be3ee4a553ca

8772

---- /dev/null

8773

-+++ b/kernel/sched/bmq.h

8774

-@@ -0,0 +1,111 @@

8775

-+#define ALT_SCHED_VERSION_MSG "sched/bmq: BMQ CPU Scheduler "ALT_SCHED_VERSION" by Alfred Chen.\n"

8776

-+

8777

-+/*

8778

-+ * BMQ only routines

8779

-+ */

8780

-+#define rq_switch_time(rq)	((rq)->clock - (rq)->last_ts_switch)

8781

-+#define boost_threshold(p)	(sched_timeslice_ns >>\

8782

-+				 (15 - MAX_PRIORITY_ADJ -  (p)->boost_prio))

8783

-+

8784

-+static inline void boost_task(struct task_struct *p)

8785

-+{

8786

-+	int limit;

8787

-+

8788

-+	switch (p->policy) {

8789

-+	case SCHED_NORMAL:

8790

-+		limit = -MAX_PRIORITY_ADJ;

8791

-+		break;

8792

-+	case SCHED_BATCH:

8793

-+	case SCHED_IDLE:

8794

-+		limit = 0;

8795

-+		break;

8796

-+	default:

8797

-+		return;

8798

-+	}

8799

-+

8800

-+	if (p->boost_prio > limit)

8801

-+		p->boost_prio--;

8802

-+}

8803

-+

8804

-+static inline void deboost_task(struct task_struct *p)

8805

-+{

8806

-+	if (p->boost_prio < MAX_PRIORITY_ADJ)

8807

-+		p->boost_prio++;

8808

-+}

8809

-+

8810

-+/*

8811

-+ * Common interfaces

8812

-+ */

8813

-+static inline void sched_timeslice_imp(const int timeslice_ms) {}

8814

-+

8815

-+static inline int

8816

-+task_sched_prio_normal(const struct task_struct *p, const struct rq *rq)

8817

-+{

8818

-+	return p->prio + p->boost_prio - MAX_RT_PRIO;

8819

-+}

8820

-+

8821

-+static inline int task_sched_prio(const struct task_struct *p)

8822

-+{

8823

-+	return (p->prio < MAX_RT_PRIO)? p->prio : MAX_RT_PRIO / 2 + (p->prio + p->boost_prio) / 2;

8824

-+}

8825

-+

8826

-+static inline int

8827

-+task_sched_prio_idx(const struct task_struct *p, const struct rq *rq)

8828

-+{

8829

-+	return task_sched_prio(p);

8830

-+}

8831

-+

8832

-+static inline int sched_prio2idx(int prio, struct rq *rq)

8833

-+{

8834

-+	return prio;

8835

-+}

8836

-+

8837

-+static inline int sched_idx2prio(int idx, struct rq *rq)

8838

-+{

8839

-+	return idx;

8840

-+}

8841

-+

8842

-+static inline void time_slice_expired(struct task_struct *p, struct rq *rq)

8843

-+{

8844

-+	p->time_slice = sched_timeslice_ns;

8845

-+

8846

-+	if (SCHED_FIFO != p->policy && task_on_rq_queued(p)) {

8847

-+		if (SCHED_RR != p->policy)

8848

-+			deboost_task(p);

8849

-+		requeue_task(p, rq);

8850

-+	}

8851

-+}

8852

-+

8853

-+static inline void sched_task_sanity_check(struct task_struct *p, struct rq *rq) {}

8854

-+

8855

-+inline int task_running_nice(struct task_struct *p)

8856

-+{

8857

-+	return (p->prio + p->boost_prio > DEFAULT_PRIO + MAX_PRIORITY_ADJ);

8858

-+}

8859

-+

8860

-+static void sched_task_fork(struct task_struct *p, struct rq *rq)

8861

-+{

8862

-+	p->boost_prio = (p->boost_prio < 0) ?

8863

-+		p->boost_prio + MAX_PRIORITY_ADJ : MAX_PRIORITY_ADJ;

8864

-+}

8865

-+

8866

-+static inline void do_sched_yield_type_1(struct task_struct *p, struct rq *rq)

8867

-+{

8868

-+	p->boost_prio = MAX_PRIORITY_ADJ;

8869

-+}

8870

-+

8871

-+#ifdef CONFIG_SMP

8872

-+static inline void sched_task_ttwu(struct task_struct *p)

8873

-+{

8874

-+	if(this_rq()->clock_task - p->last_ran > sched_timeslice_ns)

8875

-+		boost_task(p);

8876

-+}

8877

-+#endif

8878

-+

8879

-+static inline void sched_task_deactivate(struct task_struct *p, struct rq *rq)

8880

-+{

8881

-+	if (rq_switch_time(rq) < boost_threshold(p))

8882

-+		boost_task(p);

8883

-+}

8884

-+

8885

-+static inline void update_rq_time_edge(struct rq *rq) {}

8886

-diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c

8887

-index 57124614363d..f0e9c7543542 100644

8888

---- a/kernel/sched/cpufreq_schedutil.c

8889

-+++ b/kernel/sched/cpufreq_schedutil.c

8890

-@@ -167,9 +167,14 @@ static void sugov_get_util(struct sugov_cpu *sg_cpu)

8891

- 	unsigned long max = arch_scale_cpu_capacity(sg_cpu->cpu);

8892

-

8893

- 	sg_cpu->max = max;

8894

-+#ifndef CONFIG_SCHED_ALT

8895

- 	sg_cpu->bw_dl = cpu_bw_dl(rq);

8896

- 	sg_cpu->util = effective_cpu_util(sg_cpu->cpu, cpu_util_cfs(rq), max,

8897

- 					  FREQUENCY_UTIL, NULL);

8898

-+#else

8899

-+	sg_cpu->bw_dl = 0;

8900

-+	sg_cpu->util = rq_load_util(rq, max);

8901

-+#endif /* CONFIG_SCHED_ALT */

8902

- }

8903

-

8904

- /**

8905

-@@ -312,8 +317,10 @@ static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }

8906

-  */

8907

- static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu)

8908

- {

8909

-+#ifndef CONFIG_SCHED_ALT

8910

- 	if (cpu_bw_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->bw_dl)

8911

- 		sg_cpu->sg_policy->limits_changed = true;

8912

-+#endif

8913

- }

8914

-

8915

- static inline bool sugov_update_single_common(struct sugov_cpu *sg_cpu,

8916

-@@ -599,6 +606,7 @@ static int sugov_kthread_create(struct sugov_policy *sg_policy)

8917

- 	}

8918

-

8919

- 	ret = sched_setattr_nocheck(thread, &attr);

8920

-+

8921

- 	if (ret) {

8922

- 		kthread_stop(thread);

8923

- 		pr_warn("%s: failed to set SCHED_DEADLINE\n", __func__);

8924

-@@ -833,7 +841,9 @@ cpufreq_governor_init(schedutil_gov);

8925

- #ifdef CONFIG_ENERGY_MODEL

8926

- static void rebuild_sd_workfn(struct work_struct *work)

8927

- {

8928

-+#ifndef CONFIG_SCHED_ALT

8929

- 	rebuild_sched_domains_energy();

8930

-+#endif /* CONFIG_SCHED_ALT */

8931

- }

8932

- static DECLARE_WORK(rebuild_sd_work, rebuild_sd_workfn);

8933

-

8934

-diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c

8935

-index 872e481d5098..f920c8b48ec1 100644

8936

---- a/kernel/sched/cputime.c

8937

-+++ b/kernel/sched/cputime.c

8938

-@@ -123,7 +123,7 @@ void account_user_time(struct task_struct *p, u64 cputime)

8939

- 	p->utime += cputime;

8940

- 	account_group_user_time(p, cputime);

8941

-

8942

--	index = (task_nice(p) > 0) ? CPUTIME_NICE : CPUTIME_USER;

8943

-+	index = task_running_nice(p) ? CPUTIME_NICE : CPUTIME_USER;

8944

-

8945

- 	/* Add user time to cpustat. */

8946

- 	task_group_account_field(p, index, cputime);

8947

-@@ -147,7 +147,7 @@ void account_guest_time(struct task_struct *p, u64 cputime)

8948

- 	p->gtime += cputime;

8949

-

8950

- 	/* Add guest time to cpustat. */

8951

--	if (task_nice(p) > 0) {

8952

-+	if (task_running_nice(p)) {

8953

- 		cpustat[CPUTIME_NICE] += cputime;

8954

- 		cpustat[CPUTIME_GUEST_NICE] += cputime;

8955

- 	} else {

8956

-@@ -270,7 +270,7 @@ static inline u64 account_other_time(u64 max)

8957

- #ifdef CONFIG_64BIT

8958

- static inline u64 read_sum_exec_runtime(struct task_struct *t)

8959

- {

8960

--	return t->se.sum_exec_runtime;

8961

-+	return tsk_seruntime(t);

8962

- }

8963

- #else

8964

- static u64 read_sum_exec_runtime(struct task_struct *t)

8965

-@@ -280,7 +280,7 @@ static u64 read_sum_exec_runtime(struct task_struct *t)

8966

- 	struct rq *rq;

8967

-

8968

- 	rq = task_rq_lock(t, &rf);

8969

--	ns = t->se.sum_exec_runtime;

8970

-+	ns = tsk_seruntime(t);

8971

- 	task_rq_unlock(rq, t, &rf);

8972

-

8973

- 	return ns;

8974

-@@ -612,7 +612,7 @@ void cputime_adjust(struct task_cputime *curr, struct prev_cputime *prev,

8975

- void task_cputime_adjusted(struct task_struct *p, u64 *ut, u64 *st)

8976

- {

8977

- 	struct task_cputime cputime = {

8978

--		.sum_exec_runtime = p->se.sum_exec_runtime,

8979

-+		.sum_exec_runtime = tsk_seruntime(p),

8980

- 	};

8981

-

8982

- 	task_cputime(p, &cputime.utime, &cputime.stime);

8983

-diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c

8984

-index 0c5ec2776ddf..e3f4fe3f6e2c 100644

8985

---- a/kernel/sched/debug.c

8986

-+++ b/kernel/sched/debug.c

8987

-@@ -8,6 +8,7 @@

8988

-  */

8989

- #include "sched.h"

8990

-

8991

-+#ifndef CONFIG_SCHED_ALT

8992

- /*

8993

-  * This allows printing both to /proc/sched_debug and

8994

-  * to the console

8995

-@@ -210,6 +211,7 @@ static const struct file_operations sched_scaling_fops = {

8996

- };

8997

-

8998

- #endif /* SMP */

8999

-+#endif /* !CONFIG_SCHED_ALT */

9000

-

9001

- #ifdef CONFIG_PREEMPT_DYNAMIC

9002

-

9003

-@@ -273,6 +275,7 @@ static const struct file_operations sched_dynamic_fops = {

9004

-

9005

- #endif /* CONFIG_PREEMPT_DYNAMIC */

9006

-

9007

-+#ifndef CONFIG_SCHED_ALT

9008

- __read_mostly bool sched_debug_verbose;

9009

-

9010

- static const struct seq_operations sched_debug_sops;

9011

-@@ -288,6 +291,7 @@ static const struct file_operations sched_debug_fops = {

9012

- 	.llseek		= seq_lseek,

9013

- 	.release	= seq_release,

9014

- };

9015

-+#endif /* !CONFIG_SCHED_ALT */

9016

-

9017

- static struct dentry *debugfs_sched;

9018

-

9019

-@@ -297,12 +301,15 @@ static __init int sched_init_debug(void)

9020

-

9021

- 	debugfs_sched = debugfs_create_dir("sched", NULL);

9022

-

9023

-+#ifndef CONFIG_SCHED_ALT

9024

- 	debugfs_create_file("features", 0644, debugfs_sched, NULL, &sched_feat_fops);

9025

- 	debugfs_create_bool("verbose", 0644, debugfs_sched, &sched_debug_verbose);

9026

-+#endif /* !CONFIG_SCHED_ALT */

9027

- #ifdef CONFIG_PREEMPT_DYNAMIC

9028

- 	debugfs_create_file("preempt", 0644, debugfs_sched, NULL, &sched_dynamic_fops);

9029

- #endif

9030

-

9031

-+#ifndef CONFIG_SCHED_ALT

9032

- 	debugfs_create_u32("latency_ns", 0644, debugfs_sched, &sysctl_sched_latency);

9033

- 	debugfs_create_u32("min_granularity_ns", 0644, debugfs_sched, &sysctl_sched_min_granularity);

9034

- 	debugfs_create_u32("wakeup_granularity_ns", 0644, debugfs_sched, &sysctl_sched_wakeup_granularity);

9035

-@@ -330,11 +337,13 @@ static __init int sched_init_debug(void)

9036

- #endif

9037

-

9038

- 	debugfs_create_file("debug", 0444, debugfs_sched, NULL, &sched_debug_fops);

9039

-+#endif /* !CONFIG_SCHED_ALT */

9040

-

9041

- 	return 0;

9042

- }

9043

- late_initcall(sched_init_debug);

9044

-

9045

-+#ifndef CONFIG_SCHED_ALT

9046

- #ifdef CONFIG_SMP

9047

-

9048

- static cpumask_var_t		sd_sysctl_cpus;

9049

-@@ -1047,6 +1056,7 @@ void proc_sched_set_task(struct task_struct *p)

9050

- 	memset(&p->se.statistics, 0, sizeof(p->se.statistics));

9051

- #endif

9052

- }

9053

-+#endif /* !CONFIG_SCHED_ALT */

9054

-

9055

- void resched_latency_warn(int cpu, u64 latency)

9056

- {

9057

-diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c

9058

-index 912b47aa99d8..7f6b13883c2a 100644

9059

---- a/kernel/sched/idle.c

9060

-+++ b/kernel/sched/idle.c

9061

-@@ -403,6 +403,7 @@ void cpu_startup_entry(enum cpuhp_state state)

9062

- 		do_idle();

9063

- }

9064

-

9065

-+#ifndef CONFIG_SCHED_ALT

9066

- /*

9067

-  * idle-task scheduling class.

9068

-  */

9069

-@@ -525,3 +526,4 @@ DEFINE_SCHED_CLASS(idle) = {

9070

- 	.switched_to		= switched_to_idle,

9071

- 	.update_curr		= update_curr_idle,

9072

- };

9073

-+#endif

9074

-diff --git a/kernel/sched/pds.h b/kernel/sched/pds.h

9075

-new file mode 100644

9076

-index 000000000000..0f1f0d708b77

9077

---- /dev/null

9078

-+++ b/kernel/sched/pds.h

9079

-@@ -0,0 +1,127 @@

9080

-+#define ALT_SCHED_VERSION_MSG "sched/pds: PDS CPU Scheduler "ALT_SCHED_VERSION" by Alfred Chen.\n"

9081

-+

9082

-+static int sched_timeslice_shift = 22;

9083

-+

9084

-+#define NORMAL_PRIO_MOD(x)	((x) & (NORMAL_PRIO_NUM - 1))

9085

-+

9086

-+/*

9087

-+ * Common interfaces

9088

-+ */

9089

-+static inline void sched_timeslice_imp(const int timeslice_ms)

9090

-+{

9091

-+	if (2 == timeslice_ms)

9092

-+		sched_timeslice_shift = 21;

9093

-+}

9094

-+

9095

-+static inline int

9096

-+task_sched_prio_normal(const struct task_struct *p, const struct rq *rq)

9097

-+{

9098

-+	s64 delta = p->deadline - rq->time_edge + NORMAL_PRIO_NUM - NICE_WIDTH;

9099

-+

9100

-+	if (WARN_ONCE(delta > NORMAL_PRIO_NUM - 1,

9101

-+		      "pds: task_sched_prio_normal() delta %lld\n", delta))

9102

-+		return NORMAL_PRIO_NUM - 1;

9103

-+

9104

-+	return (delta < 0) ? 0 : delta;

9105

-+}

9106

-+

9107

-+static inline int task_sched_prio(const struct task_struct *p)

9108

-+{

9109

-+	return (p->prio < MAX_RT_PRIO) ? p->prio :

9110

-+		MIN_NORMAL_PRIO + task_sched_prio_normal(p, task_rq(p));

9111

-+}

9112

-+

9113

-+static inline int

9114

-+task_sched_prio_idx(const struct task_struct *p, const struct rq *rq)

9115

-+{

9116

-+	return (p->prio < MAX_RT_PRIO) ? p->prio : MIN_NORMAL_PRIO +

9117

-+		NORMAL_PRIO_MOD(task_sched_prio_normal(p, rq) + rq->time_edge);

9118

-+}

9119

-+

9120

-+static inline int sched_prio2idx(int prio, struct rq *rq)

9121

-+{

9122

-+	return (IDLE_TASK_SCHED_PRIO == prio || prio < MAX_RT_PRIO) ? prio :

9123

-+		MIN_NORMAL_PRIO + NORMAL_PRIO_MOD((prio - MIN_NORMAL_PRIO) +

9124

-+						  rq->time_edge);

9125

-+}

9126

-+

9127

-+static inline int sched_idx2prio(int idx, struct rq *rq)

9128

-+{

9129

-+	return (idx < MAX_RT_PRIO) ? idx : MIN_NORMAL_PRIO +

9130

-+		NORMAL_PRIO_MOD((idx - MIN_NORMAL_PRIO) + NORMAL_PRIO_NUM -

9131

-+				NORMAL_PRIO_MOD(rq->time_edge));

9132

-+}

9133

-+

9134

-+static inline void sched_renew_deadline(struct task_struct *p, const struct rq *rq)

9135

-+{

9136

-+	if (p->prio >= MAX_RT_PRIO)

9137

-+		p->deadline = (rq->clock >> sched_timeslice_shift) +

9138

-+			p->static_prio - (MAX_PRIO - NICE_WIDTH);

9139

-+}

9140

-+

9141

-+int task_running_nice(struct task_struct *p)

9142

-+{

9143

-+	return (p->prio > DEFAULT_PRIO);

9144

-+}

9145

-+

9146

-+static inline void update_rq_time_edge(struct rq *rq)

9147

-+{

9148

-+	struct list_head head;

9149

-+	u64 old = rq->time_edge;

9150

-+	u64 now = rq->clock >> sched_timeslice_shift;

9151

-+	u64 prio, delta;

9152

-+

9153

-+	if (now == old)

9154

-+		return;

9155

-+

9156

-+	delta = min_t(u64, NORMAL_PRIO_NUM, now - old);

9157

-+	INIT_LIST_HEAD(&head);

9158

-+

9159

-+	for_each_set_bit(prio, &rq->queue.bitmap[2], delta)

9160

-+		list_splice_tail_init(rq->queue.heads + MIN_NORMAL_PRIO +

9161

-+				      NORMAL_PRIO_MOD(prio + old), &head);

9162

-+

9163

-+	rq->queue.bitmap[2] = (NORMAL_PRIO_NUM == delta) ? 0UL :

9164

-+		rq->queue.bitmap[2] >> delta;

9165

-+	rq->time_edge = now;

9166

-+	if (!list_empty(&head)) {

9167

-+		u64 idx = MIN_NORMAL_PRIO + NORMAL_PRIO_MOD(now);

9168

-+		struct task_struct *p;

9169

-+

9170

-+		list_for_each_entry(p, &head, sq_node)

9171

-+			p->sq_idx = idx;

9172

-+

9173

-+		list_splice(&head, rq->queue.heads + idx);

9174

-+		rq->queue.bitmap[2] |= 1UL;

9175

-+	}

9176

-+}

9177

-+

9178

-+static inline void time_slice_expired(struct task_struct *p, struct rq *rq)

9179

-+{

9180

-+	p->time_slice = sched_timeslice_ns;

9181

-+	sched_renew_deadline(p, rq);

9182

-+	if (SCHED_FIFO != p->policy && task_on_rq_queued(p))

9183

-+		requeue_task(p, rq);

9184

-+}

9185

-+

9186

-+static inline void sched_task_sanity_check(struct task_struct *p, struct rq *rq)

9187

-+{

9188

-+	u64 max_dl = rq->time_edge + NICE_WIDTH - 1;

9189

-+	if (unlikely(p->deadline > max_dl))

9190

-+		p->deadline = max_dl;

9191

-+}

9192

-+

9193

-+static void sched_task_fork(struct task_struct *p, struct rq *rq)

9194

-+{

9195

-+	sched_renew_deadline(p, rq);

9196

-+}

9197

-+

9198

-+static inline void do_sched_yield_type_1(struct task_struct *p, struct rq *rq)

9199

-+{

9200

-+	time_slice_expired(p, rq);

9201

-+}

9202

-+

9203

-+#ifdef CONFIG_SMP

9204

-+static inline void sched_task_ttwu(struct task_struct *p) {}

9205

-+#endif

9206

-+static inline void sched_task_deactivate(struct task_struct *p, struct rq *rq) {}

9207

-diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c

9208

-index a554e3bbab2b..3e56f5e6ff5c 100644

9209

---- a/kernel/sched/pelt.c

9210

-+++ b/kernel/sched/pelt.c

9211

-@@ -270,6 +270,7 @@ ___update_load_avg(struct sched_avg *sa, unsigned long load)

9212

- 	WRITE_ONCE(sa->util_avg, sa->util_sum / divider);

9213

- }

9214

-

9215

-+#ifndef CONFIG_SCHED_ALT

9216

- /*

9217

-  * sched_entity:

9218

-  *

9219

-@@ -387,8 +388,9 @@ int update_dl_rq_load_avg(u64 now, struct rq *rq, int running)

9220

-

9221

- 	return 0;

9222

- }

9223

-+#endif

9224

-

9225

--#ifdef CONFIG_SCHED_THERMAL_PRESSURE

9226

-+#if defined(CONFIG_SCHED_THERMAL_PRESSURE) && !defined(CONFIG_SCHED_ALT)

9227

- /*

9228

-  * thermal:

9229

-  *

9230

-diff --git a/kernel/sched/pelt.h b/kernel/sched/pelt.h

9231

-index e06071bf3472..adf567df34d4 100644

9232

---- a/kernel/sched/pelt.h

9233

-+++ b/kernel/sched/pelt.h

9234

-@@ -1,13 +1,15 @@

9235

- #ifdef CONFIG_SMP

9236

- #include "sched-pelt.h"

9237

-

9238

-+#ifndef CONFIG_SCHED_ALT

9239

- int __update_load_avg_blocked_se(u64 now, struct sched_entity *se);

9240

- int __update_load_avg_se(u64 now, struct cfs_rq *cfs_rq, struct sched_entity *se);

9241

- int __update_load_avg_cfs_rq(u64 now, struct cfs_rq *cfs_rq);

9242

- int update_rt_rq_load_avg(u64 now, struct rq *rq, int running);

9243

- int update_dl_rq_load_avg(u64 now, struct rq *rq, int running);

9244

-+#endif

9245

-

9246

--#ifdef CONFIG_SCHED_THERMAL_PRESSURE

9247

-+#if defined(CONFIG_SCHED_THERMAL_PRESSURE) && !defined(CONFIG_SCHED_ALT)

9248

- int update_thermal_load_avg(u64 now, struct rq *rq, u64 capacity);

9249

-

9250

- static inline u64 thermal_load_avg(struct rq *rq)

9251

-@@ -42,6 +44,7 @@ static inline u32 get_pelt_divider(struct sched_avg *avg)

9252

- 	return LOAD_AVG_MAX - 1024 + avg->period_contrib;

9253

- }

9254

-

9255

-+#ifndef CONFIG_SCHED_ALT

9256

- static inline void cfs_se_util_change(struct sched_avg *avg)

9257

- {

9258

- 	unsigned int enqueued;

9259

-@@ -153,9 +156,11 @@ static inline u64 cfs_rq_clock_pelt(struct cfs_rq *cfs_rq)

9260

- 	return rq_clock_pelt(rq_of(cfs_rq));

9261

- }

9262

- #endif

9263

-+#endif /* CONFIG_SCHED_ALT */

9264

-

9265

- #else

9266

-

9267

-+#ifndef CONFIG_SCHED_ALT

9268

- static inline int

9269

- update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)

9270

- {

9271

-@@ -173,6 +178,7 @@ update_dl_rq_load_avg(u64 now, struct rq *rq, int running)

9272

- {

9273

- 	return 0;

9274

- }

9275

-+#endif

9276

-

9277

- static inline int

9278

- update_thermal_load_avg(u64 now, struct rq *rq, u64 capacity)

9279

-diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h

9280

-index ddefb0419d7a..658c41b15d3c 100644

9281

---- a/kernel/sched/sched.h

9282

-+++ b/kernel/sched/sched.h

9283

-@@ -2,6 +2,10 @@

9284

- /*

9285

-  * Scheduler internal types and methods:

9286

-  */

9287

-+#ifdef CONFIG_SCHED_ALT

9288

-+#include "alt_sched.h"

9289

-+#else

9290

-+

9291

- #include <linux/sched.h>

9292

-

9293

- #include <linux/sched/autogroup.h>

9294

-@@ -3038,3 +3042,8 @@ extern int sched_dynamic_mode(const char *str);

9295

- extern void sched_dynamic_update(int mode);

9296

- #endif

9297

-

9298

-+static inline int task_running_nice(struct task_struct *p)

9299

-+{

9300

-+	return (task_nice(p) > 0);

9301

-+}

9302

-+#endif /* !CONFIG_SCHED_ALT */

9303

-diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c

9304

-index 3f93fc3b5648..528b71e144e9 100644

9305

---- a/kernel/sched/stats.c

9306

-+++ b/kernel/sched/stats.c

9307

-@@ -22,8 +22,10 @@ static int show_schedstat(struct seq_file *seq, void *v)

9308

- 	} else {

9309

- 		struct rq *rq;

9310

- #ifdef CONFIG_SMP

9311

-+#ifndef CONFIG_SCHED_ALT

9312

- 		struct sched_domain *sd;

9313

- 		int dcount = 0;

9314

-+#endif

9315

- #endif

9316

- 		cpu = (unsigned long)(v - 2);

9317

- 		rq = cpu_rq(cpu);

9318

-@@ -40,6 +42,7 @@ static int show_schedstat(struct seq_file *seq, void *v)

9319

- 		seq_printf(seq, "\n");

9320

-

9321

- #ifdef CONFIG_SMP

9322

-+#ifndef CONFIG_SCHED_ALT

9323

- 		/* domain-specific stats */

9324

- 		rcu_read_lock();

9325

- 		for_each_domain(cpu, sd) {

9326

-@@ -68,6 +71,7 @@ static int show_schedstat(struct seq_file *seq, void *v)

9327

- 			    sd->ttwu_move_balance);

9328

- 		}

9329

- 		rcu_read_unlock();

9330

-+#endif

9331

- #endif

9332

- 	}

9333

- 	return 0;

9334

-diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c

9335

-index b77ad49dc14f..be9edf086412 100644

9336

---- a/kernel/sched/topology.c

9337

-+++ b/kernel/sched/topology.c

9338

-@@ -4,6 +4,7 @@

9339

-  */

9340

- #include "sched.h"

9341

-

9342

-+#ifndef CONFIG_SCHED_ALT

9343

- DEFINE_MUTEX(sched_domains_mutex);

9344

-

9345

- /* Protected by sched_domains_mutex: */

9346

-@@ -1382,8 +1383,10 @@ static void asym_cpu_capacity_scan(void)

9347

-  */

9348

-

9349

- static int default_relax_domain_level = -1;

9350

-+#endif /* CONFIG_SCHED_ALT */

9351

- int sched_domain_level_max;

9352

-

9353

-+#ifndef CONFIG_SCHED_ALT

9354

- static int __init setup_relax_domain_level(char *str)

9355

- {

9356

- 	if (kstrtoint(str, 0, &default_relax_domain_level))

9357

-@@ -1617,6 +1620,7 @@ sd_init(struct sched_domain_topology_level *tl,

9358

-

9359

- 	return sd;

9360

- }

9361

-+#endif /* CONFIG_SCHED_ALT */

9362

-

9363

- /*

9364

-  * Topology list, bottom-up.

9365

-@@ -1646,6 +1650,7 @@ void set_sched_topology(struct sched_domain_topology_level *tl)

9366

- 	sched_domain_topology = tl;

9367

- }

9368

-

9369

-+#ifndef CONFIG_SCHED_ALT

9370

- #ifdef CONFIG_NUMA

9371

-

9372

- static const struct cpumask *sd_numa_mask(int cpu)

9373

-@@ -2451,3 +2456,17 @@ void partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],

9374

- 	partition_sched_domains_locked(ndoms_new, doms_new, dattr_new);

9375

- 	mutex_unlock(&sched_domains_mutex);

9376

- }

9377

-+#else /* CONFIG_SCHED_ALT */

9378

-+void partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],

9379

-+			     struct sched_domain_attr *dattr_new)

9380

-+{}

9381

-+

9382

-+#ifdef CONFIG_NUMA

9383

-+int __read_mostly		node_reclaim_distance = RECLAIM_DISTANCE;

9384

-+

9385

-+int sched_numa_find_closest(const struct cpumask *cpus, int cpu)

9386

-+{

9387

-+	return best_mask_cpu(cpu, cpus);

9388

-+}

9389

-+#endif /* CONFIG_NUMA */

9390

-+#endif

9391

-diff --git a/kernel/sysctl.c b/kernel/sysctl.c

9392

-index 272f4a272f8c..1c9455c8ecf6 100644

9393

---- a/kernel/sysctl.c

9394

-+++ b/kernel/sysctl.c

9395

-@@ -122,6 +122,10 @@ static unsigned long long_max = LONG_MAX;

9396

- static int one_hundred = 100;

9397

- static int two_hundred = 200;

9398

- static int one_thousand = 1000;

9399

-+#ifdef CONFIG_SCHED_ALT

9400

-+static int __maybe_unused zero = 0;

9401

-+extern int sched_yield_type;

9402

-+#endif

9403

- #ifdef CONFIG_PRINTK

9404

- static int ten_thousand = 10000;

9405

- #endif

9406

-@@ -1730,6 +1734,24 @@ int proc_do_static_key(struct ctl_table *table, int write,

9407

- }

9408

-

9409

- static struct ctl_table kern_table[] = {

9410

-+#ifdef CONFIG_SCHED_ALT

9411

-+/* In ALT, only supported "sched_schedstats" */

9412

-+#ifdef CONFIG_SCHED_DEBUG

9413

-+#ifdef CONFIG_SMP

9414

-+#ifdef CONFIG_SCHEDSTATS

9415

-+	{

9416

-+		.procname	= "sched_schedstats",

9417

-+		.data		= NULL,

9418

-+		.maxlen		= sizeof(unsigned int),

9419

-+		.mode		= 0644,

9420

-+		.proc_handler	= sysctl_schedstats,

9421

-+		.extra1		= SYSCTL_ZERO,

9422

-+		.extra2		= SYSCTL_ONE,

9423

-+	},

9424

-+#endif /* CONFIG_SCHEDSTATS */

9425

-+#endif /* CONFIG_SMP */

9426

-+#endif /* CONFIG_SCHED_DEBUG */

9427

-+#else  /* !CONFIG_SCHED_ALT */

9428

- 	{

9429

- 		.procname	= "sched_child_runs_first",

9430

- 		.data		= &sysctl_sched_child_runs_first,

9431

-@@ -1860,6 +1882,7 @@ static struct ctl_table kern_table[] = {

9432

- 		.extra2		= SYSCTL_ONE,

9433

- 	},

9434

- #endif

9435

-+#endif /* !CONFIG_SCHED_ALT */

9436

- #ifdef CONFIG_PROVE_LOCKING

9437

- 	{

9438

- 		.procname	= "prove_locking",

9439

-@@ -2436,6 +2459,17 @@ static struct ctl_table kern_table[] = {

9440

- 		.proc_handler	= proc_dointvec,

9441

- 	},

9442

- #endif

9443

-+#ifdef CONFIG_SCHED_ALT

9444

-+	{

9445

-+		.procname	= "yield_type",

9446

-+		.data		= &sched_yield_type,

9447

-+		.maxlen		= sizeof (int),

9448

-+		.mode		= 0644,

9449

-+		.proc_handler	= &proc_dointvec_minmax,

9450

-+		.extra1		= &zero,

9451

-+		.extra2		= &two,

9452

-+	},

9453

-+#endif

9454

- #if defined(CONFIG_S390) && defined(CONFIG_SMP)

9455

- 	{

9456

- 		.procname	= "spin_retry",

9457

-diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c

9458

-index 4a66725b1d4a..cb80ed5c1f5c 100644

9459

---- a/kernel/time/hrtimer.c

9460

-+++ b/kernel/time/hrtimer.c

9461

-@@ -1940,8 +1940,10 @@ long hrtimer_nanosleep(ktime_t rqtp, const enum hrtimer_mode mode,

9462

- 	int ret = 0;

9463

- 	u64 slack;

9464

-

9465

-+#ifndef CONFIG_SCHED_ALT

9466

- 	slack = current->timer_slack_ns;

9467

- 	if (dl_task(current) || rt_task(current))

9468

-+#endif

9469

- 		slack = 0;

9470

-

9471

- 	hrtimer_init_sleeper_on_stack(&t, clockid, mode);

9472

-diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c

9473

-index 517be7fd175e..de3afe8e0800 100644

9474

---- a/kernel/time/posix-cpu-timers.c

9475

-+++ b/kernel/time/posix-cpu-timers.c

9476

-@@ -216,7 +216,7 @@ static void task_sample_cputime(struct task_struct *p, u64 *samples)

9477

- 	u64 stime, utime;

9478

-

9479

- 	task_cputime(p, &utime, &stime);

9480

--	store_samples(samples, stime, utime, p->se.sum_exec_runtime);

9481

-+	store_samples(samples, stime, utime, tsk_seruntime(p));

9482

- }

9483

-

9484

- static void proc_sample_cputime_atomic(struct task_cputime_atomic *at,

9485

-@@ -801,6 +801,7 @@ static void collect_posix_cputimers(struct posix_cputimers *pct, u64 *samples,

9486

- 	}

9487

- }

9488

-

9489

-+#ifndef CONFIG_SCHED_ALT

9490

- static inline void check_dl_overrun(struct task_struct *tsk)

9491

- {

9492

- 	if (tsk->dl.dl_overrun) {

9493

-@@ -808,6 +809,7 @@ static inline void check_dl_overrun(struct task_struct *tsk)

9494

- 		__group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);

9495

- 	}

9496

- }

9497

-+#endif

9498

-

9499

- static bool check_rlimit(u64 time, u64 limit, int signo, bool rt, bool hard)

9500

- {

9501

-@@ -835,8 +837,10 @@ static void check_thread_timers(struct task_struct *tsk,

9502

- 	u64 samples[CPUCLOCK_MAX];

9503

- 	unsigned long soft;

9504

-

9505

-+#ifndef CONFIG_SCHED_ALT

9506

- 	if (dl_task(tsk))

9507

- 		check_dl_overrun(tsk);

9508

-+#endif

9509

-

9510

- 	if (expiry_cache_is_inactive(pct))

9511

- 		return;

9512

-@@ -850,7 +854,7 @@ static void check_thread_timers(struct task_struct *tsk,

9513

- 	soft = task_rlimit(tsk, RLIMIT_RTTIME);

9514

- 	if (soft != RLIM_INFINITY) {

9515

- 		/* Task RT timeout is accounted in jiffies. RTTIME is usec */

9516

--		unsigned long rttime = tsk->rt.timeout * (USEC_PER_SEC / HZ);

9517

-+		unsigned long rttime = tsk_rttimeout(tsk) * (USEC_PER_SEC / HZ);

9518

- 		unsigned long hard = task_rlimit_max(tsk, RLIMIT_RTTIME);

9519

-

9520

- 		/* At the hard limit, send SIGKILL. No further action. */

9521

-@@ -1086,8 +1090,10 @@ static inline bool fastpath_timer_check(struct task_struct *tsk)

9522

- 			return true;

9523

- 	}

9524

-

9525

-+#ifndef CONFIG_SCHED_ALT

9526

- 	if (dl_task(tsk) && tsk->dl.dl_overrun)

9527

- 		return true;

9528

-+#endif

9529

-

9530

- 	return false;

9531

- }

9532

-diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c

9533

-index adf7ef194005..11c8f36e281b 100644

9534

---- a/kernel/trace/trace_selftest.c

9535

-+++ b/kernel/trace/trace_selftest.c

9536

-@@ -1052,10 +1052,15 @@ static int trace_wakeup_test_thread(void *data)

9537

- {

9538

- 	/* Make this a -deadline thread */

9539

- 	static const struct sched_attr attr = {

9540

-+#ifdef CONFIG_SCHED_ALT

9541

-+		/* No deadline on BMQ/PDS, use RR */

9542

-+		.sched_policy = SCHED_RR,

9543

-+#else

9544

- 		.sched_policy = SCHED_DEADLINE,

9545

- 		.sched_runtime = 100000ULL,

9546

- 		.sched_deadline = 10000000ULL,

9547

- 		.sched_period = 10000000ULL

9548

-+#endif

9549

- 	};

9550

- 	struct wakeup_test_data *x = data;

9551

-

9552

---- a/kernel/sched/alt_core.c	2021-11-18 18:58:14.290182408 -0500

9553

-+++ b/kernel/sched/alt_core.c	2021-11-18 18:58:54.870593883 -0500

9554

-@@ -2762,7 +2762,7 @@ int sched_fork(unsigned long clone_flags

9555

- 	return 0;

9556

- }

9557

-

9558

--void sched_post_fork(struct task_struct *p) {}

9559

-+void sched_post_fork(struct task_struct *p, struct kernel_clone_args *kargs) {}

9560

-

9561

- #ifdef CONFIG_SCHEDSTATS

9562

-

9563

9564

diff --git a/5021_BMQ-and-PDS-gentoo-defaults.patch b/5021_BMQ-and-PDS-gentoo-defaults.patch

9565

deleted file mode 100644

9566

index d449eec4..00000000

9567

--- a/5021_BMQ-and-PDS-gentoo-defaults.patch

9568

+++ /dev/null

9569

@@ -1,13 +0,0 @@

9570

---- a/init/Kconfig	2021-04-27 07:38:30.556467045 -0400

9571

-+++ b/init/Kconfig	2021-04-27 07:39:32.956412800 -0400

9572

-@@ -780,8 +780,9 @@ config GENERIC_SCHED_CLOCK

9573

- menu "Scheduler features"

9574

-

9575

- menuconfig SCHED_ALT

9576

-+	depends on X86_64

9577

- 	bool "Alternative CPU Schedulers"

9578

--	default y

9579

-+	default n

9580

- 	help

9581

- 	  This feature enable alternative CPU scheduler"

9582

-

Gentoo Archives: gentoo-commits