[gentoo-commits] proj/linux-patches:5.15 commit in: / - gentoo-commits

From:	Mike Pagano <mpagano@g.o>
To:	gentoo-commits@l.g.o
Subject:	[gentoo-commits] proj/linux-patches:5.15 commit in: /
Date:	Thu, 27 Jan 2022 12:01:47
Message-Id:	`1643284864.fc237c9da47b7160c956d3b4a2449f93864085fd.mpagano@gentoo`

1

commit:     fc237c9da47b7160c956d3b4a2449f93864085fd

2

Author:     Mike Pagano <mpagano <AT> gentoo <DOT> org>

3

AuthorDate: Thu Jan 27 12:01:04 2022 +0000

4

Commit:     Mike Pagano <mpagano <AT> gentoo <DOT> org>

5

CommitDate: Thu Jan 27 12:01:04 2022 +0000

6

URL:        https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=fc237c9d

7

8

Temporarily remove BMQ, will add back with working patch

9

10

Signed-off-by: Mike Pagano <mpagano <AT> gentoo.org>

11

12

 0000_README                                  |    8 -

13

 5020_BMQ-and-PDS-io-scheduler-v5.15-r1.patch | 9788 --------------------------

14

 5021_BMQ-and-PDS-gentoo-defaults.patch       |   13 -

15

 3 files changed, 9809 deletions(-)

16

17

diff --git a/0000_README b/0000_README

18

index c0a48663..eddb9032 100644

19

--- a/0000_README

20

+++ b/0000_README

21

@@ -139,14 +139,6 @@ Patch:  4567_distro-Gentoo-Kconfig.patch

22

 From:   Tom Wijsman <TomWij@g.o>

23

 Desc:   Add Gentoo Linux support config settings and defaults.

24

25

-Patch:  5020_BMQ-and-PDS-io-scheduler-v5.15-r1.patch

26

-From:   https://gitlab.com/alfredchen/linux-prjc

27

-Desc:   BMQ(BitMap Queue) Scheduler. A new CPU scheduler developed from PDS(incld). Inspired by the scheduler in zircon.

28

-

29

-Patch:  5021_BMQ-and-PDS-gentoo-defaults.patch

30

-From:   https://gitweb.gentoo.org/proj/linux-patches.git/

31

-Desc:   Set defaults for BMQ. Add archs as people test, default to N

32

-

33

 Patch:  5010_enable-cpu-optimizations-universal.patch

34

 From:   https://github.com/graysky2/kernel_compiler_patch

35

 Desc:   Kernel >= 5.15 patch enables gcc = v11.1+ optimizations for additional CPUs.

36

37

diff --git a/5020_BMQ-and-PDS-io-scheduler-v5.15-r1.patch b/5020_BMQ-and-PDS-io-scheduler-v5.15-r1.patch

38

deleted file mode 100644

39

index 5886349e..00000000

40

--- a/5020_BMQ-and-PDS-io-scheduler-v5.15-r1.patch

41

+++ /dev/null

42

@@ -1,9788 +0,0 @@

43

-diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt

44

-index 1396fd2d9031..6ccb561c9a54 100644

45

---- a/Documentation/admin-guide/kernel-parameters.txt

46

-+++ b/Documentation/admin-guide/kernel-parameters.txt

47

-@@ -4985,6 +4985,12 @@

48

- 	sa1100ir	[NET]

49

- 			See drivers/net/irda/sa1100_ir.c.

50

-

51

-+	sched_timeslice=

52

-+			[KNL] Time slice in ms for Project C BMQ/PDS scheduler.

53

-+			Format: integer 2, 4

54

-+			Default: 4

55

-+			See Documentation/scheduler/sched-BMQ.txt

56

-+

57

- 	sched_verbose	[KNL] Enables verbose scheduler debug messages.

58

-

59

- 	schedstats=	[KNL,X86] Enable or disable scheduled statistics.

60

-diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst

61

-index 426162009ce9..15ac2d7e47cd 100644

62

---- a/Documentation/admin-guide/sysctl/kernel.rst

63

-+++ b/Documentation/admin-guide/sysctl/kernel.rst

64

-@@ -1542,3 +1542,13 @@ is 10 seconds.

65

-

66

- The softlockup threshold is (``2 * watchdog_thresh``). Setting this

67

- tunable to zero will disable lockup detection altogether.

68

-+

69

-+yield_type:

70

-+===========

71

-+

72

-+BMQ/PDS CPU scheduler only. This determines what type of yield calls

73

-+to sched_yield will perform.

74

-+

75

-+  0 - No yield.

76

-+  1 - Deboost and requeue task. (default)

77

-+  2 - Set run queue skip task.

78

-diff --git a/Documentation/scheduler/sched-BMQ.txt b/Documentation/scheduler/sched-BMQ.txt

79

-new file mode 100644

80

-index 000000000000..05c84eec0f31

81

---- /dev/null

82

-+++ b/Documentation/scheduler/sched-BMQ.txt

83

-@@ -0,0 +1,110 @@

84

-+                         BitMap queue CPU Scheduler

85

-+                         --------------------------

86

-+

87

-+CONTENT

88

-+========

89

-+

90

-+ Background

91

-+ Design

92

-+   Overview

93

-+   Task policy

94

-+   Priority management

95

-+   BitMap Queue

96

-+   CPU Assignment and Migration

97

-+

98

-+

99

-+Background

100

-+==========

101

-+

102

-+BitMap Queue CPU scheduler, referred to as BMQ from here on, is an evolution

103

-+of previous Priority and Deadline based Skiplist multiple queue scheduler(PDS),

104

-+and inspired by Zircon scheduler. The goal of it is to keep the scheduler code

105

-+simple, while efficiency and scalable for interactive tasks, such as desktop,

106

-+movie playback and gaming etc.

107

-+

108

-+Design

109

-+======

110

-+

111

-+Overview

112

-+--------

113

-+

114

-+BMQ use per CPU run queue design, each CPU(logical) has it's own run queue,

115

-+each CPU is responsible for scheduling the tasks that are putting into it's

116

-+run queue.

117

-+

118

-+The run queue is a set of priority queues. Note that these queues are fifo

119

-+queue for non-rt tasks or priority queue for rt tasks in data structure. See

120

-+BitMap Queue below for details. BMQ is optimized for non-rt tasks in the fact

121

-+that most applications are non-rt tasks. No matter the queue is fifo or

122

-+priority, In each queue is an ordered list of runnable tasks awaiting execution

123

-+and the data structures are the same. When it is time for a new task to run,

124

-+the scheduler simply looks the lowest numbered queueue that contains a task,

125

-+and runs the first task from the head of that queue. And per CPU idle task is

126

-+also in the run queue, so the scheduler can always find a task to run on from

127

-+its run queue.

128

-+

129

-+Each task will assigned the same timeslice(default 4ms) when it is picked to

130

-+start running. Task will be reinserted at the end of the appropriate priority

131

-+queue when it uses its whole timeslice. When the scheduler selects a new task

132

-+from the priority queue it sets the CPU's preemption timer for the remainder of

133

-+the previous timeslice. When that timer fires the scheduler will stop execution

134

-+on that task, select another task and start over again.

135

-+

136

-+If a task blocks waiting for a shared resource then it's taken out of its

137

-+priority queue and is placed in a wait queue for the shared resource. When it

138

-+is unblocked it will be reinserted in the appropriate priority queue of an

139

-+eligible CPU.

140

-+

141

-+Task policy

142

-+-----------

143

-+

144

-+BMQ supports DEADLINE, FIFO, RR, NORMAL, BATCH and IDLE task policy like the

145

-+mainline CFS scheduler. But BMQ is heavy optimized for non-rt task, that's

146

-+NORMAL/BATCH/IDLE policy tasks. Below is the implementation detail of each

147

-+policy.

148

-+

149

-+DEADLINE

150

-+	It is squashed as priority 0 FIFO task.

151

-+

152

-+FIFO/RR

153

-+	All RT tasks share one single priority queue in BMQ run queue designed. The

154

-+complexity of insert operation is O(n). BMQ is not designed for system runs

155

-+with major rt policy tasks.

156

-+

157

-+NORMAL/BATCH/IDLE

158

-+	BATCH and IDLE tasks are treated as the same policy. They compete CPU with

159

-+NORMAL policy tasks, but they just don't boost. To control the priority of

160

-+NORMAL/BATCH/IDLE tasks, simply use nice level.

161

-+

162

-+ISO

163

-+	ISO policy is not supported in BMQ. Please use nice level -20 NORMAL policy

164

-+task instead.

165

-+

166

-+Priority management

167

-+-------------------

168

-+

169

-+RT tasks have priority from 0-99. For non-rt tasks, there are three different

170

-+factors used to determine the effective priority of a task. The effective

171

-+priority being what is used to determine which queue it will be in.

172

-+

173

-+The first factor is simply the task’s static priority. Which is assigned from

174

-+task's nice level, within [-20, 19] in userland's point of view and [0, 39]

175

-+internally.

176

-+

177

-+The second factor is the priority boost. This is a value bounded between

178

-+[-MAX_PRIORITY_ADJ, MAX_PRIORITY_ADJ] used to offset the base priority, it is

179

-+modified by the following cases:

180

-+

181

-+*When a thread has used up its entire timeslice, always deboost its boost by

182

-+increasing by one.

183

-+*When a thread gives up cpu control(voluntary or non-voluntary) to reschedule,

184

-+and its switch-in time(time after last switch and run) below the thredhold

185

-+based on its priority boost, will boost its boost by decreasing by one buti is

186

-+capped at 0 (won’t go negative).

187

-+

188

-+The intent in this system is to ensure that interactive threads are serviced

189

-+quickly. These are usually the threads that interact directly with the user

190

-+and cause user-perceivable latency. These threads usually do little work and

191

-+spend most of their time blocked awaiting another user event. So they get the

192

-+priority boost from unblocking while background threads that do most of the

193

-+processing receive the priority penalty for using their entire timeslice.

194

-diff --git a/fs/proc/base.c b/fs/proc/base.c

195

-index 1f394095eb88..2c3d95546908 100644

196

---- a/fs/proc/base.c

197

-+++ b/fs/proc/base.c

198

-@@ -480,7 +480,7 @@ static int proc_pid_schedstat(struct seq_file *m, struct pid_namespace *ns,

199

- 		seq_puts(m, "0 0 0\n");

200

- 	else

201

- 		seq_printf(m, "%llu %llu %lu\n",

202

--		   (unsigned long long)task->se.sum_exec_runtime,

203

-+		   (unsigned long long)tsk_seruntime(task),

204

- 		   (unsigned long long)task->sched_info.run_delay,

205

- 		   task->sched_info.pcount);

206

-

207

-diff --git a/include/asm-generic/resource.h b/include/asm-generic/resource.h

208

-index 8874f681b056..59eb72bf7d5f 100644

209

---- a/include/asm-generic/resource.h

210

-+++ b/include/asm-generic/resource.h

211

-@@ -23,7 +23,7 @@

212

- 	[RLIMIT_LOCKS]		= {  RLIM_INFINITY,  RLIM_INFINITY },	\

213

- 	[RLIMIT_SIGPENDING]	= { 		0,	       0 },	\

214

- 	[RLIMIT_MSGQUEUE]	= {   MQ_BYTES_MAX,   MQ_BYTES_MAX },	\

215

--	[RLIMIT_NICE]		= { 0, 0 },				\

216

-+	[RLIMIT_NICE]		= { 30, 30 },				\

217

- 	[RLIMIT_RTPRIO]		= { 0, 0 },				\

218

- 	[RLIMIT_RTTIME]		= {  RLIM_INFINITY,  RLIM_INFINITY },	\

219

- }

220

-diff --git a/include/linux/sched.h b/include/linux/sched.h

221

-index c1a927ddec64..a7eb91d15442 100644

222

---- a/include/linux/sched.h

223

-+++ b/include/linux/sched.h

224

-@@ -748,12 +748,18 @@ struct task_struct {

225

- 	unsigned int			ptrace;

226

-

227

- #ifdef CONFIG_SMP

228

--	int				on_cpu;

229

- 	struct __call_single_node	wake_entry;

230

-+#endif

231

-+#if defined(CONFIG_SMP) || defined(CONFIG_SCHED_ALT)

232

-+	int				on_cpu;

233

-+#endif

234

-+

235

-+#ifdef CONFIG_SMP

236

- #ifdef CONFIG_THREAD_INFO_IN_TASK

237

- 	/* Current CPU: */

238

- 	unsigned int			cpu;

239

- #endif

240

-+#ifndef CONFIG_SCHED_ALT

241

- 	unsigned int			wakee_flips;

242

- 	unsigned long			wakee_flip_decay_ts;

243

- 	struct task_struct		*last_wakee;

244

-@@ -767,6 +773,7 @@ struct task_struct {

245

- 	 */

246

- 	int				recent_used_cpu;

247

- 	int				wake_cpu;

248

-+#endif /* !CONFIG_SCHED_ALT */

249

- #endif

250

- 	int				on_rq;

251

-

252

-@@ -775,6 +782,20 @@ struct task_struct {

253

- 	int				normal_prio;

254

- 	unsigned int			rt_priority;

255

-

256

-+#ifdef CONFIG_SCHED_ALT

257

-+	u64				last_ran;

258

-+	s64				time_slice;

259

-+	int				sq_idx;

260

-+	struct list_head		sq_node;

261

-+#ifdef CONFIG_SCHED_BMQ

262

-+	int				boost_prio;

263

-+#endif /* CONFIG_SCHED_BMQ */

264

-+#ifdef CONFIG_SCHED_PDS

265

-+	u64				deadline;

266

-+#endif /* CONFIG_SCHED_PDS */

267

-+	/* sched_clock time spent running */

268

-+	u64				sched_time;

269

-+#else /* !CONFIG_SCHED_ALT */

270

- 	const struct sched_class	*sched_class;

271

- 	struct sched_entity		se;

272

- 	struct sched_rt_entity		rt;

273

-@@ -785,6 +806,7 @@ struct task_struct {

274

- 	unsigned long			core_cookie;

275

- 	unsigned int			core_occupation;

276

- #endif

277

-+#endif /* !CONFIG_SCHED_ALT */

278

-

279

- #ifdef CONFIG_CGROUP_SCHED

280

- 	struct task_group		*sched_task_group;

281

-@@ -1505,6 +1527,15 @@ struct task_struct {

282

- 	 */

283

- };

284

-

285

-+#ifdef CONFIG_SCHED_ALT

286

-+#define tsk_seruntime(t)		((t)->sched_time)

287

-+/* replace the uncertian rt_timeout with 0UL */

288

-+#define tsk_rttimeout(t)		(0UL)

289

-+#else /* CFS */

290

-+#define tsk_seruntime(t)	((t)->se.sum_exec_runtime)

291

-+#define tsk_rttimeout(t)	((t)->rt.timeout)

292

-+#endif /* !CONFIG_SCHED_ALT */

293

-+

294

- static inline struct pid *task_pid(struct task_struct *task)

295

- {

296

- 	return task->thread_pid;

297

-diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h

298

-index 1aff00b65f3c..216fdf2fe90c 100644

299

---- a/include/linux/sched/deadline.h

300

-+++ b/include/linux/sched/deadline.h

301

-@@ -1,5 +1,24 @@

302

- /* SPDX-License-Identifier: GPL-2.0 */

303

-

304

-+#ifdef CONFIG_SCHED_ALT

305

-+

306

-+static inline int dl_task(struct task_struct *p)

307

-+{

308

-+	return 0;

309

-+}

310

-+

311

-+#ifdef CONFIG_SCHED_BMQ

312

-+#define __tsk_deadline(p)	(0UL)

313

-+#endif

314

-+

315

-+#ifdef CONFIG_SCHED_PDS

316

-+#define __tsk_deadline(p)	((((u64) ((p)->prio))<<56) | (p)->deadline)

317

-+#endif

318

-+

319

-+#else

320

-+

321

-+#define __tsk_deadline(p)	((p)->dl.deadline)

322

-+

323

- /*

324

-  * SCHED_DEADLINE tasks has negative priorities, reflecting

325

-  * the fact that any of them has higher prio than RT and

326

-@@ -19,6 +38,7 @@ static inline int dl_task(struct task_struct *p)

327

- {

328

- 	return dl_prio(p->prio);

329

- }

330

-+#endif /* CONFIG_SCHED_ALT */

331

-

332

- static inline bool dl_time_before(u64 a, u64 b)

333

- {

334

-diff --git a/include/linux/sched/prio.h b/include/linux/sched/prio.h

335

-index ab83d85e1183..6af9ae681116 100644

336

---- a/include/linux/sched/prio.h

337

-+++ b/include/linux/sched/prio.h

338

-@@ -18,6 +18,32 @@

339

- #define MAX_PRIO		(MAX_RT_PRIO + NICE_WIDTH)

340

- #define DEFAULT_PRIO		(MAX_RT_PRIO + NICE_WIDTH / 2)

341

-

342

-+#ifdef CONFIG_SCHED_ALT

343

-+

344

-+/* Undefine MAX_PRIO and DEFAULT_PRIO */

345

-+#undef MAX_PRIO

346

-+#undef DEFAULT_PRIO

347

-+

348

-+/* +/- priority levels from the base priority */

349

-+#ifdef CONFIG_SCHED_BMQ

350

-+#define MAX_PRIORITY_ADJ	(7)

351

-+

352

-+#define MIN_NORMAL_PRIO		(MAX_RT_PRIO)

353

-+#define MAX_PRIO		(MIN_NORMAL_PRIO + NICE_WIDTH)

354

-+#define DEFAULT_PRIO		(MIN_NORMAL_PRIO + NICE_WIDTH / 2)

355

-+#endif

356

-+

357

-+#ifdef CONFIG_SCHED_PDS

358

-+#define MAX_PRIORITY_ADJ	(0)

359

-+

360

-+#define MIN_NORMAL_PRIO		(128)

361

-+#define NORMAL_PRIO_NUM		(64)

362

-+#define MAX_PRIO		(MIN_NORMAL_PRIO + NORMAL_PRIO_NUM)

363

-+#define DEFAULT_PRIO		(MAX_PRIO - NICE_WIDTH / 2)

364

-+#endif

365

-+

366

-+#endif /* CONFIG_SCHED_ALT */

367

-+

368

- /*

369

-  * Convert user-nice values [ -20 ... 0 ... 19 ]

370

-  * to static priority [ MAX_RT_PRIO..MAX_PRIO-1 ],

371

-diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h

372

-index e5af028c08b4..0a7565d0d3cf 100644

373

---- a/include/linux/sched/rt.h

374

-+++ b/include/linux/sched/rt.h

375

-@@ -24,8 +24,10 @@ static inline bool task_is_realtime(struct task_struct *tsk)

376

-

377

- 	if (policy == SCHED_FIFO || policy == SCHED_RR)

378

- 		return true;

379

-+#ifndef CONFIG_SCHED_ALT

380

- 	if (policy == SCHED_DEADLINE)

381

- 		return true;

382

-+#endif

383

- 	return false;

384

- }

385

-

386

-diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h

387

-index 8f0f778b7c91..991f2280475b 100644

388

---- a/include/linux/sched/topology.h

389

-+++ b/include/linux/sched/topology.h

390

-@@ -225,7 +225,8 @@ static inline bool cpus_share_cache(int this_cpu, int that_cpu)

391

-

392

- #endif	/* !CONFIG_SMP */

393

-

394

--#if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)

395

-+#if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL) && \

396

-+	!defined(CONFIG_SCHED_ALT)

397

- extern void rebuild_sched_domains_energy(void);

398

- #else

399

- static inline void rebuild_sched_domains_energy(void)

400

-diff --git a/init/Kconfig b/init/Kconfig

401

-index 11f8a845f259..c8e82fcafb9e 100644

402

---- a/init/Kconfig

403

-+++ b/init/Kconfig

404

-@@ -814,9 +814,39 @@ config GENERIC_SCHED_CLOCK

405

-

406

- menu "Scheduler features"

407

-

408

-+menuconfig SCHED_ALT

409

-+	bool "Alternative CPU Schedulers"

410

-+	default y

411

-+	help

412

-+	  This feature enable alternative CPU scheduler"

413

-+

414

-+if SCHED_ALT

415

-+

416

-+choice

417

-+	prompt "Alternative CPU Scheduler"

418

-+	default SCHED_BMQ

419

-+

420

-+config SCHED_BMQ

421

-+	bool "BMQ CPU scheduler"

422

-+	help

423

-+	  The BitMap Queue CPU scheduler for excellent interactivity and

424

-+	  responsiveness on the desktop and solid scalability on normal

425

-+	  hardware and commodity servers.

426

-+

427

-+config SCHED_PDS

428

-+	bool "PDS CPU scheduler"

429

-+	help

430

-+	  The Priority and Deadline based Skip list multiple queue CPU

431

-+	  Scheduler.

432

-+

433

-+endchoice

434

-+

435

-+endif

436

-+

437

- config UCLAMP_TASK

438

- 	bool "Enable utilization clamping for RT/FAIR tasks"

439

- 	depends on CPU_FREQ_GOV_SCHEDUTIL

440

-+	depends on !SCHED_ALT

441

- 	help

442

- 	  This feature enables the scheduler to track the clamped utilization

443

- 	  of each CPU based on RUNNABLE tasks scheduled on that CPU.

444

-@@ -902,6 +932,7 @@ config NUMA_BALANCING

445

- 	depends on ARCH_SUPPORTS_NUMA_BALANCING

446

- 	depends on !ARCH_WANT_NUMA_VARIABLE_LOCALITY

447

- 	depends on SMP && NUMA && MIGRATION

448

-+	depends on !SCHED_ALT

449

- 	help

450

- 	  This option adds support for automatic NUMA aware memory/task placement.

451

- 	  The mechanism is quite primitive and is based on migrating memory when

452

-@@ -994,6 +1025,7 @@ config FAIR_GROUP_SCHED

453

- 	depends on CGROUP_SCHED

454

- 	default CGROUP_SCHED

455

-

456

-+if !SCHED_ALT

457

- config CFS_BANDWIDTH

458

- 	bool "CPU bandwidth provisioning for FAIR_GROUP_SCHED"

459

- 	depends on FAIR_GROUP_SCHED

460

-@@ -1016,6 +1048,7 @@ config RT_GROUP_SCHED

461

- 	  realtime bandwidth for them.

462

- 	  See Documentation/scheduler/sched-rt-group.rst for more information.

463

-

464

-+endif #!SCHED_ALT

465

- endif #CGROUP_SCHED

466

-

467

- config UCLAMP_TASK_GROUP

468

-@@ -1259,6 +1292,7 @@ config CHECKPOINT_RESTORE

469

-

470

- config SCHED_AUTOGROUP

471

- 	bool "Automatic process group scheduling"

472

-+	depends on !SCHED_ALT

473

- 	select CGROUPS

474

- 	select CGROUP_SCHED

475

- 	select FAIR_GROUP_SCHED

476

-diff --git a/init/init_task.c b/init/init_task.c

477

-index 2d024066e27b..49f706df0904 100644

478

---- a/init/init_task.c

479

-+++ b/init/init_task.c

480

-@@ -75,9 +75,15 @@ struct task_struct init_task

481

- 	.stack		= init_stack,

482

- 	.usage		= REFCOUNT_INIT(2),

483

- 	.flags		= PF_KTHREAD,

484

-+#ifdef CONFIG_SCHED_ALT

485

-+	.prio		= DEFAULT_PRIO + MAX_PRIORITY_ADJ,

486

-+	.static_prio	= DEFAULT_PRIO,

487

-+	.normal_prio	= DEFAULT_PRIO + MAX_PRIORITY_ADJ,

488

-+#else

489

- 	.prio		= MAX_PRIO - 20,

490

- 	.static_prio	= MAX_PRIO - 20,

491

- 	.normal_prio	= MAX_PRIO - 20,

492

-+#endif

493

- 	.policy		= SCHED_NORMAL,

494

- 	.cpus_ptr	= &init_task.cpus_mask,

495

- 	.user_cpus_ptr	= NULL,

496

-@@ -88,6 +94,17 @@ struct task_struct init_task

497

- 	.restart_block	= {

498

- 		.fn = do_no_restart_syscall,

499

- 	},

500

-+#ifdef CONFIG_SCHED_ALT

501

-+	.sq_node	= LIST_HEAD_INIT(init_task.sq_node),

502

-+#ifdef CONFIG_SCHED_BMQ

503

-+	.boost_prio	= 0,

504

-+	.sq_idx		= 15,

505

-+#endif

506

-+#ifdef CONFIG_SCHED_PDS

507

-+	.deadline	= 0,

508

-+#endif

509

-+	.time_slice	= HZ,

510

-+#else

511

- 	.se		= {

512

- 		.group_node 	= LIST_HEAD_INIT(init_task.se.group_node),

513

- 	},

514

-@@ -95,6 +112,7 @@ struct task_struct init_task

515

- 		.run_list	= LIST_HEAD_INIT(init_task.rt.run_list),

516

- 		.time_slice	= RR_TIMESLICE,

517

- 	},

518

-+#endif

519

- 	.tasks		= LIST_HEAD_INIT(init_task.tasks),

520

- #ifdef CONFIG_SMP

521

- 	.pushable_tasks	= PLIST_NODE_INIT(init_task.pushable_tasks, MAX_PRIO),

522

-diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt

523

-index 5876e30c5740..7594d0a31869 100644

524

---- a/kernel/Kconfig.preempt

525

-+++ b/kernel/Kconfig.preempt

526

-@@ -102,7 +102,7 @@ config PREEMPT_DYNAMIC

527

-

528

- config SCHED_CORE

529

- 	bool "Core Scheduling for SMT"

530

--	depends on SCHED_SMT

531

-+	depends on SCHED_SMT && !SCHED_ALT

532

- 	help

533

- 	  This option permits Core Scheduling, a means of coordinated task

534

- 	  selection across SMT siblings. When enabled -- see

535

-diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c

536

-index 2a9695ccb65f..292112c267b8 100644

537

---- a/kernel/cgroup/cpuset.c

538

-+++ b/kernel/cgroup/cpuset.c

539

-@@ -664,7 +664,7 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)

540

- 	return ret;

541

- }

542

-

543

--#ifdef CONFIG_SMP

544

-+#if defined(CONFIG_SMP) && !defined(CONFIG_SCHED_ALT)

545

- /*

546

-  * Helper routine for generate_sched_domains().

547

-  * Do cpusets a, b have overlapping effective cpus_allowed masks?

548

-@@ -1060,7 +1060,7 @@ static void rebuild_sched_domains_locked(void)

549

- 	/* Have scheduler rebuild the domains */

550

- 	partition_and_rebuild_sched_domains(ndoms, doms, attr);

551

- }

552

--#else /* !CONFIG_SMP */

553

-+#else /* !CONFIG_SMP || CONFIG_SCHED_ALT */

554

- static void rebuild_sched_domains_locked(void)

555

- {

556

- }

557

-diff --git a/kernel/delayacct.c b/kernel/delayacct.c

558

-index 51530d5b15a8..e542d71bb94b 100644

559

---- a/kernel/delayacct.c

560

-+++ b/kernel/delayacct.c

561

-@@ -139,7 +139,7 @@ int delayacct_add_tsk(struct taskstats *d, struct task_struct *tsk)

562

- 	 */

563

- 	t1 = tsk->sched_info.pcount;

564

- 	t2 = tsk->sched_info.run_delay;

565

--	t3 = tsk->se.sum_exec_runtime;

566

-+	t3 = tsk_seruntime(tsk);

567

-

568

- 	d->cpu_count += t1;

569

-

570

-diff --git a/kernel/exit.c b/kernel/exit.c

571

-index 91a43e57a32e..4b157befc10c 100644

572

---- a/kernel/exit.c

573

-+++ b/kernel/exit.c

574

-@@ -122,7 +122,7 @@ static void __exit_signal(struct task_struct *tsk)

575

- 			sig->curr_target = next_thread(tsk);

576

- 	}

577

-

578

--	add_device_randomness((const void*) &tsk->se.sum_exec_runtime,

579

-+	add_device_randomness((const void*) &tsk_seruntime(tsk),

580

- 			      sizeof(unsigned long long));

581

-

582

- 	/*

583

-@@ -143,7 +143,7 @@ static void __exit_signal(struct task_struct *tsk)

584

- 	sig->inblock += task_io_get_inblock(tsk);

585

- 	sig->oublock += task_io_get_oublock(tsk);

586

- 	task_io_accounting_add(&sig->ioac, &tsk->ioac);

587

--	sig->sum_sched_runtime += tsk->se.sum_exec_runtime;

588

-+	sig->sum_sched_runtime += tsk_seruntime(tsk);

589

- 	sig->nr_threads--;

590

- 	__unhash_process(tsk, group_dead);

591

- 	write_sequnlock(&sig->stats_lock);

592

-diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c

593

-index 291b857a6e20..f3480cdb7497 100644

594

---- a/kernel/livepatch/transition.c

595

-+++ b/kernel/livepatch/transition.c

596

-@@ -307,7 +307,11 @@ static bool klp_try_switch_task(struct task_struct *task)

597

- 	 */

598

- 	rq = task_rq_lock(task, &flags);

599

-

600

-+#ifdef	CONFIG_SCHED_ALT

601

-+	if (task_running(task) && task != current) {

602

-+#else

603

- 	if (task_running(rq, task) && task != current) {

604

-+#endif

605

- 		snprintf(err_buf, STACK_ERR_BUF_SIZE,

606

- 			 "%s: %s:%d is running\n", __func__, task->comm,

607

- 			 task->pid);

608

-diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c

609

-index 6bb116c559b4..d4c8168a8270 100644

610

---- a/kernel/locking/rtmutex.c

611

-+++ b/kernel/locking/rtmutex.c

612

-@@ -298,21 +298,25 @@ static __always_inline void

613

- waiter_update_prio(struct rt_mutex_waiter *waiter, struct task_struct *task)

614

- {

615

- 	waiter->prio = __waiter_prio(task);

616

--	waiter->deadline = task->dl.deadline;

617

-+	waiter->deadline = __tsk_deadline(task);

618

- }

619

-

620

- /*

621

-  * Only use with rt_mutex_waiter_{less,equal}()

622

-  */

623

- #define task_to_waiter(p)	\

624

--	&(struct rt_mutex_waiter){ .prio = __waiter_prio(p), .deadline = (p)->dl.deadline }

625

-+	&(struct rt_mutex_waiter){ .prio = __waiter_prio(p), .deadline = __tsk_deadline(p) }

626

-

627

- static __always_inline int rt_mutex_waiter_less(struct rt_mutex_waiter *left,

628

- 						struct rt_mutex_waiter *right)

629

- {

630

-+#ifdef CONFIG_SCHED_PDS

631

-+	return (left->deadline < right->deadline);

632

-+#else

633

- 	if (left->prio < right->prio)

634

- 		return 1;

635

-

636

-+#ifndef CONFIG_SCHED_BMQ

637

- 	/*

638

- 	 * If both waiters have dl_prio(), we check the deadlines of the

639

- 	 * associated tasks.

640

-@@ -321,16 +325,22 @@ static __always_inline int rt_mutex_waiter_less(struct rt_mutex_waiter *left,

641

- 	 */

642

- 	if (dl_prio(left->prio))

643

- 		return dl_time_before(left->deadline, right->deadline);

644

-+#endif

645

-

646

- 	return 0;

647

-+#endif

648

- }

649

-

650

- static __always_inline int rt_mutex_waiter_equal(struct rt_mutex_waiter *left,

651

- 						 struct rt_mutex_waiter *right)

652

- {

653

-+#ifdef CONFIG_SCHED_PDS

654

-+	return (left->deadline == right->deadline);

655

-+#else

656

- 	if (left->prio != right->prio)

657

- 		return 0;

658

-

659

-+#ifndef CONFIG_SCHED_BMQ

660

- 	/*

661

- 	 * If both waiters have dl_prio(), we check the deadlines of the

662

- 	 * associated tasks.

663

-@@ -339,8 +349,10 @@ static __always_inline int rt_mutex_waiter_equal(struct rt_mutex_waiter *left,

664

- 	 */

665

- 	if (dl_prio(left->prio))

666

- 		return left->deadline == right->deadline;

667

-+#endif

668

-

669

- 	return 1;

670

-+#endif

671

- }

672

-

673

- static inline bool rt_mutex_steal(struct rt_mutex_waiter *waiter,

674

-diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile

675

-index 978fcfca5871..0425ee149b4d 100644

676

---- a/kernel/sched/Makefile

677

-+++ b/kernel/sched/Makefile

678

-@@ -22,14 +22,21 @@ ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y)

679

- CFLAGS_core.o := $(PROFILING) -fno-omit-frame-pointer

680

- endif

681

-

682

--obj-y += core.o loadavg.o clock.o cputime.o

683

--obj-y += idle.o fair.o rt.o deadline.o

684

--obj-y += wait.o wait_bit.o swait.o completion.o

685

--

686

--obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o pelt.o

687

-+ifdef CONFIG_SCHED_ALT

688

-+obj-y += alt_core.o

689

-+obj-$(CONFIG_SCHED_DEBUG) += alt_debug.o

690

-+else

691

-+obj-y += core.o

692

-+obj-y += fair.o rt.o deadline.o

693

-+obj-$(CONFIG_SMP) += cpudeadline.o stop_task.o

694

- obj-$(CONFIG_SCHED_AUTOGROUP) += autogroup.o

695

--obj-$(CONFIG_SCHEDSTATS) += stats.o

696

-+endif

697

- obj-$(CONFIG_SCHED_DEBUG) += debug.o

698

-+obj-y += loadavg.o clock.o cputime.o

699

-+obj-y += idle.o

700

-+obj-y += wait.o wait_bit.o swait.o completion.o

701

-+obj-$(CONFIG_SMP) += cpupri.o pelt.o topology.o

702

-+obj-$(CONFIG_SCHEDSTATS) += stats.o

703

- obj-$(CONFIG_CGROUP_CPUACCT) += cpuacct.o

704

- obj-$(CONFIG_CPU_FREQ) += cpufreq.o

705

- obj-$(CONFIG_CPU_FREQ_GOV_SCHEDUTIL) += cpufreq_schedutil.o

706

-diff --git a/kernel/sched/alt_core.c b/kernel/sched/alt_core.c

707

-new file mode 100644

708

-index 000000000000..8b0ddbdd24e4

709

---- /dev/null

710

-+++ b/kernel/sched/alt_core.c

711

-@@ -0,0 +1,7627 @@

712

-+/*

713

-+ *  kernel/sched/alt_core.c

714

-+ *

715

-+ *  Core alternative kernel scheduler code and related syscalls

716

-+ *

717

-+ *  Copyright (C) 1991-2002  Linus Torvalds

718

-+ *

719

-+ *  2009-08-13	Brainfuck deadline scheduling policy by Con Kolivas deletes

720

-+ *		a whole lot of those previous things.

721

-+ *  2017-09-06	Priority and Deadline based Skip list multiple queue kernel

722

-+ *		scheduler by Alfred Chen.

723

-+ *  2019-02-20	BMQ(BitMap Queue) kernel scheduler by Alfred Chen.

724

-+ */

725

-+#define CREATE_TRACE_POINTS

726

-+#include <trace/events/sched.h>

727

-+#undef CREATE_TRACE_POINTS

728

-+

729

-+#include "sched.h"

730

-+

731

-+#include <linux/sched/rt.h>

732

-+

733

-+#include <linux/context_tracking.h>

734

-+#include <linux/compat.h>

735

-+#include <linux/blkdev.h>

736

-+#include <linux/delayacct.h>

737

-+#include <linux/freezer.h>

738

-+#include <linux/init_task.h>

739

-+#include <linux/kprobes.h>

740

-+#include <linux/mmu_context.h>

741

-+#include <linux/nmi.h>

742

-+#include <linux/profile.h>

743

-+#include <linux/rcupdate_wait.h>

744

-+#include <linux/security.h>

745

-+#include <linux/syscalls.h>

746

-+#include <linux/wait_bit.h>

747

-+

748

-+#include <linux/kcov.h>

749

-+#include <linux/scs.h>

750

-+

751

-+#include <asm/switch_to.h>

752

-+

753

-+#include "../workqueue_internal.h"

754

-+#include "../../fs/io-wq.h"

755

-+#include "../smpboot.h"

756

-+

757

-+#include "pelt.h"

758

-+#include "smp.h"

759

-+

760

-+/*

761

-+ * Export tracepoints that act as a bare tracehook (ie: have no trace event

762

-+ * associated with them) to allow external modules to probe them.

763

-+ */

764

-+EXPORT_TRACEPOINT_SYMBOL_GPL(pelt_irq_tp);

765

-+

766

-+#ifdef CONFIG_SCHED_DEBUG

767

-+#define sched_feat(x)	(1)

768

-+/*

769

-+ * Print a warning if need_resched is set for the given duration (if

770

-+ * LATENCY_WARN is enabled).

771

-+ *

772

-+ * If sysctl_resched_latency_warn_once is set, only one warning will be shown

773

-+ * per boot.

774

-+ */

775

-+__read_mostly int sysctl_resched_latency_warn_ms = 100;

776

-+__read_mostly int sysctl_resched_latency_warn_once = 1;

777

-+#else

778

-+#define sched_feat(x)	(0)

779

-+#endif /* CONFIG_SCHED_DEBUG */

780

-+

781

-+#define ALT_SCHED_VERSION "v5.15-r1"

782

-+

783

-+/* rt_prio(prio) defined in include/linux/sched/rt.h */

784

-+#define rt_task(p)		rt_prio((p)->prio)

785

-+#define rt_policy(policy)	((policy) == SCHED_FIFO || (policy) == SCHED_RR)

786

-+#define task_has_rt_policy(p)	(rt_policy((p)->policy))

787

-+

788

-+#define STOP_PRIO		(MAX_RT_PRIO - 1)

789

-+

790

-+/* Default time slice is 4 in ms, can be set via kernel parameter "sched_timeslice" */

791

-+u64 sched_timeslice_ns __read_mostly = (4 << 20);

792

-+

793

-+static inline void requeue_task(struct task_struct *p, struct rq *rq);

794

-+

795

-+#ifdef CONFIG_SCHED_BMQ

796

-+#include "bmq.h"

797

-+#endif

798

-+#ifdef CONFIG_SCHED_PDS

799

-+#include "pds.h"

800

-+#endif

801

-+

802

-+static int __init sched_timeslice(char *str)

803

-+{

804

-+	int timeslice_ms;

805

-+

806

-+	get_option(&str, &timeslice_ms);

807

-+	if (2 != timeslice_ms)

808

-+		timeslice_ms = 4;

809

-+	sched_timeslice_ns = timeslice_ms << 20;

810

-+	sched_timeslice_imp(timeslice_ms);

811

-+

812

-+	return 0;

813

-+}

814

-+early_param("sched_timeslice", sched_timeslice);

815

-+

816

-+/* Reschedule if less than this many μs left */

817

-+#define RESCHED_NS		(100 << 10)

818

-+

819

-+/**

820

-+ * sched_yield_type - Choose what sort of yield sched_yield will perform.

821

-+ * 0: No yield.

822

-+ * 1: Deboost and requeue task. (default)

823

-+ * 2: Set rq skip task.

824

-+ */

825

-+int sched_yield_type __read_mostly = 1;

826

-+

827

-+#ifdef CONFIG_SMP

828

-+static cpumask_t sched_rq_pending_mask ____cacheline_aligned_in_smp;

829

-+

830

-+DEFINE_PER_CPU(cpumask_t [NR_CPU_AFFINITY_LEVELS], sched_cpu_topo_masks);

831

-+DEFINE_PER_CPU(cpumask_t *, sched_cpu_llc_mask);

832

-+DEFINE_PER_CPU(cpumask_t *, sched_cpu_topo_end_mask);

833

-+

834

-+#ifdef CONFIG_SCHED_SMT

835

-+DEFINE_STATIC_KEY_FALSE(sched_smt_present);

836

-+EXPORT_SYMBOL_GPL(sched_smt_present);

837

-+#endif

838

-+

839

-+/*

840

-+ * Keep a unique ID per domain (we use the first CPUs number in the cpumask of

841

-+ * the domain), this allows us to quickly tell if two cpus are in the same cache

842

-+ * domain, see cpus_share_cache().

843

-+ */

844

-+DEFINE_PER_CPU(int, sd_llc_id);

845

-+#endif /* CONFIG_SMP */

846

-+

847

-+static DEFINE_MUTEX(sched_hotcpu_mutex);

848

-+

849

-+DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);

850

-+

851

-+#ifndef prepare_arch_switch

852

-+# define prepare_arch_switch(next)	do { } while (0)

853

-+#endif

854

-+#ifndef finish_arch_post_lock_switch

855

-+# define finish_arch_post_lock_switch()	do { } while (0)

856

-+#endif

857

-+

858

-+#ifdef CONFIG_SCHED_SMT

859

-+static cpumask_t sched_sg_idle_mask ____cacheline_aligned_in_smp;

860

-+#endif

861

-+static cpumask_t sched_rq_watermark[SCHED_BITS] ____cacheline_aligned_in_smp;

862

-+

863

-+/* sched_queue related functions */

864

-+static inline void sched_queue_init(struct sched_queue *q)

865

-+{

866

-+	int i;

867

-+

868

-+	bitmap_zero(q->bitmap, SCHED_BITS);

869

-+	for(i = 0; i < SCHED_BITS; i++)

870

-+		INIT_LIST_HEAD(&q->heads[i]);

871

-+}

872

-+

873

-+/*

874

-+ * Init idle task and put into queue structure of rq

875

-+ * IMPORTANT: may be called multiple times for a single cpu

876

-+ */

877

-+static inline void sched_queue_init_idle(struct sched_queue *q,

878

-+					 struct task_struct *idle)

879

-+{

880

-+	idle->sq_idx = IDLE_TASK_SCHED_PRIO;

881

-+	INIT_LIST_HEAD(&q->heads[idle->sq_idx]);

882

-+	list_add(&idle->sq_node, &q->heads[idle->sq_idx]);

883

-+}

884

-+

885

-+/* water mark related functions */

886

-+static inline void update_sched_rq_watermark(struct rq *rq)

887

-+{

888

-+	unsigned long watermark = find_first_bit(rq->queue.bitmap, SCHED_QUEUE_BITS);

889

-+	unsigned long last_wm = rq->watermark;

890

-+	unsigned long i;

891

-+	int cpu;

892

-+

893

-+	if (watermark == last_wm)

894

-+		return;

895

-+

896

-+	rq->watermark = watermark;

897

-+	cpu = cpu_of(rq);

898

-+	if (watermark < last_wm) {

899

-+		for (i = last_wm; i > watermark; i--)

900

-+			cpumask_clear_cpu(cpu, sched_rq_watermark + SCHED_BITS - 1 - i);

901

-+#ifdef CONFIG_SCHED_SMT

902

-+		if (static_branch_likely(&sched_smt_present) &&

903

-+		    IDLE_TASK_SCHED_PRIO == last_wm)

904

-+			cpumask_andnot(&sched_sg_idle_mask,

905

-+				       &sched_sg_idle_mask, cpu_smt_mask(cpu));

906

-+#endif

907

-+		return;

908

-+	}

909

-+	/* last_wm < watermark */

910

-+	for (i = watermark; i > last_wm; i--)

911

-+		cpumask_set_cpu(cpu, sched_rq_watermark + SCHED_BITS - 1 - i);

912

-+#ifdef CONFIG_SCHED_SMT

913

-+	if (static_branch_likely(&sched_smt_present) &&

914

-+	    IDLE_TASK_SCHED_PRIO == watermark) {

915

-+		cpumask_t tmp;

916

-+

917

-+		cpumask_and(&tmp, cpu_smt_mask(cpu), sched_rq_watermark);

918

-+		if (cpumask_equal(&tmp, cpu_smt_mask(cpu)))

919

-+			cpumask_or(&sched_sg_idle_mask,

920

-+				   &sched_sg_idle_mask, cpu_smt_mask(cpu));

921

-+	}

922

-+#endif

923

-+}

924

-+

925

-+/*

926

-+ * This routine assume that the idle task always in queue

927

-+ */

928

-+static inline struct task_struct *sched_rq_first_task(struct rq *rq)

929

-+{

930

-+	unsigned long idx = find_first_bit(rq->queue.bitmap, SCHED_QUEUE_BITS);

931

-+	const struct list_head *head = &rq->queue.heads[sched_prio2idx(idx, rq)];

932

-+

933

-+	return list_first_entry(head, struct task_struct, sq_node);

934

-+}

935

-+

936

-+static inline struct task_struct *

937

-+sched_rq_next_task(struct task_struct *p, struct rq *rq)

938

-+{

939

-+	unsigned long idx = p->sq_idx;

940

-+	struct list_head *head = &rq->queue.heads[idx];

941

-+

942

-+	if (list_is_last(&p->sq_node, head)) {

943

-+		idx = find_next_bit(rq->queue.bitmap, SCHED_QUEUE_BITS,

944

-+				    sched_idx2prio(idx, rq) + 1);

945

-+		head = &rq->queue.heads[sched_prio2idx(idx, rq)];

946

-+

947

-+		return list_first_entry(head, struct task_struct, sq_node);

948

-+	}

949

-+

950

-+	return list_next_entry(p, sq_node);

951

-+}

952

-+

953

-+static inline struct task_struct *rq_runnable_task(struct rq *rq)

954

-+{

955

-+	struct task_struct *next = sched_rq_first_task(rq);

956

-+

957

-+	if (unlikely(next == rq->skip))

958

-+		next = sched_rq_next_task(next, rq);

959

-+

960

-+	return next;

961

-+}

962

-+

963

-+/*

964

-+ * Serialization rules:

965

-+ *

966

-+ * Lock order:

967

-+ *

968

-+ *   p->pi_lock

969

-+ *     rq->lock

970

-+ *       hrtimer_cpu_base->lock (hrtimer_start() for bandwidth controls)

971

-+ *

972

-+ *  rq1->lock

973

-+ *    rq2->lock  where: rq1 < rq2

974

-+ *

975

-+ * Regular state:

976

-+ *

977

-+ * Normal scheduling state is serialized by rq->lock. __schedule() takes the

978

-+ * local CPU's rq->lock, it optionally removes the task from the runqueue and

979

-+ * always looks at the local rq data structures to find the most eligible task

980

-+ * to run next.

981

-+ *

982

-+ * Task enqueue is also under rq->lock, possibly taken from another CPU.

983

-+ * Wakeups from another LLC domain might use an IPI to transfer the enqueue to

984

-+ * the local CPU to avoid bouncing the runqueue state around [ see

985

-+ * ttwu_queue_wakelist() ]

986

-+ *

987

-+ * Task wakeup, specifically wakeups that involve migration, are horribly

988

-+ * complicated to avoid having to take two rq->locks.

989

-+ *

990

-+ * Special state:

991

-+ *

992

-+ * System-calls and anything external will use task_rq_lock() which acquires

993

-+ * both p->pi_lock and rq->lock. As a consequence the state they change is

994

-+ * stable while holding either lock:

995

-+ *

996

-+ *  - sched_setaffinity()/

997

-+ *    set_cpus_allowed_ptr():	p->cpus_ptr, p->nr_cpus_allowed

998

-+ *  - set_user_nice():		p->se.load, p->*prio

999

-+ *  - __sched_setscheduler():	p->sched_class, p->policy, p->*prio,

1000

-+ *				p->se.load, p->rt_priority,

1001

-+ *				p->dl.dl_{runtime, deadline, period, flags, bw, density}

1002

-+ *  - sched_setnuma():		p->numa_preferred_nid

1003

-+ *  - sched_move_task()/

1004

-+ *    cpu_cgroup_fork():	p->sched_task_group

1005

-+ *  - uclamp_update_active()	p->uclamp*

1006

-+ *

1007

-+ * p->state <- TASK_*:

1008

-+ *

1009

-+ *   is changed locklessly using set_current_state(), __set_current_state() or

1010

-+ *   set_special_state(), see their respective comments, or by

1011

-+ *   try_to_wake_up(). This latter uses p->pi_lock to serialize against

1012

-+ *   concurrent self.

1013

-+ *

1014

-+ * p->on_rq <- { 0, 1 = TASK_ON_RQ_QUEUED, 2 = TASK_ON_RQ_MIGRATING }:

1015

-+ *

1016

-+ *   is set by activate_task() and cleared by deactivate_task(), under

1017

-+ *   rq->lock. Non-zero indicates the task is runnable, the special

1018

-+ *   ON_RQ_MIGRATING state is used for migration without holding both

1019

-+ *   rq->locks. It indicates task_cpu() is not stable, see task_rq_lock().

1020

-+ *

1021

-+ * p->on_cpu <- { 0, 1 }:

1022

-+ *

1023

-+ *   is set by prepare_task() and cleared by finish_task() such that it will be

1024

-+ *   set before p is scheduled-in and cleared after p is scheduled-out, both

1025

-+ *   under rq->lock. Non-zero indicates the task is running on its CPU.

1026

-+ *

1027

-+ *   [ The astute reader will observe that it is possible for two tasks on one

1028

-+ *     CPU to have ->on_cpu = 1 at the same time. ]

1029

-+ *

1030

-+ * task_cpu(p): is changed by set_task_cpu(), the rules are:

1031

-+ *

1032

-+ *  - Don't call set_task_cpu() on a blocked task:

1033

-+ *

1034

-+ *    We don't care what CPU we're not running on, this simplifies hotplug,

1035

-+ *    the CPU assignment of blocked tasks isn't required to be valid.

1036

-+ *

1037

-+ *  - for try_to_wake_up(), called under p->pi_lock:

1038

-+ *

1039

-+ *    This allows try_to_wake_up() to only take one rq->lock, see its comment.

1040

-+ *

1041

-+ *  - for migration called under rq->lock:

1042

-+ *    [ see task_on_rq_migrating() in task_rq_lock() ]

1043

-+ *

1044

-+ *    o move_queued_task()

1045

-+ *    o detach_task()

1046

-+ *

1047

-+ *  - for migration called under double_rq_lock():

1048

-+ *

1049

-+ *    o __migrate_swap_task()

1050

-+ *    o push_rt_task() / pull_rt_task()

1051

-+ *    o push_dl_task() / pull_dl_task()

1052

-+ *    o dl_task_offline_migration()

1053

-+ *

1054

-+ */

1055

-+

1056

-+/*

1057

-+ * Context: p->pi_lock

1058

-+ */

1059

-+static inline struct rq

1060

-+*__task_access_lock(struct task_struct *p, raw_spinlock_t **plock)

1061

-+{

1062

-+	struct rq *rq;

1063

-+	for (;;) {

1064

-+		rq = task_rq(p);

1065

-+		if (p->on_cpu || task_on_rq_queued(p)) {

1066

-+			raw_spin_lock(&rq->lock);

1067

-+			if (likely((p->on_cpu || task_on_rq_queued(p))

1068

-+				   && rq == task_rq(p))) {

1069

-+				*plock = &rq->lock;

1070

-+				return rq;

1071

-+			}

1072

-+			raw_spin_unlock(&rq->lock);

1073

-+		} else if (task_on_rq_migrating(p)) {

1074

-+			do {

1075

-+				cpu_relax();

1076

-+			} while (unlikely(task_on_rq_migrating(p)));

1077

-+		} else {

1078

-+			*plock = NULL;

1079

-+			return rq;

1080

-+		}

1081

-+	}

1082

-+}

1083

-+

1084

-+static inline void

1085

-+__task_access_unlock(struct task_struct *p, raw_spinlock_t *lock)

1086

-+{

1087

-+	if (NULL != lock)

1088

-+		raw_spin_unlock(lock);

1089

-+}

1090

-+

1091

-+static inline struct rq

1092

-+*task_access_lock_irqsave(struct task_struct *p, raw_spinlock_t **plock,

1093

-+			  unsigned long *flags)

1094

-+{

1095

-+	struct rq *rq;

1096

-+	for (;;) {

1097

-+		rq = task_rq(p);

1098

-+		if (p->on_cpu || task_on_rq_queued(p)) {

1099

-+			raw_spin_lock_irqsave(&rq->lock, *flags);

1100

-+			if (likely((p->on_cpu || task_on_rq_queued(p))

1101

-+				   && rq == task_rq(p))) {

1102

-+				*plock = &rq->lock;

1103

-+				return rq;

1104

-+			}

1105

-+			raw_spin_unlock_irqrestore(&rq->lock, *flags);

1106

-+		} else if (task_on_rq_migrating(p)) {

1107

-+			do {

1108

-+				cpu_relax();

1109

-+			} while (unlikely(task_on_rq_migrating(p)));

1110

-+		} else {

1111

-+			raw_spin_lock_irqsave(&p->pi_lock, *flags);

1112

-+			if (likely(!p->on_cpu && !p->on_rq &&

1113

-+				   rq == task_rq(p))) {

1114

-+				*plock = &p->pi_lock;

1115

-+				return rq;

1116

-+			}

1117

-+			raw_spin_unlock_irqrestore(&p->pi_lock, *flags);

1118

-+		}

1119

-+	}

1120

-+}

1121

-+

1122

-+static inline void

1123

-+task_access_unlock_irqrestore(struct task_struct *p, raw_spinlock_t *lock,

1124

-+			      unsigned long *flags)

1125

-+{

1126

-+	raw_spin_unlock_irqrestore(lock, *flags);

1127

-+}

1128

-+

1129

-+/*

1130

-+ * __task_rq_lock - lock the rq @p resides on.

1131

-+ */

1132

-+struct rq *__task_rq_lock(struct task_struct *p, struct rq_flags *rf)

1133

-+	__acquires(rq->lock)

1134

-+{

1135

-+	struct rq *rq;

1136

-+

1137

-+	lockdep_assert_held(&p->pi_lock);

1138

-+

1139

-+	for (;;) {

1140

-+		rq = task_rq(p);

1141

-+		raw_spin_lock(&rq->lock);

1142

-+		if (likely(rq == task_rq(p) && !task_on_rq_migrating(p)))

1143

-+			return rq;

1144

-+		raw_spin_unlock(&rq->lock);

1145

-+

1146

-+		while (unlikely(task_on_rq_migrating(p)))

1147

-+			cpu_relax();

1148

-+	}

1149

-+}

1150

-+

1151

-+/*

1152

-+ * task_rq_lock - lock p->pi_lock and lock the rq @p resides on.

1153

-+ */

1154

-+struct rq *task_rq_lock(struct task_struct *p, struct rq_flags *rf)

1155

-+	__acquires(p->pi_lock)

1156

-+	__acquires(rq->lock)

1157

-+{

1158

-+	struct rq *rq;

1159

-+

1160

-+	for (;;) {

1161

-+		raw_spin_lock_irqsave(&p->pi_lock, rf->flags);

1162

-+		rq = task_rq(p);

1163

-+		raw_spin_lock(&rq->lock);

1164

-+		/*

1165

-+		 *	move_queued_task()		task_rq_lock()

1166

-+		 *

1167

-+		 *	ACQUIRE (rq->lock)

1168

-+		 *	[S] ->on_rq = MIGRATING		[L] rq = task_rq()

1169

-+		 *	WMB (__set_task_cpu())		ACQUIRE (rq->lock);

1170

-+		 *	[S] ->cpu = new_cpu		[L] task_rq()

1171

-+		 *					[L] ->on_rq

1172

-+		 *	RELEASE (rq->lock)

1173

-+		 *

1174

-+		 * If we observe the old CPU in task_rq_lock(), the acquire of

1175

-+		 * the old rq->lock will fully serialize against the stores.

1176

-+		 *

1177

-+		 * If we observe the new CPU in task_rq_lock(), the address

1178

-+		 * dependency headed by '[L] rq = task_rq()' and the acquire

1179

-+		 * will pair with the WMB to ensure we then also see migrating.

1180

-+		 */

1181

-+		if (likely(rq == task_rq(p) && !task_on_rq_migrating(p))) {

1182

-+			return rq;

1183

-+		}

1184

-+		raw_spin_unlock(&rq->lock);

1185

-+		raw_spin_unlock_irqrestore(&p->pi_lock, rf->flags);

1186

-+

1187

-+		while (unlikely(task_on_rq_migrating(p)))

1188

-+			cpu_relax();

1189

-+	}

1190

-+}

1191

-+

1192

-+static inline void

1193

-+rq_lock_irqsave(struct rq *rq, struct rq_flags *rf)

1194

-+	__acquires(rq->lock)

1195

-+{

1196

-+	raw_spin_lock_irqsave(&rq->lock, rf->flags);

1197

-+}

1198

-+

1199

-+static inline void

1200

-+rq_unlock_irqrestore(struct rq *rq, struct rq_flags *rf)

1201

-+	__releases(rq->lock)

1202

-+{

1203

-+	raw_spin_unlock_irqrestore(&rq->lock, rf->flags);

1204

-+}

1205

-+

1206

-+void raw_spin_rq_lock_nested(struct rq *rq, int subclass)

1207

-+{

1208

-+	raw_spinlock_t *lock;

1209

-+

1210

-+	/* Matches synchronize_rcu() in __sched_core_enable() */

1211

-+	preempt_disable();

1212

-+

1213

-+	for (;;) {

1214

-+		lock = __rq_lockp(rq);

1215

-+		raw_spin_lock_nested(lock, subclass);

1216

-+		if (likely(lock == __rq_lockp(rq))) {

1217

-+			/* preempt_count *MUST* be > 1 */

1218

-+			preempt_enable_no_resched();

1219

-+			return;

1220

-+		}

1221

-+		raw_spin_unlock(lock);

1222

-+	}

1223

-+}

1224

-+

1225

-+void raw_spin_rq_unlock(struct rq *rq)

1226

-+{

1227

-+	raw_spin_unlock(rq_lockp(rq));

1228

-+}

1229

-+

1230

-+/*

1231

-+ * RQ-clock updating methods:

1232

-+ */

1233

-+

1234

-+static void update_rq_clock_task(struct rq *rq, s64 delta)

1235

-+{

1236

-+/*

1237

-+ * In theory, the compile should just see 0 here, and optimize out the call

1238

-+ * to sched_rt_avg_update. But I don't trust it...

1239

-+ */

1240

-+	s64 __maybe_unused steal = 0, irq_delta = 0;

1241

-+

1242

-+#ifdef CONFIG_IRQ_TIME_ACCOUNTING

1243

-+	irq_delta = irq_time_read(cpu_of(rq)) - rq->prev_irq_time;

1244

-+

1245

-+	/*

1246

-+	 * Since irq_time is only updated on {soft,}irq_exit, we might run into

1247

-+	 * this case when a previous update_rq_clock() happened inside a

1248

-+	 * {soft,}irq region.

1249

-+	 *

1250

-+	 * When this happens, we stop ->clock_task and only update the

1251

-+	 * prev_irq_time stamp to account for the part that fit, so that a next

1252

-+	 * update will consume the rest. This ensures ->clock_task is

1253

-+	 * monotonic.

1254

-+	 *

1255

-+	 * It does however cause some slight miss-attribution of {soft,}irq

1256

-+	 * time, a more accurate solution would be to update the irq_time using

1257

-+	 * the current rq->clock timestamp, except that would require using

1258

-+	 * atomic ops.

1259

-+	 */

1260

-+	if (irq_delta > delta)

1261

-+		irq_delta = delta;

1262

-+

1263

-+	rq->prev_irq_time += irq_delta;

1264

-+	delta -= irq_delta;

1265

-+#endif

1266

-+#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING

1267

-+	if (static_key_false((&paravirt_steal_rq_enabled))) {

1268

-+		steal = paravirt_steal_clock(cpu_of(rq));

1269

-+		steal -= rq->prev_steal_time_rq;

1270

-+

1271

-+		if (unlikely(steal > delta))

1272

-+			steal = delta;

1273

-+

1274

-+		rq->prev_steal_time_rq += steal;

1275

-+		delta -= steal;

1276

-+	}

1277

-+#endif

1278

-+

1279

-+	rq->clock_task += delta;

1280

-+

1281

-+#ifdef CONFIG_HAVE_SCHED_AVG_IRQ

1282

-+	if ((irq_delta + steal))

1283

-+		update_irq_load_avg(rq, irq_delta + steal);

1284

-+#endif

1285

-+}

1286

-+

1287

-+static inline void update_rq_clock(struct rq *rq)

1288

-+{

1289

-+	s64 delta = sched_clock_cpu(cpu_of(rq)) - rq->clock;

1290

-+

1291

-+	if (unlikely(delta <= 0))

1292

-+		return;

1293

-+	rq->clock += delta;

1294

-+	update_rq_time_edge(rq);

1295

-+	update_rq_clock_task(rq, delta);

1296

-+}

1297

-+

1298

-+/*

1299

-+ * RQ Load update routine

1300

-+ */

1301

-+#define RQ_LOAD_HISTORY_BITS		(sizeof(s32) * 8ULL)

1302

-+#define RQ_UTIL_SHIFT			(8)

1303

-+#define RQ_LOAD_HISTORY_TO_UTIL(l)	(((l) >> (RQ_LOAD_HISTORY_BITS - 1 - RQ_UTIL_SHIFT)) & 0xff)

1304

-+

1305

-+#define LOAD_BLOCK(t)		((t) >> 17)

1306

-+#define LOAD_HALF_BLOCK(t)	((t) >> 16)

1307

-+#define BLOCK_MASK(t)		((t) & ((0x01 << 18) - 1))

1308

-+#define LOAD_BLOCK_BIT(b)	(1UL << (RQ_LOAD_HISTORY_BITS - 1 - (b)))

1309

-+#define CURRENT_LOAD_BIT	LOAD_BLOCK_BIT(0)

1310

-+

1311

-+static inline void rq_load_update(struct rq *rq)

1312

-+{

1313

-+	u64 time = rq->clock;

1314

-+	u64 delta = min(LOAD_BLOCK(time) - LOAD_BLOCK(rq->load_stamp),

1315

-+			RQ_LOAD_HISTORY_BITS - 1);

1316

-+	u64 prev = !!(rq->load_history & CURRENT_LOAD_BIT);

1317

-+	u64 curr = !!rq->nr_running;

1318

-+

1319

-+	if (delta) {

1320

-+		rq->load_history = rq->load_history >> delta;

1321

-+

1322

-+		if (delta < RQ_UTIL_SHIFT) {

1323

-+			rq->load_block += (~BLOCK_MASK(rq->load_stamp)) * prev;

1324

-+			if (!!LOAD_HALF_BLOCK(rq->load_block) ^ curr)

1325

-+				rq->load_history ^= LOAD_BLOCK_BIT(delta);

1326

-+		}

1327

-+

1328

-+		rq->load_block = BLOCK_MASK(time) * prev;

1329

-+	} else {

1330

-+		rq->load_block += (time - rq->load_stamp) * prev;

1331

-+	}

1332

-+	if (prev ^ curr)

1333

-+		rq->load_history ^= CURRENT_LOAD_BIT;

1334

-+	rq->load_stamp = time;

1335

-+}

1336

-+

1337

-+unsigned long rq_load_util(struct rq *rq, unsigned long max)

1338

-+{

1339

-+	return RQ_LOAD_HISTORY_TO_UTIL(rq->load_history) * (max >> RQ_UTIL_SHIFT);

1340

-+}

1341

-+

1342

-+#ifdef CONFIG_SMP

1343

-+unsigned long sched_cpu_util(int cpu, unsigned long max)

1344

-+{

1345

-+	return rq_load_util(cpu_rq(cpu), max);

1346

-+}

1347

-+#endif /* CONFIG_SMP */

1348

-+

1349

-+#ifdef CONFIG_CPU_FREQ

1350

-+/**

1351

-+ * cpufreq_update_util - Take a note about CPU utilization changes.

1352

-+ * @rq: Runqueue to carry out the update for.

1353

-+ * @flags: Update reason flags.

1354

-+ *

1355

-+ * This function is called by the scheduler on the CPU whose utilization is

1356

-+ * being updated.

1357

-+ *

1358

-+ * It can only be called from RCU-sched read-side critical sections.

1359

-+ *

1360

-+ * The way cpufreq is currently arranged requires it to evaluate the CPU

1361

-+ * performance state (frequency/voltage) on a regular basis to prevent it from

1362

-+ * being stuck in a completely inadequate performance level for too long.

1363

-+ * That is not guaranteed to happen if the updates are only triggered from CFS

1364

-+ * and DL, though, because they may not be coming in if only RT tasks are

1365

-+ * active all the time (or there are RT tasks only).

1366

-+ *

1367

-+ * As a workaround for that issue, this function is called periodically by the

1368

-+ * RT sched class to trigger extra cpufreq updates to prevent it from stalling,

1369

-+ * but that really is a band-aid.  Going forward it should be replaced with

1370

-+ * solutions targeted more specifically at RT tasks.

1371

-+ */

1372

-+static inline void cpufreq_update_util(struct rq *rq, unsigned int flags)

1373

-+{

1374

-+	struct update_util_data *data;

1375

-+

1376

-+#ifdef CONFIG_SMP

1377

-+	rq_load_update(rq);

1378

-+#endif

1379

-+	data = rcu_dereference_sched(*per_cpu_ptr(&cpufreq_update_util_data,

1380

-+						  cpu_of(rq)));

1381

-+	if (data)

1382

-+		data->func(data, rq_clock(rq), flags);

1383

-+}

1384

-+#else

1385

-+static inline void cpufreq_update_util(struct rq *rq, unsigned int flags)

1386

-+{

1387

-+#ifdef CONFIG_SMP

1388

-+	rq_load_update(rq);

1389

-+#endif

1390

-+}

1391

-+#endif /* CONFIG_CPU_FREQ */

1392

-+

1393

-+#ifdef CONFIG_NO_HZ_FULL

1394

-+/*

1395

-+ * Tick may be needed by tasks in the runqueue depending on their policy and

1396

-+ * requirements. If tick is needed, lets send the target an IPI to kick it out

1397

-+ * of nohz mode if necessary.

1398

-+ */

1399

-+static inline void sched_update_tick_dependency(struct rq *rq)

1400

-+{

1401

-+	int cpu = cpu_of(rq);

1402

-+

1403

-+	if (!tick_nohz_full_cpu(cpu))

1404

-+		return;

1405

-+

1406

-+	if (rq->nr_running < 2)

1407

-+		tick_nohz_dep_clear_cpu(cpu, TICK_DEP_BIT_SCHED);

1408

-+	else

1409

-+		tick_nohz_dep_set_cpu(cpu, TICK_DEP_BIT_SCHED);

1410

-+}

1411

-+#else /* !CONFIG_NO_HZ_FULL */

1412

-+static inline void sched_update_tick_dependency(struct rq *rq) { }

1413

-+#endif

1414

-+

1415

-+bool sched_task_on_rq(struct task_struct *p)

1416

-+{

1417

-+	return task_on_rq_queued(p);

1418

-+}

1419

-+

1420

-+/*

1421

-+ * Add/Remove/Requeue task to/from the runqueue routines

1422

-+ * Context: rq->lock

1423

-+ */

1424

-+#define __SCHED_DEQUEUE_TASK(p, rq, flags, func)		\

1425

-+	psi_dequeue(p, flags & DEQUEUE_SLEEP);			\

1426

-+	sched_info_dequeue(rq, p);				\

1427

-+								\

1428

-+	list_del(&p->sq_node);					\

1429

-+	if (list_empty(&rq->queue.heads[p->sq_idx])) {		\

1430

-+		clear_bit(sched_idx2prio(p->sq_idx, rq),	\

1431

-+			  rq->queue.bitmap);			\

1432

-+		func;						\

1433

-+	}

1434

-+

1435

-+#define __SCHED_ENQUEUE_TASK(p, rq, flags)				\

1436

-+	sched_info_enqueue(rq, p);					\

1437

-+	psi_enqueue(p, flags);						\

1438

-+									\

1439

-+	p->sq_idx = task_sched_prio_idx(p, rq);				\

1440

-+	list_add_tail(&p->sq_node, &rq->queue.heads[p->sq_idx]);	\

1441

-+	set_bit(sched_idx2prio(p->sq_idx, rq), rq->queue.bitmap);

1442

-+

1443

-+static inline void dequeue_task(struct task_struct *p, struct rq *rq, int flags)

1444

-+{

1445

-+	lockdep_assert_held(&rq->lock);

1446

-+

1447

-+	/*printk(KERN_INFO "sched: dequeue(%d) %px %016llx\n", cpu_of(rq), p, p->priodl);*/

1448

-+	WARN_ONCE(task_rq(p) != rq, "sched: dequeue task reside on cpu%d from cpu%d\n",

1449

-+		  task_cpu(p), cpu_of(rq));

1450

-+

1451

-+	__SCHED_DEQUEUE_TASK(p, rq, flags, update_sched_rq_watermark(rq));

1452

-+	--rq->nr_running;

1453

-+#ifdef CONFIG_SMP

1454

-+	if (1 == rq->nr_running)

1455

-+		cpumask_clear_cpu(cpu_of(rq), &sched_rq_pending_mask);

1456

-+#endif

1457

-+

1458

-+	sched_update_tick_dependency(rq);

1459

-+}

1460

-+

1461

-+static inline void enqueue_task(struct task_struct *p, struct rq *rq, int flags)

1462

-+{

1463

-+	lockdep_assert_held(&rq->lock);

1464

-+

1465

-+	/*printk(KERN_INFO "sched: enqueue(%d) %px %016llx\n", cpu_of(rq), p, p->priodl);*/

1466

-+	WARN_ONCE(task_rq(p) != rq, "sched: enqueue task reside on cpu%d to cpu%d\n",

1467

-+		  task_cpu(p), cpu_of(rq));

1468

-+

1469

-+	__SCHED_ENQUEUE_TASK(p, rq, flags);

1470

-+	update_sched_rq_watermark(rq);

1471

-+	++rq->nr_running;

1472

-+#ifdef CONFIG_SMP

1473

-+	if (2 == rq->nr_running)

1474

-+		cpumask_set_cpu(cpu_of(rq), &sched_rq_pending_mask);

1475

-+#endif

1476

-+

1477

-+	sched_update_tick_dependency(rq);

1478

-+}

1479

-+

1480

-+static inline void requeue_task(struct task_struct *p, struct rq *rq)

1481

-+{

1482

-+	int idx;

1483

-+

1484

-+	lockdep_assert_held(&rq->lock);

1485

-+	/*printk(KERN_INFO "sched: requeue(%d) %px %016llx\n", cpu_of(rq), p, p->priodl);*/

1486

-+	WARN_ONCE(task_rq(p) != rq, "sched: cpu[%d] requeue task reside on cpu%d\n",

1487

-+		  cpu_of(rq), task_cpu(p));

1488

-+

1489

-+	idx = task_sched_prio_idx(p, rq);

1490

-+

1491

-+	list_del(&p->sq_node);

1492

-+	list_add_tail(&p->sq_node, &rq->queue.heads[idx]);

1493

-+	if (idx != p->sq_idx) {

1494

-+		if (list_empty(&rq->queue.heads[p->sq_idx]))

1495

-+			clear_bit(sched_idx2prio(p->sq_idx, rq),

1496

-+				  rq->queue.bitmap);

1497

-+		p->sq_idx = idx;

1498

-+		set_bit(sched_idx2prio(p->sq_idx, rq), rq->queue.bitmap);

1499

-+		update_sched_rq_watermark(rq);

1500

-+	}

1501

-+}

1502

-+

1503

-+/*

1504

-+ * cmpxchg based fetch_or, macro so it works for different integer types

1505

-+ */

1506

-+#define fetch_or(ptr, mask)						\

1507

-+	({								\

1508

-+		typeof(ptr) _ptr = (ptr);				\

1509

-+		typeof(mask) _mask = (mask);				\

1510

-+		typeof(*_ptr) _old, _val = *_ptr;			\

1511

-+									\

1512

-+		for (;;) {						\

1513

-+			_old = cmpxchg(_ptr, _val, _val | _mask);	\

1514

-+			if (_old == _val)				\

1515

-+				break;					\

1516

-+			_val = _old;					\

1517

-+		}							\

1518

-+	_old;								\

1519

-+})

1520

-+

1521

-+#if defined(CONFIG_SMP) && defined(TIF_POLLING_NRFLAG)

1522

-+/*

1523

-+ * Atomically set TIF_NEED_RESCHED and test for TIF_POLLING_NRFLAG,

1524

-+ * this avoids any races wrt polling state changes and thereby avoids

1525

-+ * spurious IPIs.

1526

-+ */

1527

-+static bool set_nr_and_not_polling(struct task_struct *p)

1528

-+{

1529

-+	struct thread_info *ti = task_thread_info(p);

1530

-+	return !(fetch_or(&ti->flags, _TIF_NEED_RESCHED) & _TIF_POLLING_NRFLAG);

1531

-+}

1532

-+

1533

-+/*

1534

-+ * Atomically set TIF_NEED_RESCHED if TIF_POLLING_NRFLAG is set.

1535

-+ *

1536

-+ * If this returns true, then the idle task promises to call

1537

-+ * sched_ttwu_pending() and reschedule soon.

1538

-+ */

1539

-+static bool set_nr_if_polling(struct task_struct *p)

1540

-+{

1541

-+	struct thread_info *ti = task_thread_info(p);

1542

-+	typeof(ti->flags) old, val = READ_ONCE(ti->flags);

1543

-+

1544

-+	for (;;) {

1545

-+		if (!(val & _TIF_POLLING_NRFLAG))

1546

-+			return false;

1547

-+		if (val & _TIF_NEED_RESCHED)

1548

-+			return true;

1549

-+		old = cmpxchg(&ti->flags, val, val | _TIF_NEED_RESCHED);

1550

-+		if (old == val)

1551

-+			break;

1552

-+		val = old;

1553

-+	}

1554

-+	return true;

1555

-+}

1556

-+

1557

-+#else

1558

-+static bool set_nr_and_not_polling(struct task_struct *p)

1559

-+{

1560

-+	set_tsk_need_resched(p);

1561

-+	return true;

1562

-+}

1563

-+

1564

-+#ifdef CONFIG_SMP

1565

-+static bool set_nr_if_polling(struct task_struct *p)

1566

-+{

1567

-+	return false;

1568

-+}

1569

-+#endif

1570

-+#endif

1571

-+

1572

-+static bool __wake_q_add(struct wake_q_head *head, struct task_struct *task)

1573

-+{

1574

-+	struct wake_q_node *node = &task->wake_q;

1575

-+

1576

-+	/*

1577

-+	 * Atomically grab the task, if ->wake_q is !nil already it means

1578

-+	 * it's already queued (either by us or someone else) and will get the

1579

-+	 * wakeup due to that.

1580

-+	 *

1581

-+	 * In order to ensure that a pending wakeup will observe our pending

1582

-+	 * state, even in the failed case, an explicit smp_mb() must be used.

1583

-+	 */

1584

-+	smp_mb__before_atomic();

1585

-+	if (unlikely(cmpxchg_relaxed(&node->next, NULL, WAKE_Q_TAIL)))

1586

-+		return false;

1587

-+

1588

-+	/*

1589

-+	 * The head is context local, there can be no concurrency.

1590

-+	 */

1591

-+	*head->lastp = node;

1592

-+	head->lastp = &node->next;

1593

-+	return true;

1594

-+}

1595

-+

1596

-+/**

1597

-+ * wake_q_add() - queue a wakeup for 'later' waking.

1598

-+ * @head: the wake_q_head to add @task to

1599

-+ * @task: the task to queue for 'later' wakeup

1600

-+ *

1601

-+ * Queue a task for later wakeup, most likely by the wake_up_q() call in the

1602

-+ * same context, _HOWEVER_ this is not guaranteed, the wakeup can come

1603

-+ * instantly.

1604

-+ *

1605

-+ * This function must be used as-if it were wake_up_process(); IOW the task

1606

-+ * must be ready to be woken at this location.

1607

-+ */

1608

-+void wake_q_add(struct wake_q_head *head, struct task_struct *task)

1609

-+{

1610

-+	if (__wake_q_add(head, task))

1611

-+		get_task_struct(task);

1612

-+}

1613

-+

1614

-+/**

1615

-+ * wake_q_add_safe() - safely queue a wakeup for 'later' waking.

1616

-+ * @head: the wake_q_head to add @task to

1617

-+ * @task: the task to queue for 'later' wakeup

1618

-+ *

1619

-+ * Queue a task for later wakeup, most likely by the wake_up_q() call in the

1620

-+ * same context, _HOWEVER_ this is not guaranteed, the wakeup can come

1621

-+ * instantly.

1622

-+ *

1623

-+ * This function must be used as-if it were wake_up_process(); IOW the task

1624

-+ * must be ready to be woken at this location.

1625

-+ *

1626

-+ * This function is essentially a task-safe equivalent to wake_q_add(). Callers

1627

-+ * that already hold reference to @task can call the 'safe' version and trust

1628

-+ * wake_q to do the right thing depending whether or not the @task is already

1629

-+ * queued for wakeup.

1630

-+ */

1631

-+void wake_q_add_safe(struct wake_q_head *head, struct task_struct *task)

1632

-+{

1633

-+	if (!__wake_q_add(head, task))

1634

-+		put_task_struct(task);

1635

-+}

1636

-+

1637

-+void wake_up_q(struct wake_q_head *head)

1638

-+{

1639

-+	struct wake_q_node *node = head->first;

1640

-+

1641

-+	while (node != WAKE_Q_TAIL) {

1642

-+		struct task_struct *task;

1643

-+

1644

-+		task = container_of(node, struct task_struct, wake_q);

1645

-+		/* task can safely be re-inserted now: */

1646

-+		node = node->next;

1647

-+		task->wake_q.next = NULL;

1648

-+

1649

-+		/*

1650

-+		 * wake_up_process() executes a full barrier, which pairs with

1651

-+		 * the queueing in wake_q_add() so as not to miss wakeups.

1652

-+		 */

1653

-+		wake_up_process(task);

1654

-+		put_task_struct(task);

1655

-+	}

1656

-+}

1657

-+

1658

-+/*

1659

-+ * resched_curr - mark rq's current task 'to be rescheduled now'.

1660

-+ *

1661

-+ * On UP this means the setting of the need_resched flag, on SMP it

1662

-+ * might also involve a cross-CPU call to trigger the scheduler on

1663

-+ * the target CPU.

1664

-+ */

1665

-+void resched_curr(struct rq *rq)

1666

-+{

1667

-+	struct task_struct *curr = rq->curr;

1668

-+	int cpu;

1669

-+

1670

-+	lockdep_assert_held(&rq->lock);

1671

-+

1672

-+	if (test_tsk_need_resched(curr))

1673

-+		return;

1674

-+

1675

-+	cpu = cpu_of(rq);

1676

-+	if (cpu == smp_processor_id()) {

1677

-+		set_tsk_need_resched(curr);

1678

-+		set_preempt_need_resched();

1679

-+		return;

1680

-+	}

1681

-+

1682

-+	if (set_nr_and_not_polling(curr))

1683

-+		smp_send_reschedule(cpu);

1684

-+	else

1685

-+		trace_sched_wake_idle_without_ipi(cpu);

1686

-+}

1687

-+

1688

-+void resched_cpu(int cpu)

1689

-+{

1690

-+	struct rq *rq = cpu_rq(cpu);

1691

-+	unsigned long flags;

1692

-+

1693

-+	raw_spin_lock_irqsave(&rq->lock, flags);

1694

-+	if (cpu_online(cpu) || cpu == smp_processor_id())

1695

-+		resched_curr(cpu_rq(cpu));

1696

-+	raw_spin_unlock_irqrestore(&rq->lock, flags);

1697

-+}

1698

-+

1699

-+#ifdef CONFIG_SMP

1700

-+#ifdef CONFIG_NO_HZ_COMMON

1701

-+void nohz_balance_enter_idle(int cpu) {}

1702

-+

1703

-+void select_nohz_load_balancer(int stop_tick) {}

1704

-+

1705

-+void set_cpu_sd_state_idle(void) {}

1706

-+

1707

-+/*

1708

-+ * In the semi idle case, use the nearest busy CPU for migrating timers

1709

-+ * from an idle CPU.  This is good for power-savings.

1710

-+ *

1711

-+ * We don't do similar optimization for completely idle system, as

1712

-+ * selecting an idle CPU will add more delays to the timers than intended

1713

-+ * (as that CPU's timer base may not be uptodate wrt jiffies etc).

1714

-+ */

1715

-+int get_nohz_timer_target(void)

1716

-+{

1717

-+	int i, cpu = smp_processor_id(), default_cpu = -1;

1718

-+	struct cpumask *mask;

1719

-+	const struct cpumask *hk_mask;

1720

-+

1721

-+	if (housekeeping_cpu(cpu, HK_FLAG_TIMER)) {

1722

-+		if (!idle_cpu(cpu))

1723

-+			return cpu;

1724

-+		default_cpu = cpu;

1725

-+	}

1726

-+

1727

-+	hk_mask = housekeeping_cpumask(HK_FLAG_TIMER);

1728

-+

1729

-+	for (mask = per_cpu(sched_cpu_topo_masks, cpu) + 1;

1730

-+	     mask < per_cpu(sched_cpu_topo_end_mask, cpu); mask++)

1731

-+		for_each_cpu_and(i, mask, hk_mask)

1732

-+			if (!idle_cpu(i))

1733

-+				return i;

1734

-+

1735

-+	if (default_cpu == -1)

1736

-+		default_cpu = housekeeping_any_cpu(HK_FLAG_TIMER);

1737

-+	cpu = default_cpu;

1738

-+

1739

-+	return cpu;

1740

-+}

1741

-+

1742

-+/*

1743

-+ * When add_timer_on() enqueues a timer into the timer wheel of an

1744

-+ * idle CPU then this timer might expire before the next timer event

1745

-+ * which is scheduled to wake up that CPU. In case of a completely

1746

-+ * idle system the next event might even be infinite time into the

1747

-+ * future. wake_up_idle_cpu() ensures that the CPU is woken up and

1748

-+ * leaves the inner idle loop so the newly added timer is taken into

1749

-+ * account when the CPU goes back to idle and evaluates the timer

1750

-+ * wheel for the next timer event.

1751

-+ */

1752

-+static inline void wake_up_idle_cpu(int cpu)

1753

-+{

1754

-+	struct rq *rq = cpu_rq(cpu);

1755

-+

1756

-+	if (cpu == smp_processor_id())

1757

-+		return;

1758

-+

1759

-+	if (set_nr_and_not_polling(rq->idle))

1760

-+		smp_send_reschedule(cpu);

1761

-+	else

1762

-+		trace_sched_wake_idle_without_ipi(cpu);

1763

-+}

1764

-+

1765

-+static inline bool wake_up_full_nohz_cpu(int cpu)

1766

-+{

1767

-+	/*

1768

-+	 * We just need the target to call irq_exit() and re-evaluate

1769

-+	 * the next tick. The nohz full kick at least implies that.

1770

-+	 * If needed we can still optimize that later with an

1771

-+	 * empty IRQ.

1772

-+	 */

1773

-+	if (cpu_is_offline(cpu))

1774

-+		return true;  /* Don't try to wake offline CPUs. */

1775

-+	if (tick_nohz_full_cpu(cpu)) {

1776

-+		if (cpu != smp_processor_id() ||

1777

-+		    tick_nohz_tick_stopped())

1778

-+			tick_nohz_full_kick_cpu(cpu);

1779

-+		return true;

1780

-+	}

1781

-+

1782

-+	return false;

1783

-+}

1784

-+

1785

-+void wake_up_nohz_cpu(int cpu)

1786

-+{

1787

-+	if (!wake_up_full_nohz_cpu(cpu))

1788

-+		wake_up_idle_cpu(cpu);

1789

-+}

1790

-+

1791

-+static void nohz_csd_func(void *info)

1792

-+{

1793

-+	struct rq *rq = info;

1794

-+	int cpu = cpu_of(rq);

1795

-+	unsigned int flags;

1796

-+

1797

-+	/*

1798

-+	 * Release the rq::nohz_csd.

1799

-+	 */

1800

-+	flags = atomic_fetch_andnot(NOHZ_KICK_MASK, nohz_flags(cpu));

1801

-+	WARN_ON(!(flags & NOHZ_KICK_MASK));

1802

-+

1803

-+	rq->idle_balance = idle_cpu(cpu);

1804

-+	if (rq->idle_balance && !need_resched()) {

1805

-+		rq->nohz_idle_balance = flags;

1806

-+		raise_softirq_irqoff(SCHED_SOFTIRQ);

1807

-+	}

1808

-+}

1809

-+

1810

-+#endif /* CONFIG_NO_HZ_COMMON */

1811

-+#endif /* CONFIG_SMP */

1812

-+

1813

-+static inline void check_preempt_curr(struct rq *rq)

1814

-+{

1815

-+	if (sched_rq_first_task(rq) != rq->curr)

1816

-+		resched_curr(rq);

1817

-+}

1818

-+

1819

-+#ifdef CONFIG_SCHED_HRTICK

1820

-+/*

1821

-+ * Use HR-timers to deliver accurate preemption points.

1822

-+ */

1823

-+

1824

-+static void hrtick_clear(struct rq *rq)

1825

-+{

1826

-+	if (hrtimer_active(&rq->hrtick_timer))

1827

-+		hrtimer_cancel(&rq->hrtick_timer);

1828

-+}

1829

-+

1830

-+/*

1831

-+ * High-resolution timer tick.

1832

-+ * Runs from hardirq context with interrupts disabled.

1833

-+ */

1834

-+static enum hrtimer_restart hrtick(struct hrtimer *timer)

1835

-+{

1836

-+	struct rq *rq = container_of(timer, struct rq, hrtick_timer);

1837

-+

1838

-+	WARN_ON_ONCE(cpu_of(rq) != smp_processor_id());

1839

-+

1840

-+	raw_spin_lock(&rq->lock);

1841

-+	resched_curr(rq);

1842

-+	raw_spin_unlock(&rq->lock);

1843

-+

1844

-+	return HRTIMER_NORESTART;

1845

-+}

1846

-+

1847

-+/*

1848

-+ * Use hrtick when:

1849

-+ *  - enabled by features

1850

-+ *  - hrtimer is actually high res

1851

-+ */

1852

-+static inline int hrtick_enabled(struct rq *rq)

1853

-+{

1854

-+	/**

1855

-+	 * Alt schedule FW doesn't support sched_feat yet

1856

-+	if (!sched_feat(HRTICK))

1857

-+		return 0;

1858

-+	*/

1859

-+	if (!cpu_active(cpu_of(rq)))

1860

-+		return 0;

1861

-+	return hrtimer_is_hres_active(&rq->hrtick_timer);

1862

-+}

1863

-+

1864

-+#ifdef CONFIG_SMP

1865

-+

1866

-+static void __hrtick_restart(struct rq *rq)

1867

-+{

1868

-+	struct hrtimer *timer = &rq->hrtick_timer;

1869

-+	ktime_t time = rq->hrtick_time;

1870

-+

1871

-+	hrtimer_start(timer, time, HRTIMER_MODE_ABS_PINNED_HARD);

1872

-+}

1873

-+

1874

-+/*

1875

-+ * called from hardirq (IPI) context

1876

-+ */

1877

-+static void __hrtick_start(void *arg)

1878

-+{

1879

-+	struct rq *rq = arg;

1880

-+

1881

-+	raw_spin_lock(&rq->lock);

1882

-+	__hrtick_restart(rq);

1883

-+	raw_spin_unlock(&rq->lock);

1884

-+}

1885

-+

1886

-+/*

1887

-+ * Called to set the hrtick timer state.

1888

-+ *

1889

-+ * called with rq->lock held and irqs disabled

1890

-+ */

1891

-+void hrtick_start(struct rq *rq, u64 delay)

1892

-+{

1893

-+	struct hrtimer *timer = &rq->hrtick_timer;

1894

-+	s64 delta;

1895

-+

1896

-+	/*

1897

-+	 * Don't schedule slices shorter than 10000ns, that just

1898

-+	 * doesn't make sense and can cause timer DoS.

1899

-+	 */

1900

-+	delta = max_t(s64, delay, 10000LL);

1901

-+

1902

-+	rq->hrtick_time = ktime_add_ns(timer->base->get_time(), delta);

1903

-+

1904

-+	if (rq == this_rq())

1905

-+		__hrtick_restart(rq);

1906

-+	else

1907

-+		smp_call_function_single_async(cpu_of(rq), &rq->hrtick_csd);

1908

-+}

1909

-+

1910

-+#else

1911

-+/*

1912

-+ * Called to set the hrtick timer state.

1913

-+ *

1914

-+ * called with rq->lock held and irqs disabled

1915

-+ */

1916

-+void hrtick_start(struct rq *rq, u64 delay)

1917

-+{

1918

-+	/*

1919

-+	 * Don't schedule slices shorter than 10000ns, that just

1920

-+	 * doesn't make sense. Rely on vruntime for fairness.

1921

-+	 */

1922

-+	delay = max_t(u64, delay, 10000LL);

1923

-+	hrtimer_start(&rq->hrtick_timer, ns_to_ktime(delay),

1924

-+		      HRTIMER_MODE_REL_PINNED_HARD);

1925

-+}

1926

-+#endif /* CONFIG_SMP */

1927

-+

1928

-+static void hrtick_rq_init(struct rq *rq)

1929

-+{

1930

-+#ifdef CONFIG_SMP

1931

-+	INIT_CSD(&rq->hrtick_csd, __hrtick_start, rq);

1932

-+#endif

1933

-+

1934

-+	hrtimer_init(&rq->hrtick_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_HARD);

1935

-+	rq->hrtick_timer.function = hrtick;

1936

-+}

1937

-+#else	/* CONFIG_SCHED_HRTICK */

1938

-+static inline int hrtick_enabled(struct rq *rq)

1939

-+{

1940

-+	return 0;

1941

-+}

1942

-+

1943

-+static inline void hrtick_clear(struct rq *rq)

1944

-+{

1945

-+}

1946

-+

1947

-+static inline void hrtick_rq_init(struct rq *rq)

1948

-+{

1949

-+}

1950

-+#endif	/* CONFIG_SCHED_HRTICK */

1951

-+

1952

-+static inline int __normal_prio(int policy, int rt_prio, int static_prio)

1953

-+{

1954

-+	return rt_policy(policy) ? (MAX_RT_PRIO - 1 - rt_prio) :

1955

-+		static_prio + MAX_PRIORITY_ADJ;

1956

-+}

1957

-+

1958

-+/*

1959

-+ * Calculate the expected normal priority: i.e. priority

1960

-+ * without taking RT-inheritance into account. Might be

1961

-+ * boosted by interactivity modifiers. Changes upon fork,

1962

-+ * setprio syscalls, and whenever the interactivity

1963

-+ * estimator recalculates.

1964

-+ */

1965

-+static inline int normal_prio(struct task_struct *p)

1966

-+{

1967

-+	return __normal_prio(p->policy, p->rt_priority, p->static_prio);

1968

-+}

1969

-+

1970

-+/*

1971

-+ * Calculate the current priority, i.e. the priority

1972

-+ * taken into account by the scheduler. This value might

1973

-+ * be boosted by RT tasks as it will be RT if the task got

1974

-+ * RT-boosted. If not then it returns p->normal_prio.

1975

-+ */

1976

-+static int effective_prio(struct task_struct *p)

1977

-+{

1978

-+	p->normal_prio = normal_prio(p);

1979

-+	/*

1980

-+	 * If we are RT tasks or we were boosted to RT priority,

1981

-+	 * keep the priority unchanged. Otherwise, update priority

1982

-+	 * to the normal priority:

1983

-+	 */

1984

-+	if (!rt_prio(p->prio))

1985

-+		return p->normal_prio;

1986

-+	return p->prio;

1987

-+}

1988

-+

1989

-+/*

1990

-+ * activate_task - move a task to the runqueue.

1991

-+ *

1992

-+ * Context: rq->lock

1993

-+ */

1994

-+static void activate_task(struct task_struct *p, struct rq *rq)

1995

-+{

1996

-+	enqueue_task(p, rq, ENQUEUE_WAKEUP);

1997

-+	p->on_rq = TASK_ON_RQ_QUEUED;

1998

-+

1999

-+	/*

2000

-+	 * If in_iowait is set, the code below may not trigger any cpufreq

2001

-+	 * utilization updates, so do it here explicitly with the IOWAIT flag

2002

-+	 * passed.

2003

-+	 */

2004

-+	cpufreq_update_util(rq, SCHED_CPUFREQ_IOWAIT * p->in_iowait);

2005

-+}

2006

-+

2007

-+/*

2008

-+ * deactivate_task - remove a task from the runqueue.

2009

-+ *

2010

-+ * Context: rq->lock

2011

-+ */

2012

-+static inline void deactivate_task(struct task_struct *p, struct rq *rq)

2013

-+{

2014

-+	dequeue_task(p, rq, DEQUEUE_SLEEP);

2015

-+	p->on_rq = 0;

2016

-+	cpufreq_update_util(rq, 0);

2017

-+}

2018

-+

2019

-+static inline void __set_task_cpu(struct task_struct *p, unsigned int cpu)

2020

-+{

2021

-+#ifdef CONFIG_SMP

2022

-+	/*

2023

-+	 * After ->cpu is set up to a new value, task_access_lock(p, ...) can be

2024

-+	 * successfully executed on another CPU. We must ensure that updates of

2025

-+	 * per-task data have been completed by this moment.

2026

-+	 */

2027

-+	smp_wmb();

2028

-+

2029

-+#ifdef CONFIG_THREAD_INFO_IN_TASK

2030

-+	WRITE_ONCE(p->cpu, cpu);

2031

-+#else

2032

-+	WRITE_ONCE(task_thread_info(p)->cpu, cpu);

2033

-+#endif

2034

-+#endif

2035

-+}

2036

-+

2037

-+static inline bool is_migration_disabled(struct task_struct *p)

2038

-+{

2039

-+#ifdef CONFIG_SMP

2040

-+	return p->migration_disabled;

2041

-+#else

2042

-+	return false;

2043

-+#endif

2044

-+}

2045

-+

2046

-+#define SCA_CHECK		0x01

2047

-+#define SCA_USER		0x08

2048

-+

2049

-+#ifdef CONFIG_SMP

2050

-+

2051

-+void set_task_cpu(struct task_struct *p, unsigned int new_cpu)

2052

-+{

2053

-+#ifdef CONFIG_SCHED_DEBUG

2054

-+	unsigned int state = READ_ONCE(p->__state);

2055

-+

2056

-+	/*

2057

-+	 * We should never call set_task_cpu() on a blocked task,

2058

-+	 * ttwu() will sort out the placement.

2059

-+	 */

2060

-+	WARN_ON_ONCE(state != TASK_RUNNING && state != TASK_WAKING && !p->on_rq);

2061

-+

2062

-+#ifdef CONFIG_LOCKDEP

2063

-+	/*

2064

-+	 * The caller should hold either p->pi_lock or rq->lock, when changing

2065

-+	 * a task's CPU. ->pi_lock for waking tasks, rq->lock for runnable tasks.

2066

-+	 *

2067

-+	 * sched_move_task() holds both and thus holding either pins the cgroup,

2068

-+	 * see task_group().

2069

-+	 */

2070

-+	WARN_ON_ONCE(debug_locks && !(lockdep_is_held(&p->pi_lock) ||

2071

-+				      lockdep_is_held(&task_rq(p)->lock)));

2072

-+#endif

2073

-+	/*

2074

-+	 * Clearly, migrating tasks to offline CPUs is a fairly daft thing.

2075

-+	 */

2076

-+	WARN_ON_ONCE(!cpu_online(new_cpu));

2077

-+

2078

-+	WARN_ON_ONCE(is_migration_disabled(p));

2079

-+#endif

2080

-+	if (task_cpu(p) == new_cpu)

2081

-+		return;

2082

-+	trace_sched_migrate_task(p, new_cpu);

2083

-+	rseq_migrate(p);

2084

-+	perf_event_task_migrate(p);

2085

-+

2086

-+	__set_task_cpu(p, new_cpu);

2087

-+}

2088

-+

2089

-+#define MDF_FORCE_ENABLED	0x80

2090

-+

2091

-+static void

2092

-+__do_set_cpus_ptr(struct task_struct *p, const struct cpumask *new_mask)

2093

-+{

2094

-+	/*

2095

-+	 * This here violates the locking rules for affinity, since we're only

2096

-+	 * supposed to change these variables while holding both rq->lock and

2097

-+	 * p->pi_lock.

2098

-+	 *

2099

-+	 * HOWEVER, it magically works, because ttwu() is the only code that

2100

-+	 * accesses these variables under p->pi_lock and only does so after

2101

-+	 * smp_cond_load_acquire(&p->on_cpu, !VAL), and we're in __schedule()

2102

-+	 * before finish_task().

2103

-+	 *

2104

-+	 * XXX do further audits, this smells like something putrid.

2105

-+	 */

2106

-+	SCHED_WARN_ON(!p->on_cpu);

2107

-+	p->cpus_ptr = new_mask;

2108

-+}

2109

-+

2110

-+void migrate_disable(void)

2111

-+{

2112

-+	struct task_struct *p = current;

2113

-+	int cpu;

2114

-+

2115

-+	if (p->migration_disabled) {

2116

-+		p->migration_disabled++;

2117

-+		return;

2118

-+	}

2119

-+

2120

-+	preempt_disable();

2121

-+	cpu = smp_processor_id();

2122

-+	if (cpumask_test_cpu(cpu, &p->cpus_mask)) {

2123

-+		cpu_rq(cpu)->nr_pinned++;

2124

-+		p->migration_disabled = 1;

2125

-+		p->migration_flags &= ~MDF_FORCE_ENABLED;

2126

-+

2127

-+		/*

2128

-+		 * Violates locking rules! see comment in __do_set_cpus_ptr().

2129

-+		 */

2130

-+		if (p->cpus_ptr == &p->cpus_mask)

2131

-+			__do_set_cpus_ptr(p, cpumask_of(cpu));

2132

-+	}

2133

-+	preempt_enable();

2134

-+}

2135

-+EXPORT_SYMBOL_GPL(migrate_disable);

2136

-+

2137

-+void migrate_enable(void)

2138

-+{

2139

-+	struct task_struct *p = current;

2140

-+

2141

-+	if (0 == p->migration_disabled)

2142

-+		return;

2143

-+

2144

-+	if (p->migration_disabled > 1) {

2145

-+		p->migration_disabled--;

2146

-+		return;

2147

-+	}

2148

-+

2149

-+	/*

2150

-+	 * Ensure stop_task runs either before or after this, and that

2151

-+	 * __set_cpus_allowed_ptr(SCA_MIGRATE_ENABLE) doesn't schedule().

2152

-+	 */

2153

-+	preempt_disable();

2154

-+	/*

2155

-+	 * Assumption: current should be running on allowed cpu

2156

-+	 */

2157

-+	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &p->cpus_mask));

2158

-+	if (p->cpus_ptr != &p->cpus_mask)

2159

-+		__do_set_cpus_ptr(p, &p->cpus_mask);

2160

-+	/*

2161

-+	 * Mustn't clear migration_disabled() until cpus_ptr points back at the

2162

-+	 * regular cpus_mask, otherwise things that race (eg.

2163

-+	 * select_fallback_rq) get confused.

2164

-+	 */

2165

-+	barrier();

2166

-+	p->migration_disabled = 0;

2167

-+	this_rq()->nr_pinned--;

2168

-+	preempt_enable();

2169

-+}

2170

-+EXPORT_SYMBOL_GPL(migrate_enable);

2171

-+

2172

-+static inline bool rq_has_pinned_tasks(struct rq *rq)

2173

-+{

2174

-+	return rq->nr_pinned;

2175

-+}

2176

-+

2177

-+/*

2178

-+ * Per-CPU kthreads are allowed to run on !active && online CPUs, see

2179

-+ * __set_cpus_allowed_ptr() and select_fallback_rq().

2180

-+ */

2181

-+static inline bool is_cpu_allowed(struct task_struct *p, int cpu)

2182

-+{

2183

-+	/* When not in the task's cpumask, no point in looking further. */

2184

-+	if (!cpumask_test_cpu(cpu, p->cpus_ptr))

2185

-+		return false;

2186

-+

2187

-+	/* migrate_disabled() must be allowed to finish. */

2188

-+	if (is_migration_disabled(p))

2189

-+		return cpu_online(cpu);

2190

-+

2191

-+	/* Non kernel threads are not allowed during either online or offline. */

2192

-+	if (!(p->flags & PF_KTHREAD))

2193

-+		return cpu_active(cpu) && task_cpu_possible(cpu, p);

2194

-+

2195

-+	/* KTHREAD_IS_PER_CPU is always allowed. */

2196

-+	if (kthread_is_per_cpu(p))

2197

-+		return cpu_online(cpu);

2198

-+

2199

-+	/* Regular kernel threads don't get to stay during offline. */

2200

-+	if (cpu_dying(cpu))

2201

-+		return false;

2202

-+

2203

-+	/* But are allowed during online. */

2204

-+	return cpu_online(cpu);

2205

-+}

2206

-+

2207

-+/*

2208

-+ * This is how migration works:

2209

-+ *

2210

-+ * 1) we invoke migration_cpu_stop() on the target CPU using

2211

-+ *    stop_one_cpu().

2212

-+ * 2) stopper starts to run (implicitly forcing the migrated thread

2213

-+ *    off the CPU)

2214

-+ * 3) it checks whether the migrated task is still in the wrong runqueue.

2215

-+ * 4) if it's in the wrong runqueue then the migration thread removes

2216

-+ *    it and puts it into the right queue.

2217

-+ * 5) stopper completes and stop_one_cpu() returns and the migration

2218

-+ *    is done.

2219

-+ */

2220

-+

2221

-+/*

2222

-+ * move_queued_task - move a queued task to new rq.

2223

-+ *

2224

-+ * Returns (locked) new rq. Old rq's lock is released.

2225

-+ */

2226

-+static struct rq *move_queued_task(struct rq *rq, struct task_struct *p, int

2227

-+				   new_cpu)

2228

-+{

2229

-+	lockdep_assert_held(&rq->lock);

2230

-+

2231

-+	WRITE_ONCE(p->on_rq, TASK_ON_RQ_MIGRATING);

2232

-+	dequeue_task(p, rq, 0);

2233

-+	set_task_cpu(p, new_cpu);

2234

-+	raw_spin_unlock(&rq->lock);

2235

-+

2236

-+	rq = cpu_rq(new_cpu);

2237

-+

2238

-+	raw_spin_lock(&rq->lock);

2239

-+	BUG_ON(task_cpu(p) != new_cpu);

2240

-+	sched_task_sanity_check(p, rq);

2241

-+	enqueue_task(p, rq, 0);

2242

-+	p->on_rq = TASK_ON_RQ_QUEUED;

2243

-+	check_preempt_curr(rq);

2244

-+

2245

-+	return rq;

2246

-+}

2247

-+

2248

-+struct migration_arg {

2249

-+	struct task_struct *task;

2250

-+	int dest_cpu;

2251

-+};

2252

-+

2253

-+/*

2254

-+ * Move (not current) task off this CPU, onto the destination CPU. We're doing

2255

-+ * this because either it can't run here any more (set_cpus_allowed()

2256

-+ * away from this CPU, or CPU going down), or because we're

2257

-+ * attempting to rebalance this task on exec (sched_exec).

2258

-+ *

2259

-+ * So we race with normal scheduler movements, but that's OK, as long

2260

-+ * as the task is no longer on this CPU.

2261

-+ */

2262

-+static struct rq *__migrate_task(struct rq *rq, struct task_struct *p, int

2263

-+				 dest_cpu)

2264

-+{

2265

-+	/* Affinity changed (again). */

2266

-+	if (!is_cpu_allowed(p, dest_cpu))

2267

-+		return rq;

2268

-+

2269

-+	update_rq_clock(rq);

2270

-+	return move_queued_task(rq, p, dest_cpu);

2271

-+}

2272

-+

2273

-+/*

2274

-+ * migration_cpu_stop - this will be executed by a highprio stopper thread

2275

-+ * and performs thread migration by bumping thread off CPU then

2276

-+ * 'pushing' onto another runqueue.

2277

-+ */

2278

-+static int migration_cpu_stop(void *data)

2279

-+{

2280

-+	struct migration_arg *arg = data;

2281

-+	struct task_struct *p = arg->task;

2282

-+	struct rq *rq = this_rq();

2283

-+	unsigned long flags;

2284

-+

2285

-+	/*

2286

-+	 * The original target CPU might have gone down and we might

2287

-+	 * be on another CPU but it doesn't matter.

2288

-+	 */

2289

-+	local_irq_save(flags);

2290

-+	/*

2291

-+	 * We need to explicitly wake pending tasks before running

2292

-+	 * __migrate_task() such that we will not miss enforcing cpus_ptr

2293

-+	 * during wakeups, see set_cpus_allowed_ptr()'s TASK_WAKING test.

2294

-+	 */

2295

-+	flush_smp_call_function_from_idle();

2296

-+

2297

-+	raw_spin_lock(&p->pi_lock);

2298

-+	raw_spin_lock(&rq->lock);

2299

-+	/*

2300

-+	 * If task_rq(p) != rq, it cannot be migrated here, because we're

2301

-+	 * holding rq->lock, if p->on_rq == 0 it cannot get enqueued because

2302

-+	 * we're holding p->pi_lock.

2303

-+	 */

2304

-+	if (task_rq(p) == rq && task_on_rq_queued(p))

2305

-+		rq = __migrate_task(rq, p, arg->dest_cpu);

2306

-+	raw_spin_unlock(&rq->lock);

2307

-+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);

2308

-+

2309

-+	return 0;

2310

-+}

2311

-+

2312

-+static inline void

2313

-+set_cpus_allowed_common(struct task_struct *p, const struct cpumask *new_mask)

2314

-+{

2315

-+	cpumask_copy(&p->cpus_mask, new_mask);

2316

-+	p->nr_cpus_allowed = cpumask_weight(new_mask);

2317

-+}

2318

-+

2319

-+static void

2320

-+__do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)

2321

-+{

2322

-+	lockdep_assert_held(&p->pi_lock);

2323

-+	set_cpus_allowed_common(p, new_mask);

2324

-+}

2325

-+

2326

-+void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)

2327

-+{

2328

-+	__do_set_cpus_allowed(p, new_mask);

2329

-+}

2330

-+

2331

-+int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src,

2332

-+		      int node)

2333

-+{

2334

-+	if (!src->user_cpus_ptr)

2335

-+		return 0;

2336

-+

2337

-+	dst->user_cpus_ptr = kmalloc_node(cpumask_size(), GFP_KERNEL, node);

2338

-+	if (!dst->user_cpus_ptr)

2339

-+		return -ENOMEM;

2340

-+

2341

-+	cpumask_copy(dst->user_cpus_ptr, src->user_cpus_ptr);

2342

-+	return 0;

2343

-+}

2344

-+

2345

-+static inline struct cpumask *clear_user_cpus_ptr(struct task_struct *p)

2346

-+{

2347

-+	struct cpumask *user_mask = NULL;

2348

-+

2349

-+	swap(p->user_cpus_ptr, user_mask);

2350

-+

2351

-+	return user_mask;

2352

-+}

2353

-+

2354

-+void release_user_cpus_ptr(struct task_struct *p)

2355

-+{

2356

-+	kfree(clear_user_cpus_ptr(p));

2357

-+}

2358

-+

2359

-+#endif

2360

-+

2361

-+/**

2362

-+ * task_curr - is this task currently executing on a CPU?

2363

-+ * @p: the task in question.

2364

-+ *

2365

-+ * Return: 1 if the task is currently executing. 0 otherwise.

2366

-+ */

2367

-+inline int task_curr(const struct task_struct *p)

2368

-+{

2369

-+	return cpu_curr(task_cpu(p)) == p;

2370

-+}

2371

-+

2372

-+#ifdef CONFIG_SMP

2373

-+/*

2374

-+ * wait_task_inactive - wait for a thread to unschedule.

2375

-+ *

2376

-+ * If @match_state is nonzero, it's the @p->state value just checked and

2377

-+ * not expected to change.  If it changes, i.e. @p might have woken up,

2378

-+ * then return zero.  When we succeed in waiting for @p to be off its CPU,

2379

-+ * we return a positive number (its total switch count).  If a second call

2380

-+ * a short while later returns the same number, the caller can be sure that

2381

-+ * @p has remained unscheduled the whole time.

2382

-+ *

2383

-+ * The caller must ensure that the task *will* unschedule sometime soon,

2384

-+ * else this function might spin for a *long* time. This function can't

2385

-+ * be called with interrupts off, or it may introduce deadlock with

2386

-+ * smp_call_function() if an IPI is sent by the same process we are

2387

-+ * waiting to become inactive.

2388

-+ */

2389

-+unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state)

2390

-+{

2391

-+	unsigned long flags;

2392

-+	bool running, on_rq;

2393

-+	unsigned long ncsw;

2394

-+	struct rq *rq;

2395

-+	raw_spinlock_t *lock;

2396

-+

2397

-+	for (;;) {

2398

-+		rq = task_rq(p);

2399

-+

2400

-+		/*

2401

-+		 * If the task is actively running on another CPU

2402

-+		 * still, just relax and busy-wait without holding

2403

-+		 * any locks.

2404

-+		 *

2405

-+		 * NOTE! Since we don't hold any locks, it's not

2406

-+		 * even sure that "rq" stays as the right runqueue!

2407

-+		 * But we don't care, since this will return false

2408

-+		 * if the runqueue has changed and p is actually now

2409

-+		 * running somewhere else!

2410

-+		 */

2411

-+		while (task_running(p) && p == rq->curr) {

2412

-+			if (match_state && unlikely(READ_ONCE(p->__state) != match_state))

2413

-+				return 0;

2414

-+			cpu_relax();

2415

-+		}

2416

-+

2417

-+		/*

2418

-+		 * Ok, time to look more closely! We need the rq

2419

-+		 * lock now, to be *sure*. If we're wrong, we'll

2420

-+		 * just go back and repeat.

2421

-+		 */

2422

-+		task_access_lock_irqsave(p, &lock, &flags);

2423

-+		trace_sched_wait_task(p);

2424

-+		running = task_running(p);

2425

-+		on_rq = p->on_rq;

2426

-+		ncsw = 0;

2427

-+		if (!match_state || READ_ONCE(p->__state) == match_state)

2428

-+			ncsw = p->nvcsw | LONG_MIN; /* sets MSB */

2429

-+		task_access_unlock_irqrestore(p, lock, &flags);

2430

-+

2431

-+		/*

2432

-+		 * If it changed from the expected state, bail out now.

2433

-+		 */

2434

-+		if (unlikely(!ncsw))

2435

-+			break;

2436

-+

2437

-+		/*

2438

-+		 * Was it really running after all now that we

2439

-+		 * checked with the proper locks actually held?

2440

-+		 *

2441

-+		 * Oops. Go back and try again..

2442

-+		 */

2443

-+		if (unlikely(running)) {

2444

-+			cpu_relax();

2445

-+			continue;

2446

-+		}

2447

-+

2448

-+		/*

2449

-+		 * It's not enough that it's not actively running,

2450

-+		 * it must be off the runqueue _entirely_, and not

2451

-+		 * preempted!

2452

-+		 *

2453

-+		 * So if it was still runnable (but just not actively

2454

-+		 * running right now), it's preempted, and we should

2455

-+		 * yield - it could be a while.

2456

-+		 */

2457

-+		if (unlikely(on_rq)) {

2458

-+			ktime_t to = NSEC_PER_SEC / HZ;

2459

-+

2460

-+			set_current_state(TASK_UNINTERRUPTIBLE);

2461

-+			schedule_hrtimeout(&to, HRTIMER_MODE_REL);

2462

-+			continue;

2463

-+		}

2464

-+

2465

-+		/*

2466

-+		 * Ahh, all good. It wasn't running, and it wasn't

2467

-+		 * runnable, which means that it will never become

2468

-+		 * running in the future either. We're all done!

2469

-+		 */

2470

-+		break;

2471

-+	}

2472

-+

2473

-+	return ncsw;

2474

-+}

2475

-+

2476

-+/***

2477

-+ * kick_process - kick a running thread to enter/exit the kernel

2478

-+ * @p: the to-be-kicked thread

2479

-+ *

2480

-+ * Cause a process which is running on another CPU to enter

2481

-+ * kernel-mode, without any delay. (to get signals handled.)

2482

-+ *

2483

-+ * NOTE: this function doesn't have to take the runqueue lock,

2484

-+ * because all it wants to ensure is that the remote task enters

2485

-+ * the kernel. If the IPI races and the task has been migrated

2486

-+ * to another CPU then no harm is done and the purpose has been

2487

-+ * achieved as well.

2488

-+ */

2489

-+void kick_process(struct task_struct *p)

2490

-+{

2491

-+	int cpu;

2492

-+

2493

-+	preempt_disable();

2494

-+	cpu = task_cpu(p);

2495

-+	if ((cpu != smp_processor_id()) && task_curr(p))

2496

-+		smp_send_reschedule(cpu);

2497

-+	preempt_enable();

2498

-+}

2499

-+EXPORT_SYMBOL_GPL(kick_process);

2500

-+

2501

-+/*

2502

-+ * ->cpus_ptr is protected by both rq->lock and p->pi_lock

2503

-+ *

2504

-+ * A few notes on cpu_active vs cpu_online:

2505

-+ *

2506

-+ *  - cpu_active must be a subset of cpu_online

2507

-+ *

2508

-+ *  - on CPU-up we allow per-CPU kthreads on the online && !active CPU,

2509

-+ *    see __set_cpus_allowed_ptr(). At this point the newly online

2510

-+ *    CPU isn't yet part of the sched domains, and balancing will not

2511

-+ *    see it.

2512

-+ *

2513

-+ *  - on cpu-down we clear cpu_active() to mask the sched domains and

2514

-+ *    avoid the load balancer to place new tasks on the to be removed

2515

-+ *    CPU. Existing tasks will remain running there and will be taken

2516

-+ *    off.

2517

-+ *

2518

-+ * This means that fallback selection must not select !active CPUs.

2519

-+ * And can assume that any active CPU must be online. Conversely

2520

-+ * select_task_rq() below may allow selection of !active CPUs in order

2521

-+ * to satisfy the above rules.

2522

-+ */

2523

-+static int select_fallback_rq(int cpu, struct task_struct *p)

2524

-+{

2525

-+	int nid = cpu_to_node(cpu);

2526

-+	const struct cpumask *nodemask = NULL;

2527

-+	enum { cpuset, possible, fail } state = cpuset;

2528

-+	int dest_cpu;

2529

-+

2530

-+	/*

2531

-+	 * If the node that the CPU is on has been offlined, cpu_to_node()

2532

-+	 * will return -1. There is no CPU on the node, and we should

2533

-+	 * select the CPU on the other node.

2534

-+	 */

2535

-+	if (nid != -1) {

2536

-+		nodemask = cpumask_of_node(nid);

2537

-+

2538

-+		/* Look for allowed, online CPU in same node. */

2539

-+		for_each_cpu(dest_cpu, nodemask) {

2540

-+			if (is_cpu_allowed(p, dest_cpu))

2541

-+				return dest_cpu;

2542

-+		}

2543

-+	}

2544

-+

2545

-+	for (;;) {

2546

-+		/* Any allowed, online CPU? */

2547

-+		for_each_cpu(dest_cpu, p->cpus_ptr) {

2548

-+			if (!is_cpu_allowed(p, dest_cpu))

2549

-+				continue;

2550

-+			goto out;

2551

-+		}

2552

-+

2553

-+		/* No more Mr. Nice Guy. */

2554

-+		switch (state) {

2555

-+		case cpuset:

2556

-+			if (cpuset_cpus_allowed_fallback(p)) {

2557

-+				state = possible;

2558

-+				break;

2559

-+			}

2560

-+			fallthrough;

2561

-+		case possible:

2562

-+			/*

2563

-+			 * XXX When called from select_task_rq() we only

2564

-+			 * hold p->pi_lock and again violate locking order.

2565

-+			 *

2566

-+			 * More yuck to audit.

2567

-+			 */

2568

-+			do_set_cpus_allowed(p, task_cpu_possible_mask(p));

2569

-+			state = fail;

2570

-+			break;

2571

-+

2572

-+		case fail:

2573

-+			BUG();

2574

-+			break;

2575

-+		}

2576

-+	}

2577

-+

2578

-+out:

2579

-+	if (state != cpuset) {

2580

-+		/*

2581

-+		 * Don't tell them about moving exiting tasks or

2582

-+		 * kernel threads (both mm NULL), since they never

2583

-+		 * leave kernel.

2584

-+		 */

2585

-+		if (p->mm && printk_ratelimit()) {

2586

-+			printk_deferred("process %d (%s) no longer affine to cpu%d\n",

2587

-+					task_pid_nr(p), p->comm, cpu);

2588

-+		}

2589

-+	}

2590

-+

2591

-+	return dest_cpu;

2592

-+}

2593

-+

2594

-+static inline int select_task_rq(struct task_struct *p)

2595

-+{

2596

-+	cpumask_t chk_mask, tmp;

2597

-+

2598

-+	if (unlikely(!cpumask_and(&chk_mask, p->cpus_ptr, cpu_active_mask)))

2599

-+		return select_fallback_rq(task_cpu(p), p);

2600

-+

2601

-+	if (

2602

-+#ifdef CONFIG_SCHED_SMT

2603

-+	    cpumask_and(&tmp, &chk_mask, &sched_sg_idle_mask) ||

2604

-+#endif

2605

-+	    cpumask_and(&tmp, &chk_mask, sched_rq_watermark) ||

2606

-+	    cpumask_and(&tmp, &chk_mask,

2607

-+			sched_rq_watermark + SCHED_BITS - task_sched_prio(p)))

2608

-+		return best_mask_cpu(task_cpu(p), &tmp);

2609

-+

2610

-+	return best_mask_cpu(task_cpu(p), &chk_mask);

2611

-+}

2612

-+

2613

-+void sched_set_stop_task(int cpu, struct task_struct *stop)

2614

-+{

2615

-+	static struct lock_class_key stop_pi_lock;

2616

-+	struct sched_param stop_param = { .sched_priority = STOP_PRIO };

2617

-+	struct sched_param start_param = { .sched_priority = 0 };

2618

-+	struct task_struct *old_stop = cpu_rq(cpu)->stop;

2619

-+

2620

-+	if (stop) {

2621

-+		/*

2622

-+		 * Make it appear like a SCHED_FIFO task, its something

2623

-+		 * userspace knows about and won't get confused about.

2624

-+		 *

2625

-+		 * Also, it will make PI more or less work without too

2626

-+		 * much confusion -- but then, stop work should not

2627

-+		 * rely on PI working anyway.

2628

-+		 */

2629

-+		sched_setscheduler_nocheck(stop, SCHED_FIFO, &stop_param);

2630

-+

2631

-+		/*

2632

-+		 * The PI code calls rt_mutex_setprio() with ->pi_lock held to

2633

-+		 * adjust the effective priority of a task. As a result,

2634

-+		 * rt_mutex_setprio() can trigger (RT) balancing operations,

2635

-+		 * which can then trigger wakeups of the stop thread to push

2636

-+		 * around the current task.

2637

-+		 *

2638

-+		 * The stop task itself will never be part of the PI-chain, it

2639

-+		 * never blocks, therefore that ->pi_lock recursion is safe.

2640

-+		 * Tell lockdep about this by placing the stop->pi_lock in its

2641

-+		 * own class.

2642

-+		 */

2643

-+		lockdep_set_class(&stop->pi_lock, &stop_pi_lock);

2644

-+	}

2645

-+

2646

-+	cpu_rq(cpu)->stop = stop;

2647

-+

2648

-+	if (old_stop) {

2649

-+		/*

2650

-+		 * Reset it back to a normal scheduling policy so that

2651

-+		 * it can die in pieces.

2652

-+		 */

2653

-+		sched_setscheduler_nocheck(old_stop, SCHED_NORMAL, &start_param);

2654

-+	}

2655

-+}

2656

-+

2657

-+static int affine_move_task(struct rq *rq, struct task_struct *p, int dest_cpu,

2658

-+			    raw_spinlock_t *lock, unsigned long irq_flags)

2659

-+{

2660

-+	/* Can the task run on the task's current CPU? If so, we're done */

2661

-+	if (!cpumask_test_cpu(task_cpu(p), &p->cpus_mask)) {

2662

-+		if (p->migration_disabled) {

2663

-+			if (likely(p->cpus_ptr != &p->cpus_mask))

2664

-+				__do_set_cpus_ptr(p, &p->cpus_mask);

2665

-+			p->migration_disabled = 0;

2666

-+			p->migration_flags |= MDF_FORCE_ENABLED;

2667

-+			/* When p is migrate_disabled, rq->lock should be held */

2668

-+			rq->nr_pinned--;

2669

-+		}

2670

-+

2671

-+		if (task_running(p) || READ_ONCE(p->__state) == TASK_WAKING) {

2672

-+			struct migration_arg arg = { p, dest_cpu };

2673

-+

2674

-+			/* Need help from migration thread: drop lock and wait. */

2675

-+			__task_access_unlock(p, lock);

2676

-+			raw_spin_unlock_irqrestore(&p->pi_lock, irq_flags);

2677

-+			stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);

2678

-+			return 0;

2679

-+		}

2680

-+		if (task_on_rq_queued(p)) {

2681

-+			/*

2682

-+			 * OK, since we're going to drop the lock immediately

2683

-+			 * afterwards anyway.

2684

-+			 */

2685

-+			update_rq_clock(rq);

2686

-+			rq = move_queued_task(rq, p, dest_cpu);

2687

-+			lock = &rq->lock;

2688

-+		}

2689

-+	}

2690

-+	__task_access_unlock(p, lock);

2691

-+	raw_spin_unlock_irqrestore(&p->pi_lock, irq_flags);

2692

-+	return 0;

2693

-+}

2694

-+

2695

-+static int __set_cpus_allowed_ptr_locked(struct task_struct *p,

2696

-+					 const struct cpumask *new_mask,

2697

-+					 u32 flags,

2698

-+					 struct rq *rq,

2699

-+					 raw_spinlock_t *lock,

2700

-+					 unsigned long irq_flags)

2701

-+{

2702

-+	const struct cpumask *cpu_allowed_mask = task_cpu_possible_mask(p);

2703

-+	const struct cpumask *cpu_valid_mask = cpu_active_mask;

2704

-+	bool kthread = p->flags & PF_KTHREAD;

2705

-+	struct cpumask *user_mask = NULL;

2706

-+	int dest_cpu;

2707

-+	int ret = 0;

2708

-+

2709

-+	if (kthread || is_migration_disabled(p)) {

2710

-+		/*

2711

-+		 * Kernel threads are allowed on online && !active CPUs,

2712

-+		 * however, during cpu-hot-unplug, even these might get pushed

2713

-+		 * away if not KTHREAD_IS_PER_CPU.

2714

-+		 *

2715

-+		 * Specifically, migration_disabled() tasks must not fail the

2716

-+		 * cpumask_any_and_distribute() pick below, esp. so on

2717

-+		 * SCA_MIGRATE_ENABLE, otherwise we'll not call

2718

-+		 * set_cpus_allowed_common() and actually reset p->cpus_ptr.

2719

-+		 */

2720

-+		cpu_valid_mask = cpu_online_mask;

2721

-+	}

2722

-+

2723

-+	if (!kthread && !cpumask_subset(new_mask, cpu_allowed_mask)) {

2724

-+		ret = -EINVAL;

2725

-+		goto out;

2726

-+	}

2727

-+

2728

-+	/*

2729

-+	 * Must re-check here, to close a race against __kthread_bind(),

2730

-+	 * sched_setaffinity() is not guaranteed to observe the flag.

2731

-+	 */

2732

-+	if ((flags & SCA_CHECK) && (p->flags & PF_NO_SETAFFINITY)) {

2733

-+		ret = -EINVAL;

2734

-+		goto out;

2735

-+	}

2736

-+

2737

-+	if (cpumask_equal(&p->cpus_mask, new_mask))

2738

-+		goto out;

2739

-+

2740

-+	dest_cpu = cpumask_any_and(cpu_valid_mask, new_mask);

2741

-+	if (dest_cpu >= nr_cpu_ids) {

2742

-+		ret = -EINVAL;

2743

-+		goto out;

2744

-+	}

2745

-+

2746

-+	__do_set_cpus_allowed(p, new_mask);

2747

-+

2748

-+	if (flags & SCA_USER)

2749

-+		user_mask = clear_user_cpus_ptr(p);

2750

-+

2751

-+	ret = affine_move_task(rq, p, dest_cpu, lock, irq_flags);

2752

-+

2753

-+	kfree(user_mask);

2754

-+

2755

-+	return ret;

2756

-+

2757

-+out:

2758

-+	__task_access_unlock(p, lock);

2759

-+	raw_spin_unlock_irqrestore(&p->pi_lock, irq_flags);

2760

-+

2761

-+	return ret;

2762

-+}

2763

-+

2764

-+/*

2765

-+ * Change a given task's CPU affinity. Migrate the thread to a

2766

-+ * proper CPU and schedule it away if the CPU it's executing on

2767

-+ * is removed from the allowed bitmask.

2768

-+ *

2769

-+ * NOTE: the caller must have a valid reference to the task, the

2770

-+ * task must not exit() & deallocate itself prematurely. The

2771

-+ * call is not atomic; no spinlocks may be held.

2772

-+ */

2773

-+static int __set_cpus_allowed_ptr(struct task_struct *p,

2774

-+				  const struct cpumask *new_mask, u32 flags)

2775

-+{

2776

-+	unsigned long irq_flags;

2777

-+	struct rq *rq;

2778

-+	raw_spinlock_t *lock;

2779

-+

2780

-+	raw_spin_lock_irqsave(&p->pi_lock, irq_flags);

2781

-+	rq = __task_access_lock(p, &lock);

2782

-+

2783

-+	return __set_cpus_allowed_ptr_locked(p, new_mask, flags, rq, lock, irq_flags);

2784

-+}

2785

-+

2786

-+int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)

2787

-+{

2788

-+	return __set_cpus_allowed_ptr(p, new_mask, 0);

2789

-+}

2790

-+EXPORT_SYMBOL_GPL(set_cpus_allowed_ptr);

2791

-+

2792

-+/*

2793

-+ * Change a given task's CPU affinity to the intersection of its current

2794

-+ * affinity mask and @subset_mask, writing the resulting mask to @new_mask

2795

-+ * and pointing @p->user_cpus_ptr to a copy of the old mask.

2796

-+ * If the resulting mask is empty, leave the affinity unchanged and return

2797

-+ * -EINVAL.

2798

-+ */

2799

-+static int restrict_cpus_allowed_ptr(struct task_struct *p,

2800

-+				     struct cpumask *new_mask,

2801

-+				     const struct cpumask *subset_mask)

2802

-+{

2803

-+	struct cpumask *user_mask = NULL;

2804

-+	unsigned long irq_flags;

2805

-+	raw_spinlock_t *lock;

2806

-+	struct rq *rq;

2807

-+	int err;

2808

-+

2809

-+	if (!p->user_cpus_ptr) {

2810

-+		user_mask = kmalloc(cpumask_size(), GFP_KERNEL);

2811

-+		if (!user_mask)

2812

-+			return -ENOMEM;

2813

-+	}

2814

-+

2815

-+	raw_spin_lock_irqsave(&p->pi_lock, irq_flags);

2816

-+	rq = __task_access_lock(p, &lock);

2817

-+

2818

-+	if (!cpumask_and(new_mask, &p->cpus_mask, subset_mask)) {

2819

-+		err = -EINVAL;

2820

-+		goto err_unlock;

2821

-+	}

2822

-+

2823

-+	/*

2824

-+	 * We're about to butcher the task affinity, so keep track of what

2825

-+	 * the user asked for in case we're able to restore it later on.

2826

-+	 */

2827

-+	if (user_mask) {

2828

-+		cpumask_copy(user_mask, p->cpus_ptr);

2829

-+		p->user_cpus_ptr = user_mask;

2830

-+	}

2831

-+

2832

-+	/*return __set_cpus_allowed_ptr_locked(p, new_mask, 0, rq, &rf);*/

2833

-+	return __set_cpus_allowed_ptr_locked(p, new_mask, 0, rq, lock, irq_flags);

2834

-+

2835

-+err_unlock:

2836

-+	__task_access_unlock(p, lock);

2837

-+	raw_spin_unlock_irqrestore(&p->pi_lock, irq_flags);

2838

-+	kfree(user_mask);

2839

-+	return err;

2840

-+}

2841

-+

2842

-+/*

2843

-+ * Restrict the CPU affinity of task @p so that it is a subset of

2844

-+ * task_cpu_possible_mask() and point @p->user_cpu_ptr to a copy of the

2845

-+ * old affinity mask. If the resulting mask is empty, we warn and walk

2846

-+ * up the cpuset hierarchy until we find a suitable mask.

2847

-+ */

2848

-+void force_compatible_cpus_allowed_ptr(struct task_struct *p)

2849

-+{

2850

-+	cpumask_var_t new_mask;

2851

-+	const struct cpumask *override_mask = task_cpu_possible_mask(p);

2852

-+

2853

-+	alloc_cpumask_var(&new_mask, GFP_KERNEL);

2854

-+

2855

-+	/*

2856

-+	 * __migrate_task() can fail silently in the face of concurrent

2857

-+	 * offlining of the chosen destination CPU, so take the hotplug

2858

-+	 * lock to ensure that the migration succeeds.

2859

-+	 */

2860

-+	cpus_read_lock();

2861

-+	if (!cpumask_available(new_mask))

2862

-+		goto out_set_mask;

2863

-+

2864

-+	if (!restrict_cpus_allowed_ptr(p, new_mask, override_mask))

2865

-+		goto out_free_mask;

2866

-+

2867

-+	/*

2868

-+	 * We failed to find a valid subset of the affinity mask for the

2869

-+	 * task, so override it based on its cpuset hierarchy.

2870

-+	 */

2871

-+	cpuset_cpus_allowed(p, new_mask);

2872

-+	override_mask = new_mask;

2873

-+

2874

-+out_set_mask:

2875

-+	if (printk_ratelimit()) {

2876

-+		printk_deferred("Overriding affinity for process %d (%s) to CPUs %*pbl\n",

2877

-+				task_pid_nr(p), p->comm,

2878

-+				cpumask_pr_args(override_mask));

2879

-+	}

2880

-+

2881

-+	WARN_ON(set_cpus_allowed_ptr(p, override_mask));

2882

-+out_free_mask:

2883

-+	cpus_read_unlock();

2884

-+	free_cpumask_var(new_mask);

2885

-+}

2886

-+

2887

-+static int

2888

-+__sched_setaffinity(struct task_struct *p, const struct cpumask *mask);

2889

-+

2890

-+/*

2891

-+ * Restore the affinity of a task @p which was previously restricted by a

2892

-+ * call to force_compatible_cpus_allowed_ptr(). This will clear (and free)

2893

-+ * @p->user_cpus_ptr.

2894

-+ *

2895

-+ * It is the caller's responsibility to serialise this with any calls to

2896

-+ * force_compatible_cpus_allowed_ptr(@p).

2897

-+ */

2898

-+void relax_compatible_cpus_allowed_ptr(struct task_struct *p)

2899

-+{

2900

-+	struct cpumask *user_mask = p->user_cpus_ptr;

2901

-+	unsigned long flags;

2902

-+

2903

-+	/*

2904

-+	 * Try to restore the old affinity mask. If this fails, then

2905

-+	 * we free the mask explicitly to avoid it being inherited across

2906

-+	 * a subsequent fork().

2907

-+	 */

2908

-+	if (!user_mask || !__sched_setaffinity(p, user_mask))

2909

-+		return;

2910

-+

2911

-+	raw_spin_lock_irqsave(&p->pi_lock, flags);

2912

-+	user_mask = clear_user_cpus_ptr(p);

2913

-+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);

2914

-+

2915

-+	kfree(user_mask);

2916

-+}

2917

-+

2918

-+#else /* CONFIG_SMP */

2919

-+

2920

-+static inline int select_task_rq(struct task_struct *p)

2921

-+{

2922

-+	return 0;

2923

-+}

2924

-+

2925

-+static inline int

2926

-+__set_cpus_allowed_ptr(struct task_struct *p,

2927

-+		       const struct cpumask *new_mask, u32 flags)

2928

-+{

2929

-+	return set_cpus_allowed_ptr(p, new_mask);

2930

-+}

2931

-+

2932

-+static inline bool rq_has_pinned_tasks(struct rq *rq)

2933

-+{

2934

-+	return false;

2935

-+}

2936

-+

2937

-+#endif /* !CONFIG_SMP */

2938

-+

2939

-+static void

2940

-+ttwu_stat(struct task_struct *p, int cpu, int wake_flags)

2941

-+{

2942

-+	struct rq *rq;

2943

-+

2944

-+	if (!schedstat_enabled())

2945

-+		return;

2946

-+

2947

-+	rq = this_rq();

2948

-+

2949

-+#ifdef CONFIG_SMP

2950

-+	if (cpu == rq->cpu)

2951

-+		__schedstat_inc(rq->ttwu_local);

2952

-+	else {

2953

-+		/** Alt schedule FW ToDo:

2954

-+		 * How to do ttwu_wake_remote

2955

-+		 */

2956

-+	}

2957

-+#endif /* CONFIG_SMP */

2958

-+

2959

-+	__schedstat_inc(rq->ttwu_count);

2960

-+}

2961

-+

2962

-+/*

2963

-+ * Mark the task runnable and perform wakeup-preemption.

2964

-+ */

2965

-+static inline void

2966

-+ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)

2967

-+{

2968

-+	check_preempt_curr(rq);

2969

-+	WRITE_ONCE(p->__state, TASK_RUNNING);

2970

-+	trace_sched_wakeup(p);

2971

-+}

2972

-+

2973

-+static inline void

2974

-+ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags)

2975

-+{

2976

-+	if (p->sched_contributes_to_load)

2977

-+		rq->nr_uninterruptible--;

2978

-+

2979

-+	if (

2980

-+#ifdef CONFIG_SMP

2981

-+	    !(wake_flags & WF_MIGRATED) &&

2982

-+#endif

2983

-+	    p->in_iowait) {

2984

-+		delayacct_blkio_end(p);

2985

-+		atomic_dec(&task_rq(p)->nr_iowait);

2986

-+	}

2987

-+

2988

-+	activate_task(p, rq);

2989

-+	ttwu_do_wakeup(rq, p, 0);

2990

-+}

2991

-+

2992

-+/*

2993

-+ * Consider @p being inside a wait loop:

2994

-+ *

2995

-+ *   for (;;) {

2996

-+ *      set_current_state(TASK_UNINTERRUPTIBLE);

2997

-+ *

2998

-+ *      if (CONDITION)

2999

-+ *         break;

3000

-+ *

3001

-+ *      schedule();

3002

-+ *   }

3003

-+ *   __set_current_state(TASK_RUNNING);

3004

-+ *

3005

-+ * between set_current_state() and schedule(). In this case @p is still

3006

-+ * runnable, so all that needs doing is change p->state back to TASK_RUNNING in

3007

-+ * an atomic manner.

3008

-+ *

3009

-+ * By taking task_rq(p)->lock we serialize against schedule(), if @p->on_rq

3010

-+ * then schedule() must still happen and p->state can be changed to

3011

-+ * TASK_RUNNING. Otherwise we lost the race, schedule() has happened, and we

3012

-+ * need to do a full wakeup with enqueue.

3013

-+ *

3014

-+ * Returns: %true when the wakeup is done,

3015

-+ *          %false otherwise.

3016

-+ */

3017

-+static int ttwu_runnable(struct task_struct *p, int wake_flags)

3018

-+{

3019

-+	struct rq *rq;

3020

-+	raw_spinlock_t *lock;

3021

-+	int ret = 0;

3022

-+

3023

-+	rq = __task_access_lock(p, &lock);

3024

-+	if (task_on_rq_queued(p)) {

3025

-+		/* check_preempt_curr() may use rq clock */

3026

-+		update_rq_clock(rq);

3027

-+		ttwu_do_wakeup(rq, p, wake_flags);

3028

-+		ret = 1;

3029

-+	}

3030

-+	__task_access_unlock(p, lock);

3031

-+

3032

-+	return ret;

3033

-+}

3034

-+

3035

-+#ifdef CONFIG_SMP

3036

-+void sched_ttwu_pending(void *arg)

3037

-+{

3038

-+	struct llist_node *llist = arg;

3039

-+	struct rq *rq = this_rq();

3040

-+	struct task_struct *p, *t;

3041

-+	struct rq_flags rf;

3042

-+

3043

-+	if (!llist)

3044

-+		return;

3045

-+

3046

-+	/*

3047

-+	 * rq::ttwu_pending racy indication of out-standing wakeups.

3048

-+	 * Races such that false-negatives are possible, since they

3049

-+	 * are shorter lived that false-positives would be.

3050

-+	 */

3051

-+	WRITE_ONCE(rq->ttwu_pending, 0);

3052

-+

3053

-+	rq_lock_irqsave(rq, &rf);

3054

-+	update_rq_clock(rq);

3055

-+

3056

-+	llist_for_each_entry_safe(p, t, llist, wake_entry.llist) {

3057

-+		if (WARN_ON_ONCE(p->on_cpu))

3058

-+			smp_cond_load_acquire(&p->on_cpu, !VAL);

3059

-+

3060

-+		if (WARN_ON_ONCE(task_cpu(p) != cpu_of(rq)))

3061

-+			set_task_cpu(p, cpu_of(rq));

3062

-+

3063

-+		ttwu_do_activate(rq, p, p->sched_remote_wakeup ? WF_MIGRATED : 0);

3064

-+	}

3065

-+

3066

-+	rq_unlock_irqrestore(rq, &rf);

3067

-+}

3068

-+

3069

-+void send_call_function_single_ipi(int cpu)

3070

-+{

3071

-+	struct rq *rq = cpu_rq(cpu);

3072

-+

3073

-+	if (!set_nr_if_polling(rq->idle))

3074

-+		arch_send_call_function_single_ipi(cpu);

3075

-+	else

3076

-+		trace_sched_wake_idle_without_ipi(cpu);

3077

-+}

3078

-+

3079

-+/*

3080

-+ * Queue a task on the target CPUs wake_list and wake the CPU via IPI if

3081

-+ * necessary. The wakee CPU on receipt of the IPI will queue the task

3082

-+ * via sched_ttwu_wakeup() for activation so the wakee incurs the cost

3083

-+ * of the wakeup instead of the waker.

3084

-+ */

3085

-+static void __ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_flags)

3086

-+{

3087

-+	struct rq *rq = cpu_rq(cpu);

3088

-+

3089

-+	p->sched_remote_wakeup = !!(wake_flags & WF_MIGRATED);

3090

-+

3091

-+	WRITE_ONCE(rq->ttwu_pending, 1);

3092

-+	__smp_call_single_queue(cpu, &p->wake_entry.llist);

3093

-+}

3094

-+

3095

-+static inline bool ttwu_queue_cond(int cpu, int wake_flags)

3096

-+{

3097

-+	/*

3098

-+	 * Do not complicate things with the async wake_list while the CPU is

3099

-+	 * in hotplug state.

3100

-+	 */

3101

-+	if (!cpu_active(cpu))

3102

-+		return false;

3103

-+

3104

-+	/*

3105

-+	 * If the CPU does not share cache, then queue the task on the

3106

-+	 * remote rqs wakelist to avoid accessing remote data.

3107

-+	 */

3108

-+	if (!cpus_share_cache(smp_processor_id(), cpu))

3109

-+		return true;

3110

-+

3111

-+	/*

3112

-+	 * If the task is descheduling and the only running task on the

3113

-+	 * CPU then use the wakelist to offload the task activation to

3114

-+	 * the soon-to-be-idle CPU as the current CPU is likely busy.

3115

-+	 * nr_running is checked to avoid unnecessary task stacking.

3116

-+	 */

3117

-+	if ((wake_flags & WF_ON_CPU) && cpu_rq(cpu)->nr_running <= 1)

3118

-+		return true;

3119

-+

3120

-+	return false;

3121

-+}

3122

-+

3123

-+static bool ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_flags)

3124

-+{

3125

-+	if (__is_defined(ALT_SCHED_TTWU_QUEUE) && ttwu_queue_cond(cpu, wake_flags)) {

3126

-+		if (WARN_ON_ONCE(cpu == smp_processor_id()))

3127

-+			return false;

3128

-+

3129

-+		sched_clock_cpu(cpu); /* Sync clocks across CPUs */

3130

-+		__ttwu_queue_wakelist(p, cpu, wake_flags);

3131

-+		return true;

3132

-+	}

3133

-+

3134

-+	return false;

3135

-+}

3136

-+

3137

-+void wake_up_if_idle(int cpu)

3138

-+{

3139

-+	struct rq *rq = cpu_rq(cpu);

3140

-+	unsigned long flags;

3141

-+

3142

-+	rcu_read_lock();

3143

-+

3144

-+	if (!is_idle_task(rcu_dereference(rq->curr)))

3145

-+		goto out;

3146

-+

3147

-+	if (set_nr_if_polling(rq->idle)) {

3148

-+		trace_sched_wake_idle_without_ipi(cpu);

3149

-+	} else {

3150

-+		raw_spin_lock_irqsave(&rq->lock, flags);

3151

-+		if (is_idle_task(rq->curr))

3152

-+			smp_send_reschedule(cpu);

3153

-+		/* Else CPU is not idle, do nothing here */

3154

-+		raw_spin_unlock_irqrestore(&rq->lock, flags);

3155

-+	}

3156

-+

3157

-+out:

3158

-+	rcu_read_unlock();

3159

-+}

3160

-+

3161

-+bool cpus_share_cache(int this_cpu, int that_cpu)

3162

-+{

3163

-+	return per_cpu(sd_llc_id, this_cpu) == per_cpu(sd_llc_id, that_cpu);

3164

-+}

3165

-+#else /* !CONFIG_SMP */

3166

-+

3167

-+static inline bool ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_flags)

3168

-+{

3169

-+	return false;

3170

-+}

3171

-+

3172

-+#endif /* CONFIG_SMP */

3173

-+

3174

-+static inline void ttwu_queue(struct task_struct *p, int cpu, int wake_flags)

3175

-+{

3176

-+	struct rq *rq = cpu_rq(cpu);

3177

-+

3178

-+	if (ttwu_queue_wakelist(p, cpu, wake_flags))

3179

-+		return;

3180

-+

3181

-+	raw_spin_lock(&rq->lock);

3182

-+	update_rq_clock(rq);

3183

-+	ttwu_do_activate(rq, p, wake_flags);

3184

-+	raw_spin_unlock(&rq->lock);

3185

-+}

3186

-+

3187

-+/*

3188

-+ * Invoked from try_to_wake_up() to check whether the task can be woken up.

3189

-+ *

3190

-+ * The caller holds p::pi_lock if p != current or has preemption

3191

-+ * disabled when p == current.

3192

-+ *

3193

-+ * The rules of PREEMPT_RT saved_state:

3194

-+ *

3195

-+ *   The related locking code always holds p::pi_lock when updating

3196

-+ *   p::saved_state, which means the code is fully serialized in both cases.

3197

-+ *

3198

-+ *   The lock wait and lock wakeups happen via TASK_RTLOCK_WAIT. No other

3199

-+ *   bits set. This allows to distinguish all wakeup scenarios.

3200

-+ */

3201

-+static __always_inline

3202

-+bool ttwu_state_match(struct task_struct *p, unsigned int state, int *success)

3203

-+{

3204

-+	if (IS_ENABLED(CONFIG_DEBUG_PREEMPT)) {

3205

-+		WARN_ON_ONCE((state & TASK_RTLOCK_WAIT) &&

3206

-+			     state != TASK_RTLOCK_WAIT);

3207

-+	}

3208

-+

3209

-+	if (READ_ONCE(p->__state) & state) {

3210

-+		*success = 1;

3211

-+		return true;

3212

-+	}

3213

-+

3214

-+#ifdef CONFIG_PREEMPT_RT

3215

-+	/*

3216

-+	 * Saved state preserves the task state across blocking on

3217

-+	 * an RT lock.  If the state matches, set p::saved_state to

3218

-+	 * TASK_RUNNING, but do not wake the task because it waits

3219

-+	 * for a lock wakeup. Also indicate success because from

3220

-+	 * the regular waker's point of view this has succeeded.

3221

-+	 *

3222

-+	 * After acquiring the lock the task will restore p::__state

3223

-+	 * from p::saved_state which ensures that the regular

3224

-+	 * wakeup is not lost. The restore will also set

3225

-+	 * p::saved_state to TASK_RUNNING so any further tests will

3226

-+	 * not result in false positives vs. @success

3227

-+	 */

3228

-+	if (p->saved_state & state) {

3229

-+		p->saved_state = TASK_RUNNING;

3230

-+		*success = 1;

3231

-+	}

3232

-+#endif

3233

-+	return false;

3234

-+}

3235

-+

3236

-+/*

3237

-+ * Notes on Program-Order guarantees on SMP systems.

3238

-+ *

3239

-+ *  MIGRATION

3240

-+ *

3241

-+ * The basic program-order guarantee on SMP systems is that when a task [t]

3242

-+ * migrates, all its activity on its old CPU [c0] happens-before any subsequent

3243

-+ * execution on its new CPU [c1].

3244

-+ *

3245

-+ * For migration (of runnable tasks) this is provided by the following means:

3246

-+ *

3247

-+ *  A) UNLOCK of the rq(c0)->lock scheduling out task t

3248

-+ *  B) migration for t is required to synchronize *both* rq(c0)->lock and

3249

-+ *     rq(c1)->lock (if not at the same time, then in that order).

3250

-+ *  C) LOCK of the rq(c1)->lock scheduling in task

3251

-+ *

3252

-+ * Transitivity guarantees that B happens after A and C after B.

3253

-+ * Note: we only require RCpc transitivity.

3254

-+ * Note: the CPU doing B need not be c0 or c1

3255

-+ *

3256

-+ * Example:

3257

-+ *

3258

-+ *   CPU0            CPU1            CPU2

3259

-+ *

3260

-+ *   LOCK rq(0)->lock

3261

-+ *   sched-out X

3262

-+ *   sched-in Y

3263

-+ *   UNLOCK rq(0)->lock

3264

-+ *

3265

-+ *                                   LOCK rq(0)->lock // orders against CPU0

3266

-+ *                                   dequeue X

3267

-+ *                                   UNLOCK rq(0)->lock

3268

-+ *

3269

-+ *                                   LOCK rq(1)->lock

3270

-+ *                                   enqueue X

3271

-+ *                                   UNLOCK rq(1)->lock

3272

-+ *

3273

-+ *                   LOCK rq(1)->lock // orders against CPU2

3274

-+ *                   sched-out Z

3275

-+ *                   sched-in X

3276

-+ *                   UNLOCK rq(1)->lock

3277

-+ *

3278

-+ *

3279

-+ *  BLOCKING -- aka. SLEEP + WAKEUP

3280

-+ *

3281

-+ * For blocking we (obviously) need to provide the same guarantee as for

3282

-+ * migration. However the means are completely different as there is no lock

3283

-+ * chain to provide order. Instead we do:

3284

-+ *

3285

-+ *   1) smp_store_release(X->on_cpu, 0)   -- finish_task()

3286

-+ *   2) smp_cond_load_acquire(!X->on_cpu) -- try_to_wake_up()

3287

-+ *

3288

-+ * Example:

3289

-+ *

3290

-+ *   CPU0 (schedule)  CPU1 (try_to_wake_up) CPU2 (schedule)

3291

-+ *

3292

-+ *   LOCK rq(0)->lock LOCK X->pi_lock

3293

-+ *   dequeue X

3294

-+ *   sched-out X

3295

-+ *   smp_store_release(X->on_cpu, 0);

3296

-+ *

3297

-+ *                    smp_cond_load_acquire(&X->on_cpu, !VAL);

3298

-+ *                    X->state = WAKING

3299

-+ *                    set_task_cpu(X,2)

3300

-+ *

3301

-+ *                    LOCK rq(2)->lock

3302

-+ *                    enqueue X

3303

-+ *                    X->state = RUNNING

3304

-+ *                    UNLOCK rq(2)->lock

3305

-+ *

3306

-+ *                                          LOCK rq(2)->lock // orders against CPU1

3307

-+ *                                          sched-out Z

3308

-+ *                                          sched-in X

3309

-+ *                                          UNLOCK rq(2)->lock

3310

-+ *

3311

-+ *                    UNLOCK X->pi_lock

3312

-+ *   UNLOCK rq(0)->lock

3313

-+ *

3314

-+ *

3315

-+ * However; for wakeups there is a second guarantee we must provide, namely we

3316

-+ * must observe the state that lead to our wakeup. That is, not only must our

3317

-+ * task observe its own prior state, it must also observe the stores prior to

3318

-+ * its wakeup.

3319

-+ *

3320

-+ * This means that any means of doing remote wakeups must order the CPU doing

3321

-+ * the wakeup against the CPU the task is going to end up running on. This,

3322

-+ * however, is already required for the regular Program-Order guarantee above,

3323

-+ * since the waking CPU is the one issueing the ACQUIRE (smp_cond_load_acquire).

3324

-+ *

3325

-+ */

3326

-+

3327

-+/**

3328

-+ * try_to_wake_up - wake up a thread

3329

-+ * @p: the thread to be awakened

3330

-+ * @state: the mask of task states that can be woken

3331

-+ * @wake_flags: wake modifier flags (WF_*)

3332

-+ *

3333

-+ * Conceptually does:

3334

-+ *

3335

-+ *   If (@state & @p->state) @p->state = TASK_RUNNING.

3336

-+ *

3337

-+ * If the task was not queued/runnable, also place it back on a runqueue.

3338

-+ *

3339

-+ * This function is atomic against schedule() which would dequeue the task.

3340

-+ *

3341

-+ * It issues a full memory barrier before accessing @p->state, see the comment

3342

-+ * with set_current_state().

3343

-+ *

3344

-+ * Uses p->pi_lock to serialize against concurrent wake-ups.

3345

-+ *

3346

-+ * Relies on p->pi_lock stabilizing:

3347

-+ *  - p->sched_class

3348

-+ *  - p->cpus_ptr

3349

-+ *  - p->sched_task_group

3350

-+ * in order to do migration, see its use of select_task_rq()/set_task_cpu().

3351

-+ *

3352

-+ * Tries really hard to only take one task_rq(p)->lock for performance.

3353

-+ * Takes rq->lock in:

3354

-+ *  - ttwu_runnable()    -- old rq, unavoidable, see comment there;

3355

-+ *  - ttwu_queue()       -- new rq, for enqueue of the task;

3356

-+ *  - psi_ttwu_dequeue() -- much sadness :-( accounting will kill us.

3357

-+ *

3358

-+ * As a consequence we race really badly with just about everything. See the

3359

-+ * many memory barriers and their comments for details.

3360

-+ *

3361

-+ * Return: %true if @p->state changes (an actual wakeup was done),

3362

-+ *	   %false otherwise.

3363

-+ */

3364

-+static int try_to_wake_up(struct task_struct *p, unsigned int state,

3365

-+			  int wake_flags)

3366

-+{

3367

-+	unsigned long flags;

3368

-+	int cpu, success = 0;

3369

-+

3370

-+	preempt_disable();

3371

-+	if (p == current) {

3372

-+		/*

3373

-+		 * We're waking current, this means 'p->on_rq' and 'task_cpu(p)

3374

-+		 * == smp_processor_id()'. Together this means we can special

3375

-+		 * case the whole 'p->on_rq && ttwu_runnable()' case below

3376

-+		 * without taking any locks.

3377

-+		 *

3378

-+		 * In particular:

3379

-+		 *  - we rely on Program-Order guarantees for all the ordering,

3380

-+		 *  - we're serialized against set_special_state() by virtue of

3381

-+		 *    it disabling IRQs (this allows not taking ->pi_lock).

3382

-+		 */

3383

-+		if (!ttwu_state_match(p, state, &success))

3384

-+			goto out;

3385

-+

3386

-+		trace_sched_waking(p);

3387

-+		WRITE_ONCE(p->__state, TASK_RUNNING);

3388

-+		trace_sched_wakeup(p);

3389

-+		goto out;

3390

-+	}

3391

-+

3392

-+	/*

3393

-+	 * If we are going to wake up a thread waiting for CONDITION we

3394

-+	 * need to ensure that CONDITION=1 done by the caller can not be

3395

-+	 * reordered with p->state check below. This pairs with smp_store_mb()

3396

-+	 * in set_current_state() that the waiting thread does.

3397

-+	 */

3398

-+	raw_spin_lock_irqsave(&p->pi_lock, flags);

3399

-+	smp_mb__after_spinlock();

3400

-+	if (!ttwu_state_match(p, state, &success))

3401

-+		goto unlock;

3402

-+

3403

-+	trace_sched_waking(p);

3404

-+

3405

-+	/*

3406

-+	 * Ensure we load p->on_rq _after_ p->state, otherwise it would

3407

-+	 * be possible to, falsely, observe p->on_rq == 0 and get stuck

3408

-+	 * in smp_cond_load_acquire() below.

3409

-+	 *

3410

-+	 * sched_ttwu_pending()			try_to_wake_up()

3411

-+	 *   STORE p->on_rq = 1			  LOAD p->state

3412

-+	 *   UNLOCK rq->lock

3413

-+	 *

3414

-+	 * __schedule() (switch to task 'p')

3415

-+	 *   LOCK rq->lock			  smp_rmb();

3416

-+	 *   smp_mb__after_spinlock();

3417

-+	 *   UNLOCK rq->lock

3418

-+	 *

3419

-+	 * [task p]

3420

-+	 *   STORE p->state = UNINTERRUPTIBLE	  LOAD p->on_rq

3421

-+	 *

3422

-+	 * Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in

3423

-+	 * __schedule().  See the comment for smp_mb__after_spinlock().

3424

-+	 *

3425

-+	 * A similar smb_rmb() lives in try_invoke_on_locked_down_task().

3426

-+	 */

3427

-+	smp_rmb();

3428

-+	if (READ_ONCE(p->on_rq) && ttwu_runnable(p, wake_flags))

3429

-+		goto unlock;

3430

-+

3431

-+#ifdef CONFIG_SMP

3432

-+	/*

3433

-+	 * Ensure we load p->on_cpu _after_ p->on_rq, otherwise it would be

3434

-+	 * possible to, falsely, observe p->on_cpu == 0.

3435

-+	 *

3436

-+	 * One must be running (->on_cpu == 1) in order to remove oneself

3437

-+	 * from the runqueue.

3438

-+	 *

3439

-+	 * __schedule() (switch to task 'p')	try_to_wake_up()

3440

-+	 *   STORE p->on_cpu = 1		  LOAD p->on_rq

3441

-+	 *   UNLOCK rq->lock

3442

-+	 *

3443

-+	 * __schedule() (put 'p' to sleep)

3444

-+	 *   LOCK rq->lock			  smp_rmb();

3445

-+	 *   smp_mb__after_spinlock();

3446

-+	 *   STORE p->on_rq = 0			  LOAD p->on_cpu

3447

-+	 *

3448

-+	 * Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in

3449

-+	 * __schedule().  See the comment for smp_mb__after_spinlock().

3450

-+	 *

3451

-+	 * Form a control-dep-acquire with p->on_rq == 0 above, to ensure

3452

-+	 * schedule()'s deactivate_task() has 'happened' and p will no longer

3453

-+	 * care about it's own p->state. See the comment in __schedule().

3454

-+	 */

3455

-+	smp_acquire__after_ctrl_dep();

3456

-+

3457

-+	/*

3458

-+	 * We're doing the wakeup (@success == 1), they did a dequeue (p->on_rq

3459

-+	 * == 0), which means we need to do an enqueue, change p->state to

3460

-+	 * TASK_WAKING such that we can unlock p->pi_lock before doing the

3461

-+	 * enqueue, such as ttwu_queue_wakelist().

3462

-+	 */

3463

-+	WRITE_ONCE(p->__state, TASK_WAKING);

3464

-+

3465

-+	/*

3466

-+	 * If the owning (remote) CPU is still in the middle of schedule() with

3467

-+	 * this task as prev, considering queueing p on the remote CPUs wake_list

3468

-+	 * which potentially sends an IPI instead of spinning on p->on_cpu to

3469

-+	 * let the waker make forward progress. This is safe because IRQs are

3470

-+	 * disabled and the IPI will deliver after on_cpu is cleared.

3471

-+	 *

3472

-+	 * Ensure we load task_cpu(p) after p->on_cpu:

3473

-+	 *

3474

-+	 * set_task_cpu(p, cpu);

3475

-+	 *   STORE p->cpu = @cpu

3476

-+	 * __schedule() (switch to task 'p')

3477

-+	 *   LOCK rq->lock

3478

-+	 *   smp_mb__after_spin_lock()          smp_cond_load_acquire(&p->on_cpu)

3479

-+	 *   STORE p->on_cpu = 1                LOAD p->cpu

3480

-+	 *

3481

-+	 * to ensure we observe the correct CPU on which the task is currently

3482

-+	 * scheduling.

3483

-+	 */

3484

-+	if (smp_load_acquire(&p->on_cpu) &&

3485

-+	    ttwu_queue_wakelist(p, task_cpu(p), wake_flags | WF_ON_CPU))

3486

-+		goto unlock;

3487

-+

3488

-+	/*

3489

-+	 * If the owning (remote) CPU is still in the middle of schedule() with

3490

-+	 * this task as prev, wait until it's done referencing the task.

3491

-+	 *

3492

-+	 * Pairs with the smp_store_release() in finish_task().

3493

-+	 *

3494

-+	 * This ensures that tasks getting woken will be fully ordered against

3495

-+	 * their previous state and preserve Program Order.

3496

-+	 */

3497

-+	smp_cond_load_acquire(&p->on_cpu, !VAL);

3498

-+

3499

-+	sched_task_ttwu(p);

3500

-+

3501

-+	cpu = select_task_rq(p);

3502

-+

3503

-+	if (cpu != task_cpu(p)) {

3504

-+		if (p->in_iowait) {

3505

-+			delayacct_blkio_end(p);

3506

-+			atomic_dec(&task_rq(p)->nr_iowait);

3507

-+		}

3508

-+

3509

-+		wake_flags |= WF_MIGRATED;

3510

-+		psi_ttwu_dequeue(p);

3511

-+		set_task_cpu(p, cpu);

3512

-+	}

3513

-+#else

3514

-+	cpu = task_cpu(p);

3515

-+#endif /* CONFIG_SMP */

3516

-+

3517

-+	ttwu_queue(p, cpu, wake_flags);

3518

-+unlock:

3519

-+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);

3520

-+out:

3521

-+	if (success)

3522

-+		ttwu_stat(p, task_cpu(p), wake_flags);

3523

-+	preempt_enable();

3524

-+

3525

-+	return success;

3526

-+}

3527

-+

3528

-+/**

3529

-+ * try_invoke_on_locked_down_task - Invoke a function on task in fixed state

3530

-+ * @p: Process for which the function is to be invoked, can be @current.

3531

-+ * @func: Function to invoke.

3532

-+ * @arg: Argument to function.

3533

-+ *

3534

-+ * If the specified task can be quickly locked into a definite state

3535

-+ * (either sleeping or on a given runqueue), arrange to keep it in that

3536

-+ * state while invoking @func(@arg).  This function can use ->on_rq and

3537

-+ * task_curr() to work out what the state is, if required.  Given that

3538

-+ * @func can be invoked with a runqueue lock held, it had better be quite

3539

-+ * lightweight.

3540

-+ *

3541

-+ * Returns:

3542

-+ *	@false if the task slipped out from under the locks.

3543

-+ *	@true if the task was locked onto a runqueue or is sleeping.

3544

-+ *		However, @func can override this by returning @false.

3545

-+ */

3546

-+bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct task_struct *t, void *arg), void *arg)

3547

-+{

3548

-+	struct rq_flags rf;

3549

-+	bool ret = false;

3550

-+	struct rq *rq;

3551

-+

3552

-+	raw_spin_lock_irqsave(&p->pi_lock, rf.flags);

3553

-+	if (p->on_rq) {

3554

-+		rq = __task_rq_lock(p, &rf);

3555

-+		if (task_rq(p) == rq)

3556

-+			ret = func(p, arg);

3557

-+		__task_rq_unlock(rq, &rf);

3558

-+	} else {

3559

-+		switch (READ_ONCE(p->__state)) {

3560

-+		case TASK_RUNNING:

3561

-+		case TASK_WAKING:

3562

-+			break;

3563

-+		default:

3564

-+			smp_rmb(); // See smp_rmb() comment in try_to_wake_up().

3565

-+			if (!p->on_rq)

3566

-+				ret = func(p, arg);

3567

-+		}

3568

-+	}

3569

-+	raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags);

3570

-+	return ret;

3571

-+}

3572

-+

3573

-+/**

3574

-+ * wake_up_process - Wake up a specific process

3575

-+ * @p: The process to be woken up.

3576

-+ *

3577

-+ * Attempt to wake up the nominated process and move it to the set of runnable

3578

-+ * processes.

3579

-+ *

3580

-+ * Return: 1 if the process was woken up, 0 if it was already running.

3581

-+ *

3582

-+ * This function executes a full memory barrier before accessing the task state.

3583

-+ */

3584

-+int wake_up_process(struct task_struct *p)

3585

-+{

3586

-+	return try_to_wake_up(p, TASK_NORMAL, 0);

3587

-+}

3588

-+EXPORT_SYMBOL(wake_up_process);

3589

-+

3590

-+int wake_up_state(struct task_struct *p, unsigned int state)

3591

-+{

3592

-+	return try_to_wake_up(p, state, 0);

3593

-+}

3594

-+

3595

-+/*

3596

-+ * Perform scheduler related setup for a newly forked process p.

3597

-+ * p is forked by current.

3598

-+ *

3599

-+ * __sched_fork() is basic setup used by init_idle() too:

3600

-+ */

3601

-+static inline void __sched_fork(unsigned long clone_flags, struct task_struct *p)

3602

-+{

3603

-+	p->on_rq			= 0;

3604

-+	p->on_cpu			= 0;

3605

-+	p->utime			= 0;

3606

-+	p->stime			= 0;

3607

-+	p->sched_time			= 0;

3608

-+

3609

-+#ifdef CONFIG_PREEMPT_NOTIFIERS

3610

-+	INIT_HLIST_HEAD(&p->preempt_notifiers);

3611

-+#endif

3612

-+

3613

-+#ifdef CONFIG_COMPACTION

3614

-+	p->capture_control = NULL;

3615

-+#endif

3616

-+#ifdef CONFIG_SMP

3617

-+	p->wake_entry.u_flags = CSD_TYPE_TTWU;

3618

-+#endif

3619

-+}

3620

-+

3621

-+/*

3622

-+ * fork()/clone()-time setup:

3623

-+ */

3624

-+int sched_fork(unsigned long clone_flags, struct task_struct *p)

3625

-+{

3626

-+	__sched_fork(clone_flags, p);

3627

-+	/*

3628

-+	 * We mark the process as NEW here. This guarantees that

3629

-+	 * nobody will actually run it, and a signal or other external

3630

-+	 * event cannot wake it up and insert it on the runqueue either.

3631

-+	 */

3632

-+	p->__state = TASK_NEW;

3633

-+

3634

-+	/*

3635

-+	 * Make sure we do not leak PI boosting priority to the child.

3636

-+	 */

3637

-+	p->prio = current->normal_prio;

3638

-+

3639

-+	/*

3640

-+	 * Revert to default priority/policy on fork if requested.

3641

-+	 */

3642

-+	if (unlikely(p->sched_reset_on_fork)) {

3643

-+		if (task_has_rt_policy(p)) {

3644

-+			p->policy = SCHED_NORMAL;

3645

-+			p->static_prio = NICE_TO_PRIO(0);

3646

-+			p->rt_priority = 0;

3647

-+		} else if (PRIO_TO_NICE(p->static_prio) < 0)

3648

-+			p->static_prio = NICE_TO_PRIO(0);

3649

-+

3650

-+		p->prio = p->normal_prio = p->static_prio;

3651

-+

3652

-+		/*

3653

-+		 * We don't need the reset flag anymore after the fork. It has

3654

-+		 * fulfilled its duty:

3655

-+		 */

3656

-+		p->sched_reset_on_fork = 0;

3657

-+	}

3658

-+

3659

-+#ifdef CONFIG_SCHED_INFO

3660

-+	if (unlikely(sched_info_on()))

3661

-+		memset(&p->sched_info, 0, sizeof(p->sched_info));

3662

-+#endif

3663

-+	init_task_preempt_count(p);

3664

-+

3665

-+	return 0;

3666

-+}

3667

-+

3668

-+void sched_post_fork(struct task_struct *p, struct kernel_clone_args *kargs)

3669

-+{

3670

-+	unsigned long flags;

3671

-+	struct rq *rq;

3672

-+

3673

-+	/*

3674

-+	 * The child is not yet in the pid-hash so no cgroup attach races,

3675

-+	 * and the cgroup is pinned to this child due to cgroup_fork()

3676

-+	 * is ran before sched_fork().

3677

-+	 *

3678

-+	 * Silence PROVE_RCU.

3679

-+	 */

3680

-+	raw_spin_lock_irqsave(&p->pi_lock, flags);

3681

-+	/*

3682

-+	 * Share the timeslice between parent and child, thus the

3683

-+	 * total amount of pending timeslices in the system doesn't change,

3684

-+	 * resulting in more scheduling fairness.

3685

-+	 */

3686

-+	rq = this_rq();

3687

-+	raw_spin_lock(&rq->lock);

3688

-+

3689

-+	rq->curr->time_slice /= 2;

3690

-+	p->time_slice = rq->curr->time_slice;

3691

-+#ifdef CONFIG_SCHED_HRTICK

3692

-+	hrtick_start(rq, rq->curr->time_slice);

3693

-+#endif

3694

-+

3695

-+	if (p->time_slice < RESCHED_NS) {

3696

-+		p->time_slice = sched_timeslice_ns;

3697

-+		resched_curr(rq);

3698

-+	}

3699

-+	sched_task_fork(p, rq);

3700

-+	raw_spin_unlock(&rq->lock);

3701

-+

3702

-+	rseq_migrate(p);

3703

-+	/*

3704

-+	 * We're setting the CPU for the first time, we don't migrate,

3705

-+	 * so use __set_task_cpu().

3706

-+	 */

3707

-+	__set_task_cpu(p, smp_processor_id());

3708

-+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);

3709

-+}

3710

-+

3711

-+#ifdef CONFIG_SCHEDSTATS

3712

-+

3713

-+DEFINE_STATIC_KEY_FALSE(sched_schedstats);

3714

-+

3715

-+static void set_schedstats(bool enabled)

3716

-+{

3717

-+	if (enabled)

3718

-+		static_branch_enable(&sched_schedstats);

3719

-+	else

3720

-+		static_branch_disable(&sched_schedstats);

3721

-+}

3722

-+

3723

-+void force_schedstat_enabled(void)

3724

-+{

3725

-+	if (!schedstat_enabled()) {

3726

-+		pr_info("kernel profiling enabled schedstats, disable via kernel.sched_schedstats.\n");

3727

-+		static_branch_enable(&sched_schedstats);

3728

-+	}

3729

-+}

3730

-+

3731

-+static int __init setup_schedstats(char *str)

3732

-+{

3733

-+	int ret = 0;

3734

-+	if (!str)

3735

-+		goto out;

3736

-+

3737

-+	if (!strcmp(str, "enable")) {

3738

-+		set_schedstats(true);

3739

-+		ret = 1;

3740

-+	} else if (!strcmp(str, "disable")) {

3741

-+		set_schedstats(false);

3742

-+		ret = 1;

3743

-+	}

3744

-+out:

3745

-+	if (!ret)

3746

-+		pr_warn("Unable to parse schedstats=\n");

3747

-+

3748

-+	return ret;

3749

-+}

3750

-+__setup("schedstats=", setup_schedstats);

3751

-+

3752

-+#ifdef CONFIG_PROC_SYSCTL

3753

-+int sysctl_schedstats(struct ctl_table *table, int write,

3754

-+			 void __user *buffer, size_t *lenp, loff_t *ppos)

3755

-+{

3756

-+	struct ctl_table t;

3757

-+	int err;

3758

-+	int state = static_branch_likely(&sched_schedstats);

3759

-+

3760

-+	if (write && !capable(CAP_SYS_ADMIN))

3761

-+		return -EPERM;

3762

-+

3763

-+	t = *table;

3764

-+	t.data = &state;

3765

-+	err = proc_dointvec_minmax(&t, write, buffer, lenp, ppos);

3766

-+	if (err < 0)

3767

-+		return err;

3768

-+	if (write)

3769

-+		set_schedstats(state);

3770

-+	return err;

3771

-+}

3772

-+#endif /* CONFIG_PROC_SYSCTL */

3773

-+#endif /* CONFIG_SCHEDSTATS */

3774

-+

3775

-+/*

3776

-+ * wake_up_new_task - wake up a newly created task for the first time.

3777

-+ *

3778

-+ * This function will do some initial scheduler statistics housekeeping

3779

-+ * that must be done for every newly created context, then puts the task

3780

-+ * on the runqueue and wakes it.

3781

-+ */

3782

-+void wake_up_new_task(struct task_struct *p)

3783

-+{

3784

-+	unsigned long flags;

3785

-+	struct rq *rq;

3786

-+

3787

-+	raw_spin_lock_irqsave(&p->pi_lock, flags);

3788

-+	WRITE_ONCE(p->__state, TASK_RUNNING);

3789

-+	rq = cpu_rq(select_task_rq(p));

3790

-+#ifdef CONFIG_SMP

3791

-+	rseq_migrate(p);

3792

-+	/*

3793

-+	 * Fork balancing, do it here and not earlier because:

3794

-+	 * - cpus_ptr can change in the fork path

3795

-+	 * - any previously selected CPU might disappear through hotplug

3796

-+	 *

3797

-+	 * Use __set_task_cpu() to avoid calling sched_class::migrate_task_rq,

3798

-+	 * as we're not fully set-up yet.

3799

-+	 */

3800

-+	__set_task_cpu(p, cpu_of(rq));

3801

-+#endif

3802

-+

3803

-+	raw_spin_lock(&rq->lock);

3804

-+	update_rq_clock(rq);

3805

-+

3806

-+	activate_task(p, rq);

3807

-+	trace_sched_wakeup_new(p);

3808

-+	check_preempt_curr(rq);

3809

-+

3810

-+	raw_spin_unlock(&rq->lock);

3811

-+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);

3812

-+}

3813

-+

3814

-+#ifdef CONFIG_PREEMPT_NOTIFIERS

3815

-+

3816

-+static DEFINE_STATIC_KEY_FALSE(preempt_notifier_key);

3817

-+

3818

-+void preempt_notifier_inc(void)

3819

-+{

3820

-+	static_branch_inc(&preempt_notifier_key);

3821

-+}

3822

-+EXPORT_SYMBOL_GPL(preempt_notifier_inc);

3823

-+

3824

-+void preempt_notifier_dec(void)

3825

-+{

3826

-+	static_branch_dec(&preempt_notifier_key);

3827

-+}

3828

-+EXPORT_SYMBOL_GPL(preempt_notifier_dec);

3829

-+

3830

-+/**

3831

-+ * preempt_notifier_register - tell me when current is being preempted & rescheduled

3832

-+ * @notifier: notifier struct to register

3833

-+ */

3834

-+void preempt_notifier_register(struct preempt_notifier *notifier)

3835

-+{

3836

-+	if (!static_branch_unlikely(&preempt_notifier_key))

3837

-+		WARN(1, "registering preempt_notifier while notifiers disabled\n");

3838

-+

3839

-+	hlist_add_head(&notifier->link, &current->preempt_notifiers);

3840

-+}

3841

-+EXPORT_SYMBOL_GPL(preempt_notifier_register);

3842

-+

3843

-+/**

3844

-+ * preempt_notifier_unregister - no longer interested in preemption notifications

3845

-+ * @notifier: notifier struct to unregister

3846

-+ *

3847

-+ * This is *not* safe to call from within a preemption notifier.

3848

-+ */

3849

-+void preempt_notifier_unregister(struct preempt_notifier *notifier)

3850

-+{

3851

-+	hlist_del(&notifier->link);

3852

-+}

3853

-+EXPORT_SYMBOL_GPL(preempt_notifier_unregister);

3854

-+

3855

-+static void __fire_sched_in_preempt_notifiers(struct task_struct *curr)

3856

-+{

3857

-+	struct preempt_notifier *notifier;

3858

-+

3859

-+	hlist_for_each_entry(notifier, &curr->preempt_notifiers, link)

3860

-+		notifier->ops->sched_in(notifier, raw_smp_processor_id());

3861

-+}

3862

-+

3863

-+static __always_inline void fire_sched_in_preempt_notifiers(struct task_struct *curr)

3864

-+{

3865

-+	if (static_branch_unlikely(&preempt_notifier_key))

3866

-+		__fire_sched_in_preempt_notifiers(curr);

3867

-+}

3868

-+

3869

-+static void

3870

-+__fire_sched_out_preempt_notifiers(struct task_struct *curr,

3871

-+				   struct task_struct *next)

3872

-+{

3873

-+	struct preempt_notifier *notifier;

3874

-+

3875

-+	hlist_for_each_entry(notifier, &curr->preempt_notifiers, link)

3876

-+		notifier->ops->sched_out(notifier, next);

3877

-+}

3878

-+

3879

-+static __always_inline void

3880

-+fire_sched_out_preempt_notifiers(struct task_struct *curr,

3881

-+				 struct task_struct *next)

3882

-+{

3883

-+	if (static_branch_unlikely(&preempt_notifier_key))

3884

-+		__fire_sched_out_preempt_notifiers(curr, next);

3885

-+}

3886

-+

3887

-+#else /* !CONFIG_PREEMPT_NOTIFIERS */

3888

-+

3889

-+static inline void fire_sched_in_preempt_notifiers(struct task_struct *curr)

3890

-+{

3891

-+}

3892

-+

3893

-+static inline void

3894

-+fire_sched_out_preempt_notifiers(struct task_struct *curr,

3895

-+				 struct task_struct *next)

3896

-+{

3897

-+}

3898

-+

3899

-+#endif /* CONFIG_PREEMPT_NOTIFIERS */

3900

-+

3901

-+static inline void prepare_task(struct task_struct *next)

3902

-+{

3903

-+	/*

3904

-+	 * Claim the task as running, we do this before switching to it

3905

-+	 * such that any running task will have this set.

3906

-+	 *

3907

-+	 * See the ttwu() WF_ON_CPU case and its ordering comment.

3908

-+	 */

3909

-+	WRITE_ONCE(next->on_cpu, 1);

3910

-+}

3911

-+

3912

-+static inline void finish_task(struct task_struct *prev)

3913

-+{

3914

-+#ifdef CONFIG_SMP

3915

-+	/*

3916

-+	 * This must be the very last reference to @prev from this CPU. After

3917

-+	 * p->on_cpu is cleared, the task can be moved to a different CPU. We

3918

-+	 * must ensure this doesn't happen until the switch is completely

3919

-+	 * finished.

3920

-+	 *

3921

-+	 * In particular, the load of prev->state in finish_task_switch() must

3922

-+	 * happen before this.

3923

-+	 *

3924

-+	 * Pairs with the smp_cond_load_acquire() in try_to_wake_up().

3925

-+	 */

3926

-+	smp_store_release(&prev->on_cpu, 0);

3927

-+#else

3928

-+	prev->on_cpu = 0;

3929

-+#endif

3930

-+}

3931

-+

3932

-+#ifdef CONFIG_SMP

3933

-+

3934

-+static void do_balance_callbacks(struct rq *rq, struct callback_head *head)

3935

-+{

3936

-+	void (*func)(struct rq *rq);

3937

-+	struct callback_head *next;

3938

-+

3939

-+	lockdep_assert_held(&rq->lock);

3940

-+

3941

-+	while (head) {

3942

-+		func = (void (*)(struct rq *))head->func;

3943

-+		next = head->next;

3944

-+		head->next = NULL;

3945

-+		head = next;

3946

-+

3947

-+		func(rq);

3948

-+	}

3949

-+}

3950

-+

3951

-+static void balance_push(struct rq *rq);

3952

-+

3953

-+struct callback_head balance_push_callback = {

3954

-+	.next = NULL,

3955

-+	.func = (void (*)(struct callback_head *))balance_push,

3956

-+};

3957

-+

3958

-+static inline struct callback_head *splice_balance_callbacks(struct rq *rq)

3959

-+{

3960

-+	struct callback_head *head = rq->balance_callback;

3961

-+

3962

-+	if (head) {

3963

-+		lockdep_assert_held(&rq->lock);

3964

-+		rq->balance_callback = NULL;

3965

-+	}

3966

-+

3967

-+	return head;

3968

-+}

3969

-+

3970

-+static void __balance_callbacks(struct rq *rq)

3971

-+{

3972

-+	do_balance_callbacks(rq, splice_balance_callbacks(rq));

3973

-+}

3974

-+

3975

-+static inline void balance_callbacks(struct rq *rq, struct callback_head *head)

3976

-+{

3977

-+	unsigned long flags;

3978

-+

3979

-+	if (unlikely(head)) {

3980

-+		raw_spin_lock_irqsave(&rq->lock, flags);

3981

-+		do_balance_callbacks(rq, head);

3982

-+		raw_spin_unlock_irqrestore(&rq->lock, flags);

3983

-+	}

3984

-+}

3985

-+

3986

-+#else

3987

-+

3988

-+static inline void __balance_callbacks(struct rq *rq)

3989

-+{

3990

-+}

3991

-+

3992

-+static inline struct callback_head *splice_balance_callbacks(struct rq *rq)

3993

-+{

3994

-+	return NULL;

3995

-+}

3996

-+

3997

-+static inline void balance_callbacks(struct rq *rq, struct callback_head *head)

3998

-+{

3999

-+}

4000

-+

4001

-+#endif

4002

-+

4003

-+static inline void

4004

-+prepare_lock_switch(struct rq *rq, struct task_struct *next)

4005

-+{

4006

-+	/*

4007

-+	 * Since the runqueue lock will be released by the next

4008

-+	 * task (which is an invalid locking op but in the case

4009

-+	 * of the scheduler it's an obvious special-case), so we

4010

-+	 * do an early lockdep release here:

4011

-+	 */

4012

-+	spin_release(&rq->lock.dep_map, _THIS_IP_);

4013

-+#ifdef CONFIG_DEBUG_SPINLOCK

4014

-+	/* this is a valid case when another task releases the spinlock */

4015

-+	rq->lock.owner = next;

4016

-+#endif

4017

-+}

4018

-+

4019

-+static inline void finish_lock_switch(struct rq *rq)

4020

-+{

4021

-+	/*

4022

-+	 * If we are tracking spinlock dependencies then we have to

4023

-+	 * fix up the runqueue lock - which gets 'carried over' from

4024

-+	 * prev into current:

4025

-+	 */

4026

-+	spin_acquire(&rq->lock.dep_map, 0, 0, _THIS_IP_);

4027

-+	__balance_callbacks(rq);

4028

-+	raw_spin_unlock_irq(&rq->lock);

4029

-+}

4030

-+

4031

-+/*

4032

-+ * NOP if the arch has not defined these:

4033

-+ */

4034

-+

4035

-+#ifndef prepare_arch_switch

4036

-+# define prepare_arch_switch(next)	do { } while (0)

4037

-+#endif

4038

-+

4039

-+#ifndef finish_arch_post_lock_switch

4040

-+# define finish_arch_post_lock_switch()	do { } while (0)

4041

-+#endif

4042

-+

4043

-+static inline void kmap_local_sched_out(void)

4044

-+{

4045

-+#ifdef CONFIG_KMAP_LOCAL

4046

-+	if (unlikely(current->kmap_ctrl.idx))

4047

-+		__kmap_local_sched_out();

4048

-+#endif

4049

-+}

4050

-+

4051

-+static inline void kmap_local_sched_in(void)

4052

-+{

4053

-+#ifdef CONFIG_KMAP_LOCAL

4054

-+	if (unlikely(current->kmap_ctrl.idx))

4055

-+		__kmap_local_sched_in();

4056

-+#endif

4057

-+}

4058

-+

4059

-+/**

4060

-+ * prepare_task_switch - prepare to switch tasks

4061

-+ * @rq: the runqueue preparing to switch

4062

-+ * @next: the task we are going to switch to.

4063

-+ *

4064

-+ * This is called with the rq lock held and interrupts off. It must

4065

-+ * be paired with a subsequent finish_task_switch after the context

4066

-+ * switch.

4067

-+ *

4068

-+ * prepare_task_switch sets up locking and calls architecture specific

4069

-+ * hooks.

4070

-+ */

4071

-+static inline void

4072

-+prepare_task_switch(struct rq *rq, struct task_struct *prev,

4073

-+		    struct task_struct *next)

4074

-+{

4075

-+	kcov_prepare_switch(prev);

4076

-+	sched_info_switch(rq, prev, next);

4077

-+	perf_event_task_sched_out(prev, next);

4078

-+	rseq_preempt(prev);

4079

-+	fire_sched_out_preempt_notifiers(prev, next);

4080

-+	kmap_local_sched_out();

4081

-+	prepare_task(next);

4082

-+	prepare_arch_switch(next);

4083

-+}

4084

-+

4085

-+/**

4086

-+ * finish_task_switch - clean up after a task-switch

4087

-+ * @rq: runqueue associated with task-switch

4088

-+ * @prev: the thread we just switched away from.

4089

-+ *

4090

-+ * finish_task_switch must be called after the context switch, paired

4091

-+ * with a prepare_task_switch call before the context switch.

4092

-+ * finish_task_switch will reconcile locking set up by prepare_task_switch,

4093

-+ * and do any other architecture-specific cleanup actions.

4094

-+ *

4095

-+ * Note that we may have delayed dropping an mm in context_switch(). If

4096

-+ * so, we finish that here outside of the runqueue lock.  (Doing it

4097

-+ * with the lock held can cause deadlocks; see schedule() for

4098

-+ * details.)

4099

-+ *

4100

-+ * The context switch have flipped the stack from under us and restored the

4101

-+ * local variables which were saved when this task called schedule() in the

4102

-+ * past. prev == current is still correct but we need to recalculate this_rq

4103

-+ * because prev may have moved to another CPU.

4104

-+ */

4105

-+static struct rq *finish_task_switch(struct task_struct *prev)

4106

-+	__releases(rq->lock)

4107

-+{

4108

-+	struct rq *rq = this_rq();

4109

-+	struct mm_struct *mm = rq->prev_mm;

4110

-+	long prev_state;

4111

-+

4112

-+	/*

4113

-+	 * The previous task will have left us with a preempt_count of 2

4114

-+	 * because it left us after:

4115

-+	 *

4116

-+	 *	schedule()

4117

-+	 *	  preempt_disable();			// 1

4118

-+	 *	  __schedule()

4119

-+	 *	    raw_spin_lock_irq(&rq->lock)	// 2

4120

-+	 *

4121

-+	 * Also, see FORK_PREEMPT_COUNT.

4122

-+	 */

4123

-+	if (WARN_ONCE(preempt_count() != 2*PREEMPT_DISABLE_OFFSET,

4124

-+		      "corrupted preempt_count: %s/%d/0x%x\n",

4125

-+		      current->comm, current->pid, preempt_count()))

4126

-+		preempt_count_set(FORK_PREEMPT_COUNT);

4127

-+

4128

-+	rq->prev_mm = NULL;

4129

-+

4130

-+	/*

4131

-+	 * A task struct has one reference for the use as "current".

4132

-+	 * If a task dies, then it sets TASK_DEAD in tsk->state and calls

4133

-+	 * schedule one last time. The schedule call will never return, and

4134

-+	 * the scheduled task must drop that reference.

4135

-+	 *

4136

-+	 * We must observe prev->state before clearing prev->on_cpu (in

4137

-+	 * finish_task), otherwise a concurrent wakeup can get prev

4138

-+	 * running on another CPU and we could rave with its RUNNING -> DEAD

4139

-+	 * transition, resulting in a double drop.

4140

-+	 */

4141

-+	prev_state = READ_ONCE(prev->__state);

4142

-+	vtime_task_switch(prev);

4143

-+	perf_event_task_sched_in(prev, current);

4144

-+	finish_task(prev);

4145

-+	tick_nohz_task_switch();

4146

-+	finish_lock_switch(rq);

4147

-+	finish_arch_post_lock_switch();

4148

-+	kcov_finish_switch(current);

4149

-+	/*

4150

-+	 * kmap_local_sched_out() is invoked with rq::lock held and

4151

-+	 * interrupts disabled. There is no requirement for that, but the

4152

-+	 * sched out code does not have an interrupt enabled section.

4153

-+	 * Restoring the maps on sched in does not require interrupts being

4154

-+	 * disabled either.

4155

-+	 */

4156

-+	kmap_local_sched_in();

4157

-+

4158

-+	fire_sched_in_preempt_notifiers(current);

4159

-+	/*

4160

-+	 * When switching through a kernel thread, the loop in

4161

-+	 * membarrier_{private,global}_expedited() may have observed that

4162

-+	 * kernel thread and not issued an IPI. It is therefore possible to

4163

-+	 * schedule between user->kernel->user threads without passing though

4164

-+	 * switch_mm(). Membarrier requires a barrier after storing to

4165

-+	 * rq->curr, before returning to userspace, so provide them here:

4166

-+	 *

4167

-+	 * - a full memory barrier for {PRIVATE,GLOBAL}_EXPEDITED, implicitly

4168

-+	 *   provided by mmdrop(),

4169

-+	 * - a sync_core for SYNC_CORE.

4170

-+	 */

4171

-+	if (mm) {

4172

-+		membarrier_mm_sync_core_before_usermode(mm);

4173

-+		mmdrop(mm);

4174

-+	}

4175

-+	if (unlikely(prev_state == TASK_DEAD)) {

4176

-+		/*

4177

-+		 * Remove function-return probe instances associated with this

4178

-+		 * task and put them back on the free list.

4179

-+		 */

4180

-+		kprobe_flush_task(prev);

4181

-+

4182

-+		/* Task is done with its stack. */

4183

-+		put_task_stack(prev);

4184

-+

4185

-+		put_task_struct_rcu_user(prev);

4186

-+	}

4187

-+

4188

-+	return rq;

4189

-+}

4190

-+

4191

-+/**

4192

-+ * schedule_tail - first thing a freshly forked thread must call.

4193

-+ * @prev: the thread we just switched away from.

4194

-+ */

4195

-+asmlinkage __visible void schedule_tail(struct task_struct *prev)

4196

-+	__releases(rq->lock)

4197

-+{

4198

-+	/*

4199

-+	 * New tasks start with FORK_PREEMPT_COUNT, see there and

4200

-+	 * finish_task_switch() for details.

4201

-+	 *

4202

-+	 * finish_task_switch() will drop rq->lock() and lower preempt_count

4203

-+	 * and the preempt_enable() will end up enabling preemption (on

4204

-+	 * PREEMPT_COUNT kernels).

4205

-+	 */

4206

-+

4207

-+	finish_task_switch(prev);

4208

-+	preempt_enable();

4209

-+

4210

-+	if (current->set_child_tid)

4211

-+		put_user(task_pid_vnr(current), current->set_child_tid);

4212

-+

4213

-+	calculate_sigpending();

4214

-+}

4215

-+

4216

-+/*

4217

-+ * context_switch - switch to the new MM and the new thread's register state.

4218

-+ */

4219

-+static __always_inline struct rq *

4220

-+context_switch(struct rq *rq, struct task_struct *prev,

4221

-+	       struct task_struct *next)

4222

-+{

4223

-+	prepare_task_switch(rq, prev, next);

4224

-+

4225

-+	/*

4226

-+	 * For paravirt, this is coupled with an exit in switch_to to

4227

-+	 * combine the page table reload and the switch backend into

4228

-+	 * one hypercall.

4229

-+	 */

4230

-+	arch_start_context_switch(prev);

4231

-+

4232

-+	/*

4233

-+	 * kernel -> kernel   lazy + transfer active

4234

-+	 *   user -> kernel   lazy + mmgrab() active

4235

-+	 *

4236

-+	 * kernel ->   user   switch + mmdrop() active

4237

-+	 *   user ->   user   switch

4238

-+	 */

4239

-+	if (!next->mm) {                                // to kernel

4240

-+		enter_lazy_tlb(prev->active_mm, next);

4241

-+

4242

-+		next->active_mm = prev->active_mm;

4243

-+		if (prev->mm)                           // from user

4244

-+			mmgrab(prev->active_mm);

4245

-+		else

4246

-+			prev->active_mm = NULL;

4247

-+	} else {                                        // to user

4248

-+		membarrier_switch_mm(rq, prev->active_mm, next->mm);

4249

-+		/*

4250

-+		 * sys_membarrier() requires an smp_mb() between setting

4251

-+		 * rq->curr / membarrier_switch_mm() and returning to userspace.

4252

-+		 *

4253

-+		 * The below provides this either through switch_mm(), or in

4254

-+		 * case 'prev->active_mm == next->mm' through

4255

-+		 * finish_task_switch()'s mmdrop().

4256

-+		 */

4257

-+		switch_mm_irqs_off(prev->active_mm, next->mm, next);

4258

-+

4259

-+		if (!prev->mm) {                        // from kernel

4260

-+			/* will mmdrop() in finish_task_switch(). */

4261

-+			rq->prev_mm = prev->active_mm;

4262

-+			prev->active_mm = NULL;

4263

-+		}

4264

-+	}

4265

-+

4266

-+	prepare_lock_switch(rq, next);

4267

-+

4268

-+	/* Here we just switch the register state and the stack. */

4269

-+	switch_to(prev, next, prev);

4270

-+	barrier();

4271

-+

4272

-+	return finish_task_switch(prev);

4273

-+}

4274

-+

4275

-+/*

4276

-+ * nr_running, nr_uninterruptible and nr_context_switches:

4277

-+ *

4278

-+ * externally visible scheduler statistics: current number of runnable

4279

-+ * threads, total number of context switches performed since bootup.

4280

-+ */

4281

-+unsigned int nr_running(void)

4282

-+{

4283

-+	unsigned int i, sum = 0;

4284

-+

4285

-+	for_each_online_cpu(i)

4286

-+		sum += cpu_rq(i)->nr_running;

4287

-+

4288

-+	return sum;

4289

-+}

4290

-+

4291

-+/*

4292

-+ * Check if only the current task is running on the CPU.

4293

-+ *

4294

-+ * Caution: this function does not check that the caller has disabled

4295

-+ * preemption, thus the result might have a time-of-check-to-time-of-use

4296

-+ * race.  The caller is responsible to use it correctly, for example:

4297

-+ *

4298

-+ * - from a non-preemptible section (of course)

4299

-+ *

4300

-+ * - from a thread that is bound to a single CPU

4301

-+ *

4302

-+ * - in a loop with very short iterations (e.g. a polling loop)

4303

-+ */

4304

-+bool single_task_running(void)

4305

-+{

4306

-+	return raw_rq()->nr_running == 1;

4307

-+}

4308

-+EXPORT_SYMBOL(single_task_running);

4309

-+

4310

-+unsigned long long nr_context_switches(void)

4311

-+{

4312

-+	int i;

4313

-+	unsigned long long sum = 0;

4314

-+

4315

-+	for_each_possible_cpu(i)

4316

-+		sum += cpu_rq(i)->nr_switches;

4317

-+

4318

-+	return sum;

4319

-+}

4320

-+

4321

-+/*

4322

-+ * Consumers of these two interfaces, like for example the cpuidle menu

4323

-+ * governor, are using nonsensical data. Preferring shallow idle state selection

4324

-+ * for a CPU that has IO-wait which might not even end up running the task when

4325

-+ * it does become runnable.

4326

-+ */

4327

-+

4328

-+unsigned int nr_iowait_cpu(int cpu)

4329

-+{

4330

-+	return atomic_read(&cpu_rq(cpu)->nr_iowait);

4331

-+}

4332

-+

4333

-+/*

4334

-+ * IO-wait accounting, and how it's mostly bollocks (on SMP).

4335

-+ *

4336

-+ * The idea behind IO-wait account is to account the idle time that we could

4337

-+ * have spend running if it were not for IO. That is, if we were to improve the

4338

-+ * storage performance, we'd have a proportional reduction in IO-wait time.

4339

-+ *

4340

-+ * This all works nicely on UP, where, when a task blocks on IO, we account

4341

-+ * idle time as IO-wait, because if the storage were faster, it could've been

4342

-+ * running and we'd not be idle.

4343

-+ *

4344

-+ * This has been extended to SMP, by doing the same for each CPU. This however

4345

-+ * is broken.

4346

-+ *

4347

-+ * Imagine for instance the case where two tasks block on one CPU, only the one

4348

-+ * CPU will have IO-wait accounted, while the other has regular idle. Even

4349

-+ * though, if the storage were faster, both could've ran at the same time,

4350

-+ * utilising both CPUs.

4351

-+ *

4352

-+ * This means, that when looking globally, the current IO-wait accounting on

4353

-+ * SMP is a lower bound, by reason of under accounting.

4354

-+ *

4355

-+ * Worse, since the numbers are provided per CPU, they are sometimes

4356

-+ * interpreted per CPU, and that is nonsensical. A blocked task isn't strictly

4357

-+ * associated with any one particular CPU, it can wake to another CPU than it

4358

-+ * blocked on. This means the per CPU IO-wait number is meaningless.

4359

-+ *

4360

-+ * Task CPU affinities can make all that even more 'interesting'.

4361

-+ */

4362

-+

4363

-+unsigned int nr_iowait(void)

4364

-+{

4365

-+	unsigned int i, sum = 0;

4366

-+

4367

-+	for_each_possible_cpu(i)

4368

-+		sum += nr_iowait_cpu(i);

4369

-+

4370

-+	return sum;

4371

-+}

4372

-+

4373

-+#ifdef CONFIG_SMP

4374

-+

4375

-+/*

4376

-+ * sched_exec - execve() is a valuable balancing opportunity, because at

4377

-+ * this point the task has the smallest effective memory and cache

4378

-+ * footprint.

4379

-+ */

4380

-+void sched_exec(void)

4381

-+{

4382

-+	struct task_struct *p = current;

4383

-+	unsigned long flags;

4384

-+	int dest_cpu;

4385

-+

4386

-+	raw_spin_lock_irqsave(&p->pi_lock, flags);

4387

-+	dest_cpu = cpumask_any(p->cpus_ptr);

4388

-+	if (dest_cpu == smp_processor_id())

4389

-+		goto unlock;

4390

-+

4391

-+	if (likely(cpu_active(dest_cpu))) {

4392

-+		struct migration_arg arg = { p, dest_cpu };

4393

-+

4394

-+		raw_spin_unlock_irqrestore(&p->pi_lock, flags);

4395

-+		stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg);

4396

-+		return;

4397

-+	}

4398

-+unlock:

4399

-+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);

4400

-+}

4401

-+

4402

-+#endif

4403

-+

4404

-+DEFINE_PER_CPU(struct kernel_stat, kstat);

4405

-+DEFINE_PER_CPU(struct kernel_cpustat, kernel_cpustat);

4406

-+

4407

-+EXPORT_PER_CPU_SYMBOL(kstat);

4408

-+EXPORT_PER_CPU_SYMBOL(kernel_cpustat);

4409

-+

4410

-+static inline void update_curr(struct rq *rq, struct task_struct *p)

4411

-+{

4412

-+	s64 ns = rq->clock_task - p->last_ran;

4413

-+

4414

-+	p->sched_time += ns;

4415

-+	cgroup_account_cputime(p, ns);

4416

-+	account_group_exec_runtime(p, ns);

4417

-+

4418

-+	p->time_slice -= ns;

4419

-+	p->last_ran = rq->clock_task;

4420

-+}

4421

-+

4422

-+/*

4423

-+ * Return accounted runtime for the task.

4424

-+ * Return separately the current's pending runtime that have not been

4425

-+ * accounted yet.

4426

-+ */

4427

-+unsigned long long task_sched_runtime(struct task_struct *p)

4428

-+{

4429

-+	unsigned long flags;

4430

-+	struct rq *rq;

4431

-+	raw_spinlock_t *lock;

4432

-+	u64 ns;

4433

-+

4434

-+#if defined(CONFIG_64BIT) && defined(CONFIG_SMP)

4435

-+	/*

4436

-+	 * 64-bit doesn't need locks to atomically read a 64-bit value.

4437

-+	 * So we have a optimization chance when the task's delta_exec is 0.

4438

-+	 * Reading ->on_cpu is racy, but this is ok.

4439

-+	 *

4440

-+	 * If we race with it leaving CPU, we'll take a lock. So we're correct.

4441

-+	 * If we race with it entering CPU, unaccounted time is 0. This is

4442

-+	 * indistinguishable from the read occurring a few cycles earlier.

4443

-+	 * If we see ->on_cpu without ->on_rq, the task is leaving, and has

4444

-+	 * been accounted, so we're correct here as well.

4445

-+	 */

4446

-+	if (!p->on_cpu || !task_on_rq_queued(p))

4447

-+		return tsk_seruntime(p);

4448

-+#endif

4449

-+

4450

-+	rq = task_access_lock_irqsave(p, &lock, &flags);

4451

-+	/*

4452

-+	 * Must be ->curr _and_ ->on_rq.  If dequeued, we would

4453

-+	 * project cycles that may never be accounted to this

4454

-+	 * thread, breaking clock_gettime().

4455

-+	 */

4456

-+	if (p == rq->curr && task_on_rq_queued(p)) {

4457

-+		update_rq_clock(rq);

4458

-+		update_curr(rq, p);

4459

-+	}

4460

-+	ns = tsk_seruntime(p);

4461

-+	task_access_unlock_irqrestore(p, lock, &flags);

4462

-+

4463

-+	return ns;

4464

-+}

4465

-+

4466

-+/* This manages tasks that have run out of timeslice during a scheduler_tick */

4467

-+static inline void scheduler_task_tick(struct rq *rq)

4468

-+{

4469

-+	struct task_struct *p = rq->curr;

4470

-+

4471

-+	if (is_idle_task(p))

4472

-+		return;

4473

-+

4474

-+	update_curr(rq, p);

4475

-+	cpufreq_update_util(rq, 0);

4476

-+

4477

-+	/*

4478

-+	 * Tasks have less than RESCHED_NS of time slice left they will be

4479

-+	 * rescheduled.

4480

-+	 */

4481

-+	if (p->time_slice >= RESCHED_NS)

4482

-+		return;

4483

-+	set_tsk_need_resched(p);

4484

-+	set_preempt_need_resched();

4485

-+}

4486

-+

4487

-+#ifdef CONFIG_SCHED_DEBUG

4488

-+static u64 cpu_resched_latency(struct rq *rq)

4489

-+{

4490

-+	int latency_warn_ms = READ_ONCE(sysctl_resched_latency_warn_ms);

4491

-+	u64 resched_latency, now = rq_clock(rq);

4492

-+	static bool warned_once;

4493

-+

4494

-+	if (sysctl_resched_latency_warn_once && warned_once)

4495

-+		return 0;

4496

-+

4497

-+	if (!need_resched() || !latency_warn_ms)

4498

-+		return 0;

4499

-+

4500

-+	if (system_state == SYSTEM_BOOTING)

4501

-+		return 0;

4502

-+

4503

-+	if (!rq->last_seen_need_resched_ns) {

4504

-+		rq->last_seen_need_resched_ns = now;

4505

-+		rq->ticks_without_resched = 0;

4506

-+		return 0;

4507

-+	}

4508

-+

4509

-+	rq->ticks_without_resched++;

4510

-+	resched_latency = now - rq->last_seen_need_resched_ns;

4511

-+	if (resched_latency <= latency_warn_ms * NSEC_PER_MSEC)

4512

-+		return 0;

4513

-+

4514

-+	warned_once = true;

4515

-+

4516

-+	return resched_latency;

4517

-+}

4518

-+

4519

-+static int __init setup_resched_latency_warn_ms(char *str)

4520

-+{

4521

-+	long val;

4522

-+

4523

-+	if ((kstrtol(str, 0, &val))) {

4524

-+		pr_warn("Unable to set resched_latency_warn_ms\n");

4525

-+		return 1;

4526

-+	}

4527

-+

4528

-+	sysctl_resched_latency_warn_ms = val;

4529

-+	return 1;

4530

-+}

4531

-+__setup("resched_latency_warn_ms=", setup_resched_latency_warn_ms);

4532

-+#else

4533

-+static inline u64 cpu_resched_latency(struct rq *rq) { return 0; }

4534

-+#endif /* CONFIG_SCHED_DEBUG */

4535

-+

4536

-+/*

4537

-+ * This function gets called by the timer code, with HZ frequency.

4538

-+ * We call it with interrupts disabled.

4539

-+ */

4540

-+void scheduler_tick(void)

4541

-+{

4542

-+	int cpu __maybe_unused = smp_processor_id();

4543

-+	struct rq *rq = cpu_rq(cpu);

4544

-+	u64 resched_latency;

4545

-+

4546

-+	arch_scale_freq_tick();

4547

-+	sched_clock_tick();

4548

-+

4549

-+	raw_spin_lock(&rq->lock);

4550

-+	update_rq_clock(rq);

4551

-+

4552

-+	scheduler_task_tick(rq);

4553

-+	if (sched_feat(LATENCY_WARN))

4554

-+		resched_latency = cpu_resched_latency(rq);

4555

-+	calc_global_load_tick(rq);

4556

-+

4557

-+	rq->last_tick = rq->clock;

4558

-+	raw_spin_unlock(&rq->lock);

4559

-+

4560

-+	if (sched_feat(LATENCY_WARN) && resched_latency)

4561

-+		resched_latency_warn(cpu, resched_latency);

4562

-+

4563

-+	perf_event_task_tick();

4564

-+}

4565

-+

4566

-+#ifdef CONFIG_SCHED_SMT

4567

-+static inline int active_load_balance_cpu_stop(void *data)

4568

-+{

4569

-+	struct rq *rq = this_rq();

4570

-+	struct task_struct *p = data;

4571

-+	cpumask_t tmp;

4572

-+	unsigned long flags;

4573

-+

4574

-+	local_irq_save(flags);

4575

-+

4576

-+	raw_spin_lock(&p->pi_lock);

4577

-+	raw_spin_lock(&rq->lock);

4578

-+

4579

-+	rq->active_balance = 0;

4580

-+	/* _something_ may have changed the task, double check again */

4581

-+	if (task_on_rq_queued(p) && task_rq(p) == rq &&

4582

-+	    cpumask_and(&tmp, p->cpus_ptr, &sched_sg_idle_mask) &&

4583

-+	    !is_migration_disabled(p)) {

4584

-+		int cpu = cpu_of(rq);

4585

-+		int dcpu = __best_mask_cpu(&tmp, per_cpu(sched_cpu_llc_mask, cpu));

4586

-+		rq = move_queued_task(rq, p, dcpu);

4587

-+	}

4588

-+

4589

-+	raw_spin_unlock(&rq->lock);

4590

-+	raw_spin_unlock(&p->pi_lock);

4591

-+

4592

-+	local_irq_restore(flags);

4593

-+

4594

-+	return 0;

4595

-+}

4596

-+

4597

-+/* sg_balance_trigger - trigger slibing group balance for @cpu */

4598

-+static inline int sg_balance_trigger(const int cpu)

4599

-+{

4600

-+	struct rq *rq= cpu_rq(cpu);

4601

-+	unsigned long flags;

4602

-+	struct task_struct *curr;

4603

-+	int res;

4604

-+

4605

-+	if (!raw_spin_trylock_irqsave(&rq->lock, flags))

4606

-+		return 0;

4607

-+	curr = rq->curr;

4608

-+	res = (!is_idle_task(curr)) && (1 == rq->nr_running) &&\

4609

-+	      cpumask_intersects(curr->cpus_ptr, &sched_sg_idle_mask) &&\

4610

-+	      !is_migration_disabled(curr) && (!rq->active_balance);

4611

-+

4612

-+	if (res)

4613

-+		rq->active_balance = 1;

4614

-+

4615

-+	raw_spin_unlock_irqrestore(&rq->lock, flags);

4616

-+

4617

-+	if (res)

4618

-+		stop_one_cpu_nowait(cpu, active_load_balance_cpu_stop,

4619

-+				    curr, &rq->active_balance_work);

4620

-+	return res;

4621

-+}

4622

-+

4623

-+/*

4624

-+ * sg_balance_check - slibing group balance check for run queue @rq

4625

-+ */

4626

-+static inline void sg_balance_check(struct rq *rq)

4627

-+{

4628

-+	cpumask_t chk;

4629

-+	int cpu = cpu_of(rq);

4630

-+

4631

-+	/* exit when cpu is offline */

4632

-+	if (unlikely(!rq->online))

4633

-+		return;

4634

-+

4635

-+	/*

4636

-+	 * Only cpu in slibing idle group will do the checking and then

4637

-+	 * find potential cpus which can migrate the current running task

4638

-+	 */

4639

-+	if (cpumask_test_cpu(cpu, &sched_sg_idle_mask) &&

4640

-+	    cpumask_andnot(&chk, cpu_online_mask, sched_rq_watermark) &&

4641

-+	    cpumask_andnot(&chk, &chk, &sched_rq_pending_mask)) {

4642

-+		int i;

4643

-+

4644

-+		for_each_cpu_wrap(i, &chk, cpu) {

4645

-+			if (cpumask_subset(cpu_smt_mask(i), &chk) &&

4646

-+			    sg_balance_trigger(i))

4647

-+				return;

4648

-+		}

4649

-+	}

4650

-+}

4651

-+#endif /* CONFIG_SCHED_SMT */

4652

-+

4653

-+#ifdef CONFIG_NO_HZ_FULL

4654

-+

4655

-+struct tick_work {

4656

-+	int			cpu;

4657

-+	atomic_t		state;

4658

-+	struct delayed_work	work;

4659

-+};

4660

-+/* Values for ->state, see diagram below. */

4661

-+#define TICK_SCHED_REMOTE_OFFLINE	0

4662

-+#define TICK_SCHED_REMOTE_OFFLINING	1

4663

-+#define TICK_SCHED_REMOTE_RUNNING	2

4664

-+

4665

-+/*

4666

-+ * State diagram for ->state:

4667

-+ *

4668

-+ *

4669

-+ *          TICK_SCHED_REMOTE_OFFLINE

4670

-+ *                    |   ^

4671

-+ *                    |   |

4672

-+ *                    |   | sched_tick_remote()

4673

-+ *                    |   |

4674

-+ *                    |   |

4675

-+ *                    +--TICK_SCHED_REMOTE_OFFLINING

4676

-+ *                    |   ^

4677

-+ *                    |   |

4678

-+ * sched_tick_start() |   | sched_tick_stop()

4679

-+ *                    |   |

4680

-+ *                    V   |

4681

-+ *          TICK_SCHED_REMOTE_RUNNING

4682

-+ *

4683

-+ *

4684

-+ * Other transitions get WARN_ON_ONCE(), except that sched_tick_remote()

4685

-+ * and sched_tick_start() are happy to leave the state in RUNNING.

4686

-+ */

4687

-+

4688

-+static struct tick_work __percpu *tick_work_cpu;

4689

-+

4690

-+static void sched_tick_remote(struct work_struct *work)

4691

-+{

4692

-+	struct delayed_work *dwork = to_delayed_work(work);

4693

-+	struct tick_work *twork = container_of(dwork, struct tick_work, work);

4694

-+	int cpu = twork->cpu;

4695

-+	struct rq *rq = cpu_rq(cpu);

4696

-+	struct task_struct *curr;

4697

-+	unsigned long flags;

4698

-+	u64 delta;

4699

-+	int os;

4700

-+

4701

-+	/*

4702

-+	 * Handle the tick only if it appears the remote CPU is running in full

4703

-+	 * dynticks mode. The check is racy by nature, but missing a tick or

4704

-+	 * having one too much is no big deal because the scheduler tick updates

4705

-+	 * statistics and checks timeslices in a time-independent way, regardless

4706

-+	 * of when exactly it is running.

4707

-+	 */

4708

-+	if (!tick_nohz_tick_stopped_cpu(cpu))

4709

-+		goto out_requeue;

4710

-+

4711

-+	raw_spin_lock_irqsave(&rq->lock, flags);

4712

-+	curr = rq->curr;

4713

-+	if (cpu_is_offline(cpu))

4714

-+		goto out_unlock;

4715

-+

4716

-+	update_rq_clock(rq);

4717

-+	if (!is_idle_task(curr)) {

4718

-+		/*

4719

-+		 * Make sure the next tick runs within a reasonable

4720

-+		 * amount of time.

4721

-+		 */

4722

-+		delta = rq_clock_task(rq) - curr->last_ran;

4723

-+		WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);

4724

-+	}

4725

-+	scheduler_task_tick(rq);

4726

-+

4727

-+	calc_load_nohz_remote(rq);

4728

-+out_unlock:

4729

-+	raw_spin_unlock_irqrestore(&rq->lock, flags);

4730

-+

4731

-+out_requeue:

4732

-+	/*

4733

-+	 * Run the remote tick once per second (1Hz). This arbitrary

4734

-+	 * frequency is large enough to avoid overload but short enough

4735

-+	 * to keep scheduler internal stats reasonably up to date.  But

4736

-+	 * first update state to reflect hotplug activity if required.

4737

-+	 */

4738

-+	os = atomic_fetch_add_unless(&twork->state, -1, TICK_SCHED_REMOTE_RUNNING);

4739

-+	WARN_ON_ONCE(os == TICK_SCHED_REMOTE_OFFLINE);

4740

-+	if (os == TICK_SCHED_REMOTE_RUNNING)

4741

-+		queue_delayed_work(system_unbound_wq, dwork, HZ);

4742

-+}

4743

-+

4744

-+static void sched_tick_start(int cpu)

4745

-+{

4746

-+	int os;

4747

-+	struct tick_work *twork;

4748

-+

4749

-+	if (housekeeping_cpu(cpu, HK_FLAG_TICK))

4750

-+		return;

4751

-+

4752

-+	WARN_ON_ONCE(!tick_work_cpu);

4753

-+

4754

-+	twork = per_cpu_ptr(tick_work_cpu, cpu);

4755

-+	os = atomic_xchg(&twork->state, TICK_SCHED_REMOTE_RUNNING);

4756

-+	WARN_ON_ONCE(os == TICK_SCHED_REMOTE_RUNNING);

4757

-+	if (os == TICK_SCHED_REMOTE_OFFLINE) {

4758

-+		twork->cpu = cpu;

4759

-+		INIT_DELAYED_WORK(&twork->work, sched_tick_remote);

4760

-+		queue_delayed_work(system_unbound_wq, &twork->work, HZ);

4761

-+	}

4762

-+}

4763

-+

4764

-+#ifdef CONFIG_HOTPLUG_CPU

4765

-+static void sched_tick_stop(int cpu)

4766

-+{

4767

-+	struct tick_work *twork;

4768

-+

4769

-+	if (housekeeping_cpu(cpu, HK_FLAG_TICK))

4770

-+		return;

4771

-+

4772

-+	WARN_ON_ONCE(!tick_work_cpu);

4773

-+

4774

-+	twork = per_cpu_ptr(tick_work_cpu, cpu);

4775

-+	cancel_delayed_work_sync(&twork->work);

4776

-+}

4777

-+#endif /* CONFIG_HOTPLUG_CPU */

4778

-+

4779

-+int __init sched_tick_offload_init(void)

4780

-+{

4781

-+	tick_work_cpu = alloc_percpu(struct tick_work);

4782

-+	BUG_ON(!tick_work_cpu);

4783

-+	return 0;

4784

-+}

4785

-+

4786

-+#else /* !CONFIG_NO_HZ_FULL */

4787

-+static inline void sched_tick_start(int cpu) { }

4788

-+static inline void sched_tick_stop(int cpu) { }

4789

-+#endif

4790

-+

4791

-+#if defined(CONFIG_PREEMPTION) && (defined(CONFIG_DEBUG_PREEMPT) || \

4792

-+				defined(CONFIG_PREEMPT_TRACER))

4793

-+/*

4794

-+ * If the value passed in is equal to the current preempt count

4795

-+ * then we just disabled preemption. Start timing the latency.

4796

-+ */

4797

-+static inline void preempt_latency_start(int val)

4798

-+{

4799

-+	if (preempt_count() == val) {

4800

-+		unsigned long ip = get_lock_parent_ip();

4801

-+#ifdef CONFIG_DEBUG_PREEMPT

4802

-+		current->preempt_disable_ip = ip;

4803

-+#endif

4804

-+		trace_preempt_off(CALLER_ADDR0, ip);

4805

-+	}

4806

-+}

4807

-+

4808

-+void preempt_count_add(int val)

4809

-+{

4810

-+#ifdef CONFIG_DEBUG_PREEMPT

4811

-+	/*

4812

-+	 * Underflow?

4813

-+	 */

4814

-+	if (DEBUG_LOCKS_WARN_ON((preempt_count() < 0)))

4815

-+		return;

4816

-+#endif

4817

-+	__preempt_count_add(val);

4818

-+#ifdef CONFIG_DEBUG_PREEMPT

4819

-+	/*

4820

-+	 * Spinlock count overflowing soon?

4821

-+	 */

4822

-+	DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >=

4823

-+				PREEMPT_MASK - 10);

4824

-+#endif

4825

-+	preempt_latency_start(val);

4826

-+}

4827

-+EXPORT_SYMBOL(preempt_count_add);

4828

-+NOKPROBE_SYMBOL(preempt_count_add);

4829

-+

4830

-+/*

4831

-+ * If the value passed in equals to the current preempt count

4832

-+ * then we just enabled preemption. Stop timing the latency.

4833

-+ */

4834

-+static inline void preempt_latency_stop(int val)

4835

-+{

4836

-+	if (preempt_count() == val)

4837

-+		trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip());

4838

-+}

4839

-+

4840

-+void preempt_count_sub(int val)

4841

-+{

4842

-+#ifdef CONFIG_DEBUG_PREEMPT

4843

-+	/*

4844

-+	 * Underflow?

4845

-+	 */

4846

-+	if (DEBUG_LOCKS_WARN_ON(val > preempt_count()))

4847

-+		return;

4848

-+	/*

4849

-+	 * Is the spinlock portion underflowing?

4850

-+	 */

4851

-+	if (DEBUG_LOCKS_WARN_ON((val < PREEMPT_MASK) &&

4852

-+			!(preempt_count() & PREEMPT_MASK)))

4853

-+		return;

4854

-+#endif

4855

-+

4856

-+	preempt_latency_stop(val);

4857

-+	__preempt_count_sub(val);

4858

-+}

4859

-+EXPORT_SYMBOL(preempt_count_sub);

4860

-+NOKPROBE_SYMBOL(preempt_count_sub);

4861

-+

4862

-+#else

4863

-+static inline void preempt_latency_start(int val) { }

4864

-+static inline void preempt_latency_stop(int val) { }

4865

-+#endif

4866

-+

4867

-+static inline unsigned long get_preempt_disable_ip(struct task_struct *p)

4868

-+{

4869

-+#ifdef CONFIG_DEBUG_PREEMPT

4870

-+	return p->preempt_disable_ip;

4871

-+#else

4872

-+	return 0;

4873

-+#endif

4874

-+}

4875

-+

4876

-+/*

4877

-+ * Print scheduling while atomic bug:

4878

-+ */

4879

-+static noinline void __schedule_bug(struct task_struct *prev)

4880

-+{

4881

-+	/* Save this before calling printk(), since that will clobber it */

4882

-+	unsigned long preempt_disable_ip = get_preempt_disable_ip(current);

4883

-+

4884

-+	if (oops_in_progress)

4885

-+		return;

4886

-+

4887

-+	printk(KERN_ERR "BUG: scheduling while atomic: %s/%d/0x%08x\n",

4888

-+		prev->comm, prev->pid, preempt_count());

4889

-+

4890

-+	debug_show_held_locks(prev);

4891

-+	print_modules();

4892

-+	if (irqs_disabled())

4893

-+		print_irqtrace_events(prev);

4894

-+	if (IS_ENABLED(CONFIG_DEBUG_PREEMPT)

4895

-+	    && in_atomic_preempt_off()) {

4896

-+		pr_err("Preemption disabled at:");

4897

-+		print_ip_sym(KERN_ERR, preempt_disable_ip);

4898

-+	}

4899

-+	if (panic_on_warn)

4900

-+		panic("scheduling while atomic\n");

4901

-+

4902

-+	dump_stack();

4903

-+	add_taint(TAINT_WARN, LOCKDEP_STILL_OK);

4904

-+}

4905

-+

4906

-+/*

4907

-+ * Various schedule()-time debugging checks and statistics:

4908

-+ */

4909

-+static inline void schedule_debug(struct task_struct *prev, bool preempt)

4910

-+{

4911

-+#ifdef CONFIG_SCHED_STACK_END_CHECK

4912

-+	if (task_stack_end_corrupted(prev))

4913

-+		panic("corrupted stack end detected inside scheduler\n");

4914

-+

4915

-+	if (task_scs_end_corrupted(prev))

4916

-+		panic("corrupted shadow stack detected inside scheduler\n");

4917

-+#endif

4918

-+

4919

-+#ifdef CONFIG_DEBUG_ATOMIC_SLEEP

4920

-+	if (!preempt && READ_ONCE(prev->__state) && prev->non_block_count) {

4921

-+		printk(KERN_ERR "BUG: scheduling in a non-blocking section: %s/%d/%i\n",

4922

-+			prev->comm, prev->pid, prev->non_block_count);

4923

-+		dump_stack();

4924

-+		add_taint(TAINT_WARN, LOCKDEP_STILL_OK);

4925

-+	}

4926

-+#endif

4927

-+

4928

-+	if (unlikely(in_atomic_preempt_off())) {

4929

-+		__schedule_bug(prev);

4930

-+		preempt_count_set(PREEMPT_DISABLED);

4931

-+	}

4932

-+	rcu_sleep_check();

4933

-+	SCHED_WARN_ON(ct_state() == CONTEXT_USER);

4934

-+

4935

-+	profile_hit(SCHED_PROFILING, __builtin_return_address(0));

4936

-+

4937

-+	schedstat_inc(this_rq()->sched_count);

4938

-+}

4939

-+

4940

-+/*

4941

-+ * Compile time debug macro

4942

-+ * #define ALT_SCHED_DEBUG

4943

-+ */

4944

-+

4945

-+#ifdef ALT_SCHED_DEBUG

4946

-+void alt_sched_debug(void)

4947

-+{

4948

-+	printk(KERN_INFO "sched: pending: 0x%04lx, idle: 0x%04lx, sg_idle: 0x%04lx\n",

4949

-+	       sched_rq_pending_mask.bits[0],

4950

-+	       sched_rq_watermark[0].bits[0],

4951

-+	       sched_sg_idle_mask.bits[0]);

4952

-+}

4953

-+#else

4954

-+inline void alt_sched_debug(void) {}

4955

-+#endif

4956

-+

4957

-+#ifdef	CONFIG_SMP

4958

-+

4959

-+#define SCHED_RQ_NR_MIGRATION (32U)

4960

-+/*

4961

-+ * Migrate pending tasks in @rq to @dest_cpu

4962

-+ * Will try to migrate mininal of half of @rq nr_running tasks and

4963

-+ * SCHED_RQ_NR_MIGRATION to @dest_cpu

4964

-+ */

4965

-+static inline int

4966

-+migrate_pending_tasks(struct rq *rq, struct rq *dest_rq, const int dest_cpu)

4967

-+{

4968

-+	struct task_struct *p, *skip = rq->curr;

4969

-+	int nr_migrated = 0;

4970

-+	int nr_tries = min(rq->nr_running / 2, SCHED_RQ_NR_MIGRATION);

4971

-+

4972

-+	while (skip != rq->idle && nr_tries &&

4973

-+	       (p = sched_rq_next_task(skip, rq)) != rq->idle) {

4974

-+		skip = sched_rq_next_task(p, rq);

4975

-+		if (cpumask_test_cpu(dest_cpu, p->cpus_ptr)) {

4976

-+			__SCHED_DEQUEUE_TASK(p, rq, 0, );

4977

-+			set_task_cpu(p, dest_cpu);

4978

-+			sched_task_sanity_check(p, dest_rq);

4979

-+			__SCHED_ENQUEUE_TASK(p, dest_rq, 0);

4980

-+			nr_migrated++;

4981

-+		}

4982

-+		nr_tries--;

4983

-+	}

4984

-+

4985

-+	return nr_migrated;

4986

-+}

4987

-+

4988

-+static inline int take_other_rq_tasks(struct rq *rq, int cpu)

4989

-+{

4990

-+	struct cpumask *topo_mask, *end_mask;

4991

-+

4992

-+	if (unlikely(!rq->online))

4993

-+		return 0;

4994

-+

4995

-+	if (cpumask_empty(&sched_rq_pending_mask))

4996

-+		return 0;

4997

-+

4998

-+	topo_mask = per_cpu(sched_cpu_topo_masks, cpu) + 1;

4999

-+	end_mask = per_cpu(sched_cpu_topo_end_mask, cpu);

5000

-+	do {

5001

-+		int i;

5002

-+		for_each_cpu_and(i, &sched_rq_pending_mask, topo_mask) {

5003

-+			int nr_migrated;

5004

-+			struct rq *src_rq;

5005

-+

5006

-+			src_rq = cpu_rq(i);

5007

-+			if (!do_raw_spin_trylock(&src_rq->lock))

5008

-+				continue;

5009

-+			spin_acquire(&src_rq->lock.dep_map,

5010

-+				     SINGLE_DEPTH_NESTING, 1, _RET_IP_);

5011

-+

5012

-+			if ((nr_migrated = migrate_pending_tasks(src_rq, rq, cpu))) {

5013

-+				src_rq->nr_running -= nr_migrated;

5014

-+				if (src_rq->nr_running < 2)

5015

-+					cpumask_clear_cpu(i, &sched_rq_pending_mask);

5016

-+

5017

-+				rq->nr_running += nr_migrated;

5018

-+				if (rq->nr_running > 1)

5019

-+					cpumask_set_cpu(cpu, &sched_rq_pending_mask);

5020

-+

5021

-+				update_sched_rq_watermark(rq);

5022

-+				cpufreq_update_util(rq, 0);

5023

-+

5024

-+				spin_release(&src_rq->lock.dep_map, _RET_IP_);

5025

-+				do_raw_spin_unlock(&src_rq->lock);

5026

-+

5027

-+				return 1;

5028

-+			}

5029

-+

5030

-+			spin_release(&src_rq->lock.dep_map, _RET_IP_);

5031

-+			do_raw_spin_unlock(&src_rq->lock);

5032

-+		}

5033

-+	} while (++topo_mask < end_mask);

5034

-+

5035

-+	return 0;

5036

-+}

5037

-+#endif

5038

-+

5039

-+/*

5040

-+ * Timeslices below RESCHED_NS are considered as good as expired as there's no

5041

-+ * point rescheduling when there's so little time left.

5042

-+ */

5043

-+static inline void check_curr(struct task_struct *p, struct rq *rq)

5044

-+{

5045

-+	if (unlikely(rq->idle == p))

5046

-+		return;

5047

-+

5048

-+	update_curr(rq, p);

5049

-+

5050

-+	if (p->time_slice < RESCHED_NS)

5051

-+		time_slice_expired(p, rq);

5052

-+}

5053

-+

5054

-+static inline struct task_struct *

5055

-+choose_next_task(struct rq *rq, int cpu, struct task_struct *prev)

5056

-+{

5057

-+	struct task_struct *next;

5058

-+

5059

-+	if (unlikely(rq->skip)) {

5060

-+		next = rq_runnable_task(rq);

5061

-+		if (next == rq->idle) {

5062

-+#ifdef	CONFIG_SMP

5063

-+			if (!take_other_rq_tasks(rq, cpu)) {

5064

-+#endif

5065

-+				rq->skip = NULL;

5066

-+				schedstat_inc(rq->sched_goidle);

5067

-+				return next;

5068

-+#ifdef	CONFIG_SMP

5069

-+			}

5070

-+			next = rq_runnable_task(rq);

5071

-+#endif

5072

-+		}

5073

-+		rq->skip = NULL;

5074

-+#ifdef CONFIG_HIGH_RES_TIMERS

5075

-+		hrtick_start(rq, next->time_slice);

5076

-+#endif

5077

-+		return next;

5078

-+	}

5079

-+

5080

-+	next = sched_rq_first_task(rq);

5081

-+	if (next == rq->idle) {

5082

-+#ifdef	CONFIG_SMP

5083

-+		if (!take_other_rq_tasks(rq, cpu)) {

5084

-+#endif

5085

-+			schedstat_inc(rq->sched_goidle);

5086

-+			/*printk(KERN_INFO "sched: choose_next_task(%d) idle %px\n", cpu, next);*/

5087

-+			return next;

5088

-+#ifdef	CONFIG_SMP

5089

-+		}

5090

-+		next = sched_rq_first_task(rq);

5091

-+#endif

5092

-+	}

5093

-+#ifdef CONFIG_HIGH_RES_TIMERS

5094

-+	hrtick_start(rq, next->time_slice);

5095

-+#endif

5096

-+	/*printk(KERN_INFO "sched: choose_next_task(%d) next %px\n", cpu,

5097

-+	 * next);*/

5098

-+	return next;

5099

-+}

5100

-+

5101

-+/*

5102

-+ * Constants for the sched_mode argument of __schedule().

5103

-+ *

5104

-+ * The mode argument allows RT enabled kernels to differentiate a

5105

-+ * preemption from blocking on an 'sleeping' spin/rwlock. Note that

5106

-+ * SM_MASK_PREEMPT for !RT has all bits set, which allows the compiler to

5107

-+ * optimize the AND operation out and just check for zero.

5108

-+ */

5109

-+#define SM_NONE			0x0

5110

-+#define SM_PREEMPT		0x1

5111

-+#define SM_RTLOCK_WAIT		0x2

5112

-+

5113

-+#ifndef CONFIG_PREEMPT_RT

5114

-+# define SM_MASK_PREEMPT	(~0U)

5115

-+#else

5116

-+# define SM_MASK_PREEMPT	SM_PREEMPT

5117

-+#endif

5118

-+

5119

-+/*

5120

-+ * schedule() is the main scheduler function.

5121

-+ *

5122

-+ * The main means of driving the scheduler and thus entering this function are:

5123

-+ *

5124

-+ *   1. Explicit blocking: mutex, semaphore, waitqueue, etc.

5125

-+ *

5126

-+ *   2. TIF_NEED_RESCHED flag is checked on interrupt and userspace return

5127

-+ *      paths. For example, see arch/x86/entry_64.S.

5128

-+ *

5129

-+ *      To drive preemption between tasks, the scheduler sets the flag in timer

5130

-+ *      interrupt handler scheduler_tick().

5131

-+ *

5132

-+ *   3. Wakeups don't really cause entry into schedule(). They add a

5133

-+ *      task to the run-queue and that's it.

5134

-+ *

5135

-+ *      Now, if the new task added to the run-queue preempts the current

5136

-+ *      task, then the wakeup sets TIF_NEED_RESCHED and schedule() gets

5137

-+ *      called on the nearest possible occasion:

5138

-+ *

5139

-+ *       - If the kernel is preemptible (CONFIG_PREEMPTION=y):

5140

-+ *

5141

-+ *         - in syscall or exception context, at the next outmost

5142

-+ *           preempt_enable(). (this might be as soon as the wake_up()'s

5143

-+ *           spin_unlock()!)

5144

-+ *

5145

-+ *         - in IRQ context, return from interrupt-handler to

5146

-+ *           preemptible context

5147

-+ *

5148

-+ *       - If the kernel is not preemptible (CONFIG_PREEMPTION is not set)

5149

-+ *         then at the next:

5150

-+ *

5151

-+ *          - cond_resched() call

5152

-+ *          - explicit schedule() call

5153

-+ *          - return from syscall or exception to user-space

5154

-+ *          - return from interrupt-handler to user-space

5155

-+ *

5156

-+ * WARNING: must be called with preemption disabled!

5157

-+ */

5158

-+static void __sched notrace __schedule(unsigned int sched_mode)

5159

-+{

5160

-+	struct task_struct *prev, *next;

5161

-+	unsigned long *switch_count;

5162

-+	unsigned long prev_state;

5163

-+	struct rq *rq;

5164

-+	int cpu;

5165

-+

5166

-+	cpu = smp_processor_id();

5167

-+	rq = cpu_rq(cpu);

5168

-+	prev = rq->curr;

5169

-+

5170

-+	schedule_debug(prev, !!sched_mode);

5171

-+

5172

-+	/* by passing sched_feat(HRTICK) checking which Alt schedule FW doesn't support */

5173

-+	hrtick_clear(rq);

5174

-+

5175

-+	local_irq_disable();

5176

-+	rcu_note_context_switch(!!sched_mode);

5177

-+

5178

-+	/*

5179

-+	 * Make sure that signal_pending_state()->signal_pending() below

5180

-+	 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)

5181

-+	 * done by the caller to avoid the race with signal_wake_up():

5182

-+	 *

5183

-+	 * __set_current_state(@state)		signal_wake_up()

5184

-+	 * schedule()				  set_tsk_thread_flag(p, TIF_SIGPENDING)

5185

-+	 *					  wake_up_state(p, state)

5186

-+	 *   LOCK rq->lock			    LOCK p->pi_state

5187

-+	 *   smp_mb__after_spinlock()		    smp_mb__after_spinlock()

5188

-+	 *     if (signal_pending_state())	    if (p->state & @state)

5189

-+	 *

5190

-+	 * Also, the membarrier system call requires a full memory barrier

5191

-+	 * after coming from user-space, before storing to rq->curr.

5192

-+	 */

5193

-+	raw_spin_lock(&rq->lock);

5194

-+	smp_mb__after_spinlock();

5195

-+

5196

-+	update_rq_clock(rq);

5197

-+

5198

-+	switch_count = &prev->nivcsw;

5199

-+	/*

5200

-+	 * We must load prev->state once (task_struct::state is volatile), such

5201

-+	 * that:

5202

-+	 *

5203

-+	 *  - we form a control dependency vs deactivate_task() below.

5204

-+	 *  - ptrace_{,un}freeze_traced() can change ->state underneath us.

5205

-+	 */

5206

-+	prev_state = READ_ONCE(prev->__state);

5207

-+	if (!(sched_mode & SM_MASK_PREEMPT) && prev_state) {

5208

-+		if (signal_pending_state(prev_state, prev)) {

5209

-+			WRITE_ONCE(prev->__state, TASK_RUNNING);

5210

-+		} else {

5211

-+			prev->sched_contributes_to_load =

5212

-+				(prev_state & TASK_UNINTERRUPTIBLE) &&

5213

-+				!(prev_state & TASK_NOLOAD) &&

5214

-+				!(prev->flags & PF_FROZEN);

5215

-+

5216

-+			if (prev->sched_contributes_to_load)

5217

-+				rq->nr_uninterruptible++;

5218

-+

5219

-+			/*

5220

-+			 * __schedule()			ttwu()

5221

-+			 *   prev_state = prev->state;    if (p->on_rq && ...)

5222

-+			 *   if (prev_state)		    goto out;

5223

-+			 *     p->on_rq = 0;		  smp_acquire__after_ctrl_dep();

5224

-+			 *				  p->state = TASK_WAKING

5225

-+			 *

5226

-+			 * Where __schedule() and ttwu() have matching control dependencies.

5227

-+			 *

5228

-+			 * After this, schedule() must not care about p->state any more.

5229

-+			 */

5230

-+			sched_task_deactivate(prev, rq);

5231

-+			deactivate_task(prev, rq);

5232

-+

5233

-+			if (prev->in_iowait) {

5234

-+				atomic_inc(&rq->nr_iowait);

5235

-+				delayacct_blkio_start();

5236

-+			}

5237

-+		}

5238

-+		switch_count = &prev->nvcsw;

5239

-+	}

5240

-+

5241

-+	check_curr(prev, rq);

5242

-+

5243

-+	next = choose_next_task(rq, cpu, prev);

5244

-+	clear_tsk_need_resched(prev);

5245

-+	clear_preempt_need_resched();

5246

-+#ifdef CONFIG_SCHED_DEBUG

5247

-+	rq->last_seen_need_resched_ns = 0;

5248

-+#endif

5249

-+

5250

-+	if (likely(prev != next)) {

5251

-+		next->last_ran = rq->clock_task;

5252

-+		rq->last_ts_switch = rq->clock;

5253

-+

5254

-+		rq->nr_switches++;

5255

-+		/*

5256

-+		 * RCU users of rcu_dereference(rq->curr) may not see

5257

-+		 * changes to task_struct made by pick_next_task().

5258

-+		 */

5259

-+		RCU_INIT_POINTER(rq->curr, next);

5260

-+		/*

5261

-+		 * The membarrier system call requires each architecture

5262

-+		 * to have a full memory barrier after updating

5263

-+		 * rq->curr, before returning to user-space.

5264

-+		 *

5265

-+		 * Here are the schemes providing that barrier on the

5266

-+		 * various architectures:

5267

-+		 * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC.

5268

-+		 *   switch_mm() rely on membarrier_arch_switch_mm() on PowerPC.

5269

-+		 * - finish_lock_switch() for weakly-ordered

5270

-+		 *   architectures where spin_unlock is a full barrier,

5271

-+		 * - switch_to() for arm64 (weakly-ordered, spin_unlock

5272

-+		 *   is a RELEASE barrier),

5273

-+		 */

5274

-+		++*switch_count;

5275

-+

5276

-+		psi_sched_switch(prev, next, !task_on_rq_queued(prev));

5277

-+

5278

-+		trace_sched_switch(sched_mode & SM_MASK_PREEMPT, prev, next);

5279

-+

5280

-+		/* Also unlocks the rq: */

5281

-+		rq = context_switch(rq, prev, next);

5282

-+	} else {

5283

-+		__balance_callbacks(rq);

5284

-+		raw_spin_unlock_irq(&rq->lock);

5285

-+	}

5286

-+

5287

-+#ifdef CONFIG_SCHED_SMT

5288

-+	sg_balance_check(rq);

5289

-+#endif

5290

-+}

5291

-+

5292

-+void __noreturn do_task_dead(void)

5293

-+{

5294

-+	/* Causes final put_task_struct in finish_task_switch(): */

5295

-+	set_special_state(TASK_DEAD);

5296

-+

5297

-+	/* Tell freezer to ignore us: */

5298

-+	current->flags |= PF_NOFREEZE;

5299

-+

5300

-+	__schedule(SM_NONE);

5301

-+	BUG();

5302

-+

5303

-+	/* Avoid "noreturn function does return" - but don't continue if BUG() is a NOP: */

5304

-+	for (;;)

5305

-+		cpu_relax();

5306

-+}

5307

-+

5308

-+static inline void sched_submit_work(struct task_struct *tsk)

5309

-+{

5310

-+	unsigned int task_flags;

5311

-+

5312

-+	if (task_is_running(tsk))

5313

-+		return;

5314

-+

5315

-+	task_flags = tsk->flags;

5316

-+	/*

5317

-+	 * If a worker went to sleep, notify and ask workqueue whether

5318

-+	 * it wants to wake up a task to maintain concurrency.

5319

-+	 * As this function is called inside the schedule() context,

5320

-+	 * we disable preemption to avoid it calling schedule() again

5321

-+	 * in the possible wakeup of a kworker and because wq_worker_sleeping()

5322

-+	 * requires it.

5323

-+	 */

5324

-+	if (task_flags & (PF_WQ_WORKER | PF_IO_WORKER)) {

5325

-+		preempt_disable();

5326

-+		if (task_flags & PF_WQ_WORKER)

5327

-+			wq_worker_sleeping(tsk);

5328

-+		else

5329

-+			io_wq_worker_sleeping(tsk);

5330

-+		preempt_enable_no_resched();

5331

-+	}

5332

-+

5333

-+	if (tsk_is_pi_blocked(tsk))

5334

-+		return;

5335

-+

5336

-+	/*

5337

-+	 * If we are going to sleep and we have plugged IO queued,

5338

-+	 * make sure to submit it to avoid deadlocks.

5339

-+	 */

5340

-+	if (blk_needs_flush_plug(tsk))

5341

-+		blk_schedule_flush_plug(tsk);

5342

-+}

5343

-+

5344

-+static void sched_update_worker(struct task_struct *tsk)

5345

-+{

5346

-+	if (tsk->flags & (PF_WQ_WORKER | PF_IO_WORKER)) {

5347

-+		if (tsk->flags & PF_WQ_WORKER)

5348

-+			wq_worker_running(tsk);

5349

-+		else

5350

-+			io_wq_worker_running(tsk);

5351

-+	}

5352

-+}

5353

-+

5354

-+asmlinkage __visible void __sched schedule(void)

5355

-+{

5356

-+	struct task_struct *tsk = current;

5357

-+

5358

-+	sched_submit_work(tsk);

5359

-+	do {

5360

-+		preempt_disable();

5361

-+		__schedule(SM_NONE);

5362

-+		sched_preempt_enable_no_resched();

5363

-+	} while (need_resched());

5364

-+	sched_update_worker(tsk);

5365

-+}

5366

-+EXPORT_SYMBOL(schedule);

5367

-+

5368

-+/*

5369

-+ * synchronize_rcu_tasks() makes sure that no task is stuck in preempted

5370

-+ * state (have scheduled out non-voluntarily) by making sure that all

5371

-+ * tasks have either left the run queue or have gone into user space.

5372

-+ * As idle tasks do not do either, they must not ever be preempted

5373

-+ * (schedule out non-voluntarily).

5374

-+ *

5375

-+ * schedule_idle() is similar to schedule_preempt_disable() except that it

5376

-+ * never enables preemption because it does not call sched_submit_work().

5377

-+ */

5378

-+void __sched schedule_idle(void)

5379

-+{

5380

-+	/*

5381

-+	 * As this skips calling sched_submit_work(), which the idle task does

5382

-+	 * regardless because that function is a nop when the task is in a

5383

-+	 * TASK_RUNNING state, make sure this isn't used someplace that the

5384

-+	 * current task can be in any other state. Note, idle is always in the

5385

-+	 * TASK_RUNNING state.

5386

-+	 */

5387

-+	WARN_ON_ONCE(current->__state);

5388

-+	do {

5389

-+		__schedule(SM_NONE);

5390

-+	} while (need_resched());

5391

-+}

5392

-+

5393

-+#if defined(CONFIG_CONTEXT_TRACKING) && !defined(CONFIG_HAVE_CONTEXT_TRACKING_OFFSTACK)

5394

-+asmlinkage __visible void __sched schedule_user(void)

5395

-+{

5396

-+	/*

5397

-+	 * If we come here after a random call to set_need_resched(),

5398

-+	 * or we have been woken up remotely but the IPI has not yet arrived,

5399

-+	 * we haven't yet exited the RCU idle mode. Do it here manually until

5400

-+	 * we find a better solution.

5401

-+	 *

5402

-+	 * NB: There are buggy callers of this function.  Ideally we

5403

-+	 * should warn if prev_state != CONTEXT_USER, but that will trigger

5404

-+	 * too frequently to make sense yet.

5405

-+	 */

5406

-+	enum ctx_state prev_state = exception_enter();

5407

-+	schedule();

5408

-+	exception_exit(prev_state);

5409

-+}

5410

-+#endif

5411

-+

5412

-+/**

5413

-+ * schedule_preempt_disabled - called with preemption disabled

5414

-+ *

5415

-+ * Returns with preemption disabled. Note: preempt_count must be 1

5416

-+ */

5417

-+void __sched schedule_preempt_disabled(void)

5418

-+{

5419

-+	sched_preempt_enable_no_resched();

5420

-+	schedule();

5421

-+	preempt_disable();

5422

-+}

5423

-+

5424

-+#ifdef CONFIG_PREEMPT_RT

5425

-+void __sched notrace schedule_rtlock(void)

5426

-+{

5427

-+	do {

5428

-+		preempt_disable();

5429

-+		__schedule(SM_RTLOCK_WAIT);

5430

-+		sched_preempt_enable_no_resched();

5431

-+	} while (need_resched());

5432

-+}

5433

-+NOKPROBE_SYMBOL(schedule_rtlock);

5434

-+#endif

5435

-+

5436

-+static void __sched notrace preempt_schedule_common(void)

5437

-+{

5438

-+	do {

5439

-+		/*

5440

-+		 * Because the function tracer can trace preempt_count_sub()

5441

-+		 * and it also uses preempt_enable/disable_notrace(), if

5442

-+		 * NEED_RESCHED is set, the preempt_enable_notrace() called

5443

-+		 * by the function tracer will call this function again and

5444

-+		 * cause infinite recursion.

5445

-+		 *

5446

-+		 * Preemption must be disabled here before the function

5447

-+		 * tracer can trace. Break up preempt_disable() into two

5448

-+		 * calls. One to disable preemption without fear of being

5449

-+		 * traced. The other to still record the preemption latency,

5450

-+		 * which can also be traced by the function tracer.

5451

-+		 */

5452

-+		preempt_disable_notrace();

5453

-+		preempt_latency_start(1);

5454

-+		__schedule(SM_PREEMPT);

5455

-+		preempt_latency_stop(1);

5456

-+		preempt_enable_no_resched_notrace();

5457

-+

5458

-+		/*

5459

-+		 * Check again in case we missed a preemption opportunity

5460

-+		 * between schedule and now.

5461

-+		 */

5462

-+	} while (need_resched());

5463

-+}

5464

-+

5465

-+#ifdef CONFIG_PREEMPTION

5466

-+/*

5467

-+ * This is the entry point to schedule() from in-kernel preemption

5468

-+ * off of preempt_enable.

5469

-+ */

5470

-+asmlinkage __visible void __sched notrace preempt_schedule(void)

5471

-+{

5472

-+	/*

5473

-+	 * If there is a non-zero preempt_count or interrupts are disabled,

5474

-+	 * we do not want to preempt the current task. Just return..

5475

-+	 */

5476

-+	if (likely(!preemptible()))

5477

-+		return;

5478

-+

5479

-+	preempt_schedule_common();

5480

-+}

5481

-+NOKPROBE_SYMBOL(preempt_schedule);

5482

-+EXPORT_SYMBOL(preempt_schedule);

5483

-+

5484

-+#ifdef CONFIG_PREEMPT_DYNAMIC

5485

-+DEFINE_STATIC_CALL(preempt_schedule, __preempt_schedule_func);

5486

-+EXPORT_STATIC_CALL_TRAMP(preempt_schedule);

5487

-+#endif

5488

-+

5489

-+

5490

-+/**

5491

-+ * preempt_schedule_notrace - preempt_schedule called by tracing

5492

-+ *

5493

-+ * The tracing infrastructure uses preempt_enable_notrace to prevent

5494

-+ * recursion and tracing preempt enabling caused by the tracing

5495

-+ * infrastructure itself. But as tracing can happen in areas coming

5496

-+ * from userspace or just about to enter userspace, a preempt enable

5497

-+ * can occur before user_exit() is called. This will cause the scheduler

5498

-+ * to be called when the system is still in usermode.

5499

-+ *

5500

-+ * To prevent this, the preempt_enable_notrace will use this function

5501

-+ * instead of preempt_schedule() to exit user context if needed before

5502

-+ * calling the scheduler.

5503

-+ */

5504

-+asmlinkage __visible void __sched notrace preempt_schedule_notrace(void)

5505

-+{

5506

-+	enum ctx_state prev_ctx;

5507

-+

5508

-+	if (likely(!preemptible()))

5509

-+		return;

5510

-+

5511

-+	do {

5512

-+		/*

5513

-+		 * Because the function tracer can trace preempt_count_sub()

5514

-+		 * and it also uses preempt_enable/disable_notrace(), if

5515

-+		 * NEED_RESCHED is set, the preempt_enable_notrace() called

5516

-+		 * by the function tracer will call this function again and

5517

-+		 * cause infinite recursion.

5518

-+		 *

5519

-+		 * Preemption must be disabled here before the function

5520

-+		 * tracer can trace. Break up preempt_disable() into two

5521

-+		 * calls. One to disable preemption without fear of being

5522

-+		 * traced. The other to still record the preemption latency,

5523

-+		 * which can also be traced by the function tracer.

5524

-+		 */

5525

-+		preempt_disable_notrace();

5526

-+		preempt_latency_start(1);

5527

-+		/*

5528

-+		 * Needs preempt disabled in case user_exit() is traced

5529

-+		 * and the tracer calls preempt_enable_notrace() causing

5530

-+		 * an infinite recursion.

5531

-+		 */

5532

-+		prev_ctx = exception_enter();

5533

-+		__schedule(SM_PREEMPT);

5534

-+		exception_exit(prev_ctx);

5535

-+

5536

-+		preempt_latency_stop(1);

5537

-+		preempt_enable_no_resched_notrace();

5538

-+	} while (need_resched());

5539

-+}

5540

-+EXPORT_SYMBOL_GPL(preempt_schedule_notrace);

5541

-+

5542

-+#ifdef CONFIG_PREEMPT_DYNAMIC

5543

-+DEFINE_STATIC_CALL(preempt_schedule_notrace, __preempt_schedule_notrace_func);

5544

-+EXPORT_STATIC_CALL_TRAMP(preempt_schedule_notrace);

5545

-+#endif

5546

-+

5547

-+#endif /* CONFIG_PREEMPTION */

5548

-+

5549

-+#ifdef CONFIG_PREEMPT_DYNAMIC

5550

-+

5551

-+#include <linux/entry-common.h>

5552

-+

5553

-+/*

5554

-+ * SC:cond_resched

5555

-+ * SC:might_resched

5556

-+ * SC:preempt_schedule

5557

-+ * SC:preempt_schedule_notrace

5558

-+ * SC:irqentry_exit_cond_resched

5559

-+ *

5560

-+ *

5561

-+ * NONE:

5562

-+ *   cond_resched               <- __cond_resched

5563

-+ *   might_resched              <- RET0

5564

-+ *   preempt_schedule           <- NOP

5565

-+ *   preempt_schedule_notrace   <- NOP

5566

-+ *   irqentry_exit_cond_resched <- NOP

5567

-+ *

5568

-+ * VOLUNTARY:

5569

-+ *   cond_resched               <- __cond_resched

5570

-+ *   might_resched              <- __cond_resched

5571

-+ *   preempt_schedule           <- NOP

5572

-+ *   preempt_schedule_notrace   <- NOP

5573

-+ *   irqentry_exit_cond_resched <- NOP

5574

-+ *

5575

-+ * FULL:

5576

-+ *   cond_resched               <- RET0

5577

-+ *   might_resched              <- RET0

5578

-+ *   preempt_schedule           <- preempt_schedule

5579

-+ *   preempt_schedule_notrace   <- preempt_schedule_notrace

5580

-+ *   irqentry_exit_cond_resched <- irqentry_exit_cond_resched

5581

-+ */

5582

-+

5583

-+enum {

5584

-+	preempt_dynamic_none = 0,

5585

-+	preempt_dynamic_voluntary,

5586

-+	preempt_dynamic_full,

5587

-+};

5588

-+

5589

-+int preempt_dynamic_mode = preempt_dynamic_full;

5590

-+

5591

-+int sched_dynamic_mode(const char *str)

5592

-+{

5593

-+	if (!strcmp(str, "none"))

5594

-+		return preempt_dynamic_none;

5595

-+

5596

-+	if (!strcmp(str, "voluntary"))

5597

-+		return preempt_dynamic_voluntary;

5598

-+

5599

-+	if (!strcmp(str, "full"))

5600

-+		return preempt_dynamic_full;

5601

-+

5602

-+	return -EINVAL;

5603

-+}

5604

-+

5605

-+void sched_dynamic_update(int mode)

5606

-+{

5607

-+	/*

5608

-+	 * Avoid {NONE,VOLUNTARY} -> FULL transitions from ever ending up in

5609

-+	 * the ZERO state, which is invalid.

5610

-+	 */

5611

-+	static_call_update(cond_resched, __cond_resched);

5612

-+	static_call_update(might_resched, __cond_resched);

5613

-+	static_call_update(preempt_schedule, __preempt_schedule_func);

5614

-+	static_call_update(preempt_schedule_notrace, __preempt_schedule_notrace_func);

5615

-+	static_call_update(irqentry_exit_cond_resched, irqentry_exit_cond_resched);

5616

-+

5617

-+	switch (mode) {

5618

-+	case preempt_dynamic_none:

5619

-+		static_call_update(cond_resched, __cond_resched);

5620

-+		static_call_update(might_resched, (void *)&__static_call_return0);

5621

-+		static_call_update(preempt_schedule, NULL);

5622

-+		static_call_update(preempt_schedule_notrace, NULL);

5623

-+		static_call_update(irqentry_exit_cond_resched, NULL);

5624

-+		pr_info("Dynamic Preempt: none\n");

5625

-+		break;

5626

-+

5627

-+	case preempt_dynamic_voluntary:

5628

-+		static_call_update(cond_resched, __cond_resched);

5629

-+		static_call_update(might_resched, __cond_resched);

5630

-+		static_call_update(preempt_schedule, NULL);

5631

-+		static_call_update(preempt_schedule_notrace, NULL);

5632

-+		static_call_update(irqentry_exit_cond_resched, NULL);

5633

-+		pr_info("Dynamic Preempt: voluntary\n");

5634

-+		break;

5635

-+

5636

-+	case preempt_dynamic_full:

5637

-+		static_call_update(cond_resched, (void *)&__static_call_return0);

5638

-+		static_call_update(might_resched, (void *)&__static_call_return0);

5639

-+		static_call_update(preempt_schedule, __preempt_schedule_func);

5640

-+		static_call_update(preempt_schedule_notrace, __preempt_schedule_notrace_func);

5641

-+		static_call_update(irqentry_exit_cond_resched, irqentry_exit_cond_resched);

5642

-+		pr_info("Dynamic Preempt: full\n");

5643

-+		break;

5644

-+	}

5645

-+

5646

-+	preempt_dynamic_mode = mode;

5647

-+}

5648

-+

5649

-+static int __init setup_preempt_mode(char *str)

5650

-+{

5651

-+	int mode = sched_dynamic_mode(str);

5652

-+	if (mode < 0) {

5653

-+		pr_warn("Dynamic Preempt: unsupported mode: %s\n", str);

5654

-+		return 1;

5655

-+	}

5656

-+

5657

-+	sched_dynamic_update(mode);

5658

-+	return 0;

5659

-+}

5660

-+__setup("preempt=", setup_preempt_mode);

5661

-+

5662

-+#endif /* CONFIG_PREEMPT_DYNAMIC */

5663

-+

5664

-+/*

5665

-+ * This is the entry point to schedule() from kernel preemption

5666

-+ * off of irq context.

5667

-+ * Note, that this is called and return with irqs disabled. This will

5668

-+ * protect us against recursive calling from irq.

5669

-+ */

5670

-+asmlinkage __visible void __sched preempt_schedule_irq(void)

5671

-+{

5672

-+	enum ctx_state prev_state;

5673

-+

5674

-+	/* Catch callers which need to be fixed */

5675

-+	BUG_ON(preempt_count() || !irqs_disabled());

5676

-+

5677

-+	prev_state = exception_enter();

5678

-+

5679

-+	do {

5680

-+		preempt_disable();

5681

-+		local_irq_enable();

5682

-+		__schedule(SM_PREEMPT);

5683

-+		local_irq_disable();

5684

-+		sched_preempt_enable_no_resched();

5685

-+	} while (need_resched());

5686

-+

5687

-+	exception_exit(prev_state);

5688

-+}

5689

-+

5690

-+int default_wake_function(wait_queue_entry_t *curr, unsigned mode, int wake_flags,

5691

-+			  void *key)

5692

-+{

5693

-+	WARN_ON_ONCE(IS_ENABLED(CONFIG_SCHED_DEBUG) && wake_flags & ~WF_SYNC);

5694

-+	return try_to_wake_up(curr->private, mode, wake_flags);

5695

-+}

5696

-+EXPORT_SYMBOL(default_wake_function);

5697

-+

5698

-+static inline void check_task_changed(struct task_struct *p, struct rq *rq)

5699

-+{

5700

-+	/* Trigger resched if task sched_prio has been modified. */

5701

-+	if (task_on_rq_queued(p) && task_sched_prio_idx(p, rq) != p->sq_idx) {

5702

-+		requeue_task(p, rq);

5703

-+		check_preempt_curr(rq);

5704

-+	}

5705

-+}

5706

-+

5707

-+static void __setscheduler_prio(struct task_struct *p, int prio)

5708

-+{

5709

-+	p->prio = prio;

5710

-+}

5711

-+

5712

-+#ifdef CONFIG_RT_MUTEXES

5713

-+

5714

-+static inline int __rt_effective_prio(struct task_struct *pi_task, int prio)

5715

-+{

5716

-+	if (pi_task)

5717

-+		prio = min(prio, pi_task->prio);

5718

-+

5719

-+	return prio;

5720

-+}

5721

-+

5722

-+static inline int rt_effective_prio(struct task_struct *p, int prio)

5723

-+{

5724

-+	struct task_struct *pi_task = rt_mutex_get_top_task(p);

5725

-+

5726

-+	return __rt_effective_prio(pi_task, prio);

5727

-+}

5728

-+

5729

-+/*

5730

-+ * rt_mutex_setprio - set the current priority of a task

5731

-+ * @p: task to boost

5732

-+ * @pi_task: donor task

5733

-+ *

5734

-+ * This function changes the 'effective' priority of a task. It does

5735

-+ * not touch ->normal_prio like __setscheduler().

5736

-+ *

5737

-+ * Used by the rt_mutex code to implement priority inheritance

5738

-+ * logic. Call site only calls if the priority of the task changed.

5739

-+ */

5740

-+void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task)

5741

-+{

5742

-+	int prio;

5743

-+	struct rq *rq;

5744

-+	raw_spinlock_t *lock;

5745

-+

5746

-+	/* XXX used to be waiter->prio, not waiter->task->prio */

5747

-+	prio = __rt_effective_prio(pi_task, p->normal_prio);

5748

-+

5749

-+	/*

5750

-+	 * If nothing changed; bail early.

5751

-+	 */

5752

-+	if (p->pi_top_task == pi_task && prio == p->prio)

5753

-+		return;

5754

-+

5755

-+	rq = __task_access_lock(p, &lock);

5756

-+	/*

5757

-+	 * Set under pi_lock && rq->lock, such that the value can be used under

5758

-+	 * either lock.

5759

-+	 *

5760

-+	 * Note that there is loads of tricky to make this pointer cache work

5761

-+	 * right. rt_mutex_slowunlock()+rt_mutex_postunlock() work together to

5762

-+	 * ensure a task is de-boosted (pi_task is set to NULL) before the

5763

-+	 * task is allowed to run again (and can exit). This ensures the pointer

5764

-+	 * points to a blocked task -- which guarantees the task is present.

5765

-+	 */

5766

-+	p->pi_top_task = pi_task;

5767

-+

5768

-+	/*

5769

-+	 * For FIFO/RR we only need to set prio, if that matches we're done.

5770

-+	 */

5771

-+	if (prio == p->prio)

5772

-+		goto out_unlock;

5773

-+

5774

-+	/*

5775

-+	 * Idle task boosting is a nono in general. There is one

5776

-+	 * exception, when PREEMPT_RT and NOHZ is active:

5777

-+	 *

5778

-+	 * The idle task calls get_next_timer_interrupt() and holds

5779

-+	 * the timer wheel base->lock on the CPU and another CPU wants

5780

-+	 * to access the timer (probably to cancel it). We can safely

5781

-+	 * ignore the boosting request, as the idle CPU runs this code

5782

-+	 * with interrupts disabled and will complete the lock

5783

-+	 * protected section without being interrupted. So there is no

5784

-+	 * real need to boost.

5785

-+	 */

5786

-+	if (unlikely(p == rq->idle)) {

5787

-+		WARN_ON(p != rq->curr);

5788

-+		WARN_ON(p->pi_blocked_on);

5789

-+		goto out_unlock;

5790

-+	}

5791

-+

5792

-+	trace_sched_pi_setprio(p, pi_task);

5793

-+

5794

-+	__setscheduler_prio(p, prio);

5795

-+

5796

-+	check_task_changed(p, rq);

5797

-+out_unlock:

5798

-+	/* Avoid rq from going away on us: */

5799

-+	preempt_disable();

5800

-+

5801

-+	__balance_callbacks(rq);

5802

-+	__task_access_unlock(p, lock);

5803

-+

5804

-+	preempt_enable();

5805

-+}

5806

-+#else

5807

-+static inline int rt_effective_prio(struct task_struct *p, int prio)

5808

-+{

5809

-+	return prio;

5810

-+}

5811

-+#endif

5812

-+

5813

-+void set_user_nice(struct task_struct *p, long nice)

5814

-+{

5815

-+	unsigned long flags;

5816

-+	struct rq *rq;

5817

-+	raw_spinlock_t *lock;

5818

-+

5819

-+	if (task_nice(p) == nice || nice < MIN_NICE || nice > MAX_NICE)

5820

-+		return;

5821

-+	/*

5822

-+	 * We have to be careful, if called from sys_setpriority(),

5823

-+	 * the task might be in the middle of scheduling on another CPU.

5824

-+	 */

5825

-+	raw_spin_lock_irqsave(&p->pi_lock, flags);

5826

-+	rq = __task_access_lock(p, &lock);

5827

-+

5828

-+	p->static_prio = NICE_TO_PRIO(nice);

5829

-+	/*

5830

-+	 * The RT priorities are set via sched_setscheduler(), but we still

5831

-+	 * allow the 'normal' nice value to be set - but as expected

5832

-+	 * it won't have any effect on scheduling until the task is

5833

-+	 * not SCHED_NORMAL/SCHED_BATCH:

5834

-+	 */

5835

-+	if (task_has_rt_policy(p))

5836

-+		goto out_unlock;

5837

-+

5838

-+	p->prio = effective_prio(p);

5839

-+

5840

-+	check_task_changed(p, rq);

5841

-+out_unlock:

5842

-+	__task_access_unlock(p, lock);

5843

-+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);

5844

-+}

5845

-+EXPORT_SYMBOL(set_user_nice);

5846

-+

5847

-+/*

5848

-+ * can_nice - check if a task can reduce its nice value

5849

-+ * @p: task

5850

-+ * @nice: nice value

5851

-+ */

5852

-+int can_nice(const struct task_struct *p, const int nice)

5853

-+{

5854

-+	/* Convert nice value [19,-20] to rlimit style value [1,40] */

5855

-+	int nice_rlim = nice_to_rlimit(nice);

5856

-+

5857

-+	return (nice_rlim <= task_rlimit(p, RLIMIT_NICE) ||

5858

-+		capable(CAP_SYS_NICE));

5859

-+}

5860

-+

5861

-+#ifdef __ARCH_WANT_SYS_NICE

5862

-+

5863

-+/*

5864

-+ * sys_nice - change the priority of the current process.

5865

-+ * @increment: priority increment

5866

-+ *

5867

-+ * sys_setpriority is a more generic, but much slower function that

5868

-+ * does similar things.

5869

-+ */

5870

-+SYSCALL_DEFINE1(nice, int, increment)

5871

-+{

5872

-+	long nice, retval;

5873

-+

5874

-+	/*

5875

-+	 * Setpriority might change our priority at the same moment.

5876

-+	 * We don't have to worry. Conceptually one call occurs first

5877

-+	 * and we have a single winner.

5878

-+	 */

5879

-+

5880

-+	increment = clamp(increment, -NICE_WIDTH, NICE_WIDTH);

5881

-+	nice = task_nice(current) + increment;

5882

-+

5883

-+	nice = clamp_val(nice, MIN_NICE, MAX_NICE);

5884

-+	if (increment < 0 && !can_nice(current, nice))

5885

-+		return -EPERM;

5886

-+

5887

-+	retval = security_task_setnice(current, nice);

5888

-+	if (retval)

5889

-+		return retval;

5890

-+

5891

-+	set_user_nice(current, nice);

5892

-+	return 0;

5893

-+}

5894

-+

5895

-+#endif

5896

-+

5897

-+/**

5898

-+ * task_prio - return the priority value of a given task.

5899

-+ * @p: the task in question.

5900

-+ *

5901

-+ * Return: The priority value as seen by users in /proc.

5902

-+ *

5903

-+ * sched policy         return value   kernel prio    user prio/nice

5904

-+ *

5905

-+ * (BMQ)normal, batch, idle[0 ... 53]  [100 ... 139]          0/[-20 ... 19]/[-7 ... 7]

5906

-+ * (PDS)normal, batch, idle[0 ... 39]            100          0/[-20 ... 19]

5907

-+ * fifo, rr             [-1 ... -100]     [99 ... 0]  [0 ... 99]

5908

-+ */

5909

-+int task_prio(const struct task_struct *p)

5910

-+{

5911

-+	return (p->prio < MAX_RT_PRIO) ? p->prio - MAX_RT_PRIO :

5912

-+		task_sched_prio_normal(p, task_rq(p));

5913

-+}

5914

-+

5915

-+/**

5916

-+ * idle_cpu - is a given CPU idle currently?

5917

-+ * @cpu: the processor in question.

5918

-+ *

5919

-+ * Return: 1 if the CPU is currently idle. 0 otherwise.

5920

-+ */

5921

-+int idle_cpu(int cpu)

5922

-+{

5923

-+	struct rq *rq = cpu_rq(cpu);

5924

-+

5925

-+	if (rq->curr != rq->idle)

5926

-+		return 0;

5927

-+

5928

-+	if (rq->nr_running)

5929

-+		return 0;

5930

-+

5931

-+#ifdef CONFIG_SMP

5932

-+	if (rq->ttwu_pending)

5933

-+		return 0;

5934

-+#endif

5935

-+

5936

-+	return 1;

5937

-+}

5938

-+

5939

-+/**

5940

-+ * idle_task - return the idle task for a given CPU.

5941

-+ * @cpu: the processor in question.

5942

-+ *

5943

-+ * Return: The idle task for the cpu @cpu.

5944

-+ */

5945

-+struct task_struct *idle_task(int cpu)

5946

-+{

5947

-+	return cpu_rq(cpu)->idle;

5948

-+}

5949

-+

5950

-+/**

5951

-+ * find_process_by_pid - find a process with a matching PID value.

5952

-+ * @pid: the pid in question.

5953

-+ *

5954

-+ * The task of @pid, if found. %NULL otherwise.

5955

-+ */

5956

-+static inline struct task_struct *find_process_by_pid(pid_t pid)

5957

-+{

5958

-+	return pid ? find_task_by_vpid(pid) : current;

5959

-+}

5960

-+

5961

-+/*

5962

-+ * sched_setparam() passes in -1 for its policy, to let the functions

5963

-+ * it calls know not to change it.

5964

-+ */

5965

-+#define SETPARAM_POLICY -1

5966

-+

5967

-+static void __setscheduler_params(struct task_struct *p,

5968

-+		const struct sched_attr *attr)

5969

-+{

5970

-+	int policy = attr->sched_policy;

5971

-+

5972

-+	if (policy == SETPARAM_POLICY)

5973

-+		policy = p->policy;

5974

-+

5975

-+	p->policy = policy;

5976

-+

5977

-+	/*

5978

-+	 * allow normal nice value to be set, but will not have any

5979

-+	 * effect on scheduling until the task not SCHED_NORMAL/

5980

-+	 * SCHED_BATCH

5981

-+	 */

5982

-+	p->static_prio = NICE_TO_PRIO(attr->sched_nice);

5983

-+

5984

-+	/*

5985

-+	 * __sched_setscheduler() ensures attr->sched_priority == 0 when

5986

-+	 * !rt_policy. Always setting this ensures that things like

5987

-+	 * getparam()/getattr() don't report silly values for !rt tasks.

5988

-+	 */

5989

-+	p->rt_priority = attr->sched_priority;

5990

-+	p->normal_prio = normal_prio(p);

5991

-+}

5992

-+

5993

-+/*

5994

-+ * check the target process has a UID that matches the current process's

5995

-+ */

5996

-+static bool check_same_owner(struct task_struct *p)

5997

-+{

5998

-+	const struct cred *cred = current_cred(), *pcred;

5999

-+	bool match;

6000

-+

6001

-+	rcu_read_lock();

6002

-+	pcred = __task_cred(p);

6003

-+	match = (uid_eq(cred->euid, pcred->euid) ||

6004

-+		 uid_eq(cred->euid, pcred->uid));

6005

-+	rcu_read_unlock();

6006

-+	return match;

6007

-+}

6008

-+

6009

-+static int __sched_setscheduler(struct task_struct *p,

6010

-+				const struct sched_attr *attr,

6011

-+				bool user, bool pi)

6012

-+{

6013

-+	const struct sched_attr dl_squash_attr = {

6014

-+		.size		= sizeof(struct sched_attr),

6015

-+		.sched_policy	= SCHED_FIFO,

6016

-+		.sched_nice	= 0,

6017

-+		.sched_priority = 99,

6018

-+	};

6019

-+	int oldpolicy = -1, policy = attr->sched_policy;

6020

-+	int retval, newprio;

6021

-+	struct callback_head *head;

6022

-+	unsigned long flags;

6023

-+	struct rq *rq;

6024

-+	int reset_on_fork;

6025

-+	raw_spinlock_t *lock;

6026

-+

6027

-+	/* The pi code expects interrupts enabled */

6028

-+	BUG_ON(pi && in_interrupt());

6029

-+

6030

-+	/*

6031

-+	 * Alt schedule FW supports SCHED_DEADLINE by squash it as prio 0 SCHED_FIFO

6032

-+	 */

6033

-+	if (unlikely(SCHED_DEADLINE == policy)) {

6034

-+		attr = &dl_squash_attr;

6035

-+		policy = attr->sched_policy;

6036

-+	}

6037

-+recheck:

6038

-+	/* Double check policy once rq lock held */

6039

-+	if (policy < 0) {

6040

-+		reset_on_fork = p->sched_reset_on_fork;

6041

-+		policy = oldpolicy = p->policy;

6042

-+	} else {

6043

-+		reset_on_fork = !!(attr->sched_flags & SCHED_RESET_ON_FORK);

6044

-+

6045

-+		if (policy > SCHED_IDLE)

6046

-+			return -EINVAL;

6047

-+	}

6048

-+

6049

-+	if (attr->sched_flags & ~(SCHED_FLAG_ALL))

6050

-+		return -EINVAL;

6051

-+

6052

-+	/*

6053

-+	 * Valid priorities for SCHED_FIFO and SCHED_RR are

6054

-+	 * 1..MAX_RT_PRIO-1, valid priority for SCHED_NORMAL and

6055

-+	 * SCHED_BATCH and SCHED_IDLE is 0.

6056

-+	 */

6057

-+	if (attr->sched_priority < 0 ||

6058

-+	    (p->mm && attr->sched_priority > MAX_RT_PRIO - 1) ||

6059

-+	    (!p->mm && attr->sched_priority > MAX_RT_PRIO - 1))

6060

-+		return -EINVAL;

6061

-+	if ((SCHED_RR == policy || SCHED_FIFO == policy) !=

6062

-+	    (attr->sched_priority != 0))

6063

-+		return -EINVAL;

6064

-+

6065

-+	/*

6066

-+	 * Allow unprivileged RT tasks to decrease priority:

6067

-+	 */

6068

-+	if (user && !capable(CAP_SYS_NICE)) {

6069

-+		if (SCHED_FIFO == policy || SCHED_RR == policy) {

6070

-+			unsigned long rlim_rtprio =

6071

-+					task_rlimit(p, RLIMIT_RTPRIO);

6072

-+

6073

-+			/* Can't set/change the rt policy */

6074

-+			if (policy != p->policy && !rlim_rtprio)

6075

-+				return -EPERM;

6076

-+

6077

-+			/* Can't increase priority */

6078

-+			if (attr->sched_priority > p->rt_priority &&

6079

-+			    attr->sched_priority > rlim_rtprio)

6080

-+				return -EPERM;

6081

-+		}

6082

-+

6083

-+		/* Can't change other user's priorities */

6084

-+		if (!check_same_owner(p))

6085

-+			return -EPERM;

6086

-+

6087

-+		/* Normal users shall not reset the sched_reset_on_fork flag */

6088

-+		if (p->sched_reset_on_fork && !reset_on_fork)

6089

-+			return -EPERM;

6090

-+	}

6091

-+

6092

-+	if (user) {

6093

-+		retval = security_task_setscheduler(p);

6094

-+		if (retval)

6095

-+			return retval;

6096

-+	}

6097

-+

6098

-+	if (pi)

6099

-+		cpuset_read_lock();

6100

-+

6101

-+	/*

6102

-+	 * Make sure no PI-waiters arrive (or leave) while we are

6103

-+	 * changing the priority of the task:

6104

-+	 */

6105

-+	raw_spin_lock_irqsave(&p->pi_lock, flags);

6106

-+

6107

-+	/*

6108

-+	 * To be able to change p->policy safely, task_access_lock()

6109

-+	 * must be called.

6110

-+	 * IF use task_access_lock() here:

6111

-+	 * For the task p which is not running, reading rq->stop is

6112

-+	 * racy but acceptable as ->stop doesn't change much.

6113

-+	 * An enhancemnet can be made to read rq->stop saftly.

6114

-+	 */

6115

-+	rq = __task_access_lock(p, &lock);

6116

-+

6117

-+	/*

6118

-+	 * Changing the policy of the stop threads its a very bad idea

6119

-+	 */

6120

-+	if (p == rq->stop) {

6121

-+		retval = -EINVAL;

6122

-+		goto unlock;

6123

-+	}

6124

-+

6125

-+	/*

6126

-+	 * If not changing anything there's no need to proceed further:

6127

-+	 */

6128

-+	if (unlikely(policy == p->policy)) {

6129

-+		if (rt_policy(policy) && attr->sched_priority != p->rt_priority)

6130

-+			goto change;

6131

-+		if (!rt_policy(policy) &&

6132

-+		    NICE_TO_PRIO(attr->sched_nice) != p->static_prio)

6133

-+			goto change;

6134

-+

6135

-+		p->sched_reset_on_fork = reset_on_fork;

6136

-+		retval = 0;

6137

-+		goto unlock;

6138

-+	}

6139

-+change:

6140

-+

6141

-+	/* Re-check policy now with rq lock held */

6142

-+	if (unlikely(oldpolicy != -1 && oldpolicy != p->policy)) {

6143

-+		policy = oldpolicy = -1;

6144

-+		__task_access_unlock(p, lock);

6145

-+		raw_spin_unlock_irqrestore(&p->pi_lock, flags);

6146

-+		if (pi)

6147

-+			cpuset_read_unlock();

6148

-+		goto recheck;

6149

-+	}

6150

-+

6151

-+	p->sched_reset_on_fork = reset_on_fork;

6152

-+

6153

-+	newprio = __normal_prio(policy, attr->sched_priority, NICE_TO_PRIO(attr->sched_nice));

6154

-+	if (pi) {

6155

-+		/*

6156

-+		 * Take priority boosted tasks into account. If the new

6157

-+		 * effective priority is unchanged, we just store the new

6158

-+		 * normal parameters and do not touch the scheduler class and

6159

-+		 * the runqueue. This will be done when the task deboost

6160

-+		 * itself.

6161

-+		 */

6162

-+		newprio = rt_effective_prio(p, newprio);

6163

-+	}

6164

-+

6165

-+	if (!(attr->sched_flags & SCHED_FLAG_KEEP_PARAMS)) {

6166

-+		__setscheduler_params(p, attr);

6167

-+		__setscheduler_prio(p, newprio);

6168

-+	}

6169

-+

6170

-+	check_task_changed(p, rq);

6171

-+

6172

-+	/* Avoid rq from going away on us: */

6173

-+	preempt_disable();

6174

-+	head = splice_balance_callbacks(rq);

6175

-+	__task_access_unlock(p, lock);

6176

-+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);

6177

-+

6178

-+	if (pi) {

6179

-+		cpuset_read_unlock();

6180

-+		rt_mutex_adjust_pi(p);

6181

-+	}

6182

-+

6183

-+	/* Run balance callbacks after we've adjusted the PI chain: */

6184

-+	balance_callbacks(rq, head);

6185

-+	preempt_enable();

6186

-+

6187

-+	return 0;

6188

-+

6189

-+unlock:

6190

-+	__task_access_unlock(p, lock);

6191

-+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);

6192

-+	if (pi)

6193

-+		cpuset_read_unlock();

6194

-+	return retval;

6195

-+}

6196

-+

6197

-+static int _sched_setscheduler(struct task_struct *p, int policy,

6198

-+			       const struct sched_param *param, bool check)

6199

-+{

6200

-+	struct sched_attr attr = {

6201

-+		.sched_policy   = policy,

6202

-+		.sched_priority = param->sched_priority,

6203

-+		.sched_nice     = PRIO_TO_NICE(p->static_prio),

6204

-+	};

6205

-+

6206

-+	/* Fixup the legacy SCHED_RESET_ON_FORK hack. */

6207

-+	if ((policy != SETPARAM_POLICY) && (policy & SCHED_RESET_ON_FORK)) {

6208

-+		attr.sched_flags |= SCHED_FLAG_RESET_ON_FORK;

6209

-+		policy &= ~SCHED_RESET_ON_FORK;

6210

-+		attr.sched_policy = policy;

6211

-+	}

6212

-+

6213

-+	return __sched_setscheduler(p, &attr, check, true);

6214

-+}

6215

-+

6216

-+/**

6217

-+ * sched_setscheduler - change the scheduling policy and/or RT priority of a thread.

6218

-+ * @p: the task in question.

6219

-+ * @policy: new policy.

6220

-+ * @param: structure containing the new RT priority.

6221

-+ *

6222

-+ * Use sched_set_fifo(), read its comment.

6223

-+ *

6224

-+ * Return: 0 on success. An error code otherwise.

6225

-+ *

6226

-+ * NOTE that the task may be already dead.

6227

-+ */

6228

-+int sched_setscheduler(struct task_struct *p, int policy,

6229

-+		       const struct sched_param *param)

6230

-+{

6231

-+	return _sched_setscheduler(p, policy, param, true);

6232

-+}

6233

-+

6234

-+int sched_setattr(struct task_struct *p, const struct sched_attr *attr)

6235

-+{

6236

-+	return __sched_setscheduler(p, attr, true, true);

6237

-+}

6238

-+

6239

-+int sched_setattr_nocheck(struct task_struct *p, const struct sched_attr *attr)

6240

-+{

6241

-+	return __sched_setscheduler(p, attr, false, true);

6242

-+}

6243

-+EXPORT_SYMBOL_GPL(sched_setattr_nocheck);

6244

-+

6245

-+/**

6246

-+ * sched_setscheduler_nocheck - change the scheduling policy and/or RT priority of a thread from kernelspace.

6247

-+ * @p: the task in question.

6248

-+ * @policy: new policy.

6249

-+ * @param: structure containing the new RT priority.

6250

-+ *

6251

-+ * Just like sched_setscheduler, only don't bother checking if the

6252

-+ * current context has permission.  For example, this is needed in

6253

-+ * stop_machine(): we create temporary high priority worker threads,

6254

-+ * but our caller might not have that capability.

6255

-+ *

6256

-+ * Return: 0 on success. An error code otherwise.

6257

-+ */

6258

-+int sched_setscheduler_nocheck(struct task_struct *p, int policy,

6259

-+			       const struct sched_param *param)

6260

-+{

6261

-+	return _sched_setscheduler(p, policy, param, false);

6262

-+}

6263

-+

6264

-+/*

6265

-+ * SCHED_FIFO is a broken scheduler model; that is, it is fundamentally

6266

-+ * incapable of resource management, which is the one thing an OS really should

6267

-+ * be doing.

6268

-+ *

6269

-+ * This is of course the reason it is limited to privileged users only.

6270

-+ *

6271

-+ * Worse still; it is fundamentally impossible to compose static priority

6272

-+ * workloads. You cannot take two correctly working static prio workloads

6273

-+ * and smash them together and still expect them to work.

6274

-+ *

6275

-+ * For this reason 'all' FIFO tasks the kernel creates are basically at:

6276

-+ *

6277

-+ *   MAX_RT_PRIO / 2

6278

-+ *

6279

-+ * The administrator _MUST_ configure the system, the kernel simply doesn't

6280

-+ * know enough information to make a sensible choice.

6281

-+ */

6282

-+void sched_set_fifo(struct task_struct *p)

6283

-+{

6284

-+	struct sched_param sp = { .sched_priority = MAX_RT_PRIO / 2 };

6285

-+	WARN_ON_ONCE(sched_setscheduler_nocheck(p, SCHED_FIFO, &sp) != 0);

6286

-+}

6287

-+EXPORT_SYMBOL_GPL(sched_set_fifo);

6288

-+

6289

-+/*

6290

-+ * For when you don't much care about FIFO, but want to be above SCHED_NORMAL.

6291

-+ */

6292

-+void sched_set_fifo_low(struct task_struct *p)

6293

-+{

6294

-+	struct sched_param sp = { .sched_priority = 1 };

6295

-+	WARN_ON_ONCE(sched_setscheduler_nocheck(p, SCHED_FIFO, &sp) != 0);

6296

-+}

6297

-+EXPORT_SYMBOL_GPL(sched_set_fifo_low);

6298

-+

6299

-+void sched_set_normal(struct task_struct *p, int nice)

6300

-+{

6301

-+	struct sched_attr attr = {

6302

-+		.sched_policy = SCHED_NORMAL,

6303

-+		.sched_nice = nice,

6304

-+	};

6305

-+	WARN_ON_ONCE(sched_setattr_nocheck(p, &attr) != 0);

6306

-+}

6307

-+EXPORT_SYMBOL_GPL(sched_set_normal);

6308

-+

6309

-+static int

6310

-+do_sched_setscheduler(pid_t pid, int policy, struct sched_param __user *param)

6311

-+{

6312

-+	struct sched_param lparam;

6313

-+	struct task_struct *p;

6314

-+	int retval;

6315

-+

6316

-+	if (!param || pid < 0)

6317

-+		return -EINVAL;

6318

-+	if (copy_from_user(&lparam, param, sizeof(struct sched_param)))

6319

-+		return -EFAULT;

6320

-+

6321

-+	rcu_read_lock();

6322

-+	retval = -ESRCH;

6323

-+	p = find_process_by_pid(pid);

6324

-+	if (likely(p))

6325

-+		get_task_struct(p);

6326

-+	rcu_read_unlock();

6327

-+

6328

-+	if (likely(p)) {

6329

-+		retval = sched_setscheduler(p, policy, &lparam);

6330

-+		put_task_struct(p);

6331

-+	}

6332

-+

6333

-+	return retval;

6334

-+}

6335

-+

6336

-+/*

6337

-+ * Mimics kernel/events/core.c perf_copy_attr().

6338

-+ */

6339

-+static int sched_copy_attr(struct sched_attr __user *uattr, struct sched_attr *attr)

6340

-+{

6341

-+	u32 size;

6342

-+	int ret;

6343

-+

6344

-+	/* Zero the full structure, so that a short copy will be nice: */

6345

-+	memset(attr, 0, sizeof(*attr));

6346

-+

6347

-+	ret = get_user(size, &uattr->size);

6348

-+	if (ret)

6349

-+		return ret;

6350

-+

6351

-+	/* ABI compatibility quirk: */

6352

-+	if (!size)

6353

-+		size = SCHED_ATTR_SIZE_VER0;

6354

-+

6355

-+	if (size < SCHED_ATTR_SIZE_VER0 || size > PAGE_SIZE)

6356

-+		goto err_size;

6357

-+

6358

-+	ret = copy_struct_from_user(attr, sizeof(*attr), uattr, size);

6359

-+	if (ret) {

6360

-+		if (ret == -E2BIG)

6361

-+			goto err_size;

6362

-+		return ret;

6363

-+	}

6364

-+

6365

-+	/*

6366

-+	 * XXX: Do we want to be lenient like existing syscalls; or do we want

6367

-+	 * to be strict and return an error on out-of-bounds values?

6368

-+	 */

6369

-+	attr->sched_nice = clamp(attr->sched_nice, -20, 19);

6370

-+

6371

-+	/* sched/core.c uses zero here but we already know ret is zero */

6372

-+	return 0;

6373

-+

6374

-+err_size:

6375

-+	put_user(sizeof(*attr), &uattr->size);

6376

-+	return -E2BIG;

6377

-+}

6378

-+

6379

-+/**

6380

-+ * sys_sched_setscheduler - set/change the scheduler policy and RT priority

6381

-+ * @pid: the pid in question.

6382

-+ * @policy: new policy.

6383

-+ *

6384

-+ * Return: 0 on success. An error code otherwise.

6385

-+ * @param: structure containing the new RT priority.

6386

-+ */

6387

-+SYSCALL_DEFINE3(sched_setscheduler, pid_t, pid, int, policy, struct sched_param __user *, param)

6388

-+{

6389

-+	if (policy < 0)

6390

-+		return -EINVAL;

6391

-+

6392

-+	return do_sched_setscheduler(pid, policy, param);

6393

-+}

6394

-+

6395

-+/**

6396

-+ * sys_sched_setparam - set/change the RT priority of a thread

6397

-+ * @pid: the pid in question.

6398

-+ * @param: structure containing the new RT priority.

6399

-+ *

6400

-+ * Return: 0 on success. An error code otherwise.

6401

-+ */

6402

-+SYSCALL_DEFINE2(sched_setparam, pid_t, pid, struct sched_param __user *, param)

6403

-+{

6404

-+	return do_sched_setscheduler(pid, SETPARAM_POLICY, param);

6405

-+}

6406

-+

6407

-+/**

6408

-+ * sys_sched_setattr - same as above, but with extended sched_attr

6409

-+ * @pid: the pid in question.

6410

-+ * @uattr: structure containing the extended parameters.

6411

-+ */

6412

-+SYSCALL_DEFINE3(sched_setattr, pid_t, pid, struct sched_attr __user *, uattr,

6413

-+			       unsigned int, flags)

6414

-+{

6415

-+	struct sched_attr attr;

6416

-+	struct task_struct *p;

6417

-+	int retval;

6418

-+

6419

-+	if (!uattr || pid < 0 || flags)

6420

-+		return -EINVAL;

6421

-+

6422

-+	retval = sched_copy_attr(uattr, &attr);

6423

-+	if (retval)

6424

-+		return retval;

6425

-+

6426

-+	if ((int)attr.sched_policy < 0)

6427

-+		return -EINVAL;

6428

-+

6429

-+	rcu_read_lock();

6430

-+	retval = -ESRCH;

6431

-+	p = find_process_by_pid(pid);

6432

-+	if (likely(p))

6433

-+		get_task_struct(p);

6434

-+	rcu_read_unlock();

6435

-+

6436

-+	if (likely(p)) {

6437

-+		retval = sched_setattr(p, &attr);

6438

-+		put_task_struct(p);

6439

-+	}

6440

-+

6441

-+	return retval;

6442

-+}

6443

-+

6444

-+/**

6445

-+ * sys_sched_getscheduler - get the policy (scheduling class) of a thread

6446

-+ * @pid: the pid in question.

6447

-+ *

6448

-+ * Return: On success, the policy of the thread. Otherwise, a negative error

6449

-+ * code.

6450

-+ */

6451

-+SYSCALL_DEFINE1(sched_getscheduler, pid_t, pid)

6452

-+{

6453

-+	struct task_struct *p;

6454

-+	int retval = -EINVAL;

6455

-+

6456

-+	if (pid < 0)

6457

-+		goto out_nounlock;

6458

-+

6459

-+	retval = -ESRCH;

6460

-+	rcu_read_lock();

6461

-+	p = find_process_by_pid(pid);

6462

-+	if (p) {

6463

-+		retval = security_task_getscheduler(p);

6464

-+		if (!retval)

6465

-+			retval = p->policy;

6466

-+	}

6467

-+	rcu_read_unlock();

6468

-+

6469

-+out_nounlock:

6470

-+	return retval;

6471

-+}

6472

-+

6473

-+/**

6474

-+ * sys_sched_getscheduler - get the RT priority of a thread

6475

-+ * @pid: the pid in question.

6476

-+ * @param: structure containing the RT priority.

6477

-+ *

6478

-+ * Return: On success, 0 and the RT priority is in @param. Otherwise, an error

6479

-+ * code.

6480

-+ */

6481

-+SYSCALL_DEFINE2(sched_getparam, pid_t, pid, struct sched_param __user *, param)

6482

-+{

6483

-+	struct sched_param lp = { .sched_priority = 0 };

6484

-+	struct task_struct *p;

6485

-+	int retval = -EINVAL;

6486

-+

6487

-+	if (!param || pid < 0)

6488

-+		goto out_nounlock;

6489

-+

6490

-+	rcu_read_lock();

6491

-+	p = find_process_by_pid(pid);

6492

-+	retval = -ESRCH;

6493

-+	if (!p)

6494

-+		goto out_unlock;

6495

-+

6496

-+	retval = security_task_getscheduler(p);

6497

-+	if (retval)

6498

-+		goto out_unlock;

6499

-+

6500

-+	if (task_has_rt_policy(p))

6501

-+		lp.sched_priority = p->rt_priority;

6502

-+	rcu_read_unlock();

6503

-+

6504

-+	/*

6505

-+	 * This one might sleep, we cannot do it with a spinlock held ...

6506

-+	 */

6507

-+	retval = copy_to_user(param, &lp, sizeof(*param)) ? -EFAULT : 0;

6508

-+

6509

-+out_nounlock:

6510

-+	return retval;

6511

-+

6512

-+out_unlock:

6513

-+	rcu_read_unlock();

6514

-+	return retval;

6515

-+}

6516

-+

6517

-+/*

6518

-+ * Copy the kernel size attribute structure (which might be larger

6519

-+ * than what user-space knows about) to user-space.

6520

-+ *

6521

-+ * Note that all cases are valid: user-space buffer can be larger or

6522

-+ * smaller than the kernel-space buffer. The usual case is that both

6523

-+ * have the same size.

6524

-+ */

6525

-+static int

6526

-+sched_attr_copy_to_user(struct sched_attr __user *uattr,

6527

-+			struct sched_attr *kattr,

6528

-+			unsigned int usize)

6529

-+{

6530

-+	unsigned int ksize = sizeof(*kattr);

6531

-+

6532

-+	if (!access_ok(uattr, usize))

6533

-+		return -EFAULT;

6534

-+

6535

-+	/*

6536

-+	 * sched_getattr() ABI forwards and backwards compatibility:

6537

-+	 *

6538

-+	 * If usize == ksize then we just copy everything to user-space and all is good.

6539

-+	 *

6540

-+	 * If usize < ksize then we only copy as much as user-space has space for,

6541

-+	 * this keeps ABI compatibility as well. We skip the rest.

6542

-+	 *

6543

-+	 * If usize > ksize then user-space is using a newer version of the ABI,

6544

-+	 * which part the kernel doesn't know about. Just ignore it - tooling can

6545

-+	 * detect the kernel's knowledge of attributes from the attr->size value

6546

-+	 * which is set to ksize in this case.

6547

-+	 */

6548

-+	kattr->size = min(usize, ksize);

6549

-+

6550

-+	if (copy_to_user(uattr, kattr, kattr->size))

6551

-+		return -EFAULT;

6552

-+

6553

-+	return 0;

6554

-+}

6555

-+

6556

-+/**

6557

-+ * sys_sched_getattr - similar to sched_getparam, but with sched_attr

6558

-+ * @pid: the pid in question.

6559

-+ * @uattr: structure containing the extended parameters.

6560

-+ * @usize: sizeof(attr) for fwd/bwd comp.

6561

-+ * @flags: for future extension.

6562

-+ */

6563

-+SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,

6564

-+		unsigned int, usize, unsigned int, flags)

6565

-+{

6566

-+	struct sched_attr kattr = { };

6567

-+	struct task_struct *p;

6568

-+	int retval;

6569

-+

6570

-+	if (!uattr || pid < 0 || usize > PAGE_SIZE ||

6571

-+	    usize < SCHED_ATTR_SIZE_VER0 || flags)

6572

-+		return -EINVAL;

6573

-+

6574

-+	rcu_read_lock();

6575

-+	p = find_process_by_pid(pid);

6576

-+	retval = -ESRCH;

6577

-+	if (!p)

6578

-+		goto out_unlock;

6579

-+

6580

-+	retval = security_task_getscheduler(p);

6581

-+	if (retval)

6582

-+		goto out_unlock;

6583

-+

6584

-+	kattr.sched_policy = p->policy;

6585

-+	if (p->sched_reset_on_fork)

6586

-+		kattr.sched_flags |= SCHED_FLAG_RESET_ON_FORK;

6587

-+	if (task_has_rt_policy(p))

6588

-+		kattr.sched_priority = p->rt_priority;

6589

-+	else

6590

-+		kattr.sched_nice = task_nice(p);

6591

-+	kattr.sched_flags &= SCHED_FLAG_ALL;

6592

-+

6593

-+#ifdef CONFIG_UCLAMP_TASK

6594

-+	kattr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value;

6595

-+	kattr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value;

6596

-+#endif

6597

-+

6598

-+	rcu_read_unlock();

6599

-+

6600

-+	return sched_attr_copy_to_user(uattr, &kattr, usize);

6601

-+

6602

-+out_unlock:

6603

-+	rcu_read_unlock();

6604

-+	return retval;

6605

-+}

6606

-+

6607

-+static int

6608

-+__sched_setaffinity(struct task_struct *p, const struct cpumask *mask)

6609

-+{

6610

-+	int retval;

6611

-+	cpumask_var_t cpus_allowed, new_mask;

6612

-+

6613

-+	if (!alloc_cpumask_var(&cpus_allowed, GFP_KERNEL))

6614

-+		return -ENOMEM;

6615

-+

6616

-+	if (!alloc_cpumask_var(&new_mask, GFP_KERNEL)) {

6617

-+		retval = -ENOMEM;

6618

-+		goto out_free_cpus_allowed;

6619

-+	}

6620

-+

6621

-+	cpuset_cpus_allowed(p, cpus_allowed);

6622

-+	cpumask_and(new_mask, mask, cpus_allowed);

6623

-+again:

6624

-+	retval = __set_cpus_allowed_ptr(p, new_mask, SCA_CHECK | SCA_USER);

6625

-+	if (retval)

6626

-+		goto out_free_new_mask;

6627

-+

6628

-+	cpuset_cpus_allowed(p, cpus_allowed);

6629

-+	if (!cpumask_subset(new_mask, cpus_allowed)) {

6630

-+		/*

6631

-+		 * We must have raced with a concurrent cpuset

6632

-+		 * update. Just reset the cpus_allowed to the

6633

-+		 * cpuset's cpus_allowed

6634

-+		 */

6635

-+		cpumask_copy(new_mask, cpus_allowed);

6636

-+		goto again;

6637

-+	}

6638

-+

6639

-+out_free_new_mask:

6640

-+	free_cpumask_var(new_mask);

6641

-+out_free_cpus_allowed:

6642

-+	free_cpumask_var(cpus_allowed);

6643

-+	return retval;

6644

-+}

6645

-+

6646

-+long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)

6647

-+{

6648

-+	struct task_struct *p;

6649

-+	int retval;

6650

-+

6651

-+	rcu_read_lock();

6652

-+

6653

-+	p = find_process_by_pid(pid);

6654

-+	if (!p) {

6655

-+		rcu_read_unlock();

6656

-+		return -ESRCH;

6657

-+	}

6658

-+

6659

-+	/* Prevent p going away */

6660

-+	get_task_struct(p);

6661

-+	rcu_read_unlock();

6662

-+

6663

-+	if (p->flags & PF_NO_SETAFFINITY) {

6664

-+		retval = -EINVAL;

6665

-+		goto out_put_task;

6666

-+	}

6667

-+

6668

-+	if (!check_same_owner(p)) {

6669

-+		rcu_read_lock();

6670

-+		if (!ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE)) {

6671

-+			rcu_read_unlock();

6672

-+			retval = -EPERM;

6673

-+			goto out_put_task;

6674

-+		}

6675

-+		rcu_read_unlock();

6676

-+	}

6677

-+

6678

-+	retval = security_task_setscheduler(p);

6679

-+	if (retval)

6680

-+		goto out_put_task;

6681

-+

6682

-+	retval = __sched_setaffinity(p, in_mask);

6683

-+out_put_task:

6684

-+	put_task_struct(p);

6685

-+	return retval;

6686

-+}

6687

-+

6688

-+static int get_user_cpu_mask(unsigned long __user *user_mask_ptr, unsigned len,

6689

-+			     struct cpumask *new_mask)

6690

-+{

6691

-+	if (len < cpumask_size())

6692

-+		cpumask_clear(new_mask);

6693

-+	else if (len > cpumask_size())

6694

-+		len = cpumask_size();

6695

-+

6696

-+	return copy_from_user(new_mask, user_mask_ptr, len) ? -EFAULT : 0;

6697

-+}

6698

-+

6699

-+/**

6700

-+ * sys_sched_setaffinity - set the CPU affinity of a process

6701

-+ * @pid: pid of the process

6702

-+ * @len: length in bytes of the bitmask pointed to by user_mask_ptr

6703

-+ * @user_mask_ptr: user-space pointer to the new CPU mask

6704

-+ *

6705

-+ * Return: 0 on success. An error code otherwise.

6706

-+ */

6707

-+SYSCALL_DEFINE3(sched_setaffinity, pid_t, pid, unsigned int, len,

6708

-+		unsigned long __user *, user_mask_ptr)

6709

-+{

6710

-+	cpumask_var_t new_mask;

6711

-+	int retval;

6712

-+

6713

-+	if (!alloc_cpumask_var(&new_mask, GFP_KERNEL))

6714

-+		return -ENOMEM;

6715

-+

6716

-+	retval = get_user_cpu_mask(user_mask_ptr, len, new_mask);

6717

-+	if (retval == 0)

6718

-+		retval = sched_setaffinity(pid, new_mask);

6719

-+	free_cpumask_var(new_mask);

6720

-+	return retval;

6721

-+}

6722

-+

6723

-+long sched_getaffinity(pid_t pid, cpumask_t *mask)

6724

-+{

6725

-+	struct task_struct *p;

6726

-+	raw_spinlock_t *lock;

6727

-+	unsigned long flags;

6728

-+	int retval;

6729

-+

6730

-+	rcu_read_lock();

6731

-+

6732

-+	retval = -ESRCH;

6733

-+	p = find_process_by_pid(pid);

6734

-+	if (!p)

6735

-+		goto out_unlock;

6736

-+

6737

-+	retval = security_task_getscheduler(p);

6738

-+	if (retval)

6739

-+		goto out_unlock;

6740

-+

6741

-+	task_access_lock_irqsave(p, &lock, &flags);

6742

-+	cpumask_and(mask, &p->cpus_mask, cpu_active_mask);

6743

-+	task_access_unlock_irqrestore(p, lock, &flags);

6744

-+

6745

-+out_unlock:

6746

-+	rcu_read_unlock();

6747

-+

6748

-+	return retval;

6749

-+}

6750

-+

6751

-+/**

6752

-+ * sys_sched_getaffinity - get the CPU affinity of a process

6753

-+ * @pid: pid of the process

6754

-+ * @len: length in bytes of the bitmask pointed to by user_mask_ptr

6755

-+ * @user_mask_ptr: user-space pointer to hold the current CPU mask

6756

-+ *

6757

-+ * Return: size of CPU mask copied to user_mask_ptr on success. An

6758

-+ * error code otherwise.

6759

-+ */

6760

-+SYSCALL_DEFINE3(sched_getaffinity, pid_t, pid, unsigned int, len,

6761

-+		unsigned long __user *, user_mask_ptr)

6762

-+{

6763

-+	int ret;

6764

-+	cpumask_var_t mask;

6765

-+

6766

-+	if ((len * BITS_PER_BYTE) < nr_cpu_ids)

6767

-+		return -EINVAL;

6768

-+	if (len & (sizeof(unsigned long)-1))

6769

-+		return -EINVAL;

6770

-+

6771

-+	if (!alloc_cpumask_var(&mask, GFP_KERNEL))

6772

-+		return -ENOMEM;

6773

-+

6774

-+	ret = sched_getaffinity(pid, mask);

6775

-+	if (ret == 0) {

6776

-+		unsigned int retlen = min_t(size_t, len, cpumask_size());

6777

-+

6778

-+		if (copy_to_user(user_mask_ptr, mask, retlen))

6779

-+			ret = -EFAULT;

6780

-+		else

6781

-+			ret = retlen;

6782

-+	}

6783

-+	free_cpumask_var(mask);

6784

-+

6785

-+	return ret;

6786

-+}

6787

-+

6788

-+static void do_sched_yield(void)

6789

-+{

6790

-+	struct rq *rq;

6791

-+	struct rq_flags rf;

6792

-+

6793

-+	if (!sched_yield_type)

6794

-+		return;

6795

-+

6796

-+	rq = this_rq_lock_irq(&rf);

6797

-+

6798

-+	schedstat_inc(rq->yld_count);

6799

-+

6800

-+	if (1 == sched_yield_type) {

6801

-+		if (!rt_task(current))

6802

-+			do_sched_yield_type_1(current, rq);

6803

-+	} else if (2 == sched_yield_type) {

6804

-+		if (rq->nr_running > 1)

6805

-+			rq->skip = current;

6806

-+	}

6807

-+

6808

-+	preempt_disable();

6809

-+	raw_spin_unlock_irq(&rq->lock);

6810

-+	sched_preempt_enable_no_resched();

6811

-+

6812

-+	schedule();

6813

-+}

6814

-+

6815

-+/**

6816

-+ * sys_sched_yield - yield the current processor to other threads.

6817

-+ *

6818

-+ * This function yields the current CPU to other tasks. If there are no

6819

-+ * other threads running on this CPU then this function will return.

6820

-+ *

6821

-+ * Return: 0.

6822

-+ */

6823

-+SYSCALL_DEFINE0(sched_yield)

6824

-+{

6825

-+	do_sched_yield();

6826

-+	return 0;

6827

-+}

6828

-+

6829

-+#if !defined(CONFIG_PREEMPTION) || defined(CONFIG_PREEMPT_DYNAMIC)

6830

-+int __sched __cond_resched(void)

6831

-+{

6832

-+	if (should_resched(0)) {

6833

-+		preempt_schedule_common();

6834

-+		return 1;

6835

-+	}

6836

-+	/*

6837

-+	 * In preemptible kernels, ->rcu_read_lock_nesting tells the tick

6838

-+	 * whether the current CPU is in an RCU read-side critical section,

6839

-+	 * so the tick can report quiescent states even for CPUs looping

6840

-+	 * in kernel context.  In contrast, in non-preemptible kernels,

6841

-+	 * RCU readers leave no in-memory hints, which means that CPU-bound

6842

-+	 * processes executing in kernel context might never report an

6843

-+	 * RCU quiescent state.  Therefore, the following code causes

6844

-+	 * cond_resched() to report a quiescent state, but only when RCU

6845

-+	 * is in urgent need of one.

6846

-+	 */

6847

-+#ifndef CONFIG_PREEMPT_RCU

6848

-+	rcu_all_qs();

6849

-+#endif

6850

-+	return 0;

6851

-+}

6852

-+EXPORT_SYMBOL(__cond_resched);

6853

-+#endif

6854

-+

6855

-+#ifdef CONFIG_PREEMPT_DYNAMIC

6856

-+DEFINE_STATIC_CALL_RET0(cond_resched, __cond_resched);

6857

-+EXPORT_STATIC_CALL_TRAMP(cond_resched);

6858

-+

6859

-+DEFINE_STATIC_CALL_RET0(might_resched, __cond_resched);

6860

-+EXPORT_STATIC_CALL_TRAMP(might_resched);

6861

-+#endif

6862

-+

6863

-+/*

6864

-+ * __cond_resched_lock() - if a reschedule is pending, drop the given lock,

6865

-+ * call schedule, and on return reacquire the lock.

6866

-+ *

6867

-+ * This works OK both with and without CONFIG_PREEMPTION.  We do strange low-level

6868

-+ * operations here to prevent schedule() from being called twice (once via

6869

-+ * spin_unlock(), once by hand).

6870

-+ */

6871

-+int __cond_resched_lock(spinlock_t *lock)

6872

-+{

6873

-+	int resched = should_resched(PREEMPT_LOCK_OFFSET);

6874

-+	int ret = 0;

6875

-+

6876

-+	lockdep_assert_held(lock);

6877

-+

6878

-+	if (spin_needbreak(lock) || resched) {

6879

-+		spin_unlock(lock);

6880

-+		if (resched)

6881

-+			preempt_schedule_common();

6882

-+		else

6883

-+			cpu_relax();

6884

-+		ret = 1;

6885

-+		spin_lock(lock);

6886

-+	}

6887

-+	return ret;

6888

-+}

6889

-+EXPORT_SYMBOL(__cond_resched_lock);

6890

-+

6891

-+int __cond_resched_rwlock_read(rwlock_t *lock)

6892

-+{

6893

-+	int resched = should_resched(PREEMPT_LOCK_OFFSET);

6894

-+	int ret = 0;

6895

-+

6896

-+	lockdep_assert_held_read(lock);

6897

-+

6898

-+	if (rwlock_needbreak(lock) || resched) {

6899

-+		read_unlock(lock);

6900

-+		if (resched)

6901

-+			preempt_schedule_common();

6902

-+		else

6903

-+			cpu_relax();

6904

-+		ret = 1;

6905

-+		read_lock(lock);

6906

-+	}

6907

-+	return ret;

6908

-+}

6909

-+EXPORT_SYMBOL(__cond_resched_rwlock_read);

6910

-+

6911

-+int __cond_resched_rwlock_write(rwlock_t *lock)

6912

-+{

6913

-+	int resched = should_resched(PREEMPT_LOCK_OFFSET);

6914

-+	int ret = 0;

6915

-+

6916

-+	lockdep_assert_held_write(lock);

6917

-+

6918

-+	if (rwlock_needbreak(lock) || resched) {

6919

-+		write_unlock(lock);

6920

-+		if (resched)

6921

-+			preempt_schedule_common();

6922

-+		else

6923

-+			cpu_relax();

6924

-+		ret = 1;

6925

-+		write_lock(lock);

6926

-+	}

6927

-+	return ret;

6928

-+}

6929

-+EXPORT_SYMBOL(__cond_resched_rwlock_write);

6930

-+

6931

-+/**

6932

-+ * yield - yield the current processor to other threads.

6933

-+ *

6934

-+ * Do not ever use this function, there's a 99% chance you're doing it wrong.

6935

-+ *

6936

-+ * The scheduler is at all times free to pick the calling task as the most

6937

-+ * eligible task to run, if removing the yield() call from your code breaks

6938

-+ * it, it's already broken.

6939

-+ *

6940

-+ * Typical broken usage is:

6941

-+ *

6942

-+ * while (!event)

6943

-+ * 	yield();

6944

-+ *

6945

-+ * where one assumes that yield() will let 'the other' process run that will

6946

-+ * make event true. If the current task is a SCHED_FIFO task that will never

6947

-+ * happen. Never use yield() as a progress guarantee!!

6948

-+ *

6949

-+ * If you want to use yield() to wait for something, use wait_event().

6950

-+ * If you want to use yield() to be 'nice' for others, use cond_resched().

6951

-+ * If you still want to use yield(), do not!

6952

-+ */

6953

-+void __sched yield(void)

6954

-+{

6955

-+	set_current_state(TASK_RUNNING);

6956

-+	do_sched_yield();

6957

-+}

6958

-+EXPORT_SYMBOL(yield);

6959

-+

6960

-+/**

6961

-+ * yield_to - yield the current processor to another thread in

6962

-+ * your thread group, or accelerate that thread toward the

6963

-+ * processor it's on.

6964

-+ * @p: target task

6965

-+ * @preempt: whether task preemption is allowed or not

6966

-+ *

6967

-+ * It's the caller's job to ensure that the target task struct

6968

-+ * can't go away on us before we can do any checks.

6969

-+ *

6970

-+ * In Alt schedule FW, yield_to is not supported.

6971

-+ *

6972

-+ * Return:

6973

-+ *	true (>0) if we indeed boosted the target task.

6974

-+ *	false (0) if we failed to boost the target.

6975

-+ *	-ESRCH if there's no task to yield to.

6976

-+ */

6977

-+int __sched yield_to(struct task_struct *p, bool preempt)

6978

-+{

6979

-+	return 0;

6980

-+}

6981

-+EXPORT_SYMBOL_GPL(yield_to);

6982

-+

6983

-+int io_schedule_prepare(void)

6984

-+{

6985

-+	int old_iowait = current->in_iowait;

6986

-+

6987

-+	current->in_iowait = 1;

6988

-+	blk_schedule_flush_plug(current);

6989

-+

6990

-+	return old_iowait;

6991

-+}

6992

-+

6993

-+void io_schedule_finish(int token)

6994

-+{

6995

-+	current->in_iowait = token;

6996

-+}

6997

-+

6998

-+/*

6999

-+ * This task is about to go to sleep on IO.  Increment rq->nr_iowait so

7000

-+ * that process accounting knows that this is a task in IO wait state.

7001

-+ *

7002

-+ * But don't do that if it is a deliberate, throttling IO wait (this task

7003

-+ * has set its backing_dev_info: the queue against which it should throttle)

7004

-+ */

7005

-+

7006

-+long __sched io_schedule_timeout(long timeout)

7007

-+{

7008

-+	int token;

7009

-+	long ret;

7010

-+

7011

-+	token = io_schedule_prepare();

7012

-+	ret = schedule_timeout(timeout);

7013

-+	io_schedule_finish(token);

7014

-+

7015

-+	return ret;

7016

-+}

7017

-+EXPORT_SYMBOL(io_schedule_timeout);

7018

-+

7019

-+void __sched io_schedule(void)

7020

-+{

7021

-+	int token;

7022

-+

7023

-+	token = io_schedule_prepare();

7024

-+	schedule();

7025

-+	io_schedule_finish(token);

7026

-+}

7027

-+EXPORT_SYMBOL(io_schedule);

7028

-+

7029

-+/**

7030

-+ * sys_sched_get_priority_max - return maximum RT priority.

7031

-+ * @policy: scheduling class.

7032

-+ *

7033

-+ * Return: On success, this syscall returns the maximum

7034

-+ * rt_priority that can be used by a given scheduling class.

7035

-+ * On failure, a negative error code is returned.

7036

-+ */

7037

-+SYSCALL_DEFINE1(sched_get_priority_max, int, policy)

7038

-+{

7039

-+	int ret = -EINVAL;

7040

-+

7041

-+	switch (policy) {

7042

-+	case SCHED_FIFO:

7043

-+	case SCHED_RR:

7044

-+		ret = MAX_RT_PRIO - 1;

7045

-+		break;

7046

-+	case SCHED_NORMAL:

7047

-+	case SCHED_BATCH:

7048

-+	case SCHED_IDLE:

7049

-+		ret = 0;

7050

-+		break;

7051

-+	}

7052

-+	return ret;

7053

-+}

7054

-+

7055

-+/**

7056

-+ * sys_sched_get_priority_min - return minimum RT priority.

7057

-+ * @policy: scheduling class.

7058

-+ *

7059

-+ * Return: On success, this syscall returns the minimum

7060

-+ * rt_priority that can be used by a given scheduling class.

7061

-+ * On failure, a negative error code is returned.

7062

-+ */

7063

-+SYSCALL_DEFINE1(sched_get_priority_min, int, policy)

7064

-+{

7065

-+	int ret = -EINVAL;

7066

-+

7067

-+	switch (policy) {

7068

-+	case SCHED_FIFO:

7069

-+	case SCHED_RR:

7070

-+		ret = 1;

7071

-+		break;

7072

-+	case SCHED_NORMAL:

7073

-+	case SCHED_BATCH:

7074

-+	case SCHED_IDLE:

7075

-+		ret = 0;

7076

-+		break;

7077

-+	}

7078

-+	return ret;

7079

-+}

7080

-+

7081

-+static int sched_rr_get_interval(pid_t pid, struct timespec64 *t)

7082

-+{

7083

-+	struct task_struct *p;

7084

-+	int retval;

7085

-+

7086

-+	alt_sched_debug();

7087

-+

7088

-+	if (pid < 0)

7089

-+		return -EINVAL;

7090

-+

7091

-+	retval = -ESRCH;

7092

-+	rcu_read_lock();

7093

-+	p = find_process_by_pid(pid);

7094

-+	if (!p)

7095

-+		goto out_unlock;

7096

-+

7097

-+	retval = security_task_getscheduler(p);

7098

-+	if (retval)

7099

-+		goto out_unlock;

7100

-+	rcu_read_unlock();

7101

-+

7102

-+	*t = ns_to_timespec64(sched_timeslice_ns);

7103

-+	return 0;

7104

-+

7105

-+out_unlock:

7106

-+	rcu_read_unlock();

7107

-+	return retval;

7108

-+}

7109

-+

7110

-+/**

7111

-+ * sys_sched_rr_get_interval - return the default timeslice of a process.

7112

-+ * @pid: pid of the process.

7113

-+ * @interval: userspace pointer to the timeslice value.

7114

-+ *

7115

-+ *

7116

-+ * Return: On success, 0 and the timeslice is in @interval. Otherwise,

7117

-+ * an error code.

7118

-+ */

7119

-+SYSCALL_DEFINE2(sched_rr_get_interval, pid_t, pid,

7120

-+		struct __kernel_timespec __user *, interval)

7121

-+{

7122

-+	struct timespec64 t;

7123

-+	int retval = sched_rr_get_interval(pid, &t);

7124

-+

7125

-+	if (retval == 0)

7126

-+		retval = put_timespec64(&t, interval);

7127

-+

7128

-+	return retval;

7129

-+}

7130

-+

7131

-+#ifdef CONFIG_COMPAT_32BIT_TIME

7132

-+SYSCALL_DEFINE2(sched_rr_get_interval_time32, pid_t, pid,

7133

-+		struct old_timespec32 __user *, interval)

7134

-+{

7135

-+	struct timespec64 t;

7136

-+	int retval = sched_rr_get_interval(pid, &t);

7137

-+

7138

-+	if (retval == 0)

7139

-+		retval = put_old_timespec32(&t, interval);

7140

-+	return retval;

7141

-+}

7142

-+#endif

7143

-+

7144

-+void sched_show_task(struct task_struct *p)

7145

-+{

7146

-+	unsigned long free = 0;

7147

-+	int ppid;

7148

-+

7149

-+	if (!try_get_task_stack(p))

7150

-+		return;

7151

-+

7152

-+	pr_info("task:%-15.15s state:%c", p->comm, task_state_to_char(p));

7153

-+

7154

-+	if (task_is_running(p))

7155

-+		pr_cont("  running task    ");

7156

-+#ifdef CONFIG_DEBUG_STACK_USAGE

7157

-+	free = stack_not_used(p);

7158

-+#endif

7159

-+	ppid = 0;

7160

-+	rcu_read_lock();

7161

-+	if (pid_alive(p))

7162

-+		ppid = task_pid_nr(rcu_dereference(p->real_parent));

7163

-+	rcu_read_unlock();

7164

-+	pr_cont(" stack:%5lu pid:%5d ppid:%6d flags:0x%08lx\n",

7165

-+		free, task_pid_nr(p), ppid,

7166

-+		(unsigned long)task_thread_info(p)->flags);

7167

-+

7168

-+	print_worker_info(KERN_INFO, p);

7169

-+	print_stop_info(KERN_INFO, p);

7170

-+	show_stack(p, NULL, KERN_INFO);

7171

-+	put_task_stack(p);

7172

-+}

7173

-+EXPORT_SYMBOL_GPL(sched_show_task);

7174

-+

7175

-+static inline bool

7176

-+state_filter_match(unsigned long state_filter, struct task_struct *p)

7177

-+{

7178

-+	unsigned int state = READ_ONCE(p->__state);

7179

-+

7180

-+	/* no filter, everything matches */

7181

-+	if (!state_filter)

7182

-+		return true;

7183

-+

7184

-+	/* filter, but doesn't match */

7185

-+	if (!(state & state_filter))

7186

-+		return false;

7187

-+

7188

-+	/*

7189

-+	 * When looking for TASK_UNINTERRUPTIBLE skip TASK_IDLE (allows

7190

-+	 * TASK_KILLABLE).

7191

-+	 */

7192

-+	if (state_filter == TASK_UNINTERRUPTIBLE && state == TASK_IDLE)

7193

-+		return false;

7194

-+

7195

-+	return true;

7196

-+}

7197

-+

7198

-+

7199

-+void show_state_filter(unsigned int state_filter)

7200

-+{

7201

-+	struct task_struct *g, *p;

7202

-+

7203

-+	rcu_read_lock();

7204

-+	for_each_process_thread(g, p) {

7205

-+		/*

7206

-+		 * reset the NMI-timeout, listing all files on a slow

7207

-+		 * console might take a lot of time:

7208

-+		 * Also, reset softlockup watchdogs on all CPUs, because

7209

-+		 * another CPU might be blocked waiting for us to process

7210

-+		 * an IPI.

7211

-+		 */

7212

-+		touch_nmi_watchdog();

7213

-+		touch_all_softlockup_watchdogs();

7214

-+		if (state_filter_match(state_filter, p))

7215

-+			sched_show_task(p);

7216

-+	}

7217

-+

7218

-+#ifdef CONFIG_SCHED_DEBUG

7219

-+	/* TODO: Alt schedule FW should support this

7220

-+	if (!state_filter)

7221

-+		sysrq_sched_debug_show();

7222

-+	*/

7223

-+#endif

7224

-+	rcu_read_unlock();

7225

-+	/*

7226

-+	 * Only show locks if all tasks are dumped:

7227

-+	 */

7228

-+	if (!state_filter)

7229

-+		debug_show_all_locks();

7230

-+}

7231

-+

7232

-+void dump_cpu_task(int cpu)

7233

-+{

7234

-+	pr_info("Task dump for CPU %d:\n", cpu);

7235

-+	sched_show_task(cpu_curr(cpu));

7236

-+}

7237

-+

7238

-+/**

7239

-+ * init_idle - set up an idle thread for a given CPU

7240

-+ * @idle: task in question

7241

-+ * @cpu: CPU the idle task belongs to

7242

-+ *

7243

-+ * NOTE: this function does not set the idle thread's NEED_RESCHED

7244

-+ * flag, to make booting more robust.

7245

-+ */

7246

-+void __init init_idle(struct task_struct *idle, int cpu)

7247

-+{

7248

-+	struct rq *rq = cpu_rq(cpu);

7249

-+	unsigned long flags;

7250

-+

7251

-+	__sched_fork(0, idle);

7252

-+

7253

-+	/*

7254

-+	 * The idle task doesn't need the kthread struct to function, but it

7255

-+	 * is dressed up as a per-CPU kthread and thus needs to play the part

7256

-+	 * if we want to avoid special-casing it in code that deals with per-CPU

7257

-+	 * kthreads.

7258

-+	 */

7259

-+	set_kthread_struct(idle);

7260

-+

7261

-+	raw_spin_lock_irqsave(&idle->pi_lock, flags);

7262

-+	raw_spin_lock(&rq->lock);

7263

-+	update_rq_clock(rq);

7264

-+

7265

-+	idle->last_ran = rq->clock_task;

7266

-+	idle->__state = TASK_RUNNING;

7267

-+	/*

7268

-+	 * PF_KTHREAD should already be set at this point; regardless, make it

7269

-+	 * look like a proper per-CPU kthread.

7270

-+	 */

7271

-+	idle->flags |= PF_IDLE | PF_KTHREAD | PF_NO_SETAFFINITY;

7272

-+	kthread_set_per_cpu(idle, cpu);

7273

-+

7274

-+	sched_queue_init_idle(&rq->queue, idle);

7275

-+

7276

-+	scs_task_reset(idle);

7277

-+	kasan_unpoison_task_stack(idle);

7278

-+

7279

-+#ifdef CONFIG_SMP

7280

-+	/*

7281

-+	 * It's possible that init_idle() gets called multiple times on a task,

7282

-+	 * in that case do_set_cpus_allowed() will not do the right thing.

7283

-+	 *

7284

-+	 * And since this is boot we can forgo the serialisation.

7285

-+	 */

7286

-+	set_cpus_allowed_common(idle, cpumask_of(cpu));

7287

-+#endif

7288

-+

7289

-+	/* Silence PROVE_RCU */

7290

-+	rcu_read_lock();

7291

-+	__set_task_cpu(idle, cpu);

7292

-+	rcu_read_unlock();

7293

-+

7294

-+	rq->idle = idle;

7295

-+	rcu_assign_pointer(rq->curr, idle);

7296

-+	idle->on_cpu = 1;

7297

-+

7298

-+	raw_spin_unlock(&rq->lock);

7299

-+	raw_spin_unlock_irqrestore(&idle->pi_lock, flags);

7300

-+

7301

-+	/* Set the preempt count _outside_ the spinlocks! */

7302

-+	init_idle_preempt_count(idle, cpu);

7303

-+

7304

-+	ftrace_graph_init_idle_task(idle, cpu);

7305

-+	vtime_init_idle(idle, cpu);

7306

-+#ifdef CONFIG_SMP

7307

-+	sprintf(idle->comm, "%s/%d", INIT_TASK_COMM, cpu);

7308

-+#endif

7309

-+}

7310

-+

7311

-+#ifdef CONFIG_SMP

7312

-+

7313

-+int cpuset_cpumask_can_shrink(const struct cpumask __maybe_unused *cur,

7314

-+			      const struct cpumask __maybe_unused *trial)

7315

-+{

7316

-+	return 1;

7317

-+}

7318

-+

7319

-+int task_can_attach(struct task_struct *p,

7320

-+		    const struct cpumask *cs_cpus_allowed)

7321

-+{

7322

-+	int ret = 0;

7323

-+

7324

-+	/*

7325

-+	 * Kthreads which disallow setaffinity shouldn't be moved

7326

-+	 * to a new cpuset; we don't want to change their CPU

7327

-+	 * affinity and isolating such threads by their set of

7328

-+	 * allowed nodes is unnecessary.  Thus, cpusets are not

7329

-+	 * applicable for such threads.  This prevents checking for

7330

-+	 * success of set_cpus_allowed_ptr() on all attached tasks

7331

-+	 * before cpus_mask may be changed.

7332

-+	 */

7333

-+	if (p->flags & PF_NO_SETAFFINITY)

7334

-+		ret = -EINVAL;

7335

-+

7336

-+	return ret;

7337

-+}

7338

-+

7339

-+bool sched_smp_initialized __read_mostly;

7340

-+

7341

-+#ifdef CONFIG_HOTPLUG_CPU

7342

-+/*

7343

-+ * Ensures that the idle task is using init_mm right before its CPU goes

7344

-+ * offline.

7345

-+ */

7346

-+void idle_task_exit(void)

7347

-+{

7348

-+	struct mm_struct *mm = current->active_mm;

7349

-+

7350

-+	BUG_ON(current != this_rq()->idle);

7351

-+

7352

-+	if (mm != &init_mm) {

7353

-+		switch_mm(mm, &init_mm, current);

7354

-+		finish_arch_post_lock_switch();

7355

-+	}

7356

-+

7357

-+	scs_task_reset(current);

7358

-+	/* finish_cpu(), as ran on the BP, will clean up the active_mm state */

7359

-+}

7360

-+

7361

-+static int __balance_push_cpu_stop(void *arg)

7362

-+{

7363

-+	struct task_struct *p = arg;

7364

-+	struct rq *rq = this_rq();

7365

-+	struct rq_flags rf;

7366

-+	int cpu;

7367

-+

7368

-+	raw_spin_lock_irq(&p->pi_lock);

7369

-+	rq_lock(rq, &rf);

7370

-+

7371

-+	update_rq_clock(rq);

7372

-+

7373

-+	if (task_rq(p) == rq && task_on_rq_queued(p)) {

7374

-+		cpu = select_fallback_rq(rq->cpu, p);

7375

-+		rq = __migrate_task(rq, p, cpu);

7376

-+	}

7377

-+

7378

-+	rq_unlock(rq, &rf);

7379

-+	raw_spin_unlock_irq(&p->pi_lock);

7380

-+

7381

-+	put_task_struct(p);

7382

-+

7383

-+	return 0;

7384

-+}

7385

-+

7386

-+static DEFINE_PER_CPU(struct cpu_stop_work, push_work);

7387

-+

7388

-+/*

7389

-+ * This is enabled below SCHED_AP_ACTIVE; when !cpu_active(), but only

7390

-+ * effective when the hotplug motion is down.

7391

-+ */

7392

-+static void balance_push(struct rq *rq)

7393

-+{

7394

-+	struct task_struct *push_task = rq->curr;

7395

-+

7396

-+	lockdep_assert_held(&rq->lock);

7397

-+

7398

-+	/*

7399

-+	 * Ensure the thing is persistent until balance_push_set(.on = false);

7400

-+	 */

7401

-+	rq->balance_callback = &balance_push_callback;

7402

-+

7403

-+	/*

7404

-+	 * Only active while going offline and when invoked on the outgoing

7405

-+	 * CPU.

7406

-+	 */

7407

-+	if (!cpu_dying(rq->cpu) || rq != this_rq())

7408

-+		return;

7409

-+

7410

-+	/*

7411

-+	 * Both the cpu-hotplug and stop task are in this case and are

7412

-+	 * required to complete the hotplug process.

7413

-+	 */

7414

-+	if (kthread_is_per_cpu(push_task) ||

7415

-+	    is_migration_disabled(push_task)) {

7416

-+

7417

-+		/*

7418

-+		 * If this is the idle task on the outgoing CPU try to wake

7419

-+		 * up the hotplug control thread which might wait for the

7420

-+		 * last task to vanish. The rcuwait_active() check is

7421

-+		 * accurate here because the waiter is pinned on this CPU

7422

-+		 * and can't obviously be running in parallel.

7423

-+		 *

7424

-+		 * On RT kernels this also has to check whether there are

7425

-+		 * pinned and scheduled out tasks on the runqueue. They

7426

-+		 * need to leave the migrate disabled section first.

7427

-+		 */

7428

-+		if (!rq->nr_running && !rq_has_pinned_tasks(rq) &&

7429

-+		    rcuwait_active(&rq->hotplug_wait)) {

7430

-+			raw_spin_unlock(&rq->lock);

7431

-+			rcuwait_wake_up(&rq->hotplug_wait);

7432

-+			raw_spin_lock(&rq->lock);

7433

-+		}

7434

-+		return;

7435

-+	}

7436

-+

7437

-+	get_task_struct(push_task);

7438

-+	/*

7439

-+	 * Temporarily drop rq->lock such that we can wake-up the stop task.

7440

-+	 * Both preemption and IRQs are still disabled.

7441

-+	 */

7442

-+	raw_spin_unlock(&rq->lock);

7443

-+	stop_one_cpu_nowait(rq->cpu, __balance_push_cpu_stop, push_task,

7444

-+			    this_cpu_ptr(&push_work));

7445

-+	/*

7446

-+	 * At this point need_resched() is true and we'll take the loop in

7447

-+	 * schedule(). The next pick is obviously going to be the stop task

7448

-+	 * which kthread_is_per_cpu() and will push this task away.

7449

-+	 */

7450

-+	raw_spin_lock(&rq->lock);

7451

-+}

7452

-+

7453

-+static void balance_push_set(int cpu, bool on)

7454

-+{

7455

-+	struct rq *rq = cpu_rq(cpu);

7456

-+	struct rq_flags rf;

7457

-+

7458

-+	rq_lock_irqsave(rq, &rf);

7459

-+	if (on) {

7460

-+		WARN_ON_ONCE(rq->balance_callback);

7461

-+		rq->balance_callback = &balance_push_callback;

7462

-+	} else if (rq->balance_callback == &balance_push_callback) {

7463

-+		rq->balance_callback = NULL;

7464

-+	}

7465

-+	rq_unlock_irqrestore(rq, &rf);

7466

-+}

7467

-+

7468

-+/*

7469

-+ * Invoked from a CPUs hotplug control thread after the CPU has been marked

7470

-+ * inactive. All tasks which are not per CPU kernel threads are either

7471

-+ * pushed off this CPU now via balance_push() or placed on a different CPU

7472

-+ * during wakeup. Wait until the CPU is quiescent.

7473

-+ */

7474

-+static void balance_hotplug_wait(void)

7475

-+{

7476

-+	struct rq *rq = this_rq();

7477

-+

7478

-+	rcuwait_wait_event(&rq->hotplug_wait,

7479

-+			   rq->nr_running == 1 && !rq_has_pinned_tasks(rq),

7480

-+			   TASK_UNINTERRUPTIBLE);

7481

-+}

7482

-+

7483

-+#else

7484

-+

7485

-+static void balance_push(struct rq *rq)

7486

-+{

7487

-+}

7488

-+

7489

-+static void balance_push_set(int cpu, bool on)

7490

-+{

7491

-+}

7492

-+

7493

-+static inline void balance_hotplug_wait(void)

7494

-+{

7495

-+}

7496

-+#endif /* CONFIG_HOTPLUG_CPU */

7497

-+

7498

-+static void set_rq_offline(struct rq *rq)

7499

-+{

7500

-+	if (rq->online)

7501

-+		rq->online = false;

7502

-+}

7503

-+

7504

-+static void set_rq_online(struct rq *rq)

7505

-+{

7506

-+	if (!rq->online)

7507

-+		rq->online = true;

7508

-+}

7509

-+

7510

-+/*

7511

-+ * used to mark begin/end of suspend/resume:

7512

-+ */

7513

-+static int num_cpus_frozen;

7514

-+

7515

-+/*

7516

-+ * Update cpusets according to cpu_active mask.  If cpusets are

7517

-+ * disabled, cpuset_update_active_cpus() becomes a simple wrapper

7518

-+ * around partition_sched_domains().

7519

-+ *

7520

-+ * If we come here as part of a suspend/resume, don't touch cpusets because we

7521

-+ * want to restore it back to its original state upon resume anyway.

7522

-+ */

7523

-+static void cpuset_cpu_active(void)

7524

-+{

7525

-+	if (cpuhp_tasks_frozen) {

7526

-+		/*

7527

-+		 * num_cpus_frozen tracks how many CPUs are involved in suspend

7528

-+		 * resume sequence. As long as this is not the last online

7529

-+		 * operation in the resume sequence, just build a single sched

7530

-+		 * domain, ignoring cpusets.

7531

-+		 */

7532

-+		partition_sched_domains(1, NULL, NULL);

7533

-+		if (--num_cpus_frozen)

7534

-+			return;

7535

-+		/*

7536

-+		 * This is the last CPU online operation. So fall through and

7537

-+		 * restore the original sched domains by considering the

7538

-+		 * cpuset configurations.

7539

-+		 */

7540

-+		cpuset_force_rebuild();

7541

-+	}

7542

-+

7543

-+	cpuset_update_active_cpus();

7544

-+}

7545

-+

7546

-+static int cpuset_cpu_inactive(unsigned int cpu)

7547

-+{

7548

-+	if (!cpuhp_tasks_frozen) {

7549

-+		cpuset_update_active_cpus();

7550

-+	} else {

7551

-+		num_cpus_frozen++;

7552

-+		partition_sched_domains(1, NULL, NULL);

7553

-+	}

7554

-+	return 0;

7555

-+}

7556

-+

7557

-+int sched_cpu_activate(unsigned int cpu)

7558

-+{

7559

-+	struct rq *rq = cpu_rq(cpu);

7560

-+	unsigned long flags;

7561

-+

7562

-+	/*

7563

-+	 * Clear the balance_push callback and prepare to schedule

7564

-+	 * regular tasks.

7565

-+	 */

7566

-+	balance_push_set(cpu, false);

7567

-+

7568

-+#ifdef CONFIG_SCHED_SMT

7569

-+	/*

7570

-+	 * When going up, increment the number of cores with SMT present.

7571

-+	 */

7572

-+	if (cpumask_weight(cpu_smt_mask(cpu)) == 2)

7573

-+		static_branch_inc_cpuslocked(&sched_smt_present);

7574

-+#endif

7575

-+	set_cpu_active(cpu, true);

7576

-+

7577

-+	if (sched_smp_initialized)

7578

-+		cpuset_cpu_active();

7579

-+

7580

-+	/*

7581

-+	 * Put the rq online, if not already. This happens:

7582

-+	 *

7583

-+	 * 1) In the early boot process, because we build the real domains

7584

-+	 *    after all cpus have been brought up.

7585

-+	 *

7586

-+	 * 2) At runtime, if cpuset_cpu_active() fails to rebuild the

7587

-+	 *    domains.

7588

-+	 */

7589

-+	raw_spin_lock_irqsave(&rq->lock, flags);

7590

-+	set_rq_online(rq);

7591

-+	raw_spin_unlock_irqrestore(&rq->lock, flags);

7592

-+

7593

-+	return 0;

7594

-+}

7595

-+

7596

-+int sched_cpu_deactivate(unsigned int cpu)

7597

-+{

7598

-+	struct rq *rq = cpu_rq(cpu);

7599

-+	unsigned long flags;

7600

-+	int ret;

7601

-+

7602

-+	set_cpu_active(cpu, false);

7603

-+

7604

-+	/*

7605

-+	 * From this point forward, this CPU will refuse to run any task that

7606

-+	 * is not: migrate_disable() or KTHREAD_IS_PER_CPU, and will actively

7607

-+	 * push those tasks away until this gets cleared, see

7608

-+	 * sched_cpu_dying().

7609

-+	 */

7610

-+	balance_push_set(cpu, true);

7611

-+

7612

-+	/*

7613

-+	 * We've cleared cpu_active_mask, wait for all preempt-disabled and RCU

7614

-+	 * users of this state to go away such that all new such users will

7615

-+	 * observe it.

7616

-+	 *

7617

-+	 * Specifically, we rely on ttwu to no longer target this CPU, see

7618

-+	 * ttwu_queue_cond() and is_cpu_allowed().

7619

-+	 *

7620

-+	 * Do sync before park smpboot threads to take care the rcu boost case.

7621

-+	 */

7622

-+	synchronize_rcu();

7623

-+

7624

-+	raw_spin_lock_irqsave(&rq->lock, flags);

7625

-+	update_rq_clock(rq);

7626

-+	set_rq_offline(rq);

7627

-+	raw_spin_unlock_irqrestore(&rq->lock, flags);

7628

-+

7629

-+#ifdef CONFIG_SCHED_SMT

7630

-+	/*

7631

-+	 * When going down, decrement the number of cores with SMT present.

7632

-+	 */

7633

-+	if (cpumask_weight(cpu_smt_mask(cpu)) == 2) {

7634

-+		static_branch_dec_cpuslocked(&sched_smt_present);

7635

-+		if (!static_branch_likely(&sched_smt_present))

7636

-+			cpumask_clear(&sched_sg_idle_mask);

7637

-+	}

7638

-+#endif

7639

-+

7640

-+	if (!sched_smp_initialized)

7641

-+		return 0;

7642

-+

7643

-+	ret = cpuset_cpu_inactive(cpu);

7644

-+	if (ret) {

7645

-+		balance_push_set(cpu, false);

7646

-+		set_cpu_active(cpu, true);

7647

-+		return ret;

7648

-+	}

7649

-+

7650

-+	return 0;

7651

-+}

7652

-+

7653

-+static void sched_rq_cpu_starting(unsigned int cpu)

7654

-+{

7655

-+	struct rq *rq = cpu_rq(cpu);

7656

-+

7657

-+	rq->calc_load_update = calc_load_update;

7658

-+}

7659

-+

7660

-+int sched_cpu_starting(unsigned int cpu)

7661

-+{

7662

-+	sched_rq_cpu_starting(cpu);

7663

-+	sched_tick_start(cpu);

7664

-+	return 0;

7665

-+}

7666

-+

7667

-+#ifdef CONFIG_HOTPLUG_CPU

7668

-+

7669

-+/*

7670

-+ * Invoked immediately before the stopper thread is invoked to bring the

7671

-+ * CPU down completely. At this point all per CPU kthreads except the

7672

-+ * hotplug thread (current) and the stopper thread (inactive) have been

7673

-+ * either parked or have been unbound from the outgoing CPU. Ensure that

7674

-+ * any of those which might be on the way out are gone.

7675

-+ *

7676

-+ * If after this point a bound task is being woken on this CPU then the

7677

-+ * responsible hotplug callback has failed to do it's job.

7678

-+ * sched_cpu_dying() will catch it with the appropriate fireworks.

7679

-+ */

7680

-+int sched_cpu_wait_empty(unsigned int cpu)

7681

-+{

7682

-+	balance_hotplug_wait();

7683

-+	return 0;

7684

-+}

7685

-+

7686

-+/*

7687

-+ * Since this CPU is going 'away' for a while, fold any nr_active delta we

7688

-+ * might have. Called from the CPU stopper task after ensuring that the

7689

-+ * stopper is the last running task on the CPU, so nr_active count is

7690

-+ * stable. We need to take the teardown thread which is calling this into

7691

-+ * account, so we hand in adjust = 1 to the load calculation.

7692

-+ *

7693

-+ * Also see the comment "Global load-average calculations".

7694

-+ */

7695

-+static void calc_load_migrate(struct rq *rq)

7696

-+{

7697

-+	long delta = calc_load_fold_active(rq, 1);

7698

-+

7699

-+	if (delta)

7700

-+		atomic_long_add(delta, &calc_load_tasks);

7701

-+}

7702

-+

7703

-+static void dump_rq_tasks(struct rq *rq, const char *loglvl)

7704

-+{

7705

-+	struct task_struct *g, *p;

7706

-+	int cpu = cpu_of(rq);

7707

-+

7708

-+	lockdep_assert_held(&rq->lock);

7709

-+

7710

-+	printk("%sCPU%d enqueued tasks (%u total):\n", loglvl, cpu, rq->nr_running);

7711

-+	for_each_process_thread(g, p) {

7712

-+		if (task_cpu(p) != cpu)

7713

-+			continue;

7714

-+

7715

-+		if (!task_on_rq_queued(p))

7716

-+			continue;

7717

-+

7718

-+		printk("%s\tpid: %d, name: %s\n", loglvl, p->pid, p->comm);

7719

-+	}

7720

-+}

7721

-+

7722

-+int sched_cpu_dying(unsigned int cpu)

7723

-+{

7724

-+	struct rq *rq = cpu_rq(cpu);

7725

-+	unsigned long flags;

7726

-+

7727

-+	/* Handle pending wakeups and then migrate everything off */

7728

-+	sched_tick_stop(cpu);

7729

-+

7730

-+	raw_spin_lock_irqsave(&rq->lock, flags);

7731

-+	if (rq->nr_running != 1 || rq_has_pinned_tasks(rq)) {

7732

-+		WARN(true, "Dying CPU not properly vacated!");

7733

-+		dump_rq_tasks(rq, KERN_WARNING);

7734

-+	}

7735

-+	raw_spin_unlock_irqrestore(&rq->lock, flags);

7736

-+

7737

-+	calc_load_migrate(rq);

7738

-+	hrtick_clear(rq);

7739

-+	return 0;

7740

-+}

7741

-+#endif

7742

-+

7743

-+#ifdef CONFIG_SMP

7744

-+static void sched_init_topology_cpumask_early(void)

7745

-+{

7746

-+	int cpu;

7747

-+	cpumask_t *tmp;

7748

-+

7749

-+	for_each_possible_cpu(cpu) {

7750

-+		/* init topo masks */

7751

-+		tmp = per_cpu(sched_cpu_topo_masks, cpu);

7752

-+

7753

-+		cpumask_copy(tmp, cpumask_of(cpu));

7754

-+		tmp++;

7755

-+		cpumask_copy(tmp, cpu_possible_mask);

7756

-+		per_cpu(sched_cpu_llc_mask, cpu) = tmp;

7757

-+		per_cpu(sched_cpu_topo_end_mask, cpu) = ++tmp;

7758

-+		/*per_cpu(sd_llc_id, cpu) = cpu;*/

7759

-+	}

7760

-+}

7761

-+

7762

-+#define TOPOLOGY_CPUMASK(name, mask, last)\

7763

-+	if (cpumask_and(topo, topo, mask)) {					\

7764

-+		cpumask_copy(topo, mask);					\

7765

-+		printk(KERN_INFO "sched: cpu#%02d topo: 0x%08lx - "#name,	\

7766

-+		       cpu, (topo++)->bits[0]);					\

7767

-+	}									\

7768

-+	if (!last)								\

7769

-+		cpumask_complement(topo, mask)

7770

-+

7771

-+static void sched_init_topology_cpumask(void)

7772

-+{

7773

-+	int cpu;

7774

-+	cpumask_t *topo;

7775

-+

7776

-+	for_each_online_cpu(cpu) {

7777

-+		/* take chance to reset time slice for idle tasks */

7778

-+		cpu_rq(cpu)->idle->time_slice = sched_timeslice_ns;

7779

-+

7780

-+		topo = per_cpu(sched_cpu_topo_masks, cpu) + 1;

7781

-+

7782

-+		cpumask_complement(topo, cpumask_of(cpu));

7783

-+#ifdef CONFIG_SCHED_SMT

7784

-+		TOPOLOGY_CPUMASK(smt, topology_sibling_cpumask(cpu), false);

7785

-+#endif

7786

-+		per_cpu(sd_llc_id, cpu) = cpumask_first(cpu_coregroup_mask(cpu));

7787

-+		per_cpu(sched_cpu_llc_mask, cpu) = topo;

7788

-+		TOPOLOGY_CPUMASK(coregroup, cpu_coregroup_mask(cpu), false);

7789

-+

7790

-+		TOPOLOGY_CPUMASK(core, topology_core_cpumask(cpu), false);

7791

-+

7792

-+		TOPOLOGY_CPUMASK(others, cpu_online_mask, true);

7793

-+

7794

-+		per_cpu(sched_cpu_topo_end_mask, cpu) = topo;

7795

-+		printk(KERN_INFO "sched: cpu#%02d llc_id = %d, llc_mask idx = %d\n",

7796

-+		       cpu, per_cpu(sd_llc_id, cpu),

7797

-+		       (int) (per_cpu(sched_cpu_llc_mask, cpu) -

7798

-+			      per_cpu(sched_cpu_topo_masks, cpu)));

7799

-+	}

7800

-+}

7801

-+#endif

7802

-+

7803

-+void __init sched_init_smp(void)

7804

-+{

7805

-+	/* Move init over to a non-isolated CPU */

7806

-+	if (set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_FLAG_DOMAIN)) < 0)

7807

-+		BUG();

7808

-+	current->flags &= ~PF_NO_SETAFFINITY;

7809

-+

7810

-+	sched_init_topology_cpumask();

7811

-+

7812

-+	sched_smp_initialized = true;

7813

-+}

7814

-+#else

7815

-+void __init sched_init_smp(void)

7816

-+{

7817

-+	cpu_rq(0)->idle->time_slice = sched_timeslice_ns;

7818

-+}

7819

-+#endif /* CONFIG_SMP */

7820

-+

7821

-+int in_sched_functions(unsigned long addr)

7822

-+{

7823

-+	return in_lock_functions(addr) ||

7824

-+		(addr >= (unsigned long)__sched_text_start

7825

-+		&& addr < (unsigned long)__sched_text_end);

7826

-+}

7827

-+

7828

-+#ifdef CONFIG_CGROUP_SCHED

7829

-+/* task group related information */

7830

-+struct task_group {

7831

-+	struct cgroup_subsys_state css;

7832

-+

7833

-+	struct rcu_head rcu;

7834

-+	struct list_head list;

7835

-+

7836

-+	struct task_group *parent;

7837

-+	struct list_head siblings;

7838

-+	struct list_head children;

7839

-+#ifdef CONFIG_FAIR_GROUP_SCHED

7840

-+	unsigned long		shares;

7841

-+#endif

7842

-+};

7843

-+

7844

-+/*

7845

-+ * Default task group.

7846

-+ * Every task in system belongs to this group at bootup.

7847

-+ */

7848

-+struct task_group root_task_group;

7849

-+LIST_HEAD(task_groups);

7850

-+

7851

-+/* Cacheline aligned slab cache for task_group */

7852

-+static struct kmem_cache *task_group_cache __read_mostly;

7853

-+#endif /* CONFIG_CGROUP_SCHED */

7854

-+

7855

-+void __init sched_init(void)

7856

-+{

7857

-+	int i;

7858

-+	struct rq *rq;

7859

-+

7860

-+	printk(KERN_INFO ALT_SCHED_VERSION_MSG);

7861

-+

7862

-+	wait_bit_init();

7863

-+

7864

-+#ifdef CONFIG_SMP

7865

-+	for (i = 0; i < SCHED_BITS; i++)

7866

-+		cpumask_copy(sched_rq_watermark + i, cpu_present_mask);

7867

-+#endif

7868

-+

7869

-+#ifdef CONFIG_CGROUP_SCHED

7870

-+	task_group_cache = KMEM_CACHE(task_group, 0);

7871

-+

7872

-+	list_add(&root_task_group.list, &task_groups);

7873

-+	INIT_LIST_HEAD(&root_task_group.children);

7874

-+	INIT_LIST_HEAD(&root_task_group.siblings);

7875

-+#endif /* CONFIG_CGROUP_SCHED */

7876

-+	for_each_possible_cpu(i) {

7877

-+		rq = cpu_rq(i);

7878

-+

7879

-+		sched_queue_init(&rq->queue);

7880

-+		rq->watermark = IDLE_TASK_SCHED_PRIO;

7881

-+		rq->skip = NULL;

7882

-+

7883

-+		raw_spin_lock_init(&rq->lock);

7884

-+		rq->nr_running = rq->nr_uninterruptible = 0;

7885

-+		rq->calc_load_active = 0;

7886

-+		rq->calc_load_update = jiffies + LOAD_FREQ;

7887

-+#ifdef CONFIG_SMP

7888

-+		rq->online = false;

7889

-+		rq->cpu = i;

7890

-+

7891

-+#ifdef CONFIG_SCHED_SMT

7892

-+		rq->active_balance = 0;

7893

-+#endif

7894

-+

7895

-+#ifdef CONFIG_NO_HZ_COMMON

7896

-+		INIT_CSD(&rq->nohz_csd, nohz_csd_func, rq);

7897

-+#endif

7898

-+		rq->balance_callback = &balance_push_callback;

7899

-+#ifdef CONFIG_HOTPLUG_CPU

7900

-+		rcuwait_init(&rq->hotplug_wait);

7901

-+#endif

7902

-+#endif /* CONFIG_SMP */

7903

-+		rq->nr_switches = 0;

7904

-+

7905

-+		hrtick_rq_init(rq);

7906

-+		atomic_set(&rq->nr_iowait, 0);

7907

-+	}

7908

-+#ifdef CONFIG_SMP

7909

-+	/* Set rq->online for cpu 0 */

7910

-+	cpu_rq(0)->online = true;

7911

-+#endif

7912

-+	/*

7913

-+	 * The boot idle thread does lazy MMU switching as well:

7914

-+	 */

7915

-+	mmgrab(&init_mm);

7916

-+	enter_lazy_tlb(&init_mm, current);

7917

-+

7918

-+	/*

7919

-+	 * Make us the idle thread. Technically, schedule() should not be

7920

-+	 * called from this thread, however somewhere below it might be,

7921

-+	 * but because we are the idle thread, we just pick up running again

7922

-+	 * when this runqueue becomes "idle".

7923

-+	 */

7924

-+	init_idle(current, smp_processor_id());

7925

-+

7926

-+	calc_load_update = jiffies + LOAD_FREQ;

7927

-+

7928

-+#ifdef CONFIG_SMP

7929

-+	idle_thread_set_boot_cpu();

7930

-+	balance_push_set(smp_processor_id(), false);

7931

-+

7932

-+	sched_init_topology_cpumask_early();

7933

-+#endif /* SMP */

7934

-+

7935

-+	psi_init();

7936

-+}

7937

-+

7938

-+#ifdef CONFIG_DEBUG_ATOMIC_SLEEP

7939

-+static inline int preempt_count_equals(int preempt_offset)

7940

-+{

7941

-+	int nested = preempt_count() + rcu_preempt_depth();

7942

-+

7943

-+	return (nested == preempt_offset);

7944

-+}

7945

-+

7946

-+void __might_sleep(const char *file, int line, int preempt_offset)

7947

-+{

7948

-+	unsigned int state = get_current_state();

7949

-+	/*

7950

-+	 * Blocking primitives will set (and therefore destroy) current->state,

7951

-+	 * since we will exit with TASK_RUNNING make sure we enter with it,

7952

-+	 * otherwise we will destroy state.

7953

-+	 */

7954

-+	WARN_ONCE(state != TASK_RUNNING && current->task_state_change,

7955

-+			"do not call blocking ops when !TASK_RUNNING; "

7956

-+			"state=%x set at [<%p>] %pS\n", state,

7957

-+			(void *)current->task_state_change,

7958

-+			(void *)current->task_state_change);

7959

-+

7960

-+	___might_sleep(file, line, preempt_offset);

7961

-+}

7962

-+EXPORT_SYMBOL(__might_sleep);

7963

-+

7964

-+void ___might_sleep(const char *file, int line, int preempt_offset)

7965

-+{

7966

-+	/* Ratelimiting timestamp: */

7967

-+	static unsigned long prev_jiffy;

7968

-+

7969

-+	unsigned long preempt_disable_ip;

7970

-+

7971

-+	/* WARN_ON_ONCE() by default, no rate limit required: */

7972

-+	rcu_sleep_check();

7973

-+

7974

-+	if ((preempt_count_equals(preempt_offset) && !irqs_disabled() &&

7975

-+	     !is_idle_task(current) && !current->non_block_count) ||

7976

-+	    system_state == SYSTEM_BOOTING || system_state > SYSTEM_RUNNING ||

7977

-+	    oops_in_progress)

7978

-+		return;

7979

-+	if (time_before(jiffies, prev_jiffy + HZ) && prev_jiffy)

7980

-+		return;

7981

-+	prev_jiffy = jiffies;

7982

-+

7983

-+	/* Save this before calling printk(), since that will clobber it: */

7984

-+	preempt_disable_ip = get_preempt_disable_ip(current);

7985

-+

7986

-+	printk(KERN_ERR

7987

-+		"BUG: sleeping function called from invalid context at %s:%d\n",

7988

-+			file, line);

7989

-+	printk(KERN_ERR

7990

-+		"in_atomic(): %d, irqs_disabled(): %d, non_block: %d, pid: %d, name: %s\n",

7991

-+			in_atomic(), irqs_disabled(), current->non_block_count,

7992

-+			current->pid, current->comm);

7993

-+

7994

-+	if (task_stack_end_corrupted(current))

7995

-+		printk(KERN_EMERG "Thread overran stack, or stack corrupted\n");

7996

-+

7997

-+	debug_show_held_locks(current);

7998

-+	if (irqs_disabled())

7999

-+		print_irqtrace_events(current);

8000

-+#ifdef CONFIG_DEBUG_PREEMPT

8001

-+	if (!preempt_count_equals(preempt_offset)) {

8002

-+		pr_err("Preemption disabled at:");

8003

-+		print_ip_sym(KERN_ERR, preempt_disable_ip);

8004

-+	}

8005

-+#endif

8006

-+	dump_stack();

8007

-+	add_taint(TAINT_WARN, LOCKDEP_STILL_OK);

8008

-+}

8009

-+EXPORT_SYMBOL(___might_sleep);

8010

-+

8011

-+void __cant_sleep(const char *file, int line, int preempt_offset)

8012

-+{

8013

-+	static unsigned long prev_jiffy;

8014

-+

8015

-+	if (irqs_disabled())

8016

-+		return;

8017

-+

8018

-+	if (!IS_ENABLED(CONFIG_PREEMPT_COUNT))

8019

-+		return;

8020

-+

8021

-+	if (preempt_count() > preempt_offset)

8022

-+		return;

8023

-+

8024

-+	if (time_before(jiffies, prev_jiffy + HZ) && prev_jiffy)

8025

-+		return;

8026

-+	prev_jiffy = jiffies;

8027

-+

8028

-+	printk(KERN_ERR "BUG: assuming atomic context at %s:%d\n", file, line);

8029

-+	printk(KERN_ERR "in_atomic(): %d, irqs_disabled(): %d, pid: %d, name: %s\n",

8030

-+			in_atomic(), irqs_disabled(),

8031

-+			current->pid, current->comm);

8032

-+

8033

-+	debug_show_held_locks(current);

8034

-+	dump_stack();

8035

-+	add_taint(TAINT_WARN, LOCKDEP_STILL_OK);

8036

-+}

8037

-+EXPORT_SYMBOL_GPL(__cant_sleep);

8038

-+

8039

-+#ifdef CONFIG_SMP

8040

-+void __cant_migrate(const char *file, int line)

8041

-+{

8042

-+	static unsigned long prev_jiffy;

8043

-+

8044

-+	if (irqs_disabled())

8045

-+		return;

8046

-+

8047

-+	if (is_migration_disabled(current))

8048

-+		return;

8049

-+

8050

-+	if (!IS_ENABLED(CONFIG_PREEMPT_COUNT))

8051

-+		return;

8052

-+

8053

-+	if (preempt_count() > 0)

8054

-+		return;

8055

-+

8056

-+	if (current->migration_flags & MDF_FORCE_ENABLED)

8057

-+		return;

8058

-+

8059

-+	if (time_before(jiffies, prev_jiffy + HZ) && prev_jiffy)

8060

-+		return;

8061

-+	prev_jiffy = jiffies;

8062

-+

8063

-+	pr_err("BUG: assuming non migratable context at %s:%d\n", file, line);

8064

-+	pr_err("in_atomic(): %d, irqs_disabled(): %d, migration_disabled() %u pid: %d, name: %s\n",

8065

-+	       in_atomic(), irqs_disabled(), is_migration_disabled(current),

8066

-+	       current->pid, current->comm);

8067

-+

8068

-+	debug_show_held_locks(current);

8069

-+	dump_stack();

8070

-+	add_taint(TAINT_WARN, LOCKDEP_STILL_OK);

8071

-+}

8072

-+EXPORT_SYMBOL_GPL(__cant_migrate);

8073

-+#endif

8074

-+#endif

8075

-+

8076

-+#ifdef CONFIG_MAGIC_SYSRQ

8077

-+void normalize_rt_tasks(void)

8078

-+{

8079

-+	struct task_struct *g, *p;

8080

-+	struct sched_attr attr = {

8081

-+		.sched_policy = SCHED_NORMAL,

8082

-+	};

8083

-+

8084

-+	read_lock(&tasklist_lock);

8085

-+	for_each_process_thread(g, p) {

8086

-+		/*

8087

-+		 * Only normalize user tasks:

8088

-+		 */

8089

-+		if (p->flags & PF_KTHREAD)

8090

-+			continue;

8091

-+

8092

-+		if (!rt_task(p)) {

8093

-+			/*

8094

-+			 * Renice negative nice level userspace

8095

-+			 * tasks back to 0:

8096

-+			 */

8097

-+			if (task_nice(p) < 0)

8098

-+				set_user_nice(p, 0);

8099

-+			continue;

8100

-+		}

8101

-+

8102

-+		__sched_setscheduler(p, &attr, false, false);

8103

-+	}

8104

-+	read_unlock(&tasklist_lock);

8105

-+}

8106

-+#endif /* CONFIG_MAGIC_SYSRQ */

8107

-+

8108

-+#if defined(CONFIG_IA64) || defined(CONFIG_KGDB_KDB)

8109

-+/*

8110

-+ * These functions are only useful for the IA64 MCA handling, or kdb.

8111

-+ *

8112

-+ * They can only be called when the whole system has been

8113

-+ * stopped - every CPU needs to be quiescent, and no scheduling

8114

-+ * activity can take place. Using them for anything else would

8115

-+ * be a serious bug, and as a result, they aren't even visible

8116

-+ * under any other configuration.

8117

-+ */

8118

-+

8119

-+/**

8120

-+ * curr_task - return the current task for a given CPU.

8121

-+ * @cpu: the processor in question.

8122

-+ *

8123

-+ * ONLY VALID WHEN THE WHOLE SYSTEM IS STOPPED!

8124

-+ *

8125

-+ * Return: The current task for @cpu.

8126

-+ */

8127

-+struct task_struct *curr_task(int cpu)

8128

-+{

8129

-+	return cpu_curr(cpu);

8130

-+}

8131

-+

8132

-+#endif /* defined(CONFIG_IA64) || defined(CONFIG_KGDB_KDB) */

8133

-+

8134

-+#ifdef CONFIG_IA64

8135

-+/**

8136

-+ * ia64_set_curr_task - set the current task for a given CPU.

8137

-+ * @cpu: the processor in question.

8138

-+ * @p: the task pointer to set.

8139

-+ *

8140

-+ * Description: This function must only be used when non-maskable interrupts

8141

-+ * are serviced on a separate stack.  It allows the architecture to switch the

8142

-+ * notion of the current task on a CPU in a non-blocking manner.  This function

8143

-+ * must be called with all CPU's synchronised, and interrupts disabled, the

8144

-+ * and caller must save the original value of the current task (see

8145

-+ * curr_task() above) and restore that value before reenabling interrupts and

8146

-+ * re-starting the system.

8147

-+ *

8148

-+ * ONLY VALID WHEN THE WHOLE SYSTEM IS STOPPED!

8149

-+ */

8150

-+void ia64_set_curr_task(int cpu, struct task_struct *p)

8151

-+{

8152

-+	cpu_curr(cpu) = p;

8153

-+}

8154

-+

8155

-+#endif

8156

-+

8157

-+#ifdef CONFIG_CGROUP_SCHED

8158

-+static void sched_free_group(struct task_group *tg)

8159

-+{

8160

-+	kmem_cache_free(task_group_cache, tg);

8161

-+}

8162

-+

8163

-+/* allocate runqueue etc for a new task group */

8164

-+struct task_group *sched_create_group(struct task_group *parent)

8165

-+{

8166

-+	struct task_group *tg;

8167

-+

8168

-+	tg = kmem_cache_alloc(task_group_cache, GFP_KERNEL | __GFP_ZERO);

8169

-+	if (!tg)

8170

-+		return ERR_PTR(-ENOMEM);

8171

-+

8172

-+	return tg;

8173

-+}

8174

-+

8175

-+void sched_online_group(struct task_group *tg, struct task_group *parent)

8176

-+{

8177

-+}

8178

-+

8179

-+/* rcu callback to free various structures associated with a task group */

8180

-+static void sched_free_group_rcu(struct rcu_head *rhp)

8181

-+{

8182

-+	/* Now it should be safe to free those cfs_rqs */

8183

-+	sched_free_group(container_of(rhp, struct task_group, rcu));

8184

-+}

8185

-+

8186

-+void sched_destroy_group(struct task_group *tg)

8187

-+{

8188

-+	/* Wait for possible concurrent references to cfs_rqs complete */

8189

-+	call_rcu(&tg->rcu, sched_free_group_rcu);

8190

-+}

8191

-+

8192

-+void sched_offline_group(struct task_group *tg)

8193

-+{

8194

-+}

8195

-+

8196

-+static inline struct task_group *css_tg(struct cgroup_subsys_state *css)

8197

-+{

8198

-+	return css ? container_of(css, struct task_group, css) : NULL;

8199

-+}

8200

-+

8201

-+static struct cgroup_subsys_state *

8202

-+cpu_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)

8203

-+{

8204

-+	struct task_group *parent = css_tg(parent_css);

8205

-+	struct task_group *tg;

8206

-+

8207

-+	if (!parent) {

8208

-+		/* This is early initialization for the top cgroup */

8209

-+		return &root_task_group.css;

8210

-+	}

8211

-+

8212

-+	tg = sched_create_group(parent);

8213

-+	if (IS_ERR(tg))

8214

-+		return ERR_PTR(-ENOMEM);

8215

-+	return &tg->css;

8216

-+}

8217

-+

8218

-+/* Expose task group only after completing cgroup initialization */

8219

-+static int cpu_cgroup_css_online(struct cgroup_subsys_state *css)

8220

-+{

8221

-+	struct task_group *tg = css_tg(css);

8222

-+	struct task_group *parent = css_tg(css->parent);

8223

-+

8224

-+	if (parent)

8225

-+		sched_online_group(tg, parent);

8226

-+	return 0;

8227

-+}

8228

-+

8229

-+static void cpu_cgroup_css_released(struct cgroup_subsys_state *css)

8230

-+{

8231

-+	struct task_group *tg = css_tg(css);

8232

-+

8233

-+	sched_offline_group(tg);

8234

-+}

8235

-+

8236

-+static void cpu_cgroup_css_free(struct cgroup_subsys_state *css)

8237

-+{

8238

-+	struct task_group *tg = css_tg(css);

8239

-+

8240

-+	/*

8241

-+	 * Relies on the RCU grace period between css_released() and this.

8242

-+	 */

8243

-+	sched_free_group(tg);

8244

-+}

8245

-+

8246

-+static void cpu_cgroup_fork(struct task_struct *task)

8247

-+{

8248

-+}

8249

-+

8250

-+static int cpu_cgroup_can_attach(struct cgroup_taskset *tset)

8251

-+{

8252

-+	return 0;

8253

-+}

8254

-+

8255

-+static void cpu_cgroup_attach(struct cgroup_taskset *tset)

8256

-+{

8257

-+}

8258

-+

8259

-+#ifdef CONFIG_FAIR_GROUP_SCHED

8260

-+static DEFINE_MUTEX(shares_mutex);

8261

-+

8262

-+int sched_group_set_shares(struct task_group *tg, unsigned long shares)

8263

-+{

8264

-+	/*

8265

-+	 * We can't change the weight of the root cgroup.

8266

-+	 */

8267

-+	if (&root_task_group == tg)

8268

-+		return -EINVAL;

8269

-+

8270

-+	shares = clamp(shares, scale_load(MIN_SHARES), scale_load(MAX_SHARES));

8271

-+

8272

-+	mutex_lock(&shares_mutex);

8273

-+	if (tg->shares == shares)

8274

-+		goto done;

8275

-+

8276

-+	tg->shares = shares;

8277

-+done:

8278

-+	mutex_unlock(&shares_mutex);

8279

-+	return 0;

8280

-+}

8281

-+

8282

-+static int cpu_shares_write_u64(struct cgroup_subsys_state *css,

8283

-+				struct cftype *cftype, u64 shareval)

8284

-+{

8285

-+	if (shareval > scale_load_down(ULONG_MAX))

8286

-+		shareval = MAX_SHARES;

8287

-+	return sched_group_set_shares(css_tg(css), scale_load(shareval));

8288

-+}

8289

-+

8290

-+static u64 cpu_shares_read_u64(struct cgroup_subsys_state *css,

8291

-+			       struct cftype *cft)

8292

-+{

8293

-+	struct task_group *tg = css_tg(css);

8294

-+

8295

-+	return (u64) scale_load_down(tg->shares);

8296

-+}

8297

-+#endif

8298

-+

8299

-+static struct cftype cpu_legacy_files[] = {

8300

-+#ifdef CONFIG_FAIR_GROUP_SCHED

8301

-+	{

8302

-+		.name = "shares",

8303

-+		.read_u64 = cpu_shares_read_u64,

8304

-+		.write_u64 = cpu_shares_write_u64,

8305

-+	},

8306

-+#endif

8307

-+	{ }	/* Terminate */

8308

-+};

8309

-+

8310

-+

8311

-+static struct cftype cpu_files[] = {

8312

-+	{ }	/* terminate */

8313

-+};

8314

-+

8315

-+static int cpu_extra_stat_show(struct seq_file *sf,

8316

-+			       struct cgroup_subsys_state *css)

8317

-+{

8318

-+	return 0;

8319

-+}

8320

-+

8321

-+struct cgroup_subsys cpu_cgrp_subsys = {

8322

-+	.css_alloc	= cpu_cgroup_css_alloc,

8323

-+	.css_online	= cpu_cgroup_css_online,

8324

-+	.css_released	= cpu_cgroup_css_released,

8325

-+	.css_free	= cpu_cgroup_css_free,

8326

-+	.css_extra_stat_show = cpu_extra_stat_show,

8327

-+	.fork		= cpu_cgroup_fork,

8328

-+	.can_attach	= cpu_cgroup_can_attach,

8329

-+	.attach		= cpu_cgroup_attach,

8330

-+	.legacy_cftypes	= cpu_files,

8331

-+	.legacy_cftypes	= cpu_legacy_files,

8332

-+	.dfl_cftypes	= cpu_files,

8333

-+	.early_init	= true,

8334

-+	.threaded	= true,

8335

-+};

8336

-+#endif	/* CONFIG_CGROUP_SCHED */

8337

-+

8338

-+#undef CREATE_TRACE_POINTS

8339

-diff --git a/kernel/sched/alt_debug.c b/kernel/sched/alt_debug.c

8340

-new file mode 100644

8341

-index 000000000000..1212a031700e

8342

---- /dev/null

8343

-+++ b/kernel/sched/alt_debug.c

8344

-@@ -0,0 +1,31 @@

8345

-+/*

8346

-+ * kernel/sched/alt_debug.c

8347

-+ *

8348

-+ * Print the alt scheduler debugging details

8349

-+ *

8350

-+ * Author: Alfred Chen

8351

-+ * Date  : 2020

8352

-+ */

8353

-+#include "sched.h"

8354

-+

8355

-+/*

8356

-+ * This allows printing both to /proc/sched_debug and

8357

-+ * to the console

8358

-+ */

8359

-+#define SEQ_printf(m, x...)			\

8360

-+ do {						\

8361

-+	if (m)					\

8362

-+		seq_printf(m, x);		\

8363

-+	else					\

8364

-+		pr_cont(x);			\

8365

-+ } while (0)

8366

-+

8367

-+void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns,

8368

-+			  struct seq_file *m)

8369

-+{

8370

-+	SEQ_printf(m, "%s (%d, #threads: %d)\n", p->comm, task_pid_nr_ns(p, ns),

8371

-+						get_nr_threads(p));

8372

-+}

8373

-+

8374

-+void proc_sched_set_task(struct task_struct *p)

8375

-+{}

8376

-diff --git a/kernel/sched/alt_sched.h b/kernel/sched/alt_sched.h

8377

-new file mode 100644

8378

-index 000000000000..289058a09bd5

8379

---- /dev/null

8380

-+++ b/kernel/sched/alt_sched.h

8381

-@@ -0,0 +1,666 @@

8382

-+#ifndef ALT_SCHED_H

8383

-+#define ALT_SCHED_H

8384

-+

8385

-+#include <linux/sched.h>

8386

-+

8387

-+#include <linux/sched/clock.h>

8388

-+#include <linux/sched/cpufreq.h>

8389

-+#include <linux/sched/cputime.h>

8390

-+#include <linux/sched/debug.h>

8391

-+#include <linux/sched/init.h>

8392

-+#include <linux/sched/isolation.h>

8393

-+#include <linux/sched/loadavg.h>

8394

-+#include <linux/sched/mm.h>

8395

-+#include <linux/sched/nohz.h>

8396

-+#include <linux/sched/signal.h>

8397

-+#include <linux/sched/stat.h>

8398

-+#include <linux/sched/sysctl.h>

8399

-+#include <linux/sched/task.h>

8400

-+#include <linux/sched/topology.h>

8401

-+#include <linux/sched/wake_q.h>

8402

-+

8403

-+#include <uapi/linux/sched/types.h>

8404

-+

8405

-+#include <linux/cgroup.h>

8406

-+#include <linux/cpufreq.h>

8407

-+#include <linux/cpuidle.h>

8408

-+#include <linux/cpuset.h>

8409

-+#include <linux/ctype.h>

8410

-+#include <linux/debugfs.h>

8411

-+#include <linux/kthread.h>

8412

-+#include <linux/livepatch.h>

8413

-+#include <linux/membarrier.h>

8414

-+#include <linux/proc_fs.h>

8415

-+#include <linux/psi.h>

8416

-+#include <linux/slab.h>

8417

-+#include <linux/stop_machine.h>

8418

-+#include <linux/suspend.h>

8419

-+#include <linux/swait.h>

8420

-+#include <linux/syscalls.h>

8421

-+#include <linux/tsacct_kern.h>

8422

-+

8423

-+#include <asm/tlb.h>

8424

-+

8425

-+#ifdef CONFIG_PARAVIRT

8426

-+# include <asm/paravirt.h>

8427

-+#endif

8428

-+

8429

-+#include "cpupri.h"

8430

-+

8431

-+#include <trace/events/sched.h>

8432

-+

8433

-+#ifdef CONFIG_SCHED_BMQ

8434

-+/* bits:

8435

-+ * RT(0-99), (Low prio adj range, nice width, high prio adj range) / 2, cpu idle task */

8436

-+#define SCHED_BITS	(MAX_RT_PRIO + NICE_WIDTH / 2 + MAX_PRIORITY_ADJ + 1)

8437

-+#endif

8438

-+

8439

-+#ifdef CONFIG_SCHED_PDS

8440

-+/* bits: RT(0-99), reserved(100-127), NORMAL_PRIO_NUM, cpu idle task */

8441

-+#define SCHED_BITS	(MIN_NORMAL_PRIO + NORMAL_PRIO_NUM + 1)

8442

-+#endif /* CONFIG_SCHED_PDS */

8443

-+

8444

-+#define IDLE_TASK_SCHED_PRIO	(SCHED_BITS - 1)

8445

-+

8446

-+#ifdef CONFIG_SCHED_DEBUG

8447

-+# define SCHED_WARN_ON(x)	WARN_ONCE(x, #x)

8448

-+extern void resched_latency_warn(int cpu, u64 latency);

8449

-+#else

8450

-+# define SCHED_WARN_ON(x)	({ (void)(x), 0; })

8451

-+static inline void resched_latency_warn(int cpu, u64 latency) {}

8452

-+#endif

8453

-+

8454

-+/*

8455

-+ * Increase resolution of nice-level calculations for 64-bit architectures.

8456

-+ * The extra resolution improves shares distribution and load balancing of

8457

-+ * low-weight task groups (eg. nice +19 on an autogroup), deeper taskgroup

8458

-+ * hierarchies, especially on larger systems. This is not a user-visible change

8459

-+ * and does not change the user-interface for setting shares/weights.

8460

-+ *

8461

-+ * We increase resolution only if we have enough bits to allow this increased

8462

-+ * resolution (i.e. 64-bit). The costs for increasing resolution when 32-bit

8463

-+ * are pretty high and the returns do not justify the increased costs.

8464

-+ *

8465

-+ * Really only required when CONFIG_FAIR_GROUP_SCHED=y is also set, but to

8466

-+ * increase coverage and consistency always enable it on 64-bit platforms.

8467

-+ */

8468

-+#ifdef CONFIG_64BIT

8469

-+# define NICE_0_LOAD_SHIFT	(SCHED_FIXEDPOINT_SHIFT + SCHED_FIXEDPOINT_SHIFT)

8470

-+# define scale_load(w)		((w) << SCHED_FIXEDPOINT_SHIFT)

8471

-+# define scale_load_down(w) \

8472

-+({ \

8473

-+	unsigned long __w = (w); \

8474

-+	if (__w) \

8475

-+		__w = max(2UL, __w >> SCHED_FIXEDPOINT_SHIFT); \

8476

-+	__w; \

8477

-+})

8478

-+#else

8479

-+# define NICE_0_LOAD_SHIFT	(SCHED_FIXEDPOINT_SHIFT)

8480

-+# define scale_load(w)		(w)

8481

-+# define scale_load_down(w)	(w)

8482

-+#endif

8483

-+

8484

-+#ifdef CONFIG_FAIR_GROUP_SCHED

8485

-+#define ROOT_TASK_GROUP_LOAD	NICE_0_LOAD

8486

-+

8487

-+/*

8488

-+ * A weight of 0 or 1 can cause arithmetics problems.

8489

-+ * A weight of a cfs_rq is the sum of weights of which entities

8490

-+ * are queued on this cfs_rq, so a weight of a entity should not be

8491

-+ * too large, so as the shares value of a task group.

8492

-+ * (The default weight is 1024 - so there's no practical

8493

-+ *  limitation from this.)

8494

-+ */

8495

-+#define MIN_SHARES		(1UL <<  1)

8496

-+#define MAX_SHARES		(1UL << 18)

8497

-+#endif

8498

-+

8499

-+/* task_struct::on_rq states: */

8500

-+#define TASK_ON_RQ_QUEUED	1

8501

-+#define TASK_ON_RQ_MIGRATING	2

8502

-+

8503

-+static inline int task_on_rq_queued(struct task_struct *p)

8504

-+{

8505

-+	return p->on_rq == TASK_ON_RQ_QUEUED;

8506

-+}

8507

-+

8508

-+static inline int task_on_rq_migrating(struct task_struct *p)

8509

-+{

8510

-+	return READ_ONCE(p->on_rq) == TASK_ON_RQ_MIGRATING;

8511

-+}

8512

-+

8513

-+/*

8514

-+ * wake flags

8515

-+ */

8516

-+#define WF_SYNC		0x01		/* waker goes to sleep after wakeup */

8517

-+#define WF_FORK		0x02		/* child wakeup after fork */

8518

-+#define WF_MIGRATED	0x04		/* internal use, task got migrated */

8519

-+#define WF_ON_CPU	0x08		/* Wakee is on_rq */

8520

-+

8521

-+#define SCHED_QUEUE_BITS	(SCHED_BITS - 1)

8522

-+

8523

-+struct sched_queue {

8524

-+	DECLARE_BITMAP(bitmap, SCHED_QUEUE_BITS);

8525

-+	struct list_head heads[SCHED_BITS];

8526

-+};

8527

-+

8528

-+/*

8529

-+ * This is the main, per-CPU runqueue data structure.

8530

-+ * This data should only be modified by the local cpu.

8531

-+ */

8532

-+struct rq {

8533

-+	/* runqueue lock: */

8534

-+	raw_spinlock_t lock;

8535

-+

8536

-+	struct task_struct __rcu *curr;

8537

-+	struct task_struct *idle, *stop, *skip;

8538

-+	struct mm_struct *prev_mm;

8539

-+

8540

-+	struct sched_queue	queue;

8541

-+#ifdef CONFIG_SCHED_PDS

8542

-+	u64			time_edge;

8543

-+#endif

8544

-+	unsigned long watermark;

8545

-+

8546

-+	/* switch count */

8547

-+	u64 nr_switches;

8548

-+

8549

-+	atomic_t nr_iowait;

8550

-+

8551

-+#ifdef CONFIG_SCHED_DEBUG

8552

-+	u64 last_seen_need_resched_ns;

8553

-+	int ticks_without_resched;

8554

-+#endif

8555

-+

8556

-+#ifdef CONFIG_MEMBARRIER

8557

-+	int membarrier_state;

8558

-+#endif

8559

-+

8560

-+#ifdef CONFIG_SMP

8561

-+	int cpu;		/* cpu of this runqueue */

8562

-+	bool online;

8563

-+

8564

-+	unsigned int		ttwu_pending;

8565

-+	unsigned char		nohz_idle_balance;

8566

-+	unsigned char		idle_balance;

8567

-+

8568

-+#ifdef CONFIG_HAVE_SCHED_AVG_IRQ

8569

-+	struct sched_avg	avg_irq;

8570

-+#endif

8571

-+

8572

-+#ifdef CONFIG_SCHED_SMT

8573

-+	int active_balance;

8574

-+	struct cpu_stop_work	active_balance_work;

8575

-+#endif

8576

-+	struct callback_head	*balance_callback;

8577

-+#ifdef CONFIG_HOTPLUG_CPU

8578

-+	struct rcuwait		hotplug_wait;

8579

-+#endif

8580

-+	unsigned int		nr_pinned;

8581

-+

8582

-+#endif /* CONFIG_SMP */

8583

-+#ifdef CONFIG_IRQ_TIME_ACCOUNTING

8584

-+	u64 prev_irq_time;

8585

-+#endif /* CONFIG_IRQ_TIME_ACCOUNTING */

8586

-+#ifdef CONFIG_PARAVIRT

8587

-+	u64 prev_steal_time;

8588

-+#endif /* CONFIG_PARAVIRT */

8589

-+#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING

8590

-+	u64 prev_steal_time_rq;

8591

-+#endif /* CONFIG_PARAVIRT_TIME_ACCOUNTING */

8592

-+

8593

-+	/* For genenal cpu load util */

8594

-+	s32 load_history;

8595

-+	u64 load_block;

8596

-+	u64 load_stamp;

8597

-+

8598

-+	/* calc_load related fields */

8599

-+	unsigned long calc_load_update;

8600

-+	long calc_load_active;

8601

-+

8602

-+	u64 clock, last_tick;

8603

-+	u64 last_ts_switch;

8604

-+	u64 clock_task;

8605

-+

8606

-+	unsigned int  nr_running;

8607

-+	unsigned long nr_uninterruptible;

8608

-+

8609

-+#ifdef CONFIG_SCHED_HRTICK

8610

-+#ifdef CONFIG_SMP

8611

-+	call_single_data_t hrtick_csd;

8612

-+#endif

8613

-+	struct hrtimer		hrtick_timer;

8614

-+	ktime_t			hrtick_time;

8615

-+#endif

8616

-+

8617

-+#ifdef CONFIG_SCHEDSTATS

8618

-+

8619

-+	/* latency stats */

8620

-+	struct sched_info rq_sched_info;

8621

-+	unsigned long long rq_cpu_time;

8622

-+	/* could above be rq->cfs_rq.exec_clock + rq->rt_rq.rt_runtime ? */

8623

-+

8624

-+	/* sys_sched_yield() stats */

8625

-+	unsigned int yld_count;

8626

-+

8627

-+	/* schedule() stats */

8628

-+	unsigned int sched_switch;

8629

-+	unsigned int sched_count;

8630

-+	unsigned int sched_goidle;

8631

-+

8632

-+	/* try_to_wake_up() stats */

8633

-+	unsigned int ttwu_count;

8634

-+	unsigned int ttwu_local;

8635

-+#endif /* CONFIG_SCHEDSTATS */

8636

-+

8637

-+#ifdef CONFIG_CPU_IDLE

8638

-+	/* Must be inspected within a rcu lock section */

8639

-+	struct cpuidle_state *idle_state;

8640

-+#endif

8641

-+

8642

-+#ifdef CONFIG_NO_HZ_COMMON

8643

-+#ifdef CONFIG_SMP

8644

-+	call_single_data_t	nohz_csd;

8645

-+#endif

8646

-+	atomic_t		nohz_flags;

8647

-+#endif /* CONFIG_NO_HZ_COMMON */

8648

-+};

8649

-+

8650

-+extern unsigned long rq_load_util(struct rq *rq, unsigned long max);

8651

-+

8652

-+extern unsigned long calc_load_update;

8653

-+extern atomic_long_t calc_load_tasks;

8654

-+

8655

-+extern void calc_global_load_tick(struct rq *this_rq);

8656

-+extern long calc_load_fold_active(struct rq *this_rq, long adjust);

8657

-+

8658

-+DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);

8659

-+#define cpu_rq(cpu)		(&per_cpu(runqueues, (cpu)))

8660

-+#define this_rq()		this_cpu_ptr(&runqueues)

8661

-+#define task_rq(p)		cpu_rq(task_cpu(p))

8662

-+#define cpu_curr(cpu)		(cpu_rq(cpu)->curr)

8663

-+#define raw_rq()		raw_cpu_ptr(&runqueues)

8664

-+

8665

-+#ifdef CONFIG_SMP

8666

-+#if defined(CONFIG_SCHED_DEBUG) && defined(CONFIG_SYSCTL)

8667

-+void register_sched_domain_sysctl(void);

8668

-+void unregister_sched_domain_sysctl(void);

8669

-+#else

8670

-+static inline void register_sched_domain_sysctl(void)

8671

-+{

8672

-+}

8673

-+static inline void unregister_sched_domain_sysctl(void)

8674

-+{

8675

-+}

8676

-+#endif

8677

-+

8678

-+extern bool sched_smp_initialized;

8679

-+

8680

-+enum {

8681

-+	ITSELF_LEVEL_SPACE_HOLDER,

8682

-+#ifdef CONFIG_SCHED_SMT

8683

-+	SMT_LEVEL_SPACE_HOLDER,

8684

-+#endif

8685

-+	COREGROUP_LEVEL_SPACE_HOLDER,

8686

-+	CORE_LEVEL_SPACE_HOLDER,

8687

-+	OTHER_LEVEL_SPACE_HOLDER,

8688

-+	NR_CPU_AFFINITY_LEVELS

8689

-+};

8690

-+

8691

-+DECLARE_PER_CPU(cpumask_t [NR_CPU_AFFINITY_LEVELS], sched_cpu_topo_masks);

8692

-+DECLARE_PER_CPU(cpumask_t *, sched_cpu_llc_mask);

8693

-+

8694

-+static inline int

8695

-+__best_mask_cpu(const cpumask_t *cpumask, const cpumask_t *mask)

8696

-+{

8697

-+	int cpu;

8698

-+

8699

-+	while ((cpu = cpumask_any_and(cpumask, mask)) >= nr_cpu_ids)

8700

-+		mask++;

8701

-+

8702

-+	return cpu;

8703

-+}

8704

-+

8705

-+static inline int best_mask_cpu(int cpu, const cpumask_t *mask)

8706

-+{

8707

-+	return __best_mask_cpu(mask, per_cpu(sched_cpu_topo_masks, cpu));

8708

-+}

8709

-+

8710

-+extern void flush_smp_call_function_from_idle(void);

8711

-+

8712

-+#else  /* !CONFIG_SMP */

8713

-+static inline void flush_smp_call_function_from_idle(void) { }

8714

-+#endif

8715

-+

8716

-+#ifndef arch_scale_freq_tick

8717

-+static __always_inline

8718

-+void arch_scale_freq_tick(void)

8719

-+{

8720

-+}

8721

-+#endif

8722

-+

8723

-+#ifndef arch_scale_freq_capacity

8724

-+static __always_inline

8725

-+unsigned long arch_scale_freq_capacity(int cpu)

8726

-+{

8727

-+	return SCHED_CAPACITY_SCALE;

8728

-+}

8729

-+#endif

8730

-+

8731

-+static inline u64 __rq_clock_broken(struct rq *rq)

8732

-+{

8733

-+	return READ_ONCE(rq->clock);

8734

-+}

8735

-+

8736

-+static inline u64 rq_clock(struct rq *rq)

8737

-+{

8738

-+	/*

8739

-+	 * Relax lockdep_assert_held() checking as in VRQ, call to

8740

-+	 * sched_info_xxxx() may not held rq->lock

8741

-+	 * lockdep_assert_held(&rq->lock);

8742

-+	 */

8743

-+	return rq->clock;

8744

-+}

8745

-+

8746

-+static inline u64 rq_clock_task(struct rq *rq)

8747

-+{

8748

-+	/*

8749

-+	 * Relax lockdep_assert_held() checking as in VRQ, call to

8750

-+	 * sched_info_xxxx() may not held rq->lock

8751

-+	 * lockdep_assert_held(&rq->lock);

8752

-+	 */

8753

-+	return rq->clock_task;

8754

-+}

8755

-+

8756

-+/*

8757

-+ * {de,en}queue flags:

8758

-+ *

8759

-+ * DEQUEUE_SLEEP  - task is no longer runnable

8760

-+ * ENQUEUE_WAKEUP - task just became runnable

8761

-+ *

8762

-+ */

8763

-+

8764

-+#define DEQUEUE_SLEEP		0x01

8765

-+

8766

-+#define ENQUEUE_WAKEUP		0x01

8767

-+

8768

-+

8769

-+/*

8770

-+ * Below are scheduler API which using in other kernel code

8771

-+ * It use the dummy rq_flags

8772

-+ * ToDo : BMQ need to support these APIs for compatibility with mainline

8773

-+ * scheduler code.

8774

-+ */

8775

-+struct rq_flags {

8776

-+	unsigned long flags;

8777

-+};

8778

-+

8779

-+struct rq *__task_rq_lock(struct task_struct *p, struct rq_flags *rf)

8780

-+	__acquires(rq->lock);

8781

-+

8782

-+struct rq *task_rq_lock(struct task_struct *p, struct rq_flags *rf)

8783

-+	__acquires(p->pi_lock)

8784

-+	__acquires(rq->lock);

8785

-+

8786

-+static inline void __task_rq_unlock(struct rq *rq, struct rq_flags *rf)

8787

-+	__releases(rq->lock)

8788

-+{

8789

-+	raw_spin_unlock(&rq->lock);

8790

-+}

8791

-+

8792

-+static inline void

8793

-+task_rq_unlock(struct rq *rq, struct task_struct *p, struct rq_flags *rf)

8794

-+	__releases(rq->lock)

8795

-+	__releases(p->pi_lock)

8796

-+{

8797

-+	raw_spin_unlock(&rq->lock);

8798

-+	raw_spin_unlock_irqrestore(&p->pi_lock, rf->flags);

8799

-+}

8800

-+

8801

-+static inline void

8802

-+rq_lock(struct rq *rq, struct rq_flags *rf)

8803

-+	__acquires(rq->lock)

8804

-+{

8805

-+	raw_spin_lock(&rq->lock);

8806

-+}

8807

-+

8808

-+static inline void

8809

-+rq_unlock_irq(struct rq *rq, struct rq_flags *rf)

8810

-+	__releases(rq->lock)

8811

-+{

8812

-+	raw_spin_unlock_irq(&rq->lock);

8813

-+}

8814

-+

8815

-+static inline void

8816

-+rq_unlock(struct rq *rq, struct rq_flags *rf)

8817

-+	__releases(rq->lock)

8818

-+{

8819

-+	raw_spin_unlock(&rq->lock);

8820

-+}

8821

-+

8822

-+static inline struct rq *

8823

-+this_rq_lock_irq(struct rq_flags *rf)

8824

-+	__acquires(rq->lock)

8825

-+{

8826

-+	struct rq *rq;

8827

-+

8828

-+	local_irq_disable();

8829

-+	rq = this_rq();

8830

-+	raw_spin_lock(&rq->lock);

8831

-+

8832

-+	return rq;

8833

-+}

8834

-+

8835

-+extern void raw_spin_rq_lock_nested(struct rq *rq, int subclass);

8836

-+extern void raw_spin_rq_unlock(struct rq *rq);

8837

-+

8838

-+static inline raw_spinlock_t *__rq_lockp(struct rq *rq)

8839

-+{

8840

-+	return &rq->lock;

8841

-+}

8842

-+

8843

-+static inline raw_spinlock_t *rq_lockp(struct rq *rq)

8844

-+{

8845

-+	return __rq_lockp(rq);

8846

-+}

8847

-+

8848

-+static inline void raw_spin_rq_lock(struct rq *rq)

8849

-+{

8850

-+	raw_spin_rq_lock_nested(rq, 0);

8851

-+}

8852

-+

8853

-+static inline void raw_spin_rq_lock_irq(struct rq *rq)

8854

-+{

8855

-+	local_irq_disable();

8856

-+	raw_spin_rq_lock(rq);

8857

-+}

8858

-+

8859

-+static inline void raw_spin_rq_unlock_irq(struct rq *rq)

8860

-+{

8861

-+	raw_spin_rq_unlock(rq);

8862

-+	local_irq_enable();

8863

-+}

8864

-+

8865

-+static inline int task_current(struct rq *rq, struct task_struct *p)

8866

-+{

8867

-+	return rq->curr == p;

8868

-+}

8869

-+

8870

-+static inline bool task_running(struct task_struct *p)

8871

-+{

8872

-+	return p->on_cpu;

8873

-+}

8874

-+

8875

-+extern int task_running_nice(struct task_struct *p);

8876

-+

8877

-+extern struct static_key_false sched_schedstats;

8878

-+

8879

-+#ifdef CONFIG_CPU_IDLE

8880

-+static inline void idle_set_state(struct rq *rq,

8881

-+				  struct cpuidle_state *idle_state)

8882

-+{

8883

-+	rq->idle_state = idle_state;

8884

-+}

8885

-+

8886

-+static inline struct cpuidle_state *idle_get_state(struct rq *rq)

8887

-+{

8888

-+	WARN_ON(!rcu_read_lock_held());

8889

-+	return rq->idle_state;

8890

-+}

8891

-+#else

8892

-+static inline void idle_set_state(struct rq *rq,

8893

-+				  struct cpuidle_state *idle_state)

8894

-+{

8895

-+}

8896

-+

8897

-+static inline struct cpuidle_state *idle_get_state(struct rq *rq)

8898

-+{

8899

-+	return NULL;

8900

-+}

8901

-+#endif

8902

-+

8903

-+static inline int cpu_of(const struct rq *rq)

8904

-+{

8905

-+#ifdef CONFIG_SMP

8906

-+	return rq->cpu;

8907

-+#else

8908

-+	return 0;

8909

-+#endif

8910

-+}

8911

-+

8912

-+#include "stats.h"

8913

-+

8914

-+#ifdef CONFIG_NO_HZ_COMMON

8915

-+#define NOHZ_BALANCE_KICK_BIT	0

8916

-+#define NOHZ_STATS_KICK_BIT	1

8917

-+

8918

-+#define NOHZ_BALANCE_KICK	BIT(NOHZ_BALANCE_KICK_BIT)

8919

-+#define NOHZ_STATS_KICK		BIT(NOHZ_STATS_KICK_BIT)

8920

-+

8921

-+#define NOHZ_KICK_MASK	(NOHZ_BALANCE_KICK | NOHZ_STATS_KICK)

8922

-+

8923

-+#define nohz_flags(cpu)	(&cpu_rq(cpu)->nohz_flags)

8924

-+

8925

-+/* TODO: needed?

8926

-+extern void nohz_balance_exit_idle(struct rq *rq);

8927

-+#else

8928

-+static inline void nohz_balance_exit_idle(struct rq *rq) { }

8929

-+*/

8930

-+#endif

8931

-+

8932

-+#ifdef CONFIG_IRQ_TIME_ACCOUNTING

8933

-+struct irqtime {

8934

-+	u64			total;

8935

-+	u64			tick_delta;

8936

-+	u64			irq_start_time;

8937

-+	struct u64_stats_sync	sync;

8938

-+};

8939

-+

8940

-+DECLARE_PER_CPU(struct irqtime, cpu_irqtime);

8941

-+

8942

-+/*

8943

-+ * Returns the irqtime minus the softirq time computed by ksoftirqd.

8944

-+ * Otherwise ksoftirqd's sum_exec_runtime is substracted its own runtime

8945

-+ * and never move forward.

8946

-+ */

8947

-+static inline u64 irq_time_read(int cpu)

8948

-+{

8949

-+	struct irqtime *irqtime = &per_cpu(cpu_irqtime, cpu);

8950

-+	unsigned int seq;

8951

-+	u64 total;

8952

-+

8953

-+	do {

8954

-+		seq = __u64_stats_fetch_begin(&irqtime->sync);

8955

-+		total = irqtime->total;

8956

-+	} while (__u64_stats_fetch_retry(&irqtime->sync, seq));

8957

-+

8958

-+	return total;

8959

-+}

8960

-+#endif /* CONFIG_IRQ_TIME_ACCOUNTING */

8961

-+

8962

-+#ifdef CONFIG_CPU_FREQ

8963

-+DECLARE_PER_CPU(struct update_util_data __rcu *, cpufreq_update_util_data);

8964

-+#endif /* CONFIG_CPU_FREQ */

8965

-+

8966

-+#ifdef CONFIG_NO_HZ_FULL

8967

-+extern int __init sched_tick_offload_init(void);

8968

-+#else

8969

-+static inline int sched_tick_offload_init(void) { return 0; }

8970

-+#endif

8971

-+

8972

-+#ifdef arch_scale_freq_capacity

8973

-+#ifndef arch_scale_freq_invariant

8974

-+#define arch_scale_freq_invariant()	(true)

8975

-+#endif

8976

-+#else /* arch_scale_freq_capacity */

8977

-+#define arch_scale_freq_invariant()	(false)

8978

-+#endif

8979

-+

8980

-+extern void schedule_idle(void);

8981

-+

8982

-+#define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)

8983

-+

8984

-+/*

8985

-+ * !! For sched_setattr_nocheck() (kernel) only !!

8986

-+ *

8987

-+ * This is actually gross. :(

8988

-+ *

8989

-+ * It is used to make schedutil kworker(s) higher priority than SCHED_DEADLINE

8990

-+ * tasks, but still be able to sleep. We need this on platforms that cannot

8991

-+ * atomically change clock frequency. Remove once fast switching will be

8992

-+ * available on such platforms.

8993

-+ *

8994

-+ * SUGOV stands for SchedUtil GOVernor.

8995

-+ */

8996

-+#define SCHED_FLAG_SUGOV	0x10000000

8997

-+

8998

-+#ifdef CONFIG_MEMBARRIER

8999

-+/*

9000

-+ * The scheduler provides memory barriers required by membarrier between:

9001

-+ * - prior user-space memory accesses and store to rq->membarrier_state,

9002

-+ * - store to rq->membarrier_state and following user-space memory accesses.

9003

-+ * In the same way it provides those guarantees around store to rq->curr.

9004

-+ */

9005

-+static inline void membarrier_switch_mm(struct rq *rq,

9006

-+					struct mm_struct *prev_mm,

9007

-+					struct mm_struct *next_mm)

9008

-+{

9009

-+	int membarrier_state;

9010

-+

9011

-+	if (prev_mm == next_mm)

9012

-+		return;

9013

-+

9014

-+	membarrier_state = atomic_read(&next_mm->membarrier_state);

9015

-+	if (READ_ONCE(rq->membarrier_state) == membarrier_state)

9016

-+		return;

9017

-+

9018

-+	WRITE_ONCE(rq->membarrier_state, membarrier_state);

9019

-+}

9020

-+#else

9021

-+static inline void membarrier_switch_mm(struct rq *rq,

9022

-+					struct mm_struct *prev_mm,

9023

-+					struct mm_struct *next_mm)

9024

-+{

9025

-+}

9026

-+#endif

9027

-+

9028

-+#ifdef CONFIG_NUMA

9029

-+extern int sched_numa_find_closest(const struct cpumask *cpus, int cpu);

9030

-+#else

9031

-+static inline int sched_numa_find_closest(const struct cpumask *cpus, int cpu)

9032

-+{

9033

-+	return nr_cpu_ids;

9034

-+}

9035

-+#endif

9036

-+

9037

-+extern void swake_up_all_locked(struct swait_queue_head *q);

9038

-+extern void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait);

9039

-+

9040

-+#ifdef CONFIG_PREEMPT_DYNAMIC

9041

-+extern int preempt_dynamic_mode;

9042

-+extern int sched_dynamic_mode(const char *str);

9043

-+extern void sched_dynamic_update(int mode);

9044

-+#endif

9045

-+

9046

-+static inline void nohz_run_idle_balance(int cpu) { }

9047

-+#endif /* ALT_SCHED_H */

9048

-diff --git a/kernel/sched/bmq.h b/kernel/sched/bmq.h

9049

-new file mode 100644

9050

-index 000000000000..be3ee4a553ca

9051

---- /dev/null

9052

-+++ b/kernel/sched/bmq.h

9053

-@@ -0,0 +1,111 @@

9054

-+#define ALT_SCHED_VERSION_MSG "sched/bmq: BMQ CPU Scheduler "ALT_SCHED_VERSION" by Alfred Chen.\n"

9055

-+

9056

-+/*

9057

-+ * BMQ only routines

9058

-+ */

9059

-+#define rq_switch_time(rq)	((rq)->clock - (rq)->last_ts_switch)

9060

-+#define boost_threshold(p)	(sched_timeslice_ns >>\

9061

-+				 (15 - MAX_PRIORITY_ADJ -  (p)->boost_prio))

9062

-+

9063

-+static inline void boost_task(struct task_struct *p)

9064

-+{

9065

-+	int limit;

9066

-+

9067

-+	switch (p->policy) {

9068

-+	case SCHED_NORMAL:

9069

-+		limit = -MAX_PRIORITY_ADJ;

9070

-+		break;

9071

-+	case SCHED_BATCH:

9072

-+	case SCHED_IDLE:

9073

-+		limit = 0;

9074

-+		break;

9075

-+	default:

9076

-+		return;

9077

-+	}

9078

-+

9079

-+	if (p->boost_prio > limit)

9080

-+		p->boost_prio--;

9081

-+}

9082

-+

9083

-+static inline void deboost_task(struct task_struct *p)

9084

-+{

9085

-+	if (p->boost_prio < MAX_PRIORITY_ADJ)

9086

-+		p->boost_prio++;

9087

-+}

9088

-+

9089

-+/*

9090

-+ * Common interfaces

9091

-+ */

9092

-+static inline void sched_timeslice_imp(const int timeslice_ms) {}

9093

-+

9094

-+static inline int

9095

-+task_sched_prio_normal(const struct task_struct *p, const struct rq *rq)

9096

-+{

9097

-+	return p->prio + p->boost_prio - MAX_RT_PRIO;

9098

-+}

9099

-+

9100

-+static inline int task_sched_prio(const struct task_struct *p)

9101

-+{

9102

-+	return (p->prio < MAX_RT_PRIO)? p->prio : MAX_RT_PRIO / 2 + (p->prio + p->boost_prio) / 2;

9103

-+}

9104

-+

9105

-+static inline int

9106

-+task_sched_prio_idx(const struct task_struct *p, const struct rq *rq)

9107

-+{

9108

-+	return task_sched_prio(p);

9109

-+}

9110

-+

9111

-+static inline int sched_prio2idx(int prio, struct rq *rq)

9112

-+{

9113

-+	return prio;

9114

-+}

9115

-+

9116

-+static inline int sched_idx2prio(int idx, struct rq *rq)

9117

-+{

9118

-+	return idx;

9119

-+}

9120

-+

9121

-+static inline void time_slice_expired(struct task_struct *p, struct rq *rq)

9122

-+{

9123

-+	p->time_slice = sched_timeslice_ns;

9124

-+

9125

-+	if (SCHED_FIFO != p->policy && task_on_rq_queued(p)) {

9126

-+		if (SCHED_RR != p->policy)

9127

-+			deboost_task(p);

9128

-+		requeue_task(p, rq);

9129

-+	}

9130

-+}

9131

-+

9132

-+static inline void sched_task_sanity_check(struct task_struct *p, struct rq *rq) {}

9133

-+

9134

-+inline int task_running_nice(struct task_struct *p)

9135

-+{

9136

-+	return (p->prio + p->boost_prio > DEFAULT_PRIO + MAX_PRIORITY_ADJ);

9137

-+}

9138

-+

9139

-+static void sched_task_fork(struct task_struct *p, struct rq *rq)

9140

-+{

9141

-+	p->boost_prio = (p->boost_prio < 0) ?

9142

-+		p->boost_prio + MAX_PRIORITY_ADJ : MAX_PRIORITY_ADJ;

9143

-+}

9144

-+

9145

-+static inline void do_sched_yield_type_1(struct task_struct *p, struct rq *rq)

9146

-+{

9147

-+	p->boost_prio = MAX_PRIORITY_ADJ;

9148

-+}

9149

-+

9150

-+#ifdef CONFIG_SMP

9151

-+static inline void sched_task_ttwu(struct task_struct *p)

9152

-+{

9153

-+	if(this_rq()->clock_task - p->last_ran > sched_timeslice_ns)

9154

-+		boost_task(p);

9155

-+}

9156

-+#endif

9157

-+

9158

-+static inline void sched_task_deactivate(struct task_struct *p, struct rq *rq)

9159

-+{

9160

-+	if (rq_switch_time(rq) < boost_threshold(p))

9161

-+		boost_task(p);

9162

-+}

9163

-+

9164

-+static inline void update_rq_time_edge(struct rq *rq) {}

9165

-diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c

9166

-index e7af18857371..3e38816b736e 100644

9167

---- a/kernel/sched/cpufreq_schedutil.c

9168

-+++ b/kernel/sched/cpufreq_schedutil.c

9169

-@@ -167,9 +167,14 @@ static void sugov_get_util(struct sugov_cpu *sg_cpu)

9170

- 	unsigned long max = arch_scale_cpu_capacity(sg_cpu->cpu);

9171

-

9172

- 	sg_cpu->max = max;

9173

-+#ifndef CONFIG_SCHED_ALT

9174

- 	sg_cpu->bw_dl = cpu_bw_dl(rq);

9175

- 	sg_cpu->util = effective_cpu_util(sg_cpu->cpu, cpu_util_cfs(rq), max,

9176

- 					  FREQUENCY_UTIL, NULL);

9177

-+#else

9178

-+	sg_cpu->bw_dl = 0;

9179

-+	sg_cpu->util = rq_load_util(rq, max);

9180

-+#endif /* CONFIG_SCHED_ALT */

9181

- }

9182

-

9183

- /**

9184

-@@ -312,8 +317,10 @@ static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }

9185

-  */

9186

- static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu)

9187

- {

9188

-+#ifndef CONFIG_SCHED_ALT

9189

- 	if (cpu_bw_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->bw_dl)

9190

- 		sg_cpu->sg_policy->limits_changed = true;

9191

-+#endif

9192

- }

9193

-

9194

- static inline bool sugov_update_single_common(struct sugov_cpu *sg_cpu,

9195

-@@ -607,6 +614,7 @@ static int sugov_kthread_create(struct sugov_policy *sg_policy)

9196

- 	}

9197

-

9198

- 	ret = sched_setattr_nocheck(thread, &attr);

9199

-+

9200

- 	if (ret) {

9201

- 		kthread_stop(thread);

9202

- 		pr_warn("%s: failed to set SCHED_DEADLINE\n", __func__);

9203

-@@ -839,7 +847,9 @@ cpufreq_governor_init(schedutil_gov);

9204

- #ifdef CONFIG_ENERGY_MODEL

9205

- static void rebuild_sd_workfn(struct work_struct *work)

9206

- {

9207

-+#ifndef CONFIG_SCHED_ALT

9208

- 	rebuild_sched_domains_energy();

9209

-+#endif /* CONFIG_SCHED_ALT */

9210

- }

9211

- static DECLARE_WORK(rebuild_sd_work, rebuild_sd_workfn);

9212

-

9213

-diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c

9214

-index 872e481d5098..f920c8b48ec1 100644

9215

---- a/kernel/sched/cputime.c

9216

-+++ b/kernel/sched/cputime.c

9217

-@@ -123,7 +123,7 @@ void account_user_time(struct task_struct *p, u64 cputime)

9218

- 	p->utime += cputime;

9219

- 	account_group_user_time(p, cputime);

9220

-

9221

--	index = (task_nice(p) > 0) ? CPUTIME_NICE : CPUTIME_USER;

9222

-+	index = task_running_nice(p) ? CPUTIME_NICE : CPUTIME_USER;

9223

-

9224

- 	/* Add user time to cpustat. */

9225

- 	task_group_account_field(p, index, cputime);

9226

-@@ -147,7 +147,7 @@ void account_guest_time(struct task_struct *p, u64 cputime)

9227

- 	p->gtime += cputime;

9228

-

9229

- 	/* Add guest time to cpustat. */

9230

--	if (task_nice(p) > 0) {

9231

-+	if (task_running_nice(p)) {

9232

- 		cpustat[CPUTIME_NICE] += cputime;

9233

- 		cpustat[CPUTIME_GUEST_NICE] += cputime;

9234

- 	} else {

9235

-@@ -270,7 +270,7 @@ static inline u64 account_other_time(u64 max)

9236

- #ifdef CONFIG_64BIT

9237

- static inline u64 read_sum_exec_runtime(struct task_struct *t)

9238

- {

9239

--	return t->se.sum_exec_runtime;

9240

-+	return tsk_seruntime(t);

9241

- }

9242

- #else

9243

- static u64 read_sum_exec_runtime(struct task_struct *t)

9244

-@@ -280,7 +280,7 @@ static u64 read_sum_exec_runtime(struct task_struct *t)

9245

- 	struct rq *rq;

9246

-

9247

- 	rq = task_rq_lock(t, &rf);

9248

--	ns = t->se.sum_exec_runtime;

9249

-+	ns = tsk_seruntime(t);

9250

- 	task_rq_unlock(rq, t, &rf);

9251

-

9252

- 	return ns;

9253

-@@ -612,7 +612,7 @@ void cputime_adjust(struct task_cputime *curr, struct prev_cputime *prev,

9254

- void task_cputime_adjusted(struct task_struct *p, u64 *ut, u64 *st)

9255

- {

9256

- 	struct task_cputime cputime = {

9257

--		.sum_exec_runtime = p->se.sum_exec_runtime,

9258

-+		.sum_exec_runtime = tsk_seruntime(p),

9259

- 	};

9260

-

9261

- 	task_cputime(p, &cputime.utime, &cputime.stime);

9262

-diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c

9263

-index 17a653b67006..17ab2fe34d7a 100644

9264

---- a/kernel/sched/debug.c

9265

-+++ b/kernel/sched/debug.c

9266

-@@ -8,6 +8,7 @@

9267

-  */

9268

- #include "sched.h"

9269

-

9270

-+#ifndef CONFIG_SCHED_ALT

9271

- /*

9272

-  * This allows printing both to /proc/sched_debug and

9273

-  * to the console

9274

-@@ -216,6 +217,7 @@ static const struct file_operations sched_scaling_fops = {

9275

- };

9276

-

9277

- #endif /* SMP */

9278

-+#endif /* !CONFIG_SCHED_ALT */

9279

-

9280

- #ifdef CONFIG_PREEMPT_DYNAMIC

9281

-

9282

-@@ -279,6 +281,7 @@ static const struct file_operations sched_dynamic_fops = {

9283

-

9284

- #endif /* CONFIG_PREEMPT_DYNAMIC */

9285

-

9286

-+#ifndef CONFIG_SCHED_ALT

9287

- __read_mostly bool sched_debug_verbose;

9288

-

9289

- static const struct seq_operations sched_debug_sops;

9290

-@@ -294,6 +297,7 @@ static const struct file_operations sched_debug_fops = {

9291

- 	.llseek		= seq_lseek,

9292

- 	.release	= seq_release,

9293

- };

9294

-+#endif /* !CONFIG_SCHED_ALT */

9295

-

9296

- static struct dentry *debugfs_sched;

9297

-

9298

-@@ -303,12 +307,15 @@ static __init int sched_init_debug(void)

9299

-

9300

- 	debugfs_sched = debugfs_create_dir("sched", NULL);

9301

-

9302

-+#ifndef CONFIG_SCHED_ALT

9303

- 	debugfs_create_file("features", 0644, debugfs_sched, NULL, &sched_feat_fops);

9304

- 	debugfs_create_bool("verbose", 0644, debugfs_sched, &sched_debug_verbose);

9305

-+#endif /* !CONFIG_SCHED_ALT */

9306

- #ifdef CONFIG_PREEMPT_DYNAMIC

9307

- 	debugfs_create_file("preempt", 0644, debugfs_sched, NULL, &sched_dynamic_fops);

9308

- #endif

9309

-

9310

-+#ifndef CONFIG_SCHED_ALT

9311

- 	debugfs_create_u32("latency_ns", 0644, debugfs_sched, &sysctl_sched_latency);

9312

- 	debugfs_create_u32("min_granularity_ns", 0644, debugfs_sched, &sysctl_sched_min_granularity);

9313

- 	debugfs_create_u32("wakeup_granularity_ns", 0644, debugfs_sched, &sysctl_sched_wakeup_granularity);

9314

-@@ -336,11 +343,13 @@ static __init int sched_init_debug(void)

9315

- #endif

9316

-

9317

- 	debugfs_create_file("debug", 0444, debugfs_sched, NULL, &sched_debug_fops);

9318

-+#endif /* !CONFIG_SCHED_ALT */

9319

-

9320

- 	return 0;

9321

- }

9322

- late_initcall(sched_init_debug);

9323

-

9324

-+#ifndef CONFIG_SCHED_ALT

9325

- #ifdef CONFIG_SMP

9326

-

9327

- static cpumask_var_t		sd_sysctl_cpus;

9328

-@@ -1063,6 +1072,7 @@ void proc_sched_set_task(struct task_struct *p)

9329

- 	memset(&p->se.statistics, 0, sizeof(p->se.statistics));

9330

- #endif

9331

- }

9332

-+#endif /* !CONFIG_SCHED_ALT */

9333

-

9334

- void resched_latency_warn(int cpu, u64 latency)

9335

- {

9336

-diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c

9337

-index d17b0a5ce6ac..6ff77fc6b73a 100644

9338

---- a/kernel/sched/idle.c

9339

-+++ b/kernel/sched/idle.c

9340

-@@ -403,6 +403,7 @@ void cpu_startup_entry(enum cpuhp_state state)

9341

- 		do_idle();

9342

- }

9343

-

9344

-+#ifndef CONFIG_SCHED_ALT

9345

- /*

9346

-  * idle-task scheduling class.

9347

-  */

9348

-@@ -525,3 +526,4 @@ DEFINE_SCHED_CLASS(idle) = {

9349

- 	.switched_to		= switched_to_idle,

9350

- 	.update_curr		= update_curr_idle,

9351

- };

9352

-+#endif

9353

-diff --git a/kernel/sched/pds.h b/kernel/sched/pds.h

9354

-new file mode 100644

9355

-index 000000000000..0f1f0d708b77

9356

---- /dev/null

9357

-+++ b/kernel/sched/pds.h

9358

-@@ -0,0 +1,127 @@

9359

-+#define ALT_SCHED_VERSION_MSG "sched/pds: PDS CPU Scheduler "ALT_SCHED_VERSION" by Alfred Chen.\n"

9360

-+

9361

-+static int sched_timeslice_shift = 22;

9362

-+

9363

-+#define NORMAL_PRIO_MOD(x)	((x) & (NORMAL_PRIO_NUM - 1))

9364

-+

9365

-+/*

9366

-+ * Common interfaces

9367

-+ */

9368

-+static inline void sched_timeslice_imp(const int timeslice_ms)

9369

-+{

9370

-+	if (2 == timeslice_ms)

9371

-+		sched_timeslice_shift = 21;

9372

-+}

9373

-+

9374

-+static inline int

9375

-+task_sched_prio_normal(const struct task_struct *p, const struct rq *rq)

9376

-+{

9377

-+	s64 delta = p->deadline - rq->time_edge + NORMAL_PRIO_NUM - NICE_WIDTH;

9378

-+

9379

-+	if (WARN_ONCE(delta > NORMAL_PRIO_NUM - 1,

9380

-+		      "pds: task_sched_prio_normal() delta %lld\n", delta))

9381

-+		return NORMAL_PRIO_NUM - 1;

9382

-+

9383

-+	return (delta < 0) ? 0 : delta;

9384

-+}

9385

-+

9386

-+static inline int task_sched_prio(const struct task_struct *p)

9387

-+{

9388

-+	return (p->prio < MAX_RT_PRIO) ? p->prio :

9389

-+		MIN_NORMAL_PRIO + task_sched_prio_normal(p, task_rq(p));

9390

-+}

9391

-+

9392

-+static inline int

9393

-+task_sched_prio_idx(const struct task_struct *p, const struct rq *rq)

9394

-+{

9395

-+	return (p->prio < MAX_RT_PRIO) ? p->prio : MIN_NORMAL_PRIO +

9396

-+		NORMAL_PRIO_MOD(task_sched_prio_normal(p, rq) + rq->time_edge);

9397

-+}

9398

-+

9399

-+static inline int sched_prio2idx(int prio, struct rq *rq)

9400

-+{

9401

-+	return (IDLE_TASK_SCHED_PRIO == prio || prio < MAX_RT_PRIO) ? prio :

9402

-+		MIN_NORMAL_PRIO + NORMAL_PRIO_MOD((prio - MIN_NORMAL_PRIO) +

9403

-+						  rq->time_edge);

9404

-+}

9405

-+

9406

-+static inline int sched_idx2prio(int idx, struct rq *rq)

9407

-+{

9408

-+	return (idx < MAX_RT_PRIO) ? idx : MIN_NORMAL_PRIO +

9409

-+		NORMAL_PRIO_MOD((idx - MIN_NORMAL_PRIO) + NORMAL_PRIO_NUM -

9410

-+				NORMAL_PRIO_MOD(rq->time_edge));

9411

-+}

9412

-+

9413

-+static inline void sched_renew_deadline(struct task_struct *p, const struct rq *rq)

9414

-+{

9415

-+	if (p->prio >= MAX_RT_PRIO)

9416

-+		p->deadline = (rq->clock >> sched_timeslice_shift) +

9417

-+			p->static_prio - (MAX_PRIO - NICE_WIDTH);

9418

-+}

9419

-+

9420

-+int task_running_nice(struct task_struct *p)

9421

-+{

9422

-+	return (p->prio > DEFAULT_PRIO);

9423

-+}

9424

-+

9425

-+static inline void update_rq_time_edge(struct rq *rq)

9426

-+{

9427

-+	struct list_head head;

9428

-+	u64 old = rq->time_edge;

9429

-+	u64 now = rq->clock >> sched_timeslice_shift;

9430

-+	u64 prio, delta;

9431

-+

9432

-+	if (now == old)

9433

-+		return;

9434

-+

9435

-+	delta = min_t(u64, NORMAL_PRIO_NUM, now - old);

9436

-+	INIT_LIST_HEAD(&head);

9437

-+

9438

-+	for_each_set_bit(prio, &rq->queue.bitmap[2], delta)

9439

-+		list_splice_tail_init(rq->queue.heads + MIN_NORMAL_PRIO +

9440

-+				      NORMAL_PRIO_MOD(prio + old), &head);

9441

-+

9442

-+	rq->queue.bitmap[2] = (NORMAL_PRIO_NUM == delta) ? 0UL :

9443

-+		rq->queue.bitmap[2] >> delta;

9444

-+	rq->time_edge = now;

9445

-+	if (!list_empty(&head)) {

9446

-+		u64 idx = MIN_NORMAL_PRIO + NORMAL_PRIO_MOD(now);

9447

-+		struct task_struct *p;

9448

-+

9449

-+		list_for_each_entry(p, &head, sq_node)

9450

-+			p->sq_idx = idx;

9451

-+

9452

-+		list_splice(&head, rq->queue.heads + idx);

9453

-+		rq->queue.bitmap[2] |= 1UL;

9454

-+	}

9455

-+}

9456

-+

9457

-+static inline void time_slice_expired(struct task_struct *p, struct rq *rq)

9458

-+{

9459

-+	p->time_slice = sched_timeslice_ns;

9460

-+	sched_renew_deadline(p, rq);

9461

-+	if (SCHED_FIFO != p->policy && task_on_rq_queued(p))

9462

-+		requeue_task(p, rq);

9463

-+}

9464

-+

9465

-+static inline void sched_task_sanity_check(struct task_struct *p, struct rq *rq)

9466

-+{

9467

-+	u64 max_dl = rq->time_edge + NICE_WIDTH - 1;

9468

-+	if (unlikely(p->deadline > max_dl))

9469

-+		p->deadline = max_dl;

9470

-+}

9471

-+

9472

-+static void sched_task_fork(struct task_struct *p, struct rq *rq)

9473

-+{

9474

-+	sched_renew_deadline(p, rq);

9475

-+}

9476

-+

9477

-+static inline void do_sched_yield_type_1(struct task_struct *p, struct rq *rq)

9478

-+{

9479

-+	time_slice_expired(p, rq);

9480

-+}

9481

-+

9482

-+#ifdef CONFIG_SMP

9483

-+static inline void sched_task_ttwu(struct task_struct *p) {}

9484

-+#endif

9485

-+static inline void sched_task_deactivate(struct task_struct *p, struct rq *rq) {}

9486

-diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c

9487

-index a554e3bbab2b..3e56f5e6ff5c 100644

9488

---- a/kernel/sched/pelt.c

9489

-+++ b/kernel/sched/pelt.c

9490

-@@ -270,6 +270,7 @@ ___update_load_avg(struct sched_avg *sa, unsigned long load)

9491

- 	WRITE_ONCE(sa->util_avg, sa->util_sum / divider);

9492

- }

9493

-

9494

-+#ifndef CONFIG_SCHED_ALT

9495

- /*

9496

-  * sched_entity:

9497

-  *

9498

-@@ -387,8 +388,9 @@ int update_dl_rq_load_avg(u64 now, struct rq *rq, int running)

9499

-

9500

- 	return 0;

9501

- }

9502

-+#endif

9503

-

9504

--#ifdef CONFIG_SCHED_THERMAL_PRESSURE

9505

-+#if defined(CONFIG_SCHED_THERMAL_PRESSURE) && !defined(CONFIG_SCHED_ALT)

9506

- /*

9507

-  * thermal:

9508

-  *

9509

-diff --git a/kernel/sched/pelt.h b/kernel/sched/pelt.h

9510

-index e06071bf3472..adf567df34d4 100644

9511

---- a/kernel/sched/pelt.h

9512

-+++ b/kernel/sched/pelt.h

9513

-@@ -1,13 +1,15 @@

9514

- #ifdef CONFIG_SMP

9515

- #include "sched-pelt.h"

9516

-

9517

-+#ifndef CONFIG_SCHED_ALT

9518

- int __update_load_avg_blocked_se(u64 now, struct sched_entity *se);

9519

- int __update_load_avg_se(u64 now, struct cfs_rq *cfs_rq, struct sched_entity *se);

9520

- int __update_load_avg_cfs_rq(u64 now, struct cfs_rq *cfs_rq);

9521

- int update_rt_rq_load_avg(u64 now, struct rq *rq, int running);

9522

- int update_dl_rq_load_avg(u64 now, struct rq *rq, int running);

9523

-+#endif

9524

-

9525

--#ifdef CONFIG_SCHED_THERMAL_PRESSURE

9526

-+#if defined(CONFIG_SCHED_THERMAL_PRESSURE) && !defined(CONFIG_SCHED_ALT)

9527

- int update_thermal_load_avg(u64 now, struct rq *rq, u64 capacity);

9528

-

9529

- static inline u64 thermal_load_avg(struct rq *rq)

9530

-@@ -42,6 +44,7 @@ static inline u32 get_pelt_divider(struct sched_avg *avg)

9531

- 	return LOAD_AVG_MAX - 1024 + avg->period_contrib;

9532

- }

9533

-

9534

-+#ifndef CONFIG_SCHED_ALT

9535

- static inline void cfs_se_util_change(struct sched_avg *avg)

9536

- {

9537

- 	unsigned int enqueued;

9538

-@@ -153,9 +156,11 @@ static inline u64 cfs_rq_clock_pelt(struct cfs_rq *cfs_rq)

9539

- 	return rq_clock_pelt(rq_of(cfs_rq));

9540

- }

9541

- #endif

9542

-+#endif /* CONFIG_SCHED_ALT */

9543

-

9544

- #else

9545

-

9546

-+#ifndef CONFIG_SCHED_ALT

9547

- static inline int

9548

- update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)

9549

- {

9550

-@@ -173,6 +178,7 @@ update_dl_rq_load_avg(u64 now, struct rq *rq, int running)

9551

- {

9552

- 	return 0;

9553

- }

9554

-+#endif

9555

-

9556

- static inline int

9557

- update_thermal_load_avg(u64 now, struct rq *rq, u64 capacity)

9558

-diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h

9559

-index 3d3e5793e117..c1d976ef623f 100644

9560

---- a/kernel/sched/sched.h

9561

-+++ b/kernel/sched/sched.h

9562

-@@ -2,6 +2,10 @@

9563

- /*

9564

-  * Scheduler internal types and methods:

9565

-  */

9566

-+#ifdef CONFIG_SCHED_ALT

9567

-+#include "alt_sched.h"

9568

-+#else

9569

-+

9570

- #include <linux/sched.h>

9571

-

9572

- #include <linux/sched/autogroup.h>

9573

-@@ -3064,3 +3068,8 @@ extern int sched_dynamic_mode(const char *str);

9574

- extern void sched_dynamic_update(int mode);

9575

- #endif

9576

-

9577

-+static inline int task_running_nice(struct task_struct *p)

9578

-+{

9579

-+	return (task_nice(p) > 0);

9580

-+}

9581

-+#endif /* !CONFIG_SCHED_ALT */

9582

-diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c

9583

-index 3f93fc3b5648..528b71e144e9 100644

9584

---- a/kernel/sched/stats.c

9585

-+++ b/kernel/sched/stats.c

9586

-@@ -22,8 +22,10 @@ static int show_schedstat(struct seq_file *seq, void *v)

9587

- 	} else {

9588

- 		struct rq *rq;

9589

- #ifdef CONFIG_SMP

9590

-+#ifndef CONFIG_SCHED_ALT

9591

- 		struct sched_domain *sd;

9592

- 		int dcount = 0;

9593

-+#endif

9594

- #endif

9595

- 		cpu = (unsigned long)(v - 2);

9596

- 		rq = cpu_rq(cpu);

9597

-@@ -40,6 +42,7 @@ static int show_schedstat(struct seq_file *seq, void *v)

9598

- 		seq_printf(seq, "\n");

9599

-

9600

- #ifdef CONFIG_SMP

9601

-+#ifndef CONFIG_SCHED_ALT

9602

- 		/* domain-specific stats */

9603

- 		rcu_read_lock();

9604

- 		for_each_domain(cpu, sd) {

9605

-@@ -68,6 +71,7 @@ static int show_schedstat(struct seq_file *seq, void *v)

9606

- 			    sd->ttwu_move_balance);

9607

- 		}

9608

- 		rcu_read_unlock();

9609

-+#endif

9610

- #endif

9611

- 	}

9612

- 	return 0;

9613

-diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c

9614

-index 4e8698e62f07..36c61551252e 100644

9615

---- a/kernel/sched/topology.c

9616

-+++ b/kernel/sched/topology.c

9617

-@@ -4,6 +4,7 @@

9618

-  */

9619

- #include "sched.h"

9620

-

9621

-+#ifndef CONFIG_SCHED_ALT

9622

- DEFINE_MUTEX(sched_domains_mutex);

9623

-

9624

- /* Protected by sched_domains_mutex: */

9625

-@@ -1382,8 +1383,10 @@ static void asym_cpu_capacity_scan(void)

9626

-  */

9627

-

9628

- static int default_relax_domain_level = -1;

9629

-+#endif /* CONFIG_SCHED_ALT */

9630

- int sched_domain_level_max;

9631

-

9632

-+#ifndef CONFIG_SCHED_ALT

9633

- static int __init setup_relax_domain_level(char *str)

9634

- {

9635

- 	if (kstrtoint(str, 0, &default_relax_domain_level))

9636

-@@ -1619,6 +1622,7 @@ sd_init(struct sched_domain_topology_level *tl,

9637

-

9638

- 	return sd;

9639

- }

9640

-+#endif /* CONFIG_SCHED_ALT */

9641

-

9642

- /*

9643

-  * Topology list, bottom-up.

9644

-@@ -1648,6 +1652,7 @@ void set_sched_topology(struct sched_domain_topology_level *tl)

9645

- 	sched_domain_topology = tl;

9646

- }

9647

-

9648

-+#ifndef CONFIG_SCHED_ALT

9649

- #ifdef CONFIG_NUMA

9650

-

9651

- static const struct cpumask *sd_numa_mask(int cpu)

9652

-@@ -2516,3 +2521,17 @@ void partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],

9653

- 	partition_sched_domains_locked(ndoms_new, doms_new, dattr_new);

9654

- 	mutex_unlock(&sched_domains_mutex);

9655

- }

9656

-+#else /* CONFIG_SCHED_ALT */

9657

-+void partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],

9658

-+			     struct sched_domain_attr *dattr_new)

9659

-+{}

9660

-+

9661

-+#ifdef CONFIG_NUMA

9662

-+int __read_mostly		node_reclaim_distance = RECLAIM_DISTANCE;

9663

-+

9664

-+int sched_numa_find_closest(const struct cpumask *cpus, int cpu)

9665

-+{

9666

-+	return best_mask_cpu(cpu, cpus);

9667

-+}

9668

-+#endif /* CONFIG_NUMA */

9669

-+#endif

9670

-diff --git a/kernel/sysctl.c b/kernel/sysctl.c

9671

-index 083be6af29d7..09fc6281d488 100644

9672

---- a/kernel/sysctl.c

9673

-+++ b/kernel/sysctl.c

9674

-@@ -122,6 +122,10 @@ static unsigned long long_max = LONG_MAX;

9675

- static int one_hundred = 100;

9676

- static int two_hundred = 200;

9677

- static int one_thousand = 1000;

9678

-+#ifdef CONFIG_SCHED_ALT

9679

-+static int __maybe_unused zero = 0;

9680

-+extern int sched_yield_type;

9681

-+#endif

9682

- #ifdef CONFIG_PRINTK

9683

- static int ten_thousand = 10000;

9684

- #endif

9685

-@@ -1771,6 +1775,24 @@ int proc_do_static_key(struct ctl_table *table, int write,

9686

- }

9687

-

9688

- static struct ctl_table kern_table[] = {

9689

-+#ifdef CONFIG_SCHED_ALT

9690

-+/* In ALT, only supported "sched_schedstats" */

9691

-+#ifdef CONFIG_SCHED_DEBUG

9692

-+#ifdef CONFIG_SMP

9693

-+#ifdef CONFIG_SCHEDSTATS

9694

-+	{

9695

-+		.procname	= "sched_schedstats",

9696

-+		.data		= NULL,

9697

-+		.maxlen		= sizeof(unsigned int),

9698

-+		.mode		= 0644,

9699

-+		.proc_handler	= sysctl_schedstats,

9700

-+		.extra1		= SYSCTL_ZERO,

9701

-+		.extra2		= SYSCTL_ONE,

9702

-+	},

9703

-+#endif /* CONFIG_SCHEDSTATS */

9704

-+#endif /* CONFIG_SMP */

9705

-+#endif /* CONFIG_SCHED_DEBUG */

9706

-+#else  /* !CONFIG_SCHED_ALT */

9707

- 	{

9708

- 		.procname	= "sched_child_runs_first",

9709

- 		.data		= &sysctl_sched_child_runs_first,

9710

-@@ -1901,6 +1923,7 @@ static struct ctl_table kern_table[] = {

9711

- 		.extra2		= SYSCTL_ONE,

9712

- 	},

9713

- #endif

9714

-+#endif /* !CONFIG_SCHED_ALT */

9715

- #ifdef CONFIG_PROVE_LOCKING

9716

- 	{

9717

- 		.procname	= "prove_locking",

9718

-@@ -2477,6 +2500,17 @@ static struct ctl_table kern_table[] = {

9719

- 		.proc_handler	= proc_dointvec,

9720

- 	},

9721

- #endif

9722

-+#ifdef CONFIG_SCHED_ALT

9723

-+	{

9724

-+		.procname	= "yield_type",

9725

-+		.data		= &sched_yield_type,

9726

-+		.maxlen		= sizeof (int),

9727

-+		.mode		= 0644,

9728

-+		.proc_handler	= &proc_dointvec_minmax,

9729

-+		.extra1		= &zero,

9730

-+		.extra2		= &two,

9731

-+	},

9732

-+#endif

9733

- #if defined(CONFIG_S390) && defined(CONFIG_SMP)

9734

- 	{

9735

- 		.procname	= "spin_retry",

9736

-diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c

9737

-index 0ea8702eb516..a27a0f3a654d 100644

9738

---- a/kernel/time/hrtimer.c

9739

-+++ b/kernel/time/hrtimer.c

9740

-@@ -2088,8 +2088,10 @@ long hrtimer_nanosleep(ktime_t rqtp, const enum hrtimer_mode mode,

9741

- 	int ret = 0;

9742

- 	u64 slack;

9743

-

9744

-+#ifndef CONFIG_SCHED_ALT

9745

- 	slack = current->timer_slack_ns;

9746

- 	if (dl_task(current) || rt_task(current))

9747

-+#endif

9748

- 		slack = 0;

9749

-

9750

- 	hrtimer_init_sleeper_on_stack(&t, clockid, mode);

9751

-diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c

9752

-index 96b4e7810426..83457e8bb5d2 100644

9753

---- a/kernel/time/posix-cpu-timers.c

9754

-+++ b/kernel/time/posix-cpu-timers.c

9755

-@@ -216,7 +216,7 @@ static void task_sample_cputime(struct task_struct *p, u64 *samples)

9756

- 	u64 stime, utime;

9757

-

9758

- 	task_cputime(p, &utime, &stime);

9759

--	store_samples(samples, stime, utime, p->se.sum_exec_runtime);

9760

-+	store_samples(samples, stime, utime, tsk_seruntime(p));

9761

- }

9762

-

9763

- static void proc_sample_cputime_atomic(struct task_cputime_atomic *at,

9764

-@@ -859,6 +859,7 @@ static void collect_posix_cputimers(struct posix_cputimers *pct, u64 *samples,

9765

- 	}

9766

- }

9767

-

9768

-+#ifndef CONFIG_SCHED_ALT

9769

- static inline void check_dl_overrun(struct task_struct *tsk)

9770

- {

9771

- 	if (tsk->dl.dl_overrun) {

9772

-@@ -866,6 +867,7 @@ static inline void check_dl_overrun(struct task_struct *tsk)

9773

- 		__group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);

9774

- 	}

9775

- }

9776

-+#endif

9777

-

9778

- static bool check_rlimit(u64 time, u64 limit, int signo, bool rt, bool hard)

9779

- {

9780

-@@ -893,8 +895,10 @@ static void check_thread_timers(struct task_struct *tsk,

9781

- 	u64 samples[CPUCLOCK_MAX];

9782

- 	unsigned long soft;

9783

-

9784

-+#ifndef CONFIG_SCHED_ALT

9785

- 	if (dl_task(tsk))

9786

- 		check_dl_overrun(tsk);

9787

-+#endif

9788

-

9789

- 	if (expiry_cache_is_inactive(pct))

9790

- 		return;

9791

-@@ -908,7 +912,7 @@ static void check_thread_timers(struct task_struct *tsk,

9792

- 	soft = task_rlimit(tsk, RLIMIT_RTTIME);

9793

- 	if (soft != RLIM_INFINITY) {

9794

- 		/* Task RT timeout is accounted in jiffies. RTTIME is usec */

9795

--		unsigned long rttime = tsk->rt.timeout * (USEC_PER_SEC / HZ);

9796

-+		unsigned long rttime = tsk_rttimeout(tsk) * (USEC_PER_SEC / HZ);

9797

- 		unsigned long hard = task_rlimit_max(tsk, RLIMIT_RTTIME);

9798

-

9799

- 		/* At the hard limit, send SIGKILL. No further action. */

9800

-@@ -1144,8 +1148,10 @@ static inline bool fastpath_timer_check(struct task_struct *tsk)

9801

- 			return true;

9802

- 	}

9803

-

9804

-+#ifndef CONFIG_SCHED_ALT

9805

- 	if (dl_task(tsk) && tsk->dl.dl_overrun)

9806

- 		return true;

9807

-+#endif

9808

-

9809

- 	return false;

9810

- }

9811

-diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c

9812

-index adf7ef194005..11c8f36e281b 100644

9813

---- a/kernel/trace/trace_selftest.c

9814

-+++ b/kernel/trace/trace_selftest.c

9815

-@@ -1052,10 +1052,15 @@ static int trace_wakeup_test_thread(void *data)

9816

- {

9817

- 	/* Make this a -deadline thread */

9818

- 	static const struct sched_attr attr = {

9819

-+#ifdef CONFIG_SCHED_ALT

9820

-+		/* No deadline on BMQ/PDS, use RR */

9821

-+		.sched_policy = SCHED_RR,

9822

-+#else

9823

- 		.sched_policy = SCHED_DEADLINE,

9824

- 		.sched_runtime = 100000ULL,

9825

- 		.sched_deadline = 10000000ULL,

9826

- 		.sched_period = 10000000ULL

9827

-+#endif

9828

- 	};

9829

- 	struct wakeup_test_data *x = data;

9830

-

9831

9832

diff --git a/5021_BMQ-and-PDS-gentoo-defaults.patch b/5021_BMQ-and-PDS-gentoo-defaults.patch

9833

deleted file mode 100644

9834

index d449eec4..00000000

9835

--- a/5021_BMQ-and-PDS-gentoo-defaults.patch

9836

+++ /dev/null

9837

@@ -1,13 +0,0 @@

9838

---- a/init/Kconfig	2021-04-27 07:38:30.556467045 -0400

9839

-+++ b/init/Kconfig	2021-04-27 07:39:32.956412800 -0400

9840

-@@ -780,8 +780,9 @@ config GENERIC_SCHED_CLOCK

9841

- menu "Scheduler features"

9842

-

9843

- menuconfig SCHED_ALT

9844

-+	depends on X86_64

9845

- 	bool "Alternative CPU Schedulers"

9846

--	default y

9847

-+	default n

9848

- 	help

9849

- 	  This feature enable alternative CPU scheduler"

9850

-

Gentoo Archives: gentoo-commits