Gentoo Archives: gentoo-commits

From: Mike Pagano <mpagano@g.o>
To: gentoo-commits@l.g.o
Subject: [gentoo-commits] proj/linux-patches:4.4 commit in: /
Date: Sat, 28 Jul 2018 10:37:21
Message-Id: 1532774214.d2c4861cad10ea6344c01b2d2781c3d29a8a2c9b.mpagano@gentoo
1 commit: d2c4861cad10ea6344c01b2d2781c3d29a8a2c9b
2 Author: Mike Pagano <mpagano <AT> gentoo <DOT> org>
3 AuthorDate: Sat Jul 28 10:36:54 2018 +0000
4 Commit: Mike Pagano <mpagano <AT> gentoo <DOT> org>
5 CommitDate: Sat Jul 28 10:36:54 2018 +0000
6 URL: https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=d2c4861c
7
8 Linux patch 4.4.144 and 4.4.145
9
10 0000_README | 8 +
11 1143_linux-4.4.144.patch | 4228 ++++++++++++++++++++++++++++++++++++++++++++++
12 1144_linux-4.4.145.patch | 1006 +++++++++++
13 3 files changed, 5242 insertions(+)
14
15 diff --git a/0000_README b/0000_README
16 index 42e6d1f..5149ed7 100644
17 --- a/0000_README
18 +++ b/0000_README
19 @@ -615,6 +615,14 @@ Patch: 1142_linux-4.4.143.patch
20 From: http://www.kernel.org
21 Desc: Linux 4.4.143
22
23 +Patch: 1143_linux-4.4.144.patch
24 +From: http://www.kernel.org
25 +Desc: Linux 4.4.144
26 +
27 +Patch: 1144_linux-4.4.145.patch
28 +From: http://www.kernel.org
29 +Desc: Linux 4.4.145
30 +
31 Patch: 1500_XATTR_USER_PREFIX.patch
32 From: https://bugs.gentoo.org/show_bug.cgi?id=470644
33 Desc: Support for namespace user.pax.* on tmpfs.
34
35 diff --git a/1143_linux-4.4.144.patch b/1143_linux-4.4.144.patch
36 new file mode 100644
37 index 0000000..d0155cc
38 --- /dev/null
39 +++ b/1143_linux-4.4.144.patch
40 @@ -0,0 +1,4228 @@
41 +diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
42 +index ea6a043f5beb..50f95689ab38 100644
43 +--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
44 ++++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
45 +@@ -276,6 +276,7 @@ What: /sys/devices/system/cpu/vulnerabilities
46 + /sys/devices/system/cpu/vulnerabilities/meltdown
47 + /sys/devices/system/cpu/vulnerabilities/spectre_v1
48 + /sys/devices/system/cpu/vulnerabilities/spectre_v2
49 ++ /sys/devices/system/cpu/vulnerabilities/spec_store_bypass
50 + Date: January 2018
51 + Contact: Linux kernel mailing list <linux-kernel@×××××××××××.org>
52 + Description: Information about CPU vulnerabilities
53 +diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
54 +index e60d0b5809c1..3fd53e193b7f 100644
55 +--- a/Documentation/kernel-parameters.txt
56 ++++ b/Documentation/kernel-parameters.txt
57 +@@ -2460,6 +2460,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
58 + allow data leaks with this option, which is equivalent
59 + to spectre_v2=off.
60 +
61 ++ nospec_store_bypass_disable
62 ++ [HW] Disable all mitigations for the Speculative Store Bypass vulnerability
63 ++
64 + noxsave [BUGS=X86] Disables x86 extended register state save
65 + and restore using xsave. The kernel will fallback to
66 + enabling legacy floating-point and sse state.
67 +@@ -3623,6 +3626,48 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
68 + Not specifying this option is equivalent to
69 + spectre_v2=auto.
70 +
71 ++ spec_store_bypass_disable=
72 ++ [HW] Control Speculative Store Bypass (SSB) Disable mitigation
73 ++ (Speculative Store Bypass vulnerability)
74 ++
75 ++ Certain CPUs are vulnerable to an exploit against a
76 ++ a common industry wide performance optimization known
77 ++ as "Speculative Store Bypass" in which recent stores
78 ++ to the same memory location may not be observed by
79 ++ later loads during speculative execution. The idea
80 ++ is that such stores are unlikely and that they can
81 ++ be detected prior to instruction retirement at the
82 ++ end of a particular speculation execution window.
83 ++
84 ++ In vulnerable processors, the speculatively forwarded
85 ++ store can be used in a cache side channel attack, for
86 ++ example to read memory to which the attacker does not
87 ++ directly have access (e.g. inside sandboxed code).
88 ++
89 ++ This parameter controls whether the Speculative Store
90 ++ Bypass optimization is used.
91 ++
92 ++ on - Unconditionally disable Speculative Store Bypass
93 ++ off - Unconditionally enable Speculative Store Bypass
94 ++ auto - Kernel detects whether the CPU model contains an
95 ++ implementation of Speculative Store Bypass and
96 ++ picks the most appropriate mitigation. If the
97 ++ CPU is not vulnerable, "off" is selected. If the
98 ++ CPU is vulnerable the default mitigation is
99 ++ architecture and Kconfig dependent. See below.
100 ++ prctl - Control Speculative Store Bypass per thread
101 ++ via prctl. Speculative Store Bypass is enabled
102 ++ for a process by default. The state of the control
103 ++ is inherited on fork.
104 ++ seccomp - Same as "prctl" above, but all seccomp threads
105 ++ will disable SSB unless they explicitly opt out.
106 ++
107 ++ Not specifying this option is equivalent to
108 ++ spec_store_bypass_disable=auto.
109 ++
110 ++ Default mitigations:
111 ++ X86: If CONFIG_SECCOMP=y "seccomp", otherwise "prctl"
112 ++
113 + spia_io_base= [HW,MTD]
114 + spia_fio_base=
115 + spia_pedr=
116 +diff --git a/Documentation/spec_ctrl.txt b/Documentation/spec_ctrl.txt
117 +new file mode 100644
118 +index 000000000000..32f3d55c54b7
119 +--- /dev/null
120 ++++ b/Documentation/spec_ctrl.txt
121 +@@ -0,0 +1,94 @@
122 ++===================
123 ++Speculation Control
124 ++===================
125 ++
126 ++Quite some CPUs have speculation-related misfeatures which are in
127 ++fact vulnerabilities causing data leaks in various forms even across
128 ++privilege domains.
129 ++
130 ++The kernel provides mitigation for such vulnerabilities in various
131 ++forms. Some of these mitigations are compile-time configurable and some
132 ++can be supplied on the kernel command line.
133 ++
134 ++There is also a class of mitigations which are very expensive, but they can
135 ++be restricted to a certain set of processes or tasks in controlled
136 ++environments. The mechanism to control these mitigations is via
137 ++:manpage:`prctl(2)`.
138 ++
139 ++There are two prctl options which are related to this:
140 ++
141 ++ * PR_GET_SPECULATION_CTRL
142 ++
143 ++ * PR_SET_SPECULATION_CTRL
144 ++
145 ++PR_GET_SPECULATION_CTRL
146 ++-----------------------
147 ++
148 ++PR_GET_SPECULATION_CTRL returns the state of the speculation misfeature
149 ++which is selected with arg2 of prctl(2). The return value uses bits 0-3 with
150 ++the following meaning:
151 ++
152 ++==== ===================== ===================================================
153 ++Bit Define Description
154 ++==== ===================== ===================================================
155 ++0 PR_SPEC_PRCTL Mitigation can be controlled per task by
156 ++ PR_SET_SPECULATION_CTRL.
157 ++1 PR_SPEC_ENABLE The speculation feature is enabled, mitigation is
158 ++ disabled.
159 ++2 PR_SPEC_DISABLE The speculation feature is disabled, mitigation is
160 ++ enabled.
161 ++3 PR_SPEC_FORCE_DISABLE Same as PR_SPEC_DISABLE, but cannot be undone. A
162 ++ subsequent prctl(..., PR_SPEC_ENABLE) will fail.
163 ++==== ===================== ===================================================
164 ++
165 ++If all bits are 0 the CPU is not affected by the speculation misfeature.
166 ++
167 ++If PR_SPEC_PRCTL is set, then the per-task control of the mitigation is
168 ++available. If not set, prctl(PR_SET_SPECULATION_CTRL) for the speculation
169 ++misfeature will fail.
170 ++
171 ++PR_SET_SPECULATION_CTRL
172 ++-----------------------
173 ++
174 ++PR_SET_SPECULATION_CTRL allows to control the speculation misfeature, which
175 ++is selected by arg2 of :manpage:`prctl(2)` per task. arg3 is used to hand
176 ++in the control value, i.e. either PR_SPEC_ENABLE or PR_SPEC_DISABLE or
177 ++PR_SPEC_FORCE_DISABLE.
178 ++
179 ++Common error codes
180 ++------------------
181 ++======= =================================================================
182 ++Value Meaning
183 ++======= =================================================================
184 ++EINVAL The prctl is not implemented by the architecture or unused
185 ++ prctl(2) arguments are not 0.
186 ++
187 ++ENODEV arg2 is selecting a not supported speculation misfeature.
188 ++======= =================================================================
189 ++
190 ++PR_SET_SPECULATION_CTRL error codes
191 ++-----------------------------------
192 ++======= =================================================================
193 ++Value Meaning
194 ++======= =================================================================
195 ++0 Success
196 ++
197 ++ERANGE arg3 is incorrect, i.e. it's neither PR_SPEC_ENABLE nor
198 ++ PR_SPEC_DISABLE nor PR_SPEC_FORCE_DISABLE.
199 ++
200 ++ENXIO Control of the selected speculation misfeature is not possible.
201 ++ See PR_GET_SPECULATION_CTRL.
202 ++
203 ++EPERM Speculation was disabled with PR_SPEC_FORCE_DISABLE and caller
204 ++ tried to enable it again.
205 ++======= =================================================================
206 ++
207 ++Speculation misfeature controls
208 ++-------------------------------
209 ++- PR_SPEC_STORE_BYPASS: Speculative Store Bypass
210 ++
211 ++ Invocations:
212 ++ * prctl(PR_GET_SPECULATION_CTRL, PR_SPEC_STORE_BYPASS, 0, 0, 0);
213 ++ * prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_STORE_BYPASS, PR_SPEC_ENABLE, 0, 0);
214 ++ * prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_STORE_BYPASS, PR_SPEC_DISABLE, 0, 0);
215 ++ * prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_STORE_BYPASS, PR_SPEC_FORCE_DISABLE, 0, 0);
216 +diff --git a/Makefile b/Makefile
217 +index 54690fee0485..63f3e2438a26 100644
218 +--- a/Makefile
219 ++++ b/Makefile
220 +@@ -1,6 +1,6 @@
221 + VERSION = 4
222 + PATCHLEVEL = 4
223 +-SUBLEVEL = 143
224 ++SUBLEVEL = 144
225 + EXTRAVERSION =
226 + NAME = Blurry Fish Butt
227 +
228 +diff --git a/arch/arc/include/asm/page.h b/arch/arc/include/asm/page.h
229 +index 429957f1c236..8f1145ed0046 100644
230 +--- a/arch/arc/include/asm/page.h
231 ++++ b/arch/arc/include/asm/page.h
232 +@@ -102,7 +102,7 @@ typedef pte_t * pgtable_t;
233 + #define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT)
234 +
235 + /* Default Permissions for stack/heaps pages (Non Executable) */
236 +-#define VM_DATA_DEFAULT_FLAGS (VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE)
237 ++#define VM_DATA_DEFAULT_FLAGS (VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC)
238 +
239 + #define WANT_PAGE_VIRTUAL 1
240 +
241 +diff --git a/arch/arc/include/asm/pgtable.h b/arch/arc/include/asm/pgtable.h
242 +index e5fec320f158..c07d7b0a4058 100644
243 +--- a/arch/arc/include/asm/pgtable.h
244 ++++ b/arch/arc/include/asm/pgtable.h
245 +@@ -372,7 +372,7 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
246 +
247 + /* Decode a PTE containing swap "identifier "into constituents */
248 + #define __swp_type(pte_lookalike) (((pte_lookalike).val) & 0x1f)
249 +-#define __swp_offset(pte_lookalike) ((pte_lookalike).val << 13)
250 ++#define __swp_offset(pte_lookalike) ((pte_lookalike).val >> 13)
251 +
252 + /* NOPs, to keep generic kernel happy */
253 + #define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
254 +diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
255 +index d03bf0e28b8b..48c27c3fdfdb 100644
256 +--- a/arch/x86/entry/entry_64_compat.S
257 ++++ b/arch/x86/entry/entry_64_compat.S
258 +@@ -79,24 +79,33 @@ ENTRY(entry_SYSENTER_compat)
259 + ASM_CLAC /* Clear AC after saving FLAGS */
260 +
261 + pushq $__USER32_CS /* pt_regs->cs */
262 +- xorq %r8,%r8
263 +- pushq %r8 /* pt_regs->ip = 0 (placeholder) */
264 ++ pushq $0 /* pt_regs->ip = 0 (placeholder) */
265 + pushq %rax /* pt_regs->orig_ax */
266 + pushq %rdi /* pt_regs->di */
267 + pushq %rsi /* pt_regs->si */
268 + pushq %rdx /* pt_regs->dx */
269 + pushq %rcx /* pt_regs->cx */
270 + pushq $-ENOSYS /* pt_regs->ax */
271 +- pushq %r8 /* pt_regs->r8 = 0 */
272 +- pushq %r8 /* pt_regs->r9 = 0 */
273 +- pushq %r8 /* pt_regs->r10 = 0 */
274 +- pushq %r8 /* pt_regs->r11 = 0 */
275 ++ pushq $0 /* pt_regs->r8 = 0 */
276 ++ xorq %r8, %r8 /* nospec r8 */
277 ++ pushq $0 /* pt_regs->r9 = 0 */
278 ++ xorq %r9, %r9 /* nospec r9 */
279 ++ pushq $0 /* pt_regs->r10 = 0 */
280 ++ xorq %r10, %r10 /* nospec r10 */
281 ++ pushq $0 /* pt_regs->r11 = 0 */
282 ++ xorq %r11, %r11 /* nospec r11 */
283 + pushq %rbx /* pt_regs->rbx */
284 ++ xorl %ebx, %ebx /* nospec rbx */
285 + pushq %rbp /* pt_regs->rbp (will be overwritten) */
286 +- pushq %r8 /* pt_regs->r12 = 0 */
287 +- pushq %r8 /* pt_regs->r13 = 0 */
288 +- pushq %r8 /* pt_regs->r14 = 0 */
289 +- pushq %r8 /* pt_regs->r15 = 0 */
290 ++ xorl %ebp, %ebp /* nospec rbp */
291 ++ pushq $0 /* pt_regs->r12 = 0 */
292 ++ xorq %r12, %r12 /* nospec r12 */
293 ++ pushq $0 /* pt_regs->r13 = 0 */
294 ++ xorq %r13, %r13 /* nospec r13 */
295 ++ pushq $0 /* pt_regs->r14 = 0 */
296 ++ xorq %r14, %r14 /* nospec r14 */
297 ++ pushq $0 /* pt_regs->r15 = 0 */
298 ++ xorq %r15, %r15 /* nospec r15 */
299 + cld
300 +
301 + /*
302 +@@ -185,17 +194,26 @@ ENTRY(entry_SYSCALL_compat)
303 + pushq %rdx /* pt_regs->dx */
304 + pushq %rbp /* pt_regs->cx (stashed in bp) */
305 + pushq $-ENOSYS /* pt_regs->ax */
306 +- xorq %r8,%r8
307 +- pushq %r8 /* pt_regs->r8 = 0 */
308 +- pushq %r8 /* pt_regs->r9 = 0 */
309 +- pushq %r8 /* pt_regs->r10 = 0 */
310 +- pushq %r8 /* pt_regs->r11 = 0 */
311 ++ pushq $0 /* pt_regs->r8 = 0 */
312 ++ xorq %r8, %r8 /* nospec r8 */
313 ++ pushq $0 /* pt_regs->r9 = 0 */
314 ++ xorq %r9, %r9 /* nospec r9 */
315 ++ pushq $0 /* pt_regs->r10 = 0 */
316 ++ xorq %r10, %r10 /* nospec r10 */
317 ++ pushq $0 /* pt_regs->r11 = 0 */
318 ++ xorq %r11, %r11 /* nospec r11 */
319 + pushq %rbx /* pt_regs->rbx */
320 ++ xorl %ebx, %ebx /* nospec rbx */
321 + pushq %rbp /* pt_regs->rbp (will be overwritten) */
322 +- pushq %r8 /* pt_regs->r12 = 0 */
323 +- pushq %r8 /* pt_regs->r13 = 0 */
324 +- pushq %r8 /* pt_regs->r14 = 0 */
325 +- pushq %r8 /* pt_regs->r15 = 0 */
326 ++ xorl %ebp, %ebp /* nospec rbp */
327 ++ pushq $0 /* pt_regs->r12 = 0 */
328 ++ xorq %r12, %r12 /* nospec r12 */
329 ++ pushq $0 /* pt_regs->r13 = 0 */
330 ++ xorq %r13, %r13 /* nospec r13 */
331 ++ pushq $0 /* pt_regs->r14 = 0 */
332 ++ xorq %r14, %r14 /* nospec r14 */
333 ++ pushq $0 /* pt_regs->r15 = 0 */
334 ++ xorq %r15, %r15 /* nospec r15 */
335 +
336 + /*
337 + * User mode is traced as though IRQs are on, and SYSENTER
338 +@@ -292,17 +310,26 @@ ENTRY(entry_INT80_compat)
339 + pushq %rdx /* pt_regs->dx */
340 + pushq %rcx /* pt_regs->cx */
341 + pushq $-ENOSYS /* pt_regs->ax */
342 +- xorq %r8,%r8
343 +- pushq %r8 /* pt_regs->r8 = 0 */
344 +- pushq %r8 /* pt_regs->r9 = 0 */
345 +- pushq %r8 /* pt_regs->r10 = 0 */
346 +- pushq %r8 /* pt_regs->r11 = 0 */
347 ++ pushq $0 /* pt_regs->r8 = 0 */
348 ++ xorq %r8, %r8 /* nospec r8 */
349 ++ pushq $0 /* pt_regs->r9 = 0 */
350 ++ xorq %r9, %r9 /* nospec r9 */
351 ++ pushq $0 /* pt_regs->r10 = 0 */
352 ++ xorq %r10, %r10 /* nospec r10 */
353 ++ pushq $0 /* pt_regs->r11 = 0 */
354 ++ xorq %r11, %r11 /* nospec r11 */
355 + pushq %rbx /* pt_regs->rbx */
356 ++ xorl %ebx, %ebx /* nospec rbx */
357 + pushq %rbp /* pt_regs->rbp */
358 ++ xorl %ebp, %ebp /* nospec rbp */
359 + pushq %r12 /* pt_regs->r12 */
360 ++ xorq %r12, %r12 /* nospec r12 */
361 + pushq %r13 /* pt_regs->r13 */
362 ++ xorq %r13, %r13 /* nospec r13 */
363 + pushq %r14 /* pt_regs->r14 */
364 ++ xorq %r14, %r14 /* nospec r14 */
365 + pushq %r15 /* pt_regs->r15 */
366 ++ xorq %r15, %r15 /* nospec r15 */
367 + cld
368 +
369 + /*
370 +diff --git a/arch/x86/include/asm/apm.h b/arch/x86/include/asm/apm.h
371 +index 20370c6db74b..3d1ec41ae09a 100644
372 +--- a/arch/x86/include/asm/apm.h
373 ++++ b/arch/x86/include/asm/apm.h
374 +@@ -6,6 +6,8 @@
375 + #ifndef _ASM_X86_MACH_DEFAULT_APM_H
376 + #define _ASM_X86_MACH_DEFAULT_APM_H
377 +
378 ++#include <asm/nospec-branch.h>
379 ++
380 + #ifdef APM_ZERO_SEGS
381 + # define APM_DO_ZERO_SEGS \
382 + "pushl %%ds\n\t" \
383 +@@ -31,6 +33,7 @@ static inline void apm_bios_call_asm(u32 func, u32 ebx_in, u32 ecx_in,
384 + * N.B. We do NOT need a cld after the BIOS call
385 + * because we always save and restore the flags.
386 + */
387 ++ firmware_restrict_branch_speculation_start();
388 + __asm__ __volatile__(APM_DO_ZERO_SEGS
389 + "pushl %%edi\n\t"
390 + "pushl %%ebp\n\t"
391 +@@ -43,6 +46,7 @@ static inline void apm_bios_call_asm(u32 func, u32 ebx_in, u32 ecx_in,
392 + "=S" (*esi)
393 + : "a" (func), "b" (ebx_in), "c" (ecx_in)
394 + : "memory", "cc");
395 ++ firmware_restrict_branch_speculation_end();
396 + }
397 +
398 + static inline u8 apm_bios_call_simple_asm(u32 func, u32 ebx_in,
399 +@@ -55,6 +59,7 @@ static inline u8 apm_bios_call_simple_asm(u32 func, u32 ebx_in,
400 + * N.B. We do NOT need a cld after the BIOS call
401 + * because we always save and restore the flags.
402 + */
403 ++ firmware_restrict_branch_speculation_start();
404 + __asm__ __volatile__(APM_DO_ZERO_SEGS
405 + "pushl %%edi\n\t"
406 + "pushl %%ebp\n\t"
407 +@@ -67,6 +72,7 @@ static inline u8 apm_bios_call_simple_asm(u32 func, u32 ebx_in,
408 + "=S" (si)
409 + : "a" (func), "b" (ebx_in), "c" (ecx_in)
410 + : "memory", "cc");
411 ++ firmware_restrict_branch_speculation_end();
412 + return error;
413 + }
414 +
415 +diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
416 +index e3a6f66d288c..7f5dcb64cedb 100644
417 +--- a/arch/x86/include/asm/barrier.h
418 ++++ b/arch/x86/include/asm/barrier.h
419 +@@ -40,7 +40,7 @@ static inline unsigned long array_index_mask_nospec(unsigned long index,
420 +
421 + asm volatile ("cmp %1,%2; sbb %0,%0;"
422 + :"=r" (mask)
423 +- :"r"(size),"r" (index)
424 ++ :"g"(size),"r" (index)
425 + :"cc");
426 + return mask;
427 + }
428 +diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
429 +index dd0089841a0f..d72c1db64679 100644
430 +--- a/arch/x86/include/asm/cpufeature.h
431 ++++ b/arch/x86/include/asm/cpufeature.h
432 +@@ -28,6 +28,7 @@ enum cpuid_leafs
433 + CPUID_8000_000A_EDX,
434 + CPUID_7_ECX,
435 + CPUID_8000_0007_EBX,
436 ++ CPUID_7_EDX,
437 + };
438 +
439 + #ifdef CONFIG_X86_FEATURE_NAMES
440 +@@ -78,8 +79,9 @@ extern const char * const x86_bug_flags[NBUGINTS*32];
441 + CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 15, feature_bit) || \
442 + CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 16, feature_bit) || \
443 + CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 17, feature_bit) || \
444 ++ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 18, feature_bit) || \
445 + REQUIRED_MASK_CHECK || \
446 +- BUILD_BUG_ON_ZERO(NCAPINTS != 18))
447 ++ BUILD_BUG_ON_ZERO(NCAPINTS != 19))
448 +
449 + #define DISABLED_MASK_BIT_SET(feature_bit) \
450 + ( CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 0, feature_bit) || \
451 +@@ -100,8 +102,9 @@ extern const char * const x86_bug_flags[NBUGINTS*32];
452 + CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 15, feature_bit) || \
453 + CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 16, feature_bit) || \
454 + CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 17, feature_bit) || \
455 ++ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 18, feature_bit) || \
456 + DISABLED_MASK_CHECK || \
457 +- BUILD_BUG_ON_ZERO(NCAPINTS != 18))
458 ++ BUILD_BUG_ON_ZERO(NCAPINTS != 19))
459 +
460 + #define cpu_has(c, bit) \
461 + (__builtin_constant_p(bit) && REQUIRED_MASK_BIT_SET(bit) ? 1 : \
462 +diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
463 +index 205ce70c1d6c..f4b175db70f4 100644
464 +--- a/arch/x86/include/asm/cpufeatures.h
465 ++++ b/arch/x86/include/asm/cpufeatures.h
466 +@@ -12,7 +12,7 @@
467 + /*
468 + * Defines x86 CPU feature bits
469 + */
470 +-#define NCAPINTS 18 /* N 32-bit words worth of info */
471 ++#define NCAPINTS 19 /* N 32-bit words worth of info */
472 + #define NBUGINTS 1 /* N 32-bit bug flags */
473 +
474 + /*
475 +@@ -194,13 +194,28 @@
476 + #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */
477 +
478 + #define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */
479 +-#define X86_FEATURE_RSB_CTXSW ( 7*32+19) /* Fill RSB on context switches */
480 ++#define X86_FEATURE_RSB_CTXSW ( 7*32+19) /* "" Fill RSB on context switches */
481 ++
482 ++#define X86_FEATURE_RETPOLINE ( 7*32+29) /* "" Generic Retpoline mitigation for Spectre variant 2 */
483 ++#define X86_FEATURE_RETPOLINE_AMD ( 7*32+30) /* "" AMD Retpoline mitigation for Spectre variant 2 */
484 ++
485 ++#define X86_FEATURE_MSR_SPEC_CTRL ( 7*32+16) /* "" MSR SPEC_CTRL is implemented */
486 ++#define X86_FEATURE_SSBD ( 7*32+17) /* Speculative Store Bypass Disable */
487 +
488 +-#define X86_FEATURE_RETPOLINE ( 7*32+29) /* Generic Retpoline mitigation for Spectre variant 2 */
489 +-#define X86_FEATURE_RETPOLINE_AMD ( 7*32+30) /* AMD Retpoline mitigation for Spectre variant 2 */
490 + /* Because the ALTERNATIVE scheme is for members of the X86_FEATURE club... */
491 + #define X86_FEATURE_KAISER ( 7*32+31) /* CONFIG_PAGE_TABLE_ISOLATION w/o nokaiser */
492 +
493 ++#define X86_FEATURE_USE_IBPB ( 7*32+21) /* "" Indirect Branch Prediction Barrier enabled*/
494 ++#define X86_FEATURE_USE_IBRS_FW ( 7*32+22) /* "" Use IBRS during runtime firmware calls */
495 ++#define X86_FEATURE_SPEC_STORE_BYPASS_DISABLE ( 7*32+23) /* "" Disable Speculative Store Bypass. */
496 ++#define X86_FEATURE_LS_CFG_SSBD ( 7*32+24) /* "" AMD SSBD implementation */
497 ++
498 ++#define X86_FEATURE_IBRS ( 7*32+25) /* Indirect Branch Restricted Speculation */
499 ++#define X86_FEATURE_IBPB ( 7*32+26) /* Indirect Branch Prediction Barrier */
500 ++#define X86_FEATURE_STIBP ( 7*32+27) /* Single Thread Indirect Branch Predictors */
501 ++#define X86_FEATURE_ZEN ( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */
502 ++
503 ++
504 + /* Virtualization flags: Linux defined, word 8 */
505 + #define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
506 + #define X86_FEATURE_VNMI ( 8*32+ 1) /* Intel Virtual NMI */
507 +@@ -251,6 +266,10 @@
508 +
509 + /* AMD-defined CPU features, CPUID level 0x80000008 (ebx), word 13 */
510 + #define X86_FEATURE_CLZERO (13*32+0) /* CLZERO instruction */
511 ++#define X86_FEATURE_AMD_IBPB (13*32+12) /* Indirect Branch Prediction Barrier */
512 ++#define X86_FEATURE_AMD_IBRS (13*32+14) /* Indirect Branch Restricted Speculation */
513 ++#define X86_FEATURE_AMD_STIBP (13*32+15) /* Single Thread Indirect Branch Predictors */
514 ++#define X86_FEATURE_VIRT_SSBD (13*32+25) /* Virtualized Speculative Store Bypass Disable */
515 +
516 + /* Thermal and Power Management Leaf, CPUID level 0x00000006 (eax), word 14 */
517 + #define X86_FEATURE_DTHERM (14*32+ 0) /* Digital Thermal Sensor */
518 +@@ -285,6 +304,15 @@
519 + #define X86_FEATURE_SUCCOR (17*32+1) /* Uncorrectable error containment and recovery */
520 + #define X86_FEATURE_SMCA (17*32+3) /* Scalable MCA */
521 +
522 ++
523 ++/* Intel-defined CPU features, CPUID level 0x00000007:0 (EDX), word 18 */
524 ++#define X86_FEATURE_AVX512_4VNNIW (18*32+ 2) /* AVX-512 Neural Network Instructions */
525 ++#define X86_FEATURE_AVX512_4FMAPS (18*32+ 3) /* AVX-512 Multiply Accumulation Single precision */
526 ++#define X86_FEATURE_SPEC_CTRL (18*32+26) /* "" Speculation Control (IBRS + IBPB) */
527 ++#define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */
528 ++#define X86_FEATURE_ARCH_CAPABILITIES (18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
529 ++#define X86_FEATURE_SPEC_CTRL_SSBD (18*32+31) /* "" Speculative Store Bypass Disable */
530 ++
531 + /*
532 + * BUG word(s)
533 + */
534 +@@ -302,5 +330,6 @@
535 + #define X86_BUG_CPU_MELTDOWN X86_BUG(14) /* CPU is affected by meltdown attack and needs kernel page table isolation */
536 + #define X86_BUG_SPECTRE_V1 X86_BUG(15) /* CPU is affected by Spectre variant 1 attack with conditional branches */
537 + #define X86_BUG_SPECTRE_V2 X86_BUG(16) /* CPU is affected by Spectre variant 2 attack with indirect branches */
538 ++#define X86_BUG_SPEC_STORE_BYPASS X86_BUG(17) /* CPU is affected by speculative store bypass attack */
539 +
540 + #endif /* _ASM_X86_CPUFEATURES_H */
541 +diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
542 +index 21c5ac15657b..1f8cca459c6c 100644
543 +--- a/arch/x86/include/asm/disabled-features.h
544 ++++ b/arch/x86/include/asm/disabled-features.h
545 +@@ -59,6 +59,7 @@
546 + #define DISABLED_MASK15 0
547 + #define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE)
548 + #define DISABLED_MASK17 0
549 +-#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 18)
550 ++#define DISABLED_MASK18 0
551 ++#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19)
552 +
553 + #endif /* _ASM_X86_DISABLED_FEATURES_H */
554 +diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
555 +index 0010c78c4998..7e5a2ffb6938 100644
556 +--- a/arch/x86/include/asm/efi.h
557 ++++ b/arch/x86/include/asm/efi.h
558 +@@ -3,6 +3,7 @@
559 +
560 + #include <asm/fpu/api.h>
561 + #include <asm/pgtable.h>
562 ++#include <asm/nospec-branch.h>
563 +
564 + /*
565 + * We map the EFI regions needed for runtime services non-contiguously,
566 +@@ -39,8 +40,10 @@ extern unsigned long asmlinkage efi_call_phys(void *, ...);
567 + ({ \
568 + efi_status_t __s; \
569 + kernel_fpu_begin(); \
570 ++ firmware_restrict_branch_speculation_start(); \
571 + __s = ((efi_##f##_t __attribute__((regparm(0)))*) \
572 + efi.systab->runtime->f)(args); \
573 ++ firmware_restrict_branch_speculation_end(); \
574 + kernel_fpu_end(); \
575 + __s; \
576 + })
577 +@@ -49,8 +52,10 @@ extern unsigned long asmlinkage efi_call_phys(void *, ...);
578 + #define __efi_call_virt(f, args...) \
579 + ({ \
580 + kernel_fpu_begin(); \
581 ++ firmware_restrict_branch_speculation_start(); \
582 + ((efi_##f##_t __attribute__((regparm(0)))*) \
583 + efi.systab->runtime->f)(args); \
584 ++ firmware_restrict_branch_speculation_end(); \
585 + kernel_fpu_end(); \
586 + })
587 +
588 +@@ -71,7 +76,9 @@ extern u64 asmlinkage efi_call(void *fp, ...);
589 + efi_sync_low_kernel_mappings(); \
590 + preempt_disable(); \
591 + __kernel_fpu_begin(); \
592 ++ firmware_restrict_branch_speculation_start(); \
593 + __s = efi_call((void *)efi.systab->runtime->f, __VA_ARGS__); \
594 ++ firmware_restrict_branch_speculation_end(); \
595 + __kernel_fpu_end(); \
596 + preempt_enable(); \
597 + __s; \
598 +diff --git a/arch/x86/include/asm/intel-family.h b/arch/x86/include/asm/intel-family.h
599 +index 6999f7d01a0d..e13ff5a14633 100644
600 +--- a/arch/x86/include/asm/intel-family.h
601 ++++ b/arch/x86/include/asm/intel-family.h
602 +@@ -12,6 +12,7 @@
603 + */
604 +
605 + #define INTEL_FAM6_CORE_YONAH 0x0E
606 ++
607 + #define INTEL_FAM6_CORE2_MEROM 0x0F
608 + #define INTEL_FAM6_CORE2_MEROM_L 0x16
609 + #define INTEL_FAM6_CORE2_PENRYN 0x17
610 +@@ -20,6 +21,7 @@
611 + #define INTEL_FAM6_NEHALEM 0x1E
612 + #define INTEL_FAM6_NEHALEM_EP 0x1A
613 + #define INTEL_FAM6_NEHALEM_EX 0x2E
614 ++
615 + #define INTEL_FAM6_WESTMERE 0x25
616 + #define INTEL_FAM6_WESTMERE2 0x1F
617 + #define INTEL_FAM6_WESTMERE_EP 0x2C
618 +@@ -36,9 +38,9 @@
619 + #define INTEL_FAM6_HASWELL_GT3E 0x46
620 +
621 + #define INTEL_FAM6_BROADWELL_CORE 0x3D
622 +-#define INTEL_FAM6_BROADWELL_XEON_D 0x56
623 + #define INTEL_FAM6_BROADWELL_GT3E 0x47
624 + #define INTEL_FAM6_BROADWELL_X 0x4F
625 ++#define INTEL_FAM6_BROADWELL_XEON_D 0x56
626 +
627 + #define INTEL_FAM6_SKYLAKE_MOBILE 0x4E
628 + #define INTEL_FAM6_SKYLAKE_DESKTOP 0x5E
629 +@@ -56,13 +58,15 @@
630 + #define INTEL_FAM6_ATOM_SILVERMONT1 0x37 /* BayTrail/BYT / Valleyview */
631 + #define INTEL_FAM6_ATOM_SILVERMONT2 0x4D /* Avaton/Rangely */
632 + #define INTEL_FAM6_ATOM_AIRMONT 0x4C /* CherryTrail / Braswell */
633 +-#define INTEL_FAM6_ATOM_MERRIFIELD1 0x4A /* Tangier */
634 +-#define INTEL_FAM6_ATOM_MERRIFIELD2 0x5A /* Annidale */
635 ++#define INTEL_FAM6_ATOM_MERRIFIELD 0x4A /* Tangier */
636 ++#define INTEL_FAM6_ATOM_MOOREFIELD 0x5A /* Annidale */
637 + #define INTEL_FAM6_ATOM_GOLDMONT 0x5C
638 + #define INTEL_FAM6_ATOM_DENVERTON 0x5F /* Goldmont Microserver */
639 ++#define INTEL_FAM6_ATOM_GEMINI_LAKE 0x7A
640 +
641 + /* Xeon Phi */
642 +
643 + #define INTEL_FAM6_XEON_PHI_KNL 0x57 /* Knights Landing */
644 ++#define INTEL_FAM6_XEON_PHI_KNM 0x85 /* Knights Mill */
645 +
646 + #endif /* _ASM_X86_INTEL_FAMILY_H */
647 +diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
648 +index b77f5edb03b0..0056bc945cd1 100644
649 +--- a/arch/x86/include/asm/irqflags.h
650 ++++ b/arch/x86/include/asm/irqflags.h
651 +@@ -8,7 +8,7 @@
652 + * Interrupt control:
653 + */
654 +
655 +-static inline unsigned long native_save_fl(void)
656 ++extern inline unsigned long native_save_fl(void)
657 + {
658 + unsigned long flags;
659 +
660 +diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
661 +index 7680b76adafc..3359dfedc7ee 100644
662 +--- a/arch/x86/include/asm/mmu.h
663 ++++ b/arch/x86/include/asm/mmu.h
664 +@@ -3,12 +3,18 @@
665 +
666 + #include <linux/spinlock.h>
667 + #include <linux/mutex.h>
668 ++#include <linux/atomic.h>
669 +
670 + /*
671 +- * The x86 doesn't have a mmu context, but
672 +- * we put the segment information here.
673 ++ * x86 has arch-specific MMU state beyond what lives in mm_struct.
674 + */
675 + typedef struct {
676 ++ /*
677 ++ * ctx_id uniquely identifies this mm_struct. A ctx_id will never
678 ++ * be reused, and zero is not a valid ctx_id.
679 ++ */
680 ++ u64 ctx_id;
681 ++
682 + #ifdef CONFIG_MODIFY_LDT_SYSCALL
683 + struct ldt_struct *ldt;
684 + #endif
685 +@@ -24,6 +30,11 @@ typedef struct {
686 + atomic_t perf_rdpmc_allowed; /* nonzero if rdpmc is allowed */
687 + } mm_context_t;
688 +
689 ++#define INIT_MM_CONTEXT(mm) \
690 ++ .context = { \
691 ++ .ctx_id = 1, \
692 ++ }
693 ++
694 + void leave_mm(int cpu);
695 +
696 + #endif /* _ASM_X86_MMU_H */
697 +diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
698 +index 9bfc5fd77015..effc12767cbf 100644
699 +--- a/arch/x86/include/asm/mmu_context.h
700 ++++ b/arch/x86/include/asm/mmu_context.h
701 +@@ -11,6 +11,9 @@
702 + #include <asm/tlbflush.h>
703 + #include <asm/paravirt.h>
704 + #include <asm/mpx.h>
705 ++
706 ++extern atomic64_t last_mm_ctx_id;
707 ++
708 + #ifndef CONFIG_PARAVIRT
709 + static inline void paravirt_activate_mm(struct mm_struct *prev,
710 + struct mm_struct *next)
711 +@@ -52,15 +55,15 @@ struct ldt_struct {
712 + /*
713 + * Used for LDT copy/destruction.
714 + */
715 +-int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
716 +-void destroy_context(struct mm_struct *mm);
717 ++int init_new_context_ldt(struct task_struct *tsk, struct mm_struct *mm);
718 ++void destroy_context_ldt(struct mm_struct *mm);
719 + #else /* CONFIG_MODIFY_LDT_SYSCALL */
720 +-static inline int init_new_context(struct task_struct *tsk,
721 +- struct mm_struct *mm)
722 ++static inline int init_new_context_ldt(struct task_struct *tsk,
723 ++ struct mm_struct *mm)
724 + {
725 + return 0;
726 + }
727 +-static inline void destroy_context(struct mm_struct *mm) {}
728 ++static inline void destroy_context_ldt(struct mm_struct *mm) {}
729 + #endif
730 +
731 + static inline void load_mm_ldt(struct mm_struct *mm)
732 +@@ -102,6 +105,18 @@ static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
733 + this_cpu_write(cpu_tlbstate.state, TLBSTATE_LAZY);
734 + }
735 +
736 ++static inline int init_new_context(struct task_struct *tsk,
737 ++ struct mm_struct *mm)
738 ++{
739 ++ mm->context.ctx_id = atomic64_inc_return(&last_mm_ctx_id);
740 ++ init_new_context_ldt(tsk, mm);
741 ++ return 0;
742 ++}
743 ++static inline void destroy_context(struct mm_struct *mm)
744 ++{
745 ++ destroy_context_ldt(mm);
746 ++}
747 ++
748 + extern void switch_mm(struct mm_struct *prev, struct mm_struct *next,
749 + struct task_struct *tsk);
750 +
751 +diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
752 +index b8911aecf035..caa00191e565 100644
753 +--- a/arch/x86/include/asm/msr-index.h
754 ++++ b/arch/x86/include/asm/msr-index.h
755 +@@ -32,6 +32,15 @@
756 + #define EFER_FFXSR (1<<_EFER_FFXSR)
757 +
758 + /* Intel MSRs. Some also available on other CPUs */
759 ++#define MSR_IA32_SPEC_CTRL 0x00000048 /* Speculation Control */
760 ++#define SPEC_CTRL_IBRS (1 << 0) /* Indirect Branch Restricted Speculation */
761 ++#define SPEC_CTRL_STIBP (1 << 1) /* Single Thread Indirect Branch Predictors */
762 ++#define SPEC_CTRL_SSBD_SHIFT 2 /* Speculative Store Bypass Disable bit */
763 ++#define SPEC_CTRL_SSBD (1 << SPEC_CTRL_SSBD_SHIFT) /* Speculative Store Bypass Disable */
764 ++
765 ++#define MSR_IA32_PRED_CMD 0x00000049 /* Prediction Command */
766 ++#define PRED_CMD_IBPB (1 << 0) /* Indirect Branch Prediction Barrier */
767 ++
768 + #define MSR_IA32_PERFCTR0 0x000000c1
769 + #define MSR_IA32_PERFCTR1 0x000000c2
770 + #define MSR_FSB_FREQ 0x000000cd
771 +@@ -45,6 +54,16 @@
772 + #define SNB_C3_AUTO_UNDEMOTE (1UL << 28)
773 +
774 + #define MSR_MTRRcap 0x000000fe
775 ++
776 ++#define MSR_IA32_ARCH_CAPABILITIES 0x0000010a
777 ++#define ARCH_CAP_RDCL_NO (1 << 0) /* Not susceptible to Meltdown */
778 ++#define ARCH_CAP_IBRS_ALL (1 << 1) /* Enhanced IBRS support */
779 ++#define ARCH_CAP_SSB_NO (1 << 4) /*
780 ++ * Not susceptible to Speculative Store Bypass
781 ++ * attack, so no Speculative Store Bypass
782 ++ * control required.
783 ++ */
784 ++
785 + #define MSR_IA32_BBL_CR_CTL 0x00000119
786 + #define MSR_IA32_BBL_CR_CTL3 0x0000011e
787 +
788 +@@ -132,6 +151,7 @@
789 +
790 + /* DEBUGCTLMSR bits (others vary by model): */
791 + #define DEBUGCTLMSR_LBR (1UL << 0) /* last branch recording */
792 ++#define DEBUGCTLMSR_BTF_SHIFT 1
793 + #define DEBUGCTLMSR_BTF (1UL << 1) /* single-step on branches */
794 + #define DEBUGCTLMSR_TR (1UL << 6)
795 + #define DEBUGCTLMSR_BTS (1UL << 7)
796 +@@ -308,6 +328,8 @@
797 + #define MSR_AMD64_IBSOPDATA4 0xc001103d
798 + #define MSR_AMD64_IBS_REG_COUNT_MAX 8 /* includes MSR_AMD64_IBSBRTARGET */
799 +
800 ++#define MSR_AMD64_VIRT_SPEC_CTRL 0xc001011f
801 ++
802 + /* Fam 16h MSRs */
803 + #define MSR_F16H_L2I_PERF_CTL 0xc0010230
804 + #define MSR_F16H_L2I_PERF_CTR 0xc0010231
805 +diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
806 +index 8b910416243c..b4c74c24c890 100644
807 +--- a/arch/x86/include/asm/nospec-branch.h
808 ++++ b/arch/x86/include/asm/nospec-branch.h
809 +@@ -6,6 +6,7 @@
810 + #include <asm/alternative.h>
811 + #include <asm/alternative-asm.h>
812 + #include <asm/cpufeatures.h>
813 ++#include <asm/msr-index.h>
814 +
815 + /*
816 + * Fill the CPU return stack buffer.
817 +@@ -171,6 +172,14 @@ enum spectre_v2_mitigation {
818 + SPECTRE_V2_IBRS,
819 + };
820 +
821 ++/* The Speculative Store Bypass disable variants */
822 ++enum ssb_mitigation {
823 ++ SPEC_STORE_BYPASS_NONE,
824 ++ SPEC_STORE_BYPASS_DISABLE,
825 ++ SPEC_STORE_BYPASS_PRCTL,
826 ++ SPEC_STORE_BYPASS_SECCOMP,
827 ++};
828 ++
829 + extern char __indirect_thunk_start[];
830 + extern char __indirect_thunk_end[];
831 +
832 +@@ -194,6 +203,51 @@ static inline void vmexit_fill_RSB(void)
833 + #endif
834 + }
835 +
836 ++static __always_inline
837 ++void alternative_msr_write(unsigned int msr, u64 val, unsigned int feature)
838 ++{
839 ++ asm volatile(ALTERNATIVE("", "wrmsr", %c[feature])
840 ++ : : "c" (msr),
841 ++ "a" ((u32)val),
842 ++ "d" ((u32)(val >> 32)),
843 ++ [feature] "i" (feature)
844 ++ : "memory");
845 ++}
846 ++
847 ++static inline void indirect_branch_prediction_barrier(void)
848 ++{
849 ++ u64 val = PRED_CMD_IBPB;
850 ++
851 ++ alternative_msr_write(MSR_IA32_PRED_CMD, val, X86_FEATURE_USE_IBPB);
852 ++}
853 ++
854 ++/* The Intel SPEC CTRL MSR base value cache */
855 ++extern u64 x86_spec_ctrl_base;
856 ++
857 ++/*
858 ++ * With retpoline, we must use IBRS to restrict branch prediction
859 ++ * before calling into firmware.
860 ++ *
861 ++ * (Implemented as CPP macros due to header hell.)
862 ++ */
863 ++#define firmware_restrict_branch_speculation_start() \
864 ++do { \
865 ++ u64 val = x86_spec_ctrl_base | SPEC_CTRL_IBRS; \
866 ++ \
867 ++ preempt_disable(); \
868 ++ alternative_msr_write(MSR_IA32_SPEC_CTRL, val, \
869 ++ X86_FEATURE_USE_IBRS_FW); \
870 ++} while (0)
871 ++
872 ++#define firmware_restrict_branch_speculation_end() \
873 ++do { \
874 ++ u64 val = x86_spec_ctrl_base; \
875 ++ \
876 ++ alternative_msr_write(MSR_IA32_SPEC_CTRL, val, \
877 ++ X86_FEATURE_USE_IBRS_FW); \
878 ++ preempt_enable(); \
879 ++} while (0)
880 ++
881 + #endif /* __ASSEMBLY__ */
882 +
883 + /*
884 +diff --git a/arch/x86/include/asm/required-features.h b/arch/x86/include/asm/required-features.h
885 +index fac9a5c0abe9..6847d85400a8 100644
886 +--- a/arch/x86/include/asm/required-features.h
887 ++++ b/arch/x86/include/asm/required-features.h
888 +@@ -100,6 +100,7 @@
889 + #define REQUIRED_MASK15 0
890 + #define REQUIRED_MASK16 0
891 + #define REQUIRED_MASK17 0
892 +-#define REQUIRED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 18)
893 ++#define REQUIRED_MASK18 0
894 ++#define REQUIRED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19)
895 +
896 + #endif /* _ASM_X86_REQUIRED_FEATURES_H */
897 +diff --git a/arch/x86/include/asm/spec-ctrl.h b/arch/x86/include/asm/spec-ctrl.h
898 +new file mode 100644
899 +index 000000000000..ae7c2c5cd7f0
900 +--- /dev/null
901 ++++ b/arch/x86/include/asm/spec-ctrl.h
902 +@@ -0,0 +1,80 @@
903 ++/* SPDX-License-Identifier: GPL-2.0 */
904 ++#ifndef _ASM_X86_SPECCTRL_H_
905 ++#define _ASM_X86_SPECCTRL_H_
906 ++
907 ++#include <linux/thread_info.h>
908 ++#include <asm/nospec-branch.h>
909 ++
910 ++/*
911 ++ * On VMENTER we must preserve whatever view of the SPEC_CTRL MSR
912 ++ * the guest has, while on VMEXIT we restore the host view. This
913 ++ * would be easier if SPEC_CTRL were architecturally maskable or
914 ++ * shadowable for guests but this is not (currently) the case.
915 ++ * Takes the guest view of SPEC_CTRL MSR as a parameter and also
916 ++ * the guest's version of VIRT_SPEC_CTRL, if emulated.
917 ++ */
918 ++extern void x86_virt_spec_ctrl(u64 guest_spec_ctrl, u64 guest_virt_spec_ctrl, bool guest);
919 ++
920 ++/**
921 ++ * x86_spec_ctrl_set_guest - Set speculation control registers for the guest
922 ++ * @guest_spec_ctrl: The guest content of MSR_SPEC_CTRL
923 ++ * @guest_virt_spec_ctrl: The guest controlled bits of MSR_VIRT_SPEC_CTRL
924 ++ * (may get translated to MSR_AMD64_LS_CFG bits)
925 ++ *
926 ++ * Avoids writing to the MSR if the content/bits are the same
927 ++ */
928 ++static inline
929 ++void x86_spec_ctrl_set_guest(u64 guest_spec_ctrl, u64 guest_virt_spec_ctrl)
930 ++{
931 ++ x86_virt_spec_ctrl(guest_spec_ctrl, guest_virt_spec_ctrl, true);
932 ++}
933 ++
934 ++/**
935 ++ * x86_spec_ctrl_restore_host - Restore host speculation control registers
936 ++ * @guest_spec_ctrl: The guest content of MSR_SPEC_CTRL
937 ++ * @guest_virt_spec_ctrl: The guest controlled bits of MSR_VIRT_SPEC_CTRL
938 ++ * (may get translated to MSR_AMD64_LS_CFG bits)
939 ++ *
940 ++ * Avoids writing to the MSR if the content/bits are the same
941 ++ */
942 ++static inline
943 ++void x86_spec_ctrl_restore_host(u64 guest_spec_ctrl, u64 guest_virt_spec_ctrl)
944 ++{
945 ++ x86_virt_spec_ctrl(guest_spec_ctrl, guest_virt_spec_ctrl, false);
946 ++}
947 ++
948 ++/* AMD specific Speculative Store Bypass MSR data */
949 ++extern u64 x86_amd_ls_cfg_base;
950 ++extern u64 x86_amd_ls_cfg_ssbd_mask;
951 ++
952 ++static inline u64 ssbd_tif_to_spec_ctrl(u64 tifn)
953 ++{
954 ++ BUILD_BUG_ON(TIF_SSBD < SPEC_CTRL_SSBD_SHIFT);
955 ++ return (tifn & _TIF_SSBD) >> (TIF_SSBD - SPEC_CTRL_SSBD_SHIFT);
956 ++}
957 ++
958 ++static inline unsigned long ssbd_spec_ctrl_to_tif(u64 spec_ctrl)
959 ++{
960 ++ BUILD_BUG_ON(TIF_SSBD < SPEC_CTRL_SSBD_SHIFT);
961 ++ return (spec_ctrl & SPEC_CTRL_SSBD) << (TIF_SSBD - SPEC_CTRL_SSBD_SHIFT);
962 ++}
963 ++
964 ++static inline u64 ssbd_tif_to_amd_ls_cfg(u64 tifn)
965 ++{
966 ++ return (tifn & _TIF_SSBD) ? x86_amd_ls_cfg_ssbd_mask : 0ULL;
967 ++}
968 ++
969 ++#ifdef CONFIG_SMP
970 ++extern void speculative_store_bypass_ht_init(void);
971 ++#else
972 ++static inline void speculative_store_bypass_ht_init(void) { }
973 ++#endif
974 ++
975 ++extern void speculative_store_bypass_update(unsigned long tif);
976 ++
977 ++static inline void speculative_store_bypass_update_current(void)
978 ++{
979 ++ speculative_store_bypass_update(current_thread_info()->flags);
980 ++}
981 ++
982 ++#endif
983 +diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
984 +index 18c9aaa8c043..a96e88b243ef 100644
985 +--- a/arch/x86/include/asm/thread_info.h
986 ++++ b/arch/x86/include/asm/thread_info.h
987 +@@ -92,6 +92,7 @@ struct thread_info {
988 + #define TIF_SIGPENDING 2 /* signal pending */
989 + #define TIF_NEED_RESCHED 3 /* rescheduling necessary */
990 + #define TIF_SINGLESTEP 4 /* reenable singlestep on user return*/
991 ++#define TIF_SSBD 5 /* Reduced data speculation */
992 + #define TIF_SYSCALL_EMU 6 /* syscall emulation active */
993 + #define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */
994 + #define TIF_SECCOMP 8 /* secure computing */
995 +@@ -114,8 +115,9 @@ struct thread_info {
996 + #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
997 + #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
998 + #define _TIF_SIGPENDING (1 << TIF_SIGPENDING)
999 +-#define _TIF_SINGLESTEP (1 << TIF_SINGLESTEP)
1000 + #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED)
1001 ++#define _TIF_SINGLESTEP (1 << TIF_SINGLESTEP)
1002 ++#define _TIF_SSBD (1 << TIF_SSBD)
1003 + #define _TIF_SYSCALL_EMU (1 << TIF_SYSCALL_EMU)
1004 + #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
1005 + #define _TIF_SECCOMP (1 << TIF_SECCOMP)
1006 +@@ -147,7 +149,7 @@ struct thread_info {
1007 +
1008 + /* flags to check in __switch_to() */
1009 + #define _TIF_WORK_CTXSW \
1010 +- (_TIF_IO_BITMAP|_TIF_NOTSC|_TIF_BLOCKSTEP)
1011 ++ (_TIF_IO_BITMAP|_TIF_NOTSC|_TIF_BLOCKSTEP|_TIF_SSBD)
1012 +
1013 + #define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW|_TIF_USER_RETURN_NOTIFY)
1014 + #define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW)
1015 +diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
1016 +index e2a89d2577fb..72cfe3e53af1 100644
1017 +--- a/arch/x86/include/asm/tlbflush.h
1018 ++++ b/arch/x86/include/asm/tlbflush.h
1019 +@@ -68,6 +68,8 @@ static inline void invpcid_flush_all_nonglobals(void)
1020 + struct tlb_state {
1021 + struct mm_struct *active_mm;
1022 + int state;
1023 ++ /* last user mm's ctx id */
1024 ++ u64 last_ctx_id;
1025 +
1026 + /*
1027 + * Access to this CR4 shadow and to H/W CR4 is protected by
1028 +@@ -109,6 +111,16 @@ static inline void cr4_clear_bits(unsigned long mask)
1029 + }
1030 + }
1031 +
1032 ++static inline void cr4_toggle_bits(unsigned long mask)
1033 ++{
1034 ++ unsigned long cr4;
1035 ++
1036 ++ cr4 = this_cpu_read(cpu_tlbstate.cr4);
1037 ++ cr4 ^= mask;
1038 ++ this_cpu_write(cpu_tlbstate.cr4, cr4);
1039 ++ __write_cr4(cr4);
1040 ++}
1041 ++
1042 + /* Read the CR4 shadow. */
1043 + static inline unsigned long cr4_read_shadow(void)
1044 + {
1045 +diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
1046 +index b1b78ffe01d0..7947cee61f61 100644
1047 +--- a/arch/x86/kernel/Makefile
1048 ++++ b/arch/x86/kernel/Makefile
1049 +@@ -41,6 +41,7 @@ obj-y += alternative.o i8253.o pci-nommu.o hw_breakpoint.o
1050 + obj-y += tsc.o tsc_msr.o io_delay.o rtc.o
1051 + obj-y += pci-iommu_table.o
1052 + obj-y += resource.o
1053 ++obj-y += irqflags.o
1054 +
1055 + obj-y += process.o
1056 + obj-y += fpu/
1057 +diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
1058 +index f4fb8f5b0be4..9f6151884249 100644
1059 +--- a/arch/x86/kernel/cpu/amd.c
1060 ++++ b/arch/x86/kernel/cpu/amd.c
1061 +@@ -9,6 +9,7 @@
1062 + #include <asm/processor.h>
1063 + #include <asm/apic.h>
1064 + #include <asm/cpu.h>
1065 ++#include <asm/spec-ctrl.h>
1066 + #include <asm/smp.h>
1067 + #include <asm/pci-direct.h>
1068 + #include <asm/delay.h>
1069 +@@ -519,6 +520,26 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
1070 +
1071 + if (cpu_has(c, X86_FEATURE_MWAITX))
1072 + use_mwaitx_delay();
1073 ++
1074 ++ if (c->x86 >= 0x15 && c->x86 <= 0x17) {
1075 ++ unsigned int bit;
1076 ++
1077 ++ switch (c->x86) {
1078 ++ case 0x15: bit = 54; break;
1079 ++ case 0x16: bit = 33; break;
1080 ++ case 0x17: bit = 10; break;
1081 ++ default: return;
1082 ++ }
1083 ++ /*
1084 ++ * Try to cache the base value so further operations can
1085 ++ * avoid RMW. If that faults, do not enable SSBD.
1086 ++ */
1087 ++ if (!rdmsrl_safe(MSR_AMD64_LS_CFG, &x86_amd_ls_cfg_base)) {
1088 ++ setup_force_cpu_cap(X86_FEATURE_LS_CFG_SSBD);
1089 ++ setup_force_cpu_cap(X86_FEATURE_SSBD);
1090 ++ x86_amd_ls_cfg_ssbd_mask = 1ULL << bit;
1091 ++ }
1092 ++ }
1093 + }
1094 +
1095 + static void early_init_amd(struct cpuinfo_x86 *c)
1096 +@@ -692,6 +713,17 @@ static void init_amd_bd(struct cpuinfo_x86 *c)
1097 + }
1098 + }
1099 +
1100 ++static void init_amd_zn(struct cpuinfo_x86 *c)
1101 ++{
1102 ++ set_cpu_cap(c, X86_FEATURE_ZEN);
1103 ++ /*
1104 ++ * Fix erratum 1076: CPB feature bit not being set in CPUID. It affects
1105 ++ * all up to and including B1.
1106 ++ */
1107 ++ if (c->x86_model <= 1 && c->x86_mask <= 1)
1108 ++ set_cpu_cap(c, X86_FEATURE_CPB);
1109 ++}
1110 ++
1111 + static void init_amd(struct cpuinfo_x86 *c)
1112 + {
1113 + u32 dummy;
1114 +@@ -722,6 +754,7 @@ static void init_amd(struct cpuinfo_x86 *c)
1115 + case 0x10: init_amd_gh(c); break;
1116 + case 0x12: init_amd_ln(c); break;
1117 + case 0x15: init_amd_bd(c); break;
1118 ++ case 0x17: init_amd_zn(c); break;
1119 + }
1120 +
1121 + /* Enable workaround for FXSAVE leak */
1122 +@@ -791,8 +824,9 @@ static void init_amd(struct cpuinfo_x86 *c)
1123 + if (cpu_has(c, X86_FEATURE_3DNOW) || cpu_has(c, X86_FEATURE_LM))
1124 + set_cpu_cap(c, X86_FEATURE_3DNOWPREFETCH);
1125 +
1126 +- /* AMD CPUs don't reset SS attributes on SYSRET */
1127 +- set_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS);
1128 ++ /* AMD CPUs don't reset SS attributes on SYSRET, Xen does. */
1129 ++ if (!cpu_has(c, X86_FEATURE_XENPV))
1130 ++ set_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS);
1131 + }
1132 +
1133 + #ifdef CONFIG_X86_32
1134 +diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
1135 +index 2bbc74f8a4a8..12a8867071f3 100644
1136 +--- a/arch/x86/kernel/cpu/bugs.c
1137 ++++ b/arch/x86/kernel/cpu/bugs.c
1138 +@@ -11,8 +11,10 @@
1139 + #include <linux/utsname.h>
1140 + #include <linux/cpu.h>
1141 + #include <linux/module.h>
1142 ++#include <linux/nospec.h>
1143 ++#include <linux/prctl.h>
1144 +
1145 +-#include <asm/nospec-branch.h>
1146 ++#include <asm/spec-ctrl.h>
1147 + #include <asm/cmdline.h>
1148 + #include <asm/bugs.h>
1149 + #include <asm/processor.h>
1150 +@@ -26,6 +28,27 @@
1151 + #include <asm/intel-family.h>
1152 +
1153 + static void __init spectre_v2_select_mitigation(void);
1154 ++static void __init ssb_select_mitigation(void);
1155 ++
1156 ++/*
1157 ++ * Our boot-time value of the SPEC_CTRL MSR. We read it once so that any
1158 ++ * writes to SPEC_CTRL contain whatever reserved bits have been set.
1159 ++ */
1160 ++u64 x86_spec_ctrl_base;
1161 ++EXPORT_SYMBOL_GPL(x86_spec_ctrl_base);
1162 ++
1163 ++/*
1164 ++ * The vendor and possibly platform specific bits which can be modified in
1165 ++ * x86_spec_ctrl_base.
1166 ++ */
1167 ++static u64 x86_spec_ctrl_mask = SPEC_CTRL_IBRS;
1168 ++
1169 ++/*
1170 ++ * AMD specific MSR info for Speculative Store Bypass control.
1171 ++ * x86_amd_ls_cfg_ssbd_mask is initialized in identify_boot_cpu().
1172 ++ */
1173 ++u64 x86_amd_ls_cfg_base;
1174 ++u64 x86_amd_ls_cfg_ssbd_mask;
1175 +
1176 + void __init check_bugs(void)
1177 + {
1178 +@@ -36,9 +59,27 @@ void __init check_bugs(void)
1179 + print_cpu_info(&boot_cpu_data);
1180 + }
1181 +
1182 ++ /*
1183 ++ * Read the SPEC_CTRL MSR to account for reserved bits which may
1184 ++ * have unknown values. AMD64_LS_CFG MSR is cached in the early AMD
1185 ++ * init code as it is not enumerated and depends on the family.
1186 ++ */
1187 ++ if (boot_cpu_has(X86_FEATURE_MSR_SPEC_CTRL))
1188 ++ rdmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base);
1189 ++
1190 ++ /* Allow STIBP in MSR_SPEC_CTRL if supported */
1191 ++ if (boot_cpu_has(X86_FEATURE_STIBP))
1192 ++ x86_spec_ctrl_mask |= SPEC_CTRL_STIBP;
1193 ++
1194 + /* Select the proper spectre mitigation before patching alternatives */
1195 + spectre_v2_select_mitigation();
1196 +
1197 ++ /*
1198 ++ * Select proper mitigation for any exposure to the Speculative Store
1199 ++ * Bypass vulnerability.
1200 ++ */
1201 ++ ssb_select_mitigation();
1202 ++
1203 + #ifdef CONFIG_X86_32
1204 + /*
1205 + * Check whether we are able to run this kernel safely on SMP.
1206 +@@ -94,6 +135,73 @@ static const char *spectre_v2_strings[] = {
1207 +
1208 + static enum spectre_v2_mitigation spectre_v2_enabled = SPECTRE_V2_NONE;
1209 +
1210 ++void
1211 ++x86_virt_spec_ctrl(u64 guest_spec_ctrl, u64 guest_virt_spec_ctrl, bool setguest)
1212 ++{
1213 ++ u64 msrval, guestval, hostval = x86_spec_ctrl_base;
1214 ++ struct thread_info *ti = current_thread_info();
1215 ++
1216 ++ /* Is MSR_SPEC_CTRL implemented ? */
1217 ++ if (static_cpu_has(X86_FEATURE_MSR_SPEC_CTRL)) {
1218 ++ /*
1219 ++ * Restrict guest_spec_ctrl to supported values. Clear the
1220 ++ * modifiable bits in the host base value and or the
1221 ++ * modifiable bits from the guest value.
1222 ++ */
1223 ++ guestval = hostval & ~x86_spec_ctrl_mask;
1224 ++ guestval |= guest_spec_ctrl & x86_spec_ctrl_mask;
1225 ++
1226 ++ /* SSBD controlled in MSR_SPEC_CTRL */
1227 ++ if (static_cpu_has(X86_FEATURE_SPEC_CTRL_SSBD))
1228 ++ hostval |= ssbd_tif_to_spec_ctrl(ti->flags);
1229 ++
1230 ++ if (hostval != guestval) {
1231 ++ msrval = setguest ? guestval : hostval;
1232 ++ wrmsrl(MSR_IA32_SPEC_CTRL, msrval);
1233 ++ }
1234 ++ }
1235 ++
1236 ++ /*
1237 ++ * If SSBD is not handled in MSR_SPEC_CTRL on AMD, update
1238 ++ * MSR_AMD64_L2_CFG or MSR_VIRT_SPEC_CTRL if supported.
1239 ++ */
1240 ++ if (!static_cpu_has(X86_FEATURE_LS_CFG_SSBD) &&
1241 ++ !static_cpu_has(X86_FEATURE_VIRT_SSBD))
1242 ++ return;
1243 ++
1244 ++ /*
1245 ++ * If the host has SSBD mitigation enabled, force it in the host's
1246 ++ * virtual MSR value. If its not permanently enabled, evaluate
1247 ++ * current's TIF_SSBD thread flag.
1248 ++ */
1249 ++ if (static_cpu_has(X86_FEATURE_SPEC_STORE_BYPASS_DISABLE))
1250 ++ hostval = SPEC_CTRL_SSBD;
1251 ++ else
1252 ++ hostval = ssbd_tif_to_spec_ctrl(ti->flags);
1253 ++
1254 ++ /* Sanitize the guest value */
1255 ++ guestval = guest_virt_spec_ctrl & SPEC_CTRL_SSBD;
1256 ++
1257 ++ if (hostval != guestval) {
1258 ++ unsigned long tif;
1259 ++
1260 ++ tif = setguest ? ssbd_spec_ctrl_to_tif(guestval) :
1261 ++ ssbd_spec_ctrl_to_tif(hostval);
1262 ++
1263 ++ speculative_store_bypass_update(tif);
1264 ++ }
1265 ++}
1266 ++EXPORT_SYMBOL_GPL(x86_virt_spec_ctrl);
1267 ++
1268 ++static void x86_amd_ssb_disable(void)
1269 ++{
1270 ++ u64 msrval = x86_amd_ls_cfg_base | x86_amd_ls_cfg_ssbd_mask;
1271 ++
1272 ++ if (boot_cpu_has(X86_FEATURE_VIRT_SSBD))
1273 ++ wrmsrl(MSR_AMD64_VIRT_SPEC_CTRL, SPEC_CTRL_SSBD);
1274 ++ else if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD))
1275 ++ wrmsrl(MSR_AMD64_LS_CFG, msrval);
1276 ++}
1277 +
1278 + #ifdef RETPOLINE
1279 + static bool spectre_v2_bad_module;
1280 +@@ -162,8 +270,7 @@ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void)
1281 + if (cmdline_find_option_bool(boot_command_line, "nospectre_v2"))
1282 + return SPECTRE_V2_CMD_NONE;
1283 + else {
1284 +- ret = cmdline_find_option(boot_command_line, "spectre_v2", arg,
1285 +- sizeof(arg));
1286 ++ ret = cmdline_find_option(boot_command_line, "spectre_v2", arg, sizeof(arg));
1287 + if (ret < 0)
1288 + return SPECTRE_V2_CMD_AUTO;
1289 +
1290 +@@ -184,8 +291,7 @@ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void)
1291 + cmd == SPECTRE_V2_CMD_RETPOLINE_AMD ||
1292 + cmd == SPECTRE_V2_CMD_RETPOLINE_GENERIC) &&
1293 + !IS_ENABLED(CONFIG_RETPOLINE)) {
1294 +- pr_err("%s selected but not compiled in. Switching to AUTO select\n",
1295 +- mitigation_options[i].option);
1296 ++ pr_err("%s selected but not compiled in. Switching to AUTO select\n", mitigation_options[i].option);
1297 + return SPECTRE_V2_CMD_AUTO;
1298 + }
1299 +
1300 +@@ -255,14 +361,14 @@ static void __init spectre_v2_select_mitigation(void)
1301 + goto retpoline_auto;
1302 + break;
1303 + }
1304 +- pr_err("kernel not compiled with retpoline; no mitigation available!");
1305 ++ pr_err("Spectre mitigation: kernel not compiled with retpoline; no mitigation available!");
1306 + return;
1307 +
1308 + retpoline_auto:
1309 + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) {
1310 + retpoline_amd:
1311 + if (!boot_cpu_has(X86_FEATURE_LFENCE_RDTSC)) {
1312 +- pr_err("LFENCE not serializing. Switching to generic retpoline\n");
1313 ++ pr_err("Spectre mitigation: LFENCE not serializing, switching to generic retpoline\n");
1314 + goto retpoline_generic;
1315 + }
1316 + mode = retp_compiler() ? SPECTRE_V2_RETPOLINE_AMD :
1317 +@@ -280,7 +386,7 @@ retpoline_auto:
1318 + pr_info("%s\n", spectre_v2_strings[mode]);
1319 +
1320 + /*
1321 +- * If neither SMEP or KPTI are available, there is a risk of
1322 ++ * If neither SMEP nor PTI are available, there is a risk of
1323 + * hitting userspace addresses in the RSB after a context switch
1324 + * from a shallow call stack to a deeper one. To prevent this fill
1325 + * the entire RSB, even when using IBRS.
1326 +@@ -294,38 +400,309 @@ retpoline_auto:
1327 + if ((!boot_cpu_has(X86_FEATURE_KAISER) &&
1328 + !boot_cpu_has(X86_FEATURE_SMEP)) || is_skylake_era()) {
1329 + setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
1330 +- pr_info("Filling RSB on context switch\n");
1331 ++ pr_info("Spectre v2 mitigation: Filling RSB on context switch\n");
1332 ++ }
1333 ++
1334 ++ /* Initialize Indirect Branch Prediction Barrier if supported */
1335 ++ if (boot_cpu_has(X86_FEATURE_IBPB)) {
1336 ++ setup_force_cpu_cap(X86_FEATURE_USE_IBPB);
1337 ++ pr_info("Spectre v2 mitigation: Enabling Indirect Branch Prediction Barrier\n");
1338 ++ }
1339 ++
1340 ++ /*
1341 ++ * Retpoline means the kernel is safe because it has no indirect
1342 ++ * branches. But firmware isn't, so use IBRS to protect that.
1343 ++ */
1344 ++ if (boot_cpu_has(X86_FEATURE_IBRS)) {
1345 ++ setup_force_cpu_cap(X86_FEATURE_USE_IBRS_FW);
1346 ++ pr_info("Enabling Restricted Speculation for firmware calls\n");
1347 ++ }
1348 ++}
1349 ++
1350 ++#undef pr_fmt
1351 ++#define pr_fmt(fmt) "Speculative Store Bypass: " fmt
1352 ++
1353 ++static enum ssb_mitigation ssb_mode = SPEC_STORE_BYPASS_NONE;
1354 ++
1355 ++/* The kernel command line selection */
1356 ++enum ssb_mitigation_cmd {
1357 ++ SPEC_STORE_BYPASS_CMD_NONE,
1358 ++ SPEC_STORE_BYPASS_CMD_AUTO,
1359 ++ SPEC_STORE_BYPASS_CMD_ON,
1360 ++ SPEC_STORE_BYPASS_CMD_PRCTL,
1361 ++ SPEC_STORE_BYPASS_CMD_SECCOMP,
1362 ++};
1363 ++
1364 ++static const char *ssb_strings[] = {
1365 ++ [SPEC_STORE_BYPASS_NONE] = "Vulnerable",
1366 ++ [SPEC_STORE_BYPASS_DISABLE] = "Mitigation: Speculative Store Bypass disabled",
1367 ++ [SPEC_STORE_BYPASS_PRCTL] = "Mitigation: Speculative Store Bypass disabled via prctl",
1368 ++ [SPEC_STORE_BYPASS_SECCOMP] = "Mitigation: Speculative Store Bypass disabled via prctl and seccomp",
1369 ++};
1370 ++
1371 ++static const struct {
1372 ++ const char *option;
1373 ++ enum ssb_mitigation_cmd cmd;
1374 ++} ssb_mitigation_options[] = {
1375 ++ { "auto", SPEC_STORE_BYPASS_CMD_AUTO }, /* Platform decides */
1376 ++ { "on", SPEC_STORE_BYPASS_CMD_ON }, /* Disable Speculative Store Bypass */
1377 ++ { "off", SPEC_STORE_BYPASS_CMD_NONE }, /* Don't touch Speculative Store Bypass */
1378 ++ { "prctl", SPEC_STORE_BYPASS_CMD_PRCTL }, /* Disable Speculative Store Bypass via prctl */
1379 ++ { "seccomp", SPEC_STORE_BYPASS_CMD_SECCOMP }, /* Disable Speculative Store Bypass via prctl and seccomp */
1380 ++};
1381 ++
1382 ++static enum ssb_mitigation_cmd __init ssb_parse_cmdline(void)
1383 ++{
1384 ++ enum ssb_mitigation_cmd cmd = SPEC_STORE_BYPASS_CMD_AUTO;
1385 ++ char arg[20];
1386 ++ int ret, i;
1387 ++
1388 ++ if (cmdline_find_option_bool(boot_command_line, "nospec_store_bypass_disable")) {
1389 ++ return SPEC_STORE_BYPASS_CMD_NONE;
1390 ++ } else {
1391 ++ ret = cmdline_find_option(boot_command_line, "spec_store_bypass_disable",
1392 ++ arg, sizeof(arg));
1393 ++ if (ret < 0)
1394 ++ return SPEC_STORE_BYPASS_CMD_AUTO;
1395 ++
1396 ++ for (i = 0; i < ARRAY_SIZE(ssb_mitigation_options); i++) {
1397 ++ if (!match_option(arg, ret, ssb_mitigation_options[i].option))
1398 ++ continue;
1399 ++
1400 ++ cmd = ssb_mitigation_options[i].cmd;
1401 ++ break;
1402 ++ }
1403 ++
1404 ++ if (i >= ARRAY_SIZE(ssb_mitigation_options)) {
1405 ++ pr_err("unknown option (%s). Switching to AUTO select\n", arg);
1406 ++ return SPEC_STORE_BYPASS_CMD_AUTO;
1407 ++ }
1408 ++ }
1409 ++
1410 ++ return cmd;
1411 ++}
1412 ++
1413 ++static enum ssb_mitigation __init __ssb_select_mitigation(void)
1414 ++{
1415 ++ enum ssb_mitigation mode = SPEC_STORE_BYPASS_NONE;
1416 ++ enum ssb_mitigation_cmd cmd;
1417 ++
1418 ++ if (!boot_cpu_has(X86_FEATURE_SSBD))
1419 ++ return mode;
1420 ++
1421 ++ cmd = ssb_parse_cmdline();
1422 ++ if (!boot_cpu_has_bug(X86_BUG_SPEC_STORE_BYPASS) &&
1423 ++ (cmd == SPEC_STORE_BYPASS_CMD_NONE ||
1424 ++ cmd == SPEC_STORE_BYPASS_CMD_AUTO))
1425 ++ return mode;
1426 ++
1427 ++ switch (cmd) {
1428 ++ case SPEC_STORE_BYPASS_CMD_AUTO:
1429 ++ case SPEC_STORE_BYPASS_CMD_SECCOMP:
1430 ++ /*
1431 ++ * Choose prctl+seccomp as the default mode if seccomp is
1432 ++ * enabled.
1433 ++ */
1434 ++ if (IS_ENABLED(CONFIG_SECCOMP))
1435 ++ mode = SPEC_STORE_BYPASS_SECCOMP;
1436 ++ else
1437 ++ mode = SPEC_STORE_BYPASS_PRCTL;
1438 ++ break;
1439 ++ case SPEC_STORE_BYPASS_CMD_ON:
1440 ++ mode = SPEC_STORE_BYPASS_DISABLE;
1441 ++ break;
1442 ++ case SPEC_STORE_BYPASS_CMD_PRCTL:
1443 ++ mode = SPEC_STORE_BYPASS_PRCTL;
1444 ++ break;
1445 ++ case SPEC_STORE_BYPASS_CMD_NONE:
1446 ++ break;
1447 ++ }
1448 ++
1449 ++ /*
1450 ++ * We have three CPU feature flags that are in play here:
1451 ++ * - X86_BUG_SPEC_STORE_BYPASS - CPU is susceptible.
1452 ++ * - X86_FEATURE_SSBD - CPU is able to turn off speculative store bypass
1453 ++ * - X86_FEATURE_SPEC_STORE_BYPASS_DISABLE - engage the mitigation
1454 ++ */
1455 ++ if (mode == SPEC_STORE_BYPASS_DISABLE) {
1456 ++ setup_force_cpu_cap(X86_FEATURE_SPEC_STORE_BYPASS_DISABLE);
1457 ++ /*
1458 ++ * Intel uses the SPEC CTRL MSR Bit(2) for this, while AMD uses
1459 ++ * a completely different MSR and bit dependent on family.
1460 ++ */
1461 ++ switch (boot_cpu_data.x86_vendor) {
1462 ++ case X86_VENDOR_INTEL:
1463 ++ x86_spec_ctrl_base |= SPEC_CTRL_SSBD;
1464 ++ x86_spec_ctrl_mask |= SPEC_CTRL_SSBD;
1465 ++ wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base);
1466 ++ break;
1467 ++ case X86_VENDOR_AMD:
1468 ++ x86_amd_ssb_disable();
1469 ++ break;
1470 ++ }
1471 + }
1472 ++
1473 ++ return mode;
1474 ++}
1475 ++
1476 ++static void ssb_select_mitigation(void)
1477 ++{
1478 ++ ssb_mode = __ssb_select_mitigation();
1479 ++
1480 ++ if (boot_cpu_has_bug(X86_BUG_SPEC_STORE_BYPASS))
1481 ++ pr_info("%s\n", ssb_strings[ssb_mode]);
1482 + }
1483 +
1484 + #undef pr_fmt
1485 ++#define pr_fmt(fmt) "Speculation prctl: " fmt
1486 ++
1487 ++static int ssb_prctl_set(struct task_struct *task, unsigned long ctrl)
1488 ++{
1489 ++ bool update;
1490 ++
1491 ++ if (ssb_mode != SPEC_STORE_BYPASS_PRCTL &&
1492 ++ ssb_mode != SPEC_STORE_BYPASS_SECCOMP)
1493 ++ return -ENXIO;
1494 ++
1495 ++ switch (ctrl) {
1496 ++ case PR_SPEC_ENABLE:
1497 ++ /* If speculation is force disabled, enable is not allowed */
1498 ++ if (task_spec_ssb_force_disable(task))
1499 ++ return -EPERM;
1500 ++ task_clear_spec_ssb_disable(task);
1501 ++ update = test_and_clear_tsk_thread_flag(task, TIF_SSBD);
1502 ++ break;
1503 ++ case PR_SPEC_DISABLE:
1504 ++ task_set_spec_ssb_disable(task);
1505 ++ update = !test_and_set_tsk_thread_flag(task, TIF_SSBD);
1506 ++ break;
1507 ++ case PR_SPEC_FORCE_DISABLE:
1508 ++ task_set_spec_ssb_disable(task);
1509 ++ task_set_spec_ssb_force_disable(task);
1510 ++ update = !test_and_set_tsk_thread_flag(task, TIF_SSBD);
1511 ++ break;
1512 ++ default:
1513 ++ return -ERANGE;
1514 ++ }
1515 ++
1516 ++ /*
1517 ++ * If being set on non-current task, delay setting the CPU
1518 ++ * mitigation until it is next scheduled.
1519 ++ */
1520 ++ if (task == current && update)
1521 ++ speculative_store_bypass_update_current();
1522 ++
1523 ++ return 0;
1524 ++}
1525 ++
1526 ++int arch_prctl_spec_ctrl_set(struct task_struct *task, unsigned long which,
1527 ++ unsigned long ctrl)
1528 ++{
1529 ++ switch (which) {
1530 ++ case PR_SPEC_STORE_BYPASS:
1531 ++ return ssb_prctl_set(task, ctrl);
1532 ++ default:
1533 ++ return -ENODEV;
1534 ++ }
1535 ++}
1536 ++
1537 ++#ifdef CONFIG_SECCOMP
1538 ++void arch_seccomp_spec_mitigate(struct task_struct *task)
1539 ++{
1540 ++ if (ssb_mode == SPEC_STORE_BYPASS_SECCOMP)
1541 ++ ssb_prctl_set(task, PR_SPEC_FORCE_DISABLE);
1542 ++}
1543 ++#endif
1544 ++
1545 ++static int ssb_prctl_get(struct task_struct *task)
1546 ++{
1547 ++ switch (ssb_mode) {
1548 ++ case SPEC_STORE_BYPASS_DISABLE:
1549 ++ return PR_SPEC_DISABLE;
1550 ++ case SPEC_STORE_BYPASS_SECCOMP:
1551 ++ case SPEC_STORE_BYPASS_PRCTL:
1552 ++ if (task_spec_ssb_force_disable(task))
1553 ++ return PR_SPEC_PRCTL | PR_SPEC_FORCE_DISABLE;
1554 ++ if (task_spec_ssb_disable(task))
1555 ++ return PR_SPEC_PRCTL | PR_SPEC_DISABLE;
1556 ++ return PR_SPEC_PRCTL | PR_SPEC_ENABLE;
1557 ++ default:
1558 ++ if (boot_cpu_has_bug(X86_BUG_SPEC_STORE_BYPASS))
1559 ++ return PR_SPEC_ENABLE;
1560 ++ return PR_SPEC_NOT_AFFECTED;
1561 ++ }
1562 ++}
1563 ++
1564 ++int arch_prctl_spec_ctrl_get(struct task_struct *task, unsigned long which)
1565 ++{
1566 ++ switch (which) {
1567 ++ case PR_SPEC_STORE_BYPASS:
1568 ++ return ssb_prctl_get(task);
1569 ++ default:
1570 ++ return -ENODEV;
1571 ++ }
1572 ++}
1573 ++
1574 ++void x86_spec_ctrl_setup_ap(void)
1575 ++{
1576 ++ if (boot_cpu_has(X86_FEATURE_MSR_SPEC_CTRL))
1577 ++ wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base);
1578 ++
1579 ++ if (ssb_mode == SPEC_STORE_BYPASS_DISABLE)
1580 ++ x86_amd_ssb_disable();
1581 ++}
1582 +
1583 + #ifdef CONFIG_SYSFS
1584 +-ssize_t cpu_show_meltdown(struct device *dev,
1585 +- struct device_attribute *attr, char *buf)
1586 ++
1587 ++static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr,
1588 ++ char *buf, unsigned int bug)
1589 + {
1590 +- if (!boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN))
1591 ++ if (!boot_cpu_has_bug(bug))
1592 + return sprintf(buf, "Not affected\n");
1593 +- if (boot_cpu_has(X86_FEATURE_KAISER))
1594 +- return sprintf(buf, "Mitigation: PTI\n");
1595 ++
1596 ++ switch (bug) {
1597 ++ case X86_BUG_CPU_MELTDOWN:
1598 ++ if (boot_cpu_has(X86_FEATURE_KAISER))
1599 ++ return sprintf(buf, "Mitigation: PTI\n");
1600 ++
1601 ++ break;
1602 ++
1603 ++ case X86_BUG_SPECTRE_V1:
1604 ++ return sprintf(buf, "Mitigation: __user pointer sanitization\n");
1605 ++
1606 ++ case X86_BUG_SPECTRE_V2:
1607 ++ return sprintf(buf, "%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled],
1608 ++ boot_cpu_has(X86_FEATURE_USE_IBPB) ? ", IBPB" : "",
1609 ++ boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? ", IBRS_FW" : "",
1610 ++ spectre_v2_module_string());
1611 ++
1612 ++ case X86_BUG_SPEC_STORE_BYPASS:
1613 ++ return sprintf(buf, "%s\n", ssb_strings[ssb_mode]);
1614 ++
1615 ++ default:
1616 ++ break;
1617 ++ }
1618 ++
1619 + return sprintf(buf, "Vulnerable\n");
1620 + }
1621 +
1622 +-ssize_t cpu_show_spectre_v1(struct device *dev,
1623 +- struct device_attribute *attr, char *buf)
1624 ++ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, char *buf)
1625 + {
1626 +- if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V1))
1627 +- return sprintf(buf, "Not affected\n");
1628 +- return sprintf(buf, "Mitigation: __user pointer sanitization\n");
1629 ++ return cpu_show_common(dev, attr, buf, X86_BUG_CPU_MELTDOWN);
1630 + }
1631 +
1632 +-ssize_t cpu_show_spectre_v2(struct device *dev,
1633 +- struct device_attribute *attr, char *buf)
1634 ++ssize_t cpu_show_spectre_v1(struct device *dev, struct device_attribute *attr, char *buf)
1635 + {
1636 +- if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2))
1637 +- return sprintf(buf, "Not affected\n");
1638 ++ return cpu_show_common(dev, attr, buf, X86_BUG_SPECTRE_V1);
1639 ++}
1640 +
1641 +- return sprintf(buf, "%s%s\n", spectre_v2_strings[spectre_v2_enabled],
1642 +- spectre_v2_module_string());
1643 ++ssize_t cpu_show_spectre_v2(struct device *dev, struct device_attribute *attr, char *buf)
1644 ++{
1645 ++ return cpu_show_common(dev, attr, buf, X86_BUG_SPECTRE_V2);
1646 ++}
1647 ++
1648 ++ssize_t cpu_show_spec_store_bypass(struct device *dev, struct device_attribute *attr, char *buf)
1649 ++{
1650 ++ return cpu_show_common(dev, attr, buf, X86_BUG_SPEC_STORE_BYPASS);
1651 + }
1652 + #endif
1653 +diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
1654 +index 736e2843139b..3d21b28f9826 100644
1655 +--- a/arch/x86/kernel/cpu/common.c
1656 ++++ b/arch/x86/kernel/cpu/common.c
1657 +@@ -43,6 +43,8 @@
1658 + #include <asm/pat.h>
1659 + #include <asm/microcode.h>
1660 + #include <asm/microcode_intel.h>
1661 ++#include <asm/intel-family.h>
1662 ++#include <asm/cpu_device_id.h>
1663 +
1664 + #ifdef CONFIG_X86_LOCAL_APIC
1665 + #include <asm/uv/uv.h>
1666 +@@ -674,6 +676,40 @@ static void apply_forced_caps(struct cpuinfo_x86 *c)
1667 + }
1668 + }
1669 +
1670 ++static void init_speculation_control(struct cpuinfo_x86 *c)
1671 ++{
1672 ++ /*
1673 ++ * The Intel SPEC_CTRL CPUID bit implies IBRS and IBPB support,
1674 ++ * and they also have a different bit for STIBP support. Also,
1675 ++ * a hypervisor might have set the individual AMD bits even on
1676 ++ * Intel CPUs, for finer-grained selection of what's available.
1677 ++ */
1678 ++ if (cpu_has(c, X86_FEATURE_SPEC_CTRL)) {
1679 ++ set_cpu_cap(c, X86_FEATURE_IBRS);
1680 ++ set_cpu_cap(c, X86_FEATURE_IBPB);
1681 ++ set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL);
1682 ++ }
1683 ++
1684 ++ if (cpu_has(c, X86_FEATURE_INTEL_STIBP))
1685 ++ set_cpu_cap(c, X86_FEATURE_STIBP);
1686 ++
1687 ++ if (cpu_has(c, X86_FEATURE_SPEC_CTRL_SSBD))
1688 ++ set_cpu_cap(c, X86_FEATURE_SSBD);
1689 ++
1690 ++ if (cpu_has(c, X86_FEATURE_AMD_IBRS)) {
1691 ++ set_cpu_cap(c, X86_FEATURE_IBRS);
1692 ++ set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL);
1693 ++ }
1694 ++
1695 ++ if (cpu_has(c, X86_FEATURE_AMD_IBPB))
1696 ++ set_cpu_cap(c, X86_FEATURE_IBPB);
1697 ++
1698 ++ if (cpu_has(c, X86_FEATURE_AMD_STIBP)) {
1699 ++ set_cpu_cap(c, X86_FEATURE_STIBP);
1700 ++ set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL);
1701 ++ }
1702 ++}
1703 ++
1704 + void get_cpu_cap(struct cpuinfo_x86 *c)
1705 + {
1706 + u32 eax, ebx, ecx, edx;
1707 +@@ -695,6 +731,7 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
1708 + cpuid_count(0x00000007, 0, &eax, &ebx, &ecx, &edx);
1709 + c->x86_capability[CPUID_7_0_EBX] = ebx;
1710 + c->x86_capability[CPUID_7_ECX] = ecx;
1711 ++ c->x86_capability[CPUID_7_EDX] = edx;
1712 + }
1713 +
1714 + /* Extended state features: level 0x0000000d */
1715 +@@ -765,6 +802,14 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
1716 + c->x86_capability[CPUID_8000_000A_EDX] = cpuid_edx(0x8000000a);
1717 +
1718 + init_scattered_cpuid_features(c);
1719 ++ init_speculation_control(c);
1720 ++
1721 ++ /*
1722 ++ * Clear/Set all flags overridden by options, after probe.
1723 ++ * This needs to happen each time we re-probe, which may happen
1724 ++ * several times during CPU initialization.
1725 ++ */
1726 ++ apply_forced_caps(c);
1727 + }
1728 +
1729 + static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c)
1730 +@@ -793,6 +838,75 @@ static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c)
1731 + #endif
1732 + }
1733 +
1734 ++static const __initconst struct x86_cpu_id cpu_no_speculation[] = {
1735 ++ { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_CEDARVIEW, X86_FEATURE_ANY },
1736 ++ { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_CLOVERVIEW, X86_FEATURE_ANY },
1737 ++ { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_LINCROFT, X86_FEATURE_ANY },
1738 ++ { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_PENWELL, X86_FEATURE_ANY },
1739 ++ { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_PINEVIEW, X86_FEATURE_ANY },
1740 ++ { X86_VENDOR_CENTAUR, 5 },
1741 ++ { X86_VENDOR_INTEL, 5 },
1742 ++ { X86_VENDOR_NSC, 5 },
1743 ++ { X86_VENDOR_ANY, 4 },
1744 ++ {}
1745 ++};
1746 ++
1747 ++static const __initconst struct x86_cpu_id cpu_no_meltdown[] = {
1748 ++ { X86_VENDOR_AMD },
1749 ++ {}
1750 ++};
1751 ++
1752 ++static const __initconst struct x86_cpu_id cpu_no_spec_store_bypass[] = {
1753 ++ { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_PINEVIEW },
1754 ++ { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_LINCROFT },
1755 ++ { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_PENWELL },
1756 ++ { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_CLOVERVIEW },
1757 ++ { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_CEDARVIEW },
1758 ++ { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT1 },
1759 ++ { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_AIRMONT },
1760 ++ { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT2 },
1761 ++ { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_MERRIFIELD },
1762 ++ { X86_VENDOR_INTEL, 6, INTEL_FAM6_CORE_YONAH },
1763 ++ { X86_VENDOR_INTEL, 6, INTEL_FAM6_XEON_PHI_KNL },
1764 ++ { X86_VENDOR_INTEL, 6, INTEL_FAM6_XEON_PHI_KNM },
1765 ++ { X86_VENDOR_CENTAUR, 5, },
1766 ++ { X86_VENDOR_INTEL, 5, },
1767 ++ { X86_VENDOR_NSC, 5, },
1768 ++ { X86_VENDOR_AMD, 0x12, },
1769 ++ { X86_VENDOR_AMD, 0x11, },
1770 ++ { X86_VENDOR_AMD, 0x10, },
1771 ++ { X86_VENDOR_AMD, 0xf, },
1772 ++ { X86_VENDOR_ANY, 4, },
1773 ++ {}
1774 ++};
1775 ++
1776 ++static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
1777 ++{
1778 ++ u64 ia32_cap = 0;
1779 ++
1780 ++ if (cpu_has(c, X86_FEATURE_ARCH_CAPABILITIES))
1781 ++ rdmsrl(MSR_IA32_ARCH_CAPABILITIES, ia32_cap);
1782 ++
1783 ++ if (!x86_match_cpu(cpu_no_spec_store_bypass) &&
1784 ++ !(ia32_cap & ARCH_CAP_SSB_NO))
1785 ++ setup_force_cpu_bug(X86_BUG_SPEC_STORE_BYPASS);
1786 ++
1787 ++ if (x86_match_cpu(cpu_no_speculation))
1788 ++ return;
1789 ++
1790 ++ setup_force_cpu_bug(X86_BUG_SPECTRE_V1);
1791 ++ setup_force_cpu_bug(X86_BUG_SPECTRE_V2);
1792 ++
1793 ++ if (x86_match_cpu(cpu_no_meltdown))
1794 ++ return;
1795 ++
1796 ++ /* Rogue Data Cache Load? No! */
1797 ++ if (ia32_cap & ARCH_CAP_RDCL_NO)
1798 ++ return;
1799 ++
1800 ++ setup_force_cpu_bug(X86_BUG_CPU_MELTDOWN);
1801 ++}
1802 ++
1803 + /*
1804 + * Do minimum CPU detection early.
1805 + * Fields really needed: vendor, cpuid_level, family, model, mask,
1806 +@@ -839,11 +953,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
1807 +
1808 + setup_force_cpu_cap(X86_FEATURE_ALWAYS);
1809 +
1810 +- if (c->x86_vendor != X86_VENDOR_AMD)
1811 +- setup_force_cpu_bug(X86_BUG_CPU_MELTDOWN);
1812 +-
1813 +- setup_force_cpu_bug(X86_BUG_SPECTRE_V1);
1814 +- setup_force_cpu_bug(X86_BUG_SPECTRE_V2);
1815 ++ cpu_set_bug_bits(c);
1816 +
1817 + fpu__init_system(c);
1818 +
1819 +@@ -1132,6 +1242,7 @@ void identify_secondary_cpu(struct cpuinfo_x86 *c)
1820 + enable_sep_cpu();
1821 + #endif
1822 + mtrr_ap_init();
1823 ++ x86_spec_ctrl_setup_ap();
1824 + }
1825 +
1826 + struct msr_range {
1827 +diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h
1828 +index 2584265d4745..3b19d82f7932 100644
1829 +--- a/arch/x86/kernel/cpu/cpu.h
1830 ++++ b/arch/x86/kernel/cpu/cpu.h
1831 +@@ -46,4 +46,7 @@ extern const struct cpu_dev *const __x86_cpu_dev_start[],
1832 +
1833 + extern void get_cpu_cap(struct cpuinfo_x86 *c);
1834 + extern void cpu_detect_cache_sizes(struct cpuinfo_x86 *c);
1835 ++
1836 ++extern void x86_spec_ctrl_setup_ap(void);
1837 ++
1838 + #endif /* ARCH_X86_CPU_H */
1839 +diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
1840 +index 9299e3bdfad6..4dce22d3cb06 100644
1841 +--- a/arch/x86/kernel/cpu/intel.c
1842 ++++ b/arch/x86/kernel/cpu/intel.c
1843 +@@ -13,6 +13,7 @@
1844 + #include <asm/msr.h>
1845 + #include <asm/bugs.h>
1846 + #include <asm/cpu.h>
1847 ++#include <asm/intel-family.h>
1848 +
1849 + #ifdef CONFIG_X86_64
1850 + #include <linux/topology.h>
1851 +@@ -25,6 +26,62 @@
1852 + #include <asm/apic.h>
1853 + #endif
1854 +
1855 ++/*
1856 ++ * Early microcode releases for the Spectre v2 mitigation were broken.
1857 ++ * Information taken from;
1858 ++ * - https://newsroom.intel.com/wp-content/uploads/sites/11/2018/03/microcode-update-guidance.pdf
1859 ++ * - https://kb.vmware.com/s/article/52345
1860 ++ * - Microcode revisions observed in the wild
1861 ++ * - Release note from 20180108 microcode release
1862 ++ */
1863 ++struct sku_microcode {
1864 ++ u8 model;
1865 ++ u8 stepping;
1866 ++ u32 microcode;
1867 ++};
1868 ++static const struct sku_microcode spectre_bad_microcodes[] = {
1869 ++ { INTEL_FAM6_KABYLAKE_DESKTOP, 0x0B, 0x80 },
1870 ++ { INTEL_FAM6_KABYLAKE_DESKTOP, 0x0A, 0x80 },
1871 ++ { INTEL_FAM6_KABYLAKE_DESKTOP, 0x09, 0x80 },
1872 ++ { INTEL_FAM6_KABYLAKE_MOBILE, 0x0A, 0x80 },
1873 ++ { INTEL_FAM6_KABYLAKE_MOBILE, 0x09, 0x80 },
1874 ++ { INTEL_FAM6_SKYLAKE_X, 0x03, 0x0100013e },
1875 ++ { INTEL_FAM6_SKYLAKE_X, 0x04, 0x0200003c },
1876 ++ { INTEL_FAM6_BROADWELL_CORE, 0x04, 0x28 },
1877 ++ { INTEL_FAM6_BROADWELL_GT3E, 0x01, 0x1b },
1878 ++ { INTEL_FAM6_BROADWELL_XEON_D, 0x02, 0x14 },
1879 ++ { INTEL_FAM6_BROADWELL_XEON_D, 0x03, 0x07000011 },
1880 ++ { INTEL_FAM6_BROADWELL_X, 0x01, 0x0b000025 },
1881 ++ { INTEL_FAM6_HASWELL_ULT, 0x01, 0x21 },
1882 ++ { INTEL_FAM6_HASWELL_GT3E, 0x01, 0x18 },
1883 ++ { INTEL_FAM6_HASWELL_CORE, 0x03, 0x23 },
1884 ++ { INTEL_FAM6_HASWELL_X, 0x02, 0x3b },
1885 ++ { INTEL_FAM6_HASWELL_X, 0x04, 0x10 },
1886 ++ { INTEL_FAM6_IVYBRIDGE_X, 0x04, 0x42a },
1887 ++ /* Observed in the wild */
1888 ++ { INTEL_FAM6_SANDYBRIDGE_X, 0x06, 0x61b },
1889 ++ { INTEL_FAM6_SANDYBRIDGE_X, 0x07, 0x712 },
1890 ++};
1891 ++
1892 ++static bool bad_spectre_microcode(struct cpuinfo_x86 *c)
1893 ++{
1894 ++ int i;
1895 ++
1896 ++ /*
1897 ++ * We know that the hypervisor lie to us on the microcode version so
1898 ++ * we may as well hope that it is running the correct version.
1899 ++ */
1900 ++ if (cpu_has(c, X86_FEATURE_HYPERVISOR))
1901 ++ return false;
1902 ++
1903 ++ for (i = 0; i < ARRAY_SIZE(spectre_bad_microcodes); i++) {
1904 ++ if (c->x86_model == spectre_bad_microcodes[i].model &&
1905 ++ c->x86_mask == spectre_bad_microcodes[i].stepping)
1906 ++ return (c->microcode <= spectre_bad_microcodes[i].microcode);
1907 ++ }
1908 ++ return false;
1909 ++}
1910 ++
1911 + static void early_init_intel(struct cpuinfo_x86 *c)
1912 + {
1913 + u64 misc_enable;
1914 +@@ -51,6 +108,22 @@ static void early_init_intel(struct cpuinfo_x86 *c)
1915 + rdmsr(MSR_IA32_UCODE_REV, lower_word, c->microcode);
1916 + }
1917 +
1918 ++ /* Now if any of them are set, check the blacklist and clear the lot */
1919 ++ if ((cpu_has(c, X86_FEATURE_SPEC_CTRL) ||
1920 ++ cpu_has(c, X86_FEATURE_INTEL_STIBP) ||
1921 ++ cpu_has(c, X86_FEATURE_IBRS) || cpu_has(c, X86_FEATURE_IBPB) ||
1922 ++ cpu_has(c, X86_FEATURE_STIBP)) && bad_spectre_microcode(c)) {
1923 ++ pr_warn("Intel Spectre v2 broken microcode detected; disabling Speculation Control\n");
1924 ++ setup_clear_cpu_cap(X86_FEATURE_IBRS);
1925 ++ setup_clear_cpu_cap(X86_FEATURE_IBPB);
1926 ++ setup_clear_cpu_cap(X86_FEATURE_STIBP);
1927 ++ setup_clear_cpu_cap(X86_FEATURE_SPEC_CTRL);
1928 ++ setup_clear_cpu_cap(X86_FEATURE_MSR_SPEC_CTRL);
1929 ++ setup_clear_cpu_cap(X86_FEATURE_INTEL_STIBP);
1930 ++ setup_clear_cpu_cap(X86_FEATURE_SSBD);
1931 ++ setup_clear_cpu_cap(X86_FEATURE_SPEC_CTRL_SSBD);
1932 ++ }
1933 ++
1934 + /*
1935 + * Atom erratum AAE44/AAF40/AAG38/AAH41:
1936 + *
1937 +diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
1938 +index ddc9b8125918..7b8c8c838191 100644
1939 +--- a/arch/x86/kernel/cpu/mcheck/mce.c
1940 ++++ b/arch/x86/kernel/cpu/mcheck/mce.c
1941 +@@ -2294,9 +2294,6 @@ static ssize_t store_int_with_restart(struct device *s,
1942 + if (check_interval == old_check_interval)
1943 + return ret;
1944 +
1945 +- if (check_interval < 1)
1946 +- check_interval = 1;
1947 +-
1948 + mutex_lock(&mce_sysfs_mutex);
1949 + mce_restart();
1950 + mutex_unlock(&mce_sysfs_mutex);
1951 +diff --git a/arch/x86/kernel/irqflags.S b/arch/x86/kernel/irqflags.S
1952 +new file mode 100644
1953 +index 000000000000..3817eb748eb4
1954 +--- /dev/null
1955 ++++ b/arch/x86/kernel/irqflags.S
1956 +@@ -0,0 +1,26 @@
1957 ++/* SPDX-License-Identifier: GPL-2.0 */
1958 ++
1959 ++#include <asm/asm.h>
1960 ++#include <asm-generic/export.h>
1961 ++#include <linux/linkage.h>
1962 ++
1963 ++/*
1964 ++ * unsigned long native_save_fl(void)
1965 ++ */
1966 ++ENTRY(native_save_fl)
1967 ++ pushf
1968 ++ pop %_ASM_AX
1969 ++ ret
1970 ++ENDPROC(native_save_fl)
1971 ++EXPORT_SYMBOL(native_save_fl)
1972 ++
1973 ++/*
1974 ++ * void native_restore_fl(unsigned long flags)
1975 ++ * %eax/%rdi: flags
1976 ++ */
1977 ++ENTRY(native_restore_fl)
1978 ++ push %_ASM_ARG1
1979 ++ popf
1980 ++ ret
1981 ++ENDPROC(native_restore_fl)
1982 ++EXPORT_SYMBOL(native_restore_fl)
1983 +diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
1984 +index bc429365b72a..8bc68cfc0d33 100644
1985 +--- a/arch/x86/kernel/ldt.c
1986 ++++ b/arch/x86/kernel/ldt.c
1987 +@@ -119,7 +119,7 @@ static void free_ldt_struct(struct ldt_struct *ldt)
1988 + * we do not have to muck with descriptors here, that is
1989 + * done in switch_mm() as needed.
1990 + */
1991 +-int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
1992 ++int init_new_context_ldt(struct task_struct *tsk, struct mm_struct *mm)
1993 + {
1994 + struct ldt_struct *new_ldt;
1995 + struct mm_struct *old_mm;
1996 +@@ -160,7 +160,7 @@ out_unlock:
1997 + *
1998 + * 64bit: Don't touch the LDT register - we're already in the next thread.
1999 + */
2000 +-void destroy_context(struct mm_struct *mm)
2001 ++void destroy_context_ldt(struct mm_struct *mm)
2002 + {
2003 + free_ldt_struct(mm->context.ldt);
2004 + mm->context.ldt = NULL;
2005 +diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
2006 +index 7c5c5dc90ffa..e18c8798c3a2 100644
2007 +--- a/arch/x86/kernel/process.c
2008 ++++ b/arch/x86/kernel/process.c
2009 +@@ -31,6 +31,7 @@
2010 + #include <asm/tlbflush.h>
2011 + #include <asm/mce.h>
2012 + #include <asm/vm86.h>
2013 ++#include <asm/spec-ctrl.h>
2014 +
2015 + /*
2016 + * per-CPU TSS segments. Threads are completely 'soft' on Linux,
2017 +@@ -130,11 +131,6 @@ void flush_thread(void)
2018 + fpu__clear(&tsk->thread.fpu);
2019 + }
2020 +
2021 +-static void hard_disable_TSC(void)
2022 +-{
2023 +- cr4_set_bits(X86_CR4_TSD);
2024 +-}
2025 +-
2026 + void disable_TSC(void)
2027 + {
2028 + preempt_disable();
2029 +@@ -143,15 +139,10 @@ void disable_TSC(void)
2030 + * Must flip the CPU state synchronously with
2031 + * TIF_NOTSC in the current running context.
2032 + */
2033 +- hard_disable_TSC();
2034 ++ cr4_set_bits(X86_CR4_TSD);
2035 + preempt_enable();
2036 + }
2037 +
2038 +-static void hard_enable_TSC(void)
2039 +-{
2040 +- cr4_clear_bits(X86_CR4_TSD);
2041 +-}
2042 +-
2043 + static void enable_TSC(void)
2044 + {
2045 + preempt_disable();
2046 +@@ -160,7 +151,7 @@ static void enable_TSC(void)
2047 + * Must flip the CPU state synchronously with
2048 + * TIF_NOTSC in the current running context.
2049 + */
2050 +- hard_enable_TSC();
2051 ++ cr4_clear_bits(X86_CR4_TSD);
2052 + preempt_enable();
2053 + }
2054 +
2055 +@@ -188,48 +179,199 @@ int set_tsc_mode(unsigned int val)
2056 + return 0;
2057 + }
2058 +
2059 +-void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
2060 +- struct tss_struct *tss)
2061 ++static inline void switch_to_bitmap(struct tss_struct *tss,
2062 ++ struct thread_struct *prev,
2063 ++ struct thread_struct *next,
2064 ++ unsigned long tifp, unsigned long tifn)
2065 + {
2066 +- struct thread_struct *prev, *next;
2067 +-
2068 +- prev = &prev_p->thread;
2069 +- next = &next_p->thread;
2070 +-
2071 +- if (test_tsk_thread_flag(prev_p, TIF_BLOCKSTEP) ^
2072 +- test_tsk_thread_flag(next_p, TIF_BLOCKSTEP)) {
2073 +- unsigned long debugctl = get_debugctlmsr();
2074 +-
2075 +- debugctl &= ~DEBUGCTLMSR_BTF;
2076 +- if (test_tsk_thread_flag(next_p, TIF_BLOCKSTEP))
2077 +- debugctl |= DEBUGCTLMSR_BTF;
2078 +-
2079 +- update_debugctlmsr(debugctl);
2080 +- }
2081 +-
2082 +- if (test_tsk_thread_flag(prev_p, TIF_NOTSC) ^
2083 +- test_tsk_thread_flag(next_p, TIF_NOTSC)) {
2084 +- /* prev and next are different */
2085 +- if (test_tsk_thread_flag(next_p, TIF_NOTSC))
2086 +- hard_disable_TSC();
2087 +- else
2088 +- hard_enable_TSC();
2089 +- }
2090 +-
2091 +- if (test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
2092 ++ if (tifn & _TIF_IO_BITMAP) {
2093 + /*
2094 + * Copy the relevant range of the IO bitmap.
2095 + * Normally this is 128 bytes or less:
2096 + */
2097 + memcpy(tss->io_bitmap, next->io_bitmap_ptr,
2098 + max(prev->io_bitmap_max, next->io_bitmap_max));
2099 +- } else if (test_tsk_thread_flag(prev_p, TIF_IO_BITMAP)) {
2100 ++ } else if (tifp & _TIF_IO_BITMAP) {
2101 + /*
2102 + * Clear any possible leftover bits:
2103 + */
2104 + memset(tss->io_bitmap, 0xff, prev->io_bitmap_max);
2105 + }
2106 ++}
2107 ++
2108 ++#ifdef CONFIG_SMP
2109 ++
2110 ++struct ssb_state {
2111 ++ struct ssb_state *shared_state;
2112 ++ raw_spinlock_t lock;
2113 ++ unsigned int disable_state;
2114 ++ unsigned long local_state;
2115 ++};
2116 ++
2117 ++#define LSTATE_SSB 0
2118 ++
2119 ++static DEFINE_PER_CPU(struct ssb_state, ssb_state);
2120 ++
2121 ++void speculative_store_bypass_ht_init(void)
2122 ++{
2123 ++ struct ssb_state *st = this_cpu_ptr(&ssb_state);
2124 ++ unsigned int this_cpu = smp_processor_id();
2125 ++ unsigned int cpu;
2126 ++
2127 ++ st->local_state = 0;
2128 ++
2129 ++ /*
2130 ++ * Shared state setup happens once on the first bringup
2131 ++ * of the CPU. It's not destroyed on CPU hotunplug.
2132 ++ */
2133 ++ if (st->shared_state)
2134 ++ return;
2135 ++
2136 ++ raw_spin_lock_init(&st->lock);
2137 ++
2138 ++ /*
2139 ++ * Go over HT siblings and check whether one of them has set up the
2140 ++ * shared state pointer already.
2141 ++ */
2142 ++ for_each_cpu(cpu, topology_sibling_cpumask(this_cpu)) {
2143 ++ if (cpu == this_cpu)
2144 ++ continue;
2145 ++
2146 ++ if (!per_cpu(ssb_state, cpu).shared_state)
2147 ++ continue;
2148 ++
2149 ++ /* Link it to the state of the sibling: */
2150 ++ st->shared_state = per_cpu(ssb_state, cpu).shared_state;
2151 ++ return;
2152 ++ }
2153 ++
2154 ++ /*
2155 ++ * First HT sibling to come up on the core. Link shared state of
2156 ++ * the first HT sibling to itself. The siblings on the same core
2157 ++ * which come up later will see the shared state pointer and link
2158 ++ * themself to the state of this CPU.
2159 ++ */
2160 ++ st->shared_state = st;
2161 ++}
2162 ++
2163 ++/*
2164 ++ * Logic is: First HT sibling enables SSBD for both siblings in the core
2165 ++ * and last sibling to disable it, disables it for the whole core. This how
2166 ++ * MSR_SPEC_CTRL works in "hardware":
2167 ++ *
2168 ++ * CORE_SPEC_CTRL = THREAD0_SPEC_CTRL | THREAD1_SPEC_CTRL
2169 ++ */
2170 ++static __always_inline void amd_set_core_ssb_state(unsigned long tifn)
2171 ++{
2172 ++ struct ssb_state *st = this_cpu_ptr(&ssb_state);
2173 ++ u64 msr = x86_amd_ls_cfg_base;
2174 ++
2175 ++ if (!static_cpu_has(X86_FEATURE_ZEN)) {
2176 ++ msr |= ssbd_tif_to_amd_ls_cfg(tifn);
2177 ++ wrmsrl(MSR_AMD64_LS_CFG, msr);
2178 ++ return;
2179 ++ }
2180 ++
2181 ++ if (tifn & _TIF_SSBD) {
2182 ++ /*
2183 ++ * Since this can race with prctl(), block reentry on the
2184 ++ * same CPU.
2185 ++ */
2186 ++ if (__test_and_set_bit(LSTATE_SSB, &st->local_state))
2187 ++ return;
2188 ++
2189 ++ msr |= x86_amd_ls_cfg_ssbd_mask;
2190 ++
2191 ++ raw_spin_lock(&st->shared_state->lock);
2192 ++ /* First sibling enables SSBD: */
2193 ++ if (!st->shared_state->disable_state)
2194 ++ wrmsrl(MSR_AMD64_LS_CFG, msr);
2195 ++ st->shared_state->disable_state++;
2196 ++ raw_spin_unlock(&st->shared_state->lock);
2197 ++ } else {
2198 ++ if (!__test_and_clear_bit(LSTATE_SSB, &st->local_state))
2199 ++ return;
2200 ++
2201 ++ raw_spin_lock(&st->shared_state->lock);
2202 ++ st->shared_state->disable_state--;
2203 ++ if (!st->shared_state->disable_state)
2204 ++ wrmsrl(MSR_AMD64_LS_CFG, msr);
2205 ++ raw_spin_unlock(&st->shared_state->lock);
2206 ++ }
2207 ++}
2208 ++#else
2209 ++static __always_inline void amd_set_core_ssb_state(unsigned long tifn)
2210 ++{
2211 ++ u64 msr = x86_amd_ls_cfg_base | ssbd_tif_to_amd_ls_cfg(tifn);
2212 ++
2213 ++ wrmsrl(MSR_AMD64_LS_CFG, msr);
2214 ++}
2215 ++#endif
2216 ++
2217 ++static __always_inline void amd_set_ssb_virt_state(unsigned long tifn)
2218 ++{
2219 ++ /*
2220 ++ * SSBD has the same definition in SPEC_CTRL and VIRT_SPEC_CTRL,
2221 ++ * so ssbd_tif_to_spec_ctrl() just works.
2222 ++ */
2223 ++ wrmsrl(MSR_AMD64_VIRT_SPEC_CTRL, ssbd_tif_to_spec_ctrl(tifn));
2224 ++}
2225 ++
2226 ++static __always_inline void intel_set_ssb_state(unsigned long tifn)
2227 ++{
2228 ++ u64 msr = x86_spec_ctrl_base | ssbd_tif_to_spec_ctrl(tifn);
2229 ++
2230 ++ wrmsrl(MSR_IA32_SPEC_CTRL, msr);
2231 ++}
2232 ++
2233 ++static __always_inline void __speculative_store_bypass_update(unsigned long tifn)
2234 ++{
2235 ++ if (static_cpu_has(X86_FEATURE_VIRT_SSBD))
2236 ++ amd_set_ssb_virt_state(tifn);
2237 ++ else if (static_cpu_has(X86_FEATURE_LS_CFG_SSBD))
2238 ++ amd_set_core_ssb_state(tifn);
2239 ++ else
2240 ++ intel_set_ssb_state(tifn);
2241 ++}
2242 ++
2243 ++void speculative_store_bypass_update(unsigned long tif)
2244 ++{
2245 ++ preempt_disable();
2246 ++ __speculative_store_bypass_update(tif);
2247 ++ preempt_enable();
2248 ++}
2249 ++
2250 ++void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
2251 ++ struct tss_struct *tss)
2252 ++{
2253 ++ struct thread_struct *prev, *next;
2254 ++ unsigned long tifp, tifn;
2255 ++
2256 ++ prev = &prev_p->thread;
2257 ++ next = &next_p->thread;
2258 ++
2259 ++ tifn = READ_ONCE(task_thread_info(next_p)->flags);
2260 ++ tifp = READ_ONCE(task_thread_info(prev_p)->flags);
2261 ++ switch_to_bitmap(tss, prev, next, tifp, tifn);
2262 ++
2263 + propagate_user_return_notify(prev_p, next_p);
2264 ++
2265 ++ if ((tifp & _TIF_BLOCKSTEP || tifn & _TIF_BLOCKSTEP) &&
2266 ++ arch_has_block_step()) {
2267 ++ unsigned long debugctl, msk;
2268 ++
2269 ++ rdmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
2270 ++ debugctl &= ~DEBUGCTLMSR_BTF;
2271 ++ msk = tifn & _TIF_BLOCKSTEP;
2272 ++ debugctl |= (msk >> TIF_BLOCKSTEP) << DEBUGCTLMSR_BTF_SHIFT;
2273 ++ wrmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
2274 ++ }
2275 ++
2276 ++ if ((tifp ^ tifn) & _TIF_NOTSC)
2277 ++ cr4_toggle_bits(X86_CR4_TSD);
2278 ++
2279 ++ if ((tifp ^ tifn) & _TIF_SSBD)
2280 ++ __speculative_store_bypass_update(tifn);
2281 + }
2282 +
2283 + /*
2284 +diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
2285 +index 1f7aefc7b0b4..c017f1c71560 100644
2286 +--- a/arch/x86/kernel/smpboot.c
2287 ++++ b/arch/x86/kernel/smpboot.c
2288 +@@ -75,6 +75,7 @@
2289 + #include <asm/i8259.h>
2290 + #include <asm/realmode.h>
2291 + #include <asm/misc.h>
2292 ++#include <asm/spec-ctrl.h>
2293 +
2294 + /* Number of siblings per CPU package */
2295 + int smp_num_siblings = 1;
2296 +@@ -217,6 +218,8 @@ static void notrace start_secondary(void *unused)
2297 + */
2298 + check_tsc_sync_target();
2299 +
2300 ++ speculative_store_bypass_ht_init();
2301 ++
2302 + /*
2303 + * Lock vector_lock and initialize the vectors on this cpu
2304 + * before setting the cpu online. We must set it online with
2305 +@@ -1209,6 +1212,8 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
2306 + set_mtrr_aps_delayed_init();
2307 +
2308 + smp_quirk_init_udelay();
2309 ++
2310 ++ speculative_store_bypass_ht_init();
2311 + }
2312 +
2313 + void arch_enable_nonboot_cpus_begin(void)
2314 +diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
2315 +index 42654375b73f..df7827a981dd 100644
2316 +--- a/arch/x86/kvm/svm.c
2317 ++++ b/arch/x86/kvm/svm.c
2318 +@@ -37,7 +37,7 @@
2319 + #include <asm/desc.h>
2320 + #include <asm/debugreg.h>
2321 + #include <asm/kvm_para.h>
2322 +-#include <asm/nospec-branch.h>
2323 ++#include <asm/spec-ctrl.h>
2324 +
2325 + #include <asm/virtext.h>
2326 + #include "trace.h"
2327 +diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
2328 +index 63c44a9bf6bb..18143886b186 100644
2329 +--- a/arch/x86/kvm/vmx.c
2330 ++++ b/arch/x86/kvm/vmx.c
2331 +@@ -48,7 +48,7 @@
2332 + #include <asm/kexec.h>
2333 + #include <asm/apic.h>
2334 + #include <asm/irq_remapping.h>
2335 +-#include <asm/nospec-branch.h>
2336 ++#include <asm/spec-ctrl.h>
2337 +
2338 + #include "trace.h"
2339 + #include "pmu.h"
2340 +diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
2341 +index 7cad01af6dcd..6d683bbb3502 100644
2342 +--- a/arch/x86/mm/tlb.c
2343 ++++ b/arch/x86/mm/tlb.c
2344 +@@ -10,6 +10,7 @@
2345 +
2346 + #include <asm/tlbflush.h>
2347 + #include <asm/mmu_context.h>
2348 ++#include <asm/nospec-branch.h>
2349 + #include <asm/cache.h>
2350 + #include <asm/apic.h>
2351 + #include <asm/uv/uv.h>
2352 +@@ -29,6 +30,8 @@
2353 + * Implement flush IPI by CALL_FUNCTION_VECTOR, Alex Shi
2354 + */
2355 +
2356 ++atomic64_t last_mm_ctx_id = ATOMIC64_INIT(1);
2357 ++
2358 + struct flush_tlb_info {
2359 + struct mm_struct *flush_mm;
2360 + unsigned long flush_start;
2361 +@@ -104,6 +107,36 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
2362 + unsigned cpu = smp_processor_id();
2363 +
2364 + if (likely(prev != next)) {
2365 ++ u64 last_ctx_id = this_cpu_read(cpu_tlbstate.last_ctx_id);
2366 ++
2367 ++ /*
2368 ++ * Avoid user/user BTB poisoning by flushing the branch
2369 ++ * predictor when switching between processes. This stops
2370 ++ * one process from doing Spectre-v2 attacks on another.
2371 ++ *
2372 ++ * As an optimization, flush indirect branches only when
2373 ++ * switching into processes that disable dumping. This
2374 ++ * protects high value processes like gpg, without having
2375 ++ * too high performance overhead. IBPB is *expensive*!
2376 ++ *
2377 ++ * This will not flush branches when switching into kernel
2378 ++ * threads. It will also not flush if we switch to idle
2379 ++ * thread and back to the same process. It will flush if we
2380 ++ * switch to a different non-dumpable process.
2381 ++ */
2382 ++ if (tsk && tsk->mm &&
2383 ++ tsk->mm->context.ctx_id != last_ctx_id &&
2384 ++ get_dumpable(tsk->mm) != SUID_DUMP_USER)
2385 ++ indirect_branch_prediction_barrier();
2386 ++
2387 ++ /*
2388 ++ * Record last user mm's context id, so we can avoid
2389 ++ * flushing branch buffer with IBPB if we switch back
2390 ++ * to the same user.
2391 ++ */
2392 ++ if (next != &init_mm)
2393 ++ this_cpu_write(cpu_tlbstate.last_ctx_id, next->context.ctx_id);
2394 ++
2395 + this_cpu_write(cpu_tlbstate.state, TLBSTATE_OK);
2396 + this_cpu_write(cpu_tlbstate.active_mm, next);
2397 + cpumask_set_cpu(cpu, mm_cpumask(next));
2398 +diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
2399 +index a0ac0f9c307f..f5a8cd96bae4 100644
2400 +--- a/arch/x86/platform/efi/efi_64.c
2401 ++++ b/arch/x86/platform/efi/efi_64.c
2402 +@@ -40,6 +40,7 @@
2403 + #include <asm/fixmap.h>
2404 + #include <asm/realmode.h>
2405 + #include <asm/time.h>
2406 ++#include <asm/nospec-branch.h>
2407 +
2408 + /*
2409 + * We allocate runtime services regions bottom-up, starting from -4G, i.e.
2410 +@@ -347,6 +348,7 @@ extern efi_status_t efi64_thunk(u32, ...);
2411 + \
2412 + efi_sync_low_kernel_mappings(); \
2413 + local_irq_save(flags); \
2414 ++ firmware_restrict_branch_speculation_start(); \
2415 + \
2416 + efi_scratch.prev_cr3 = read_cr3(); \
2417 + write_cr3((unsigned long)efi_scratch.efi_pgt); \
2418 +@@ -357,6 +359,7 @@ extern efi_status_t efi64_thunk(u32, ...);
2419 + \
2420 + write_cr3(efi_scratch.prev_cr3); \
2421 + __flush_tlb_all(); \
2422 ++ firmware_restrict_branch_speculation_end(); \
2423 + local_irq_restore(flags); \
2424 + \
2425 + __s; \
2426 +diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
2427 +index cbef64b508e1..82fd84d5e1aa 100644
2428 +--- a/arch/x86/xen/enlighten.c
2429 ++++ b/arch/x86/xen/enlighten.c
2430 +@@ -460,6 +460,12 @@ static void __init xen_init_cpuid_mask(void)
2431 + cpuid_leaf1_ecx_set_mask = (1 << (X86_FEATURE_MWAIT % 32));
2432 + }
2433 +
2434 ++static void __init xen_init_capabilities(void)
2435 ++{
2436 ++ if (xen_pv_domain())
2437 ++ setup_force_cpu_cap(X86_FEATURE_XENPV);
2438 ++}
2439 ++
2440 + static void xen_set_debugreg(int reg, unsigned long val)
2441 + {
2442 + HYPERVISOR_set_debugreg(reg, val);
2443 +@@ -1587,6 +1593,7 @@ asmlinkage __visible void __init xen_start_kernel(void)
2444 +
2445 + xen_init_irq_ops();
2446 + xen_init_cpuid_mask();
2447 ++ xen_init_capabilities();
2448 +
2449 + #ifdef CONFIG_X86_LOCAL_APIC
2450 + /*
2451 +@@ -1883,14 +1890,6 @@ bool xen_hvm_need_lapic(void)
2452 + }
2453 + EXPORT_SYMBOL_GPL(xen_hvm_need_lapic);
2454 +
2455 +-static void xen_set_cpu_features(struct cpuinfo_x86 *c)
2456 +-{
2457 +- if (xen_pv_domain()) {
2458 +- clear_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS);
2459 +- set_cpu_cap(c, X86_FEATURE_XENPV);
2460 +- }
2461 +-}
2462 +-
2463 + const struct hypervisor_x86 x86_hyper_xen = {
2464 + .name = "Xen",
2465 + .detect = xen_platform,
2466 +@@ -1898,7 +1897,6 @@ const struct hypervisor_x86 x86_hyper_xen = {
2467 + .init_platform = xen_hvm_guest_init,
2468 + #endif
2469 + .x2apic_available = xen_x2apic_para_available,
2470 +- .set_cpu_features = xen_set_cpu_features,
2471 + };
2472 + EXPORT_SYMBOL(x86_hyper_xen);
2473 +
2474 +diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
2475 +index 3f4ebf0261f2..29e50d1229bc 100644
2476 +--- a/arch/x86/xen/smp.c
2477 ++++ b/arch/x86/xen/smp.c
2478 +@@ -28,6 +28,7 @@
2479 + #include <xen/interface/vcpu.h>
2480 + #include <xen/interface/xenpmu.h>
2481 +
2482 ++#include <asm/spec-ctrl.h>
2483 + #include <asm/xen/interface.h>
2484 + #include <asm/xen/hypercall.h>
2485 +
2486 +@@ -87,6 +88,8 @@ static void cpu_bringup(void)
2487 + cpu_data(cpu).x86_max_cores = 1;
2488 + set_cpu_sibling_map(cpu);
2489 +
2490 ++ speculative_store_bypass_ht_init();
2491 ++
2492 + xen_setup_cpu_clockevents();
2493 +
2494 + notify_cpu_starting(cpu);
2495 +@@ -357,6 +360,8 @@ static void __init xen_smp_prepare_cpus(unsigned int max_cpus)
2496 + }
2497 + set_cpu_sibling_map(0);
2498 +
2499 ++ speculative_store_bypass_ht_init();
2500 ++
2501 + xen_pmu_init(0);
2502 +
2503 + if (xen_smp_intr_init(0))
2504 +diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
2505 +index 7f664c416faf..4ecd0de08557 100644
2506 +--- a/arch/x86/xen/suspend.c
2507 ++++ b/arch/x86/xen/suspend.c
2508 +@@ -1,11 +1,14 @@
2509 + #include <linux/types.h>
2510 + #include <linux/tick.h>
2511 ++#include <linux/percpu-defs.h>
2512 +
2513 + #include <xen/xen.h>
2514 + #include <xen/interface/xen.h>
2515 + #include <xen/grant_table.h>
2516 + #include <xen/events.h>
2517 +
2518 ++#include <asm/cpufeatures.h>
2519 ++#include <asm/msr-index.h>
2520 + #include <asm/xen/hypercall.h>
2521 + #include <asm/xen/page.h>
2522 + #include <asm/fixmap.h>
2523 +@@ -68,6 +71,8 @@ static void xen_pv_post_suspend(int suspend_cancelled)
2524 + xen_mm_unpin_all();
2525 + }
2526 +
2527 ++static DEFINE_PER_CPU(u64, spec_ctrl);
2528 ++
2529 + void xen_arch_pre_suspend(void)
2530 + {
2531 + if (xen_pv_domain())
2532 +@@ -84,6 +89,9 @@ void xen_arch_post_suspend(int cancelled)
2533 +
2534 + static void xen_vcpu_notify_restore(void *data)
2535 + {
2536 ++ if (xen_pv_domain() && boot_cpu_has(X86_FEATURE_SPEC_CTRL))
2537 ++ wrmsrl(MSR_IA32_SPEC_CTRL, this_cpu_read(spec_ctrl));
2538 ++
2539 + /* Boot processor notified via generic timekeeping_resume() */
2540 + if (smp_processor_id() == 0)
2541 + return;
2542 +@@ -93,7 +101,15 @@ static void xen_vcpu_notify_restore(void *data)
2543 +
2544 + static void xen_vcpu_notify_suspend(void *data)
2545 + {
2546 ++ u64 tmp;
2547 ++
2548 + tick_suspend_local();
2549 ++
2550 ++ if (xen_pv_domain() && boot_cpu_has(X86_FEATURE_SPEC_CTRL)) {
2551 ++ rdmsrl(MSR_IA32_SPEC_CTRL, tmp);
2552 ++ this_cpu_write(spec_ctrl, tmp);
2553 ++ wrmsrl(MSR_IA32_SPEC_CTRL, 0);
2554 ++ }
2555 + }
2556 +
2557 + void xen_arch_resume(void)
2558 +diff --git a/block/blk-core.c b/block/blk-core.c
2559 +index f5f1a55703ae..50d77c90070d 100644
2560 +--- a/block/blk-core.c
2561 ++++ b/block/blk-core.c
2562 +@@ -651,21 +651,17 @@ EXPORT_SYMBOL(blk_alloc_queue);
2563 + int blk_queue_enter(struct request_queue *q, gfp_t gfp)
2564 + {
2565 + while (true) {
2566 +- int ret;
2567 +-
2568 + if (percpu_ref_tryget_live(&q->q_usage_counter))
2569 + return 0;
2570 +
2571 + if (!gfpflags_allow_blocking(gfp))
2572 + return -EBUSY;
2573 +
2574 +- ret = wait_event_interruptible(q->mq_freeze_wq,
2575 +- !atomic_read(&q->mq_freeze_depth) ||
2576 +- blk_queue_dying(q));
2577 ++ wait_event(q->mq_freeze_wq,
2578 ++ !atomic_read(&q->mq_freeze_depth) ||
2579 ++ blk_queue_dying(q));
2580 + if (blk_queue_dying(q))
2581 + return -ENODEV;
2582 +- if (ret)
2583 +- return ret;
2584 + }
2585 + }
2586 +
2587 +diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
2588 +index 3db71afbba93..143edea1076f 100644
2589 +--- a/drivers/base/cpu.c
2590 ++++ b/drivers/base/cpu.c
2591 +@@ -518,14 +518,22 @@ ssize_t __weak cpu_show_spectre_v2(struct device *dev,
2592 + return sprintf(buf, "Not affected\n");
2593 + }
2594 +
2595 ++ssize_t __weak cpu_show_spec_store_bypass(struct device *dev,
2596 ++ struct device_attribute *attr, char *buf)
2597 ++{
2598 ++ return sprintf(buf, "Not affected\n");
2599 ++}
2600 ++
2601 + static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL);
2602 + static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL);
2603 + static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL);
2604 ++static DEVICE_ATTR(spec_store_bypass, 0444, cpu_show_spec_store_bypass, NULL);
2605 +
2606 + static struct attribute *cpu_root_vulnerabilities_attrs[] = {
2607 + &dev_attr_meltdown.attr,
2608 + &dev_attr_spectre_v1.attr,
2609 + &dev_attr_spectre_v2.attr,
2610 ++ &dev_attr_spec_store_bypass.attr,
2611 + NULL
2612 + };
2613 +
2614 +diff --git a/drivers/clk/tegra/clk-tegra30.c b/drivers/clk/tegra/clk-tegra30.c
2615 +index 8c41c6fcb9ee..acf83569f86f 100644
2616 +--- a/drivers/clk/tegra/clk-tegra30.c
2617 ++++ b/drivers/clk/tegra/clk-tegra30.c
2618 +@@ -333,11 +333,11 @@ static struct pdiv_map pllu_p[] = {
2619 + };
2620 +
2621 + static struct tegra_clk_pll_freq_table pll_u_freq_table[] = {
2622 +- { 12000000, 480000000, 960, 12, 0, 12},
2623 +- { 13000000, 480000000, 960, 13, 0, 12},
2624 +- { 16800000, 480000000, 400, 7, 0, 5},
2625 +- { 19200000, 480000000, 200, 4, 0, 3},
2626 +- { 26000000, 480000000, 960, 26, 0, 12},
2627 ++ { 12000000, 480000000, 960, 12, 2, 12 },
2628 ++ { 13000000, 480000000, 960, 13, 2, 12 },
2629 ++ { 16800000, 480000000, 400, 7, 2, 5 },
2630 ++ { 19200000, 480000000, 200, 4, 2, 3 },
2631 ++ { 26000000, 480000000, 960, 26, 2, 12 },
2632 + { 0, 0, 0, 0, 0, 0 },
2633 + };
2634 +
2635 +@@ -1372,6 +1372,7 @@ static struct tegra_clk_init_table init_table[] __initdata = {
2636 + {TEGRA30_CLK_GR2D, TEGRA30_CLK_PLL_C, 300000000, 0},
2637 + {TEGRA30_CLK_GR3D, TEGRA30_CLK_PLL_C, 300000000, 0},
2638 + {TEGRA30_CLK_GR3D2, TEGRA30_CLK_PLL_C, 300000000, 0},
2639 ++ { TEGRA30_CLK_PLL_U, TEGRA30_CLK_CLK_MAX, 480000000, 0 },
2640 + {TEGRA30_CLK_CLK_MAX, TEGRA30_CLK_CLK_MAX, 0, 0}, /* This MUST be the last entry. */
2641 + };
2642 +
2643 +diff --git a/drivers/mtd/ubi/attach.c b/drivers/mtd/ubi/attach.c
2644 +index c1aaf0336cf2..5cde3ad1665e 100644
2645 +--- a/drivers/mtd/ubi/attach.c
2646 ++++ b/drivers/mtd/ubi/attach.c
2647 +@@ -174,6 +174,40 @@ static int add_corrupted(struct ubi_attach_info *ai, int pnum, int ec)
2648 + return 0;
2649 + }
2650 +
2651 ++/**
2652 ++ * add_fastmap - add a Fastmap related physical eraseblock.
2653 ++ * @ai: attaching information
2654 ++ * @pnum: physical eraseblock number the VID header came from
2655 ++ * @vid_hdr: the volume identifier header
2656 ++ * @ec: erase counter of the physical eraseblock
2657 ++ *
2658 ++ * This function allocates a 'struct ubi_ainf_peb' object for a Fastamp
2659 ++ * physical eraseblock @pnum and adds it to the 'fastmap' list.
2660 ++ * Such blocks can be Fastmap super and data blocks from both the most
2661 ++ * recent Fastmap we're attaching from or from old Fastmaps which will
2662 ++ * be erased.
2663 ++ */
2664 ++static int add_fastmap(struct ubi_attach_info *ai, int pnum,
2665 ++ struct ubi_vid_hdr *vid_hdr, int ec)
2666 ++{
2667 ++ struct ubi_ainf_peb *aeb;
2668 ++
2669 ++ aeb = kmem_cache_alloc(ai->aeb_slab_cache, GFP_KERNEL);
2670 ++ if (!aeb)
2671 ++ return -ENOMEM;
2672 ++
2673 ++ aeb->pnum = pnum;
2674 ++ aeb->vol_id = be32_to_cpu(vidh->vol_id);
2675 ++ aeb->sqnum = be64_to_cpu(vidh->sqnum);
2676 ++ aeb->ec = ec;
2677 ++ list_add(&aeb->u.list, &ai->fastmap);
2678 ++
2679 ++ dbg_bld("add to fastmap list: PEB %d, vol_id %d, sqnum: %llu", pnum,
2680 ++ aeb->vol_id, aeb->sqnum);
2681 ++
2682 ++ return 0;
2683 ++}
2684 ++
2685 + /**
2686 + * validate_vid_hdr - check volume identifier header.
2687 + * @ubi: UBI device description object
2688 +@@ -803,13 +837,26 @@ out_unlock:
2689 + return err;
2690 + }
2691 +
2692 ++static bool vol_ignored(int vol_id)
2693 ++{
2694 ++ switch (vol_id) {
2695 ++ case UBI_LAYOUT_VOLUME_ID:
2696 ++ return true;
2697 ++ }
2698 ++
2699 ++#ifdef CONFIG_MTD_UBI_FASTMAP
2700 ++ return ubi_is_fm_vol(vol_id);
2701 ++#else
2702 ++ return false;
2703 ++#endif
2704 ++}
2705 ++
2706 + /**
2707 + * scan_peb - scan and process UBI headers of a PEB.
2708 + * @ubi: UBI device description object
2709 + * @ai: attaching information
2710 + * @pnum: the physical eraseblock number
2711 +- * @vid: The volume ID of the found volume will be stored in this pointer
2712 +- * @sqnum: The sqnum of the found volume will be stored in this pointer
2713 ++ * @fast: true if we're scanning for a Fastmap
2714 + *
2715 + * This function reads UBI headers of PEB @pnum, checks them, and adds
2716 + * information about this PEB to the corresponding list or RB-tree in the
2717 +@@ -817,9 +864,9 @@ out_unlock:
2718 + * successfully handled and a negative error code in case of failure.
2719 + */
2720 + static int scan_peb(struct ubi_device *ubi, struct ubi_attach_info *ai,
2721 +- int pnum, int *vid, unsigned long long *sqnum)
2722 ++ int pnum, bool fast)
2723 + {
2724 +- long long uninitialized_var(ec);
2725 ++ long long ec;
2726 + int err, bitflips = 0, vol_id = -1, ec_err = 0;
2727 +
2728 + dbg_bld("scan PEB %d", pnum);
2729 +@@ -935,6 +982,20 @@ static int scan_peb(struct ubi_device *ubi, struct ubi_attach_info *ai,
2730 + */
2731 + ai->maybe_bad_peb_count += 1;
2732 + case UBI_IO_BAD_HDR:
2733 ++ /*
2734 ++ * If we're facing a bad VID header we have to drop *all*
2735 ++ * Fastmap data structures we find. The most recent Fastmap
2736 ++ * could be bad and therefore there is a chance that we attach
2737 ++ * from an old one. On a fine MTD stack a PEB must not render
2738 ++ * bad all of a sudden, but the reality is different.
2739 ++ * So, let's be paranoid and help finding the root cause by
2740 ++ * falling back to scanning mode instead of attaching with a
2741 ++ * bad EBA table and cause data corruption which is hard to
2742 ++ * analyze.
2743 ++ */
2744 ++ if (fast)
2745 ++ ai->force_full_scan = 1;
2746 ++
2747 + if (ec_err)
2748 + /*
2749 + * Both headers are corrupted. There is a possibility
2750 +@@ -991,21 +1052,15 @@ static int scan_peb(struct ubi_device *ubi, struct ubi_attach_info *ai,
2751 + }
2752 +
2753 + vol_id = be32_to_cpu(vidh->vol_id);
2754 +- if (vid)
2755 +- *vid = vol_id;
2756 +- if (sqnum)
2757 +- *sqnum = be64_to_cpu(vidh->sqnum);
2758 +- if (vol_id > UBI_MAX_VOLUMES && vol_id != UBI_LAYOUT_VOLUME_ID) {
2759 ++ if (vol_id > UBI_MAX_VOLUMES && !vol_ignored(vol_id)) {
2760 + int lnum = be32_to_cpu(vidh->lnum);
2761 +
2762 + /* Unsupported internal volume */
2763 + switch (vidh->compat) {
2764 + case UBI_COMPAT_DELETE:
2765 +- if (vol_id != UBI_FM_SB_VOLUME_ID
2766 +- && vol_id != UBI_FM_DATA_VOLUME_ID) {
2767 +- ubi_msg(ubi, "\"delete\" compatible internal volume %d:%d found, will remove it",
2768 +- vol_id, lnum);
2769 +- }
2770 ++ ubi_msg(ubi, "\"delete\" compatible internal volume %d:%d found, will remove it",
2771 ++ vol_id, lnum);
2772 ++
2773 + err = add_to_list(ai, pnum, vol_id, lnum,
2774 + ec, 1, &ai->erase);
2775 + if (err)
2776 +@@ -1037,7 +1092,12 @@ static int scan_peb(struct ubi_device *ubi, struct ubi_attach_info *ai,
2777 + if (ec_err)
2778 + ubi_warn(ubi, "valid VID header but corrupted EC header at PEB %d",
2779 + pnum);
2780 +- err = ubi_add_to_av(ubi, ai, pnum, ec, vidh, bitflips);
2781 ++
2782 ++ if (ubi_is_fm_vol(vol_id))
2783 ++ err = add_fastmap(ai, pnum, vidh, ec);
2784 ++ else
2785 ++ err = ubi_add_to_av(ubi, ai, pnum, ec, vidh, bitflips);
2786 ++
2787 + if (err)
2788 + return err;
2789 +
2790 +@@ -1186,6 +1246,10 @@ static void destroy_ai(struct ubi_attach_info *ai)
2791 + list_del(&aeb->u.list);
2792 + kmem_cache_free(ai->aeb_slab_cache, aeb);
2793 + }
2794 ++ list_for_each_entry_safe(aeb, aeb_tmp, &ai->fastmap, u.list) {
2795 ++ list_del(&aeb->u.list);
2796 ++ kmem_cache_free(ai->aeb_slab_cache, aeb);
2797 ++ }
2798 +
2799 + /* Destroy the volume RB-tree */
2800 + rb = ai->volumes.rb_node;
2801 +@@ -1245,7 +1309,7 @@ static int scan_all(struct ubi_device *ubi, struct ubi_attach_info *ai,
2802 + cond_resched();
2803 +
2804 + dbg_gen("process PEB %d", pnum);
2805 +- err = scan_peb(ubi, ai, pnum, NULL, NULL);
2806 ++ err = scan_peb(ubi, ai, pnum, false);
2807 + if (err < 0)
2808 + goto out_vidh;
2809 + }
2810 +@@ -1311,6 +1375,7 @@ static struct ubi_attach_info *alloc_ai(void)
2811 + INIT_LIST_HEAD(&ai->free);
2812 + INIT_LIST_HEAD(&ai->erase);
2813 + INIT_LIST_HEAD(&ai->alien);
2814 ++ INIT_LIST_HEAD(&ai->fastmap);
2815 + ai->volumes = RB_ROOT;
2816 + ai->aeb_slab_cache = kmem_cache_create("ubi_aeb_slab_cache",
2817 + sizeof(struct ubi_ainf_peb),
2818 +@@ -1337,52 +1402,58 @@ static struct ubi_attach_info *alloc_ai(void)
2819 + */
2820 + static int scan_fast(struct ubi_device *ubi, struct ubi_attach_info **ai)
2821 + {
2822 +- int err, pnum, fm_anchor = -1;
2823 +- unsigned long long max_sqnum = 0;
2824 ++ int err, pnum;
2825 ++ struct ubi_attach_info *scan_ai;
2826 +
2827 + err = -ENOMEM;
2828 +
2829 ++ scan_ai = alloc_ai();
2830 ++ if (!scan_ai)
2831 ++ goto out;
2832 ++
2833 + ech = kzalloc(ubi->ec_hdr_alsize, GFP_KERNEL);
2834 + if (!ech)
2835 +- goto out;
2836 ++ goto out_ai;
2837 +
2838 + vidh = ubi_zalloc_vid_hdr(ubi, GFP_KERNEL);
2839 + if (!vidh)
2840 + goto out_ech;
2841 +
2842 + for (pnum = 0; pnum < UBI_FM_MAX_START; pnum++) {
2843 +- int vol_id = -1;
2844 +- unsigned long long sqnum = -1;
2845 + cond_resched();
2846 +
2847 + dbg_gen("process PEB %d", pnum);
2848 +- err = scan_peb(ubi, *ai, pnum, &vol_id, &sqnum);
2849 ++ err = scan_peb(ubi, scan_ai, pnum, true);
2850 + if (err < 0)
2851 + goto out_vidh;
2852 +-
2853 +- if (vol_id == UBI_FM_SB_VOLUME_ID && sqnum > max_sqnum) {
2854 +- max_sqnum = sqnum;
2855 +- fm_anchor = pnum;
2856 +- }
2857 + }
2858 +
2859 + ubi_free_vid_hdr(ubi, vidh);
2860 + kfree(ech);
2861 +
2862 +- if (fm_anchor < 0)
2863 +- return UBI_NO_FASTMAP;
2864 ++ if (scan_ai->force_full_scan)
2865 ++ err = UBI_NO_FASTMAP;
2866 ++ else
2867 ++ err = ubi_scan_fastmap(ubi, *ai, scan_ai);
2868 +
2869 +- destroy_ai(*ai);
2870 +- *ai = alloc_ai();
2871 +- if (!*ai)
2872 +- return -ENOMEM;
2873 ++ if (err) {
2874 ++ /*
2875 ++ * Didn't attach via fastmap, do a full scan but reuse what
2876 ++ * we've aready scanned.
2877 ++ */
2878 ++ destroy_ai(*ai);
2879 ++ *ai = scan_ai;
2880 ++ } else
2881 ++ destroy_ai(scan_ai);
2882 +
2883 +- return ubi_scan_fastmap(ubi, *ai, fm_anchor);
2884 ++ return err;
2885 +
2886 + out_vidh:
2887 + ubi_free_vid_hdr(ubi, vidh);
2888 + out_ech:
2889 + kfree(ech);
2890 ++out_ai:
2891 ++ destroy_ai(scan_ai);
2892 + out:
2893 + return err;
2894 + }
2895 +diff --git a/drivers/mtd/ubi/eba.c b/drivers/mtd/ubi/eba.c
2896 +index c4a25c858c07..03cf0553ec1b 100644
2897 +--- a/drivers/mtd/ubi/eba.c
2898 ++++ b/drivers/mtd/ubi/eba.c
2899 +@@ -1178,6 +1178,8 @@ int ubi_eba_copy_leb(struct ubi_device *ubi, int from, int to,
2900 + struct ubi_volume *vol;
2901 + uint32_t crc;
2902 +
2903 ++ ubi_assert(rwsem_is_locked(&ubi->fm_eba_sem));
2904 ++
2905 + vol_id = be32_to_cpu(vid_hdr->vol_id);
2906 + lnum = be32_to_cpu(vid_hdr->lnum);
2907 +
2908 +@@ -1346,9 +1348,7 @@ int ubi_eba_copy_leb(struct ubi_device *ubi, int from, int to,
2909 + }
2910 +
2911 + ubi_assert(vol->eba_tbl[lnum] == from);
2912 +- down_read(&ubi->fm_eba_sem);
2913 + vol->eba_tbl[lnum] = to;
2914 +- up_read(&ubi->fm_eba_sem);
2915 +
2916 + out_unlock_buf:
2917 + mutex_unlock(&ubi->buf_mutex);
2918 +diff --git a/drivers/mtd/ubi/fastmap-wl.c b/drivers/mtd/ubi/fastmap-wl.c
2919 +index ed62f1efe6eb..69dd21679a30 100644
2920 +--- a/drivers/mtd/ubi/fastmap-wl.c
2921 ++++ b/drivers/mtd/ubi/fastmap-wl.c
2922 +@@ -262,6 +262,8 @@ static struct ubi_wl_entry *get_peb_for_wl(struct ubi_device *ubi)
2923 + struct ubi_fm_pool *pool = &ubi->fm_wl_pool;
2924 + int pnum;
2925 +
2926 ++ ubi_assert(rwsem_is_locked(&ubi->fm_eba_sem));
2927 ++
2928 + if (pool->used == pool->size) {
2929 + /* We cannot update the fastmap here because this
2930 + * function is called in atomic context.
2931 +@@ -303,7 +305,7 @@ int ubi_ensure_anchor_pebs(struct ubi_device *ubi)
2932 +
2933 + wrk->anchor = 1;
2934 + wrk->func = &wear_leveling_worker;
2935 +- schedule_ubi_work(ubi, wrk);
2936 ++ __schedule_ubi_work(ubi, wrk);
2937 + return 0;
2938 + }
2939 +
2940 +@@ -344,7 +346,7 @@ int ubi_wl_put_fm_peb(struct ubi_device *ubi, struct ubi_wl_entry *fm_e,
2941 + spin_unlock(&ubi->wl_lock);
2942 +
2943 + vol_id = lnum ? UBI_FM_DATA_VOLUME_ID : UBI_FM_SB_VOLUME_ID;
2944 +- return schedule_erase(ubi, e, vol_id, lnum, torture);
2945 ++ return schedule_erase(ubi, e, vol_id, lnum, torture, true);
2946 + }
2947 +
2948 + /**
2949 +diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
2950 +index bba7dd1b5ebf..72e89b352034 100644
2951 +--- a/drivers/mtd/ubi/fastmap.c
2952 ++++ b/drivers/mtd/ubi/fastmap.c
2953 +@@ -326,6 +326,7 @@ static int update_vol(struct ubi_device *ubi, struct ubi_attach_info *ai,
2954 + aeb->pnum = new_aeb->pnum;
2955 + aeb->copy_flag = new_vh->copy_flag;
2956 + aeb->scrub = new_aeb->scrub;
2957 ++ aeb->sqnum = new_aeb->sqnum;
2958 + kmem_cache_free(ai->aeb_slab_cache, new_aeb);
2959 +
2960 + /* new_aeb is older */
2961 +@@ -850,28 +851,58 @@ fail:
2962 + return ret;
2963 + }
2964 +
2965 ++/**
2966 ++ * find_fm_anchor - find the most recent Fastmap superblock (anchor)
2967 ++ * @ai: UBI attach info to be filled
2968 ++ */
2969 ++static int find_fm_anchor(struct ubi_attach_info *ai)
2970 ++{
2971 ++ int ret = -1;
2972 ++ struct ubi_ainf_peb *aeb;
2973 ++ unsigned long long max_sqnum = 0;
2974 ++
2975 ++ list_for_each_entry(aeb, &ai->fastmap, u.list) {
2976 ++ if (aeb->vol_id == UBI_FM_SB_VOLUME_ID && aeb->sqnum > max_sqnum) {
2977 ++ max_sqnum = aeb->sqnum;
2978 ++ ret = aeb->pnum;
2979 ++ }
2980 ++ }
2981 ++
2982 ++ return ret;
2983 ++}
2984 ++
2985 + /**
2986 + * ubi_scan_fastmap - scan the fastmap.
2987 + * @ubi: UBI device object
2988 + * @ai: UBI attach info to be filled
2989 +- * @fm_anchor: The fastmap starts at this PEB
2990 ++ * @scan_ai: UBI attach info from the first 64 PEBs,
2991 ++ * used to find the most recent Fastmap data structure
2992 + *
2993 + * Returns 0 on success, UBI_NO_FASTMAP if no fastmap was found,
2994 + * UBI_BAD_FASTMAP if one was found but is not usable.
2995 + * < 0 indicates an internal error.
2996 + */
2997 + int ubi_scan_fastmap(struct ubi_device *ubi, struct ubi_attach_info *ai,
2998 +- int fm_anchor)
2999 ++ struct ubi_attach_info *scan_ai)
3000 + {
3001 + struct ubi_fm_sb *fmsb, *fmsb2;
3002 + struct ubi_vid_hdr *vh;
3003 + struct ubi_ec_hdr *ech;
3004 + struct ubi_fastmap_layout *fm;
3005 +- int i, used_blocks, pnum, ret = 0;
3006 ++ struct ubi_ainf_peb *tmp_aeb, *aeb;
3007 ++ int i, used_blocks, pnum, fm_anchor, ret = 0;
3008 + size_t fm_size;
3009 + __be32 crc, tmp_crc;
3010 + unsigned long long sqnum = 0;
3011 +
3012 ++ fm_anchor = find_fm_anchor(scan_ai);
3013 ++ if (fm_anchor < 0)
3014 ++ return UBI_NO_FASTMAP;
3015 ++
3016 ++ /* Move all (possible) fastmap blocks into our new attach structure. */
3017 ++ list_for_each_entry_safe(aeb, tmp_aeb, &scan_ai->fastmap, u.list)
3018 ++ list_move_tail(&aeb->u.list, &ai->fastmap);
3019 ++
3020 + down_write(&ubi->fm_protect);
3021 + memset(ubi->fm_buf, 0, ubi->fm_size);
3022 +
3023 +@@ -1484,22 +1515,30 @@ int ubi_update_fastmap(struct ubi_device *ubi)
3024 + struct ubi_wl_entry *tmp_e;
3025 +
3026 + down_write(&ubi->fm_protect);
3027 ++ down_write(&ubi->work_sem);
3028 ++ down_write(&ubi->fm_eba_sem);
3029 +
3030 + ubi_refill_pools(ubi);
3031 +
3032 + if (ubi->ro_mode || ubi->fm_disabled) {
3033 ++ up_write(&ubi->fm_eba_sem);
3034 ++ up_write(&ubi->work_sem);
3035 + up_write(&ubi->fm_protect);
3036 + return 0;
3037 + }
3038 +
3039 + ret = ubi_ensure_anchor_pebs(ubi);
3040 + if (ret) {
3041 ++ up_write(&ubi->fm_eba_sem);
3042 ++ up_write(&ubi->work_sem);
3043 + up_write(&ubi->fm_protect);
3044 + return ret;
3045 + }
3046 +
3047 + new_fm = kzalloc(sizeof(*new_fm), GFP_KERNEL);
3048 + if (!new_fm) {
3049 ++ up_write(&ubi->fm_eba_sem);
3050 ++ up_write(&ubi->work_sem);
3051 + up_write(&ubi->fm_protect);
3052 + return -ENOMEM;
3053 + }
3054 +@@ -1608,16 +1647,14 @@ int ubi_update_fastmap(struct ubi_device *ubi)
3055 + new_fm->e[0] = tmp_e;
3056 + }
3057 +
3058 +- down_write(&ubi->work_sem);
3059 +- down_write(&ubi->fm_eba_sem);
3060 + ret = ubi_write_fastmap(ubi, new_fm);
3061 +- up_write(&ubi->fm_eba_sem);
3062 +- up_write(&ubi->work_sem);
3063 +
3064 + if (ret)
3065 + goto err;
3066 +
3067 + out_unlock:
3068 ++ up_write(&ubi->fm_eba_sem);
3069 ++ up_write(&ubi->work_sem);
3070 + up_write(&ubi->fm_protect);
3071 + kfree(old_fm);
3072 + return ret;
3073 +diff --git a/drivers/mtd/ubi/ubi.h b/drivers/mtd/ubi/ubi.h
3074 +index de1ea2e4c37d..05d9ec66437c 100644
3075 +--- a/drivers/mtd/ubi/ubi.h
3076 ++++ b/drivers/mtd/ubi/ubi.h
3077 +@@ -699,6 +699,8 @@ struct ubi_ainf_volume {
3078 + * @erase: list of physical eraseblocks which have to be erased
3079 + * @alien: list of physical eraseblocks which should not be used by UBI (e.g.,
3080 + * those belonging to "preserve"-compatible internal volumes)
3081 ++ * @fastmap: list of physical eraseblocks which relate to fastmap (e.g.,
3082 ++ * eraseblocks of the current and not yet erased old fastmap blocks)
3083 + * @corr_peb_count: count of PEBs in the @corr list
3084 + * @empty_peb_count: count of PEBs which are presumably empty (contain only
3085 + * 0xFF bytes)
3086 +@@ -709,6 +711,8 @@ struct ubi_ainf_volume {
3087 + * @vols_found: number of volumes found
3088 + * @highest_vol_id: highest volume ID
3089 + * @is_empty: flag indicating whether the MTD device is empty or not
3090 ++ * @force_full_scan: flag indicating whether we need to do a full scan and drop
3091 ++ all existing Fastmap data structures
3092 + * @min_ec: lowest erase counter value
3093 + * @max_ec: highest erase counter value
3094 + * @max_sqnum: highest sequence number value
3095 +@@ -727,6 +731,7 @@ struct ubi_attach_info {
3096 + struct list_head free;
3097 + struct list_head erase;
3098 + struct list_head alien;
3099 ++ struct list_head fastmap;
3100 + int corr_peb_count;
3101 + int empty_peb_count;
3102 + int alien_peb_count;
3103 +@@ -735,6 +740,7 @@ struct ubi_attach_info {
3104 + int vols_found;
3105 + int highest_vol_id;
3106 + int is_empty;
3107 ++ int force_full_scan;
3108 + int min_ec;
3109 + int max_ec;
3110 + unsigned long long max_sqnum;
3111 +@@ -907,7 +913,7 @@ int ubi_compare_lebs(struct ubi_device *ubi, const struct ubi_ainf_peb *aeb,
3112 + size_t ubi_calc_fm_size(struct ubi_device *ubi);
3113 + int ubi_update_fastmap(struct ubi_device *ubi);
3114 + int ubi_scan_fastmap(struct ubi_device *ubi, struct ubi_attach_info *ai,
3115 +- int fm_anchor);
3116 ++ struct ubi_attach_info *scan_ai);
3117 + #else
3118 + static inline int ubi_update_fastmap(struct ubi_device *ubi) { return 0; }
3119 + #endif
3120 +@@ -1101,4 +1107,42 @@ static inline int idx2vol_id(const struct ubi_device *ubi, int idx)
3121 + return idx;
3122 + }
3123 +
3124 ++/**
3125 ++ * ubi_is_fm_vol - check whether a volume ID is a Fastmap volume.
3126 ++ * @vol_id: volume ID
3127 ++ */
3128 ++static inline bool ubi_is_fm_vol(int vol_id)
3129 ++{
3130 ++ switch (vol_id) {
3131 ++ case UBI_FM_SB_VOLUME_ID:
3132 ++ case UBI_FM_DATA_VOLUME_ID:
3133 ++ return true;
3134 ++ }
3135 ++
3136 ++ return false;
3137 ++}
3138 ++
3139 ++/**
3140 ++ * ubi_find_fm_block - check whether a PEB is part of the current Fastmap.
3141 ++ * @ubi: UBI device description object
3142 ++ * @pnum: physical eraseblock to look for
3143 ++ *
3144 ++ * This function returns a wear leveling object if @pnum relates to the current
3145 ++ * fastmap, @NULL otherwise.
3146 ++ */
3147 ++static inline struct ubi_wl_entry *ubi_find_fm_block(const struct ubi_device *ubi,
3148 ++ int pnum)
3149 ++{
3150 ++ int i;
3151 ++
3152 ++ if (ubi->fm) {
3153 ++ for (i = 0; i < ubi->fm->used_blocks; i++) {
3154 ++ if (ubi->fm->e[i]->pnum == pnum)
3155 ++ return ubi->fm->e[i];
3156 ++ }
3157 ++ }
3158 ++
3159 ++ return NULL;
3160 ++}
3161 ++
3162 + #endif /* !__UBI_UBI_H__ */
3163 +diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c
3164 +index ca9746f41ff1..b3c1b8106a68 100644
3165 +--- a/drivers/mtd/ubi/wl.c
3166 ++++ b/drivers/mtd/ubi/wl.c
3167 +@@ -580,7 +580,7 @@ static int erase_worker(struct ubi_device *ubi, struct ubi_work *wl_wrk,
3168 + * failure.
3169 + */
3170 + static int schedule_erase(struct ubi_device *ubi, struct ubi_wl_entry *e,
3171 +- int vol_id, int lnum, int torture)
3172 ++ int vol_id, int lnum, int torture, bool nested)
3173 + {
3174 + struct ubi_work *wl_wrk;
3175 +
3176 +@@ -599,7 +599,10 @@ static int schedule_erase(struct ubi_device *ubi, struct ubi_wl_entry *e,
3177 + wl_wrk->lnum = lnum;
3178 + wl_wrk->torture = torture;
3179 +
3180 +- schedule_ubi_work(ubi, wl_wrk);
3181 ++ if (nested)
3182 ++ __schedule_ubi_work(ubi, wl_wrk);
3183 ++ else
3184 ++ schedule_ubi_work(ubi, wl_wrk);
3185 + return 0;
3186 + }
3187 +
3188 +@@ -658,6 +661,7 @@ static int wear_leveling_worker(struct ubi_device *ubi, struct ubi_work *wrk,
3189 + if (!vid_hdr)
3190 + return -ENOMEM;
3191 +
3192 ++ down_read(&ubi->fm_eba_sem);
3193 + mutex_lock(&ubi->move_mutex);
3194 + spin_lock(&ubi->wl_lock);
3195 + ubi_assert(!ubi->move_from && !ubi->move_to);
3196 +@@ -884,6 +888,7 @@ static int wear_leveling_worker(struct ubi_device *ubi, struct ubi_work *wrk,
3197 +
3198 + dbg_wl("done");
3199 + mutex_unlock(&ubi->move_mutex);
3200 ++ up_read(&ubi->fm_eba_sem);
3201 + return 0;
3202 +
3203 + /*
3204 +@@ -925,6 +930,7 @@ out_not_moved:
3205 + }
3206 +
3207 + mutex_unlock(&ubi->move_mutex);
3208 ++ up_read(&ubi->fm_eba_sem);
3209 + return 0;
3210 +
3211 + out_error:
3212 +@@ -946,6 +952,7 @@ out_error:
3213 + out_ro:
3214 + ubi_ro_mode(ubi);
3215 + mutex_unlock(&ubi->move_mutex);
3216 ++ up_read(&ubi->fm_eba_sem);
3217 + ubi_assert(err != 0);
3218 + return err < 0 ? err : -EIO;
3219 +
3220 +@@ -953,6 +960,7 @@ out_cancel:
3221 + ubi->wl_scheduled = 0;
3222 + spin_unlock(&ubi->wl_lock);
3223 + mutex_unlock(&ubi->move_mutex);
3224 ++ up_read(&ubi->fm_eba_sem);
3225 + ubi_free_vid_hdr(ubi, vid_hdr);
3226 + return 0;
3227 + }
3228 +@@ -1075,7 +1083,7 @@ static int __erase_worker(struct ubi_device *ubi, struct ubi_work *wl_wrk)
3229 + int err1;
3230 +
3231 + /* Re-schedule the LEB for erasure */
3232 +- err1 = schedule_erase(ubi, e, vol_id, lnum, 0);
3233 ++ err1 = schedule_erase(ubi, e, vol_id, lnum, 0, false);
3234 + if (err1) {
3235 + wl_entry_destroy(ubi, e);
3236 + err = err1;
3237 +@@ -1256,7 +1264,7 @@ retry:
3238 + }
3239 + spin_unlock(&ubi->wl_lock);
3240 +
3241 +- err = schedule_erase(ubi, e, vol_id, lnum, torture);
3242 ++ err = schedule_erase(ubi, e, vol_id, lnum, torture, false);
3243 + if (err) {
3244 + spin_lock(&ubi->wl_lock);
3245 + wl_tree_add(e, &ubi->used);
3246 +@@ -1500,6 +1508,46 @@ static void shutdown_work(struct ubi_device *ubi)
3247 + }
3248 + }
3249 +
3250 ++/**
3251 ++ * erase_aeb - erase a PEB given in UBI attach info PEB
3252 ++ * @ubi: UBI device description object
3253 ++ * @aeb: UBI attach info PEB
3254 ++ * @sync: If true, erase synchronously. Otherwise schedule for erasure
3255 ++ */
3256 ++static int erase_aeb(struct ubi_device *ubi, struct ubi_ainf_peb *aeb, bool sync)
3257 ++{
3258 ++ struct ubi_wl_entry *e;
3259 ++ int err;
3260 ++
3261 ++ e = kmem_cache_alloc(ubi_wl_entry_slab, GFP_KERNEL);
3262 ++ if (!e)
3263 ++ return -ENOMEM;
3264 ++
3265 ++ e->pnum = aeb->pnum;
3266 ++ e->ec = aeb->ec;
3267 ++ ubi->lookuptbl[e->pnum] = e;
3268 ++
3269 ++ if (sync) {
3270 ++ err = sync_erase(ubi, e, false);
3271 ++ if (err)
3272 ++ goto out_free;
3273 ++
3274 ++ wl_tree_add(e, &ubi->free);
3275 ++ ubi->free_count++;
3276 ++ } else {
3277 ++ err = schedule_erase(ubi, e, aeb->vol_id, aeb->lnum, 0, false);
3278 ++ if (err)
3279 ++ goto out_free;
3280 ++ }
3281 ++
3282 ++ return 0;
3283 ++
3284 ++out_free:
3285 ++ wl_entry_destroy(ubi, e);
3286 ++
3287 ++ return err;
3288 ++}
3289 ++
3290 + /**
3291 + * ubi_wl_init - initialize the WL sub-system using attaching information.
3292 + * @ubi: UBI device description object
3293 +@@ -1537,17 +1585,9 @@ int ubi_wl_init(struct ubi_device *ubi, struct ubi_attach_info *ai)
3294 + list_for_each_entry_safe(aeb, tmp, &ai->erase, u.list) {
3295 + cond_resched();
3296 +
3297 +- e = kmem_cache_alloc(ubi_wl_entry_slab, GFP_KERNEL);
3298 +- if (!e)
3299 +- goto out_free;
3300 +-
3301 +- e->pnum = aeb->pnum;
3302 +- e->ec = aeb->ec;
3303 +- ubi->lookuptbl[e->pnum] = e;
3304 +- if (schedule_erase(ubi, e, aeb->vol_id, aeb->lnum, 0)) {
3305 +- wl_entry_destroy(ubi, e);
3306 ++ err = erase_aeb(ubi, aeb, false);
3307 ++ if (err)
3308 + goto out_free;
3309 +- }
3310 +
3311 + found_pebs++;
3312 + }
3313 +@@ -1598,19 +1638,49 @@ int ubi_wl_init(struct ubi_device *ubi, struct ubi_attach_info *ai)
3314 + }
3315 + }
3316 +
3317 +- dbg_wl("found %i PEBs", found_pebs);
3318 ++ list_for_each_entry(aeb, &ai->fastmap, u.list) {
3319 ++ cond_resched();
3320 +
3321 +- if (ubi->fm) {
3322 +- ubi_assert(ubi->good_peb_count ==
3323 +- found_pebs + ubi->fm->used_blocks);
3324 ++ e = ubi_find_fm_block(ubi, aeb->pnum);
3325 +
3326 +- for (i = 0; i < ubi->fm->used_blocks; i++) {
3327 +- e = ubi->fm->e[i];
3328 ++ if (e) {
3329 ++ ubi_assert(!ubi->lookuptbl[e->pnum]);
3330 + ubi->lookuptbl[e->pnum] = e;
3331 ++ } else {
3332 ++ bool sync = false;
3333 ++
3334 ++ /*
3335 ++ * Usually old Fastmap PEBs are scheduled for erasure
3336 ++ * and we don't have to care about them but if we face
3337 ++ * an power cut before scheduling them we need to
3338 ++ * take care of them here.
3339 ++ */
3340 ++ if (ubi->lookuptbl[aeb->pnum])
3341 ++ continue;
3342 ++
3343 ++ /*
3344 ++ * The fastmap update code might not find a free PEB for
3345 ++ * writing the fastmap anchor to and then reuses the
3346 ++ * current fastmap anchor PEB. When this PEB gets erased
3347 ++ * and a power cut happens before it is written again we
3348 ++ * must make sure that the fastmap attach code doesn't
3349 ++ * find any outdated fastmap anchors, hence we erase the
3350 ++ * outdated fastmap anchor PEBs synchronously here.
3351 ++ */
3352 ++ if (aeb->vol_id == UBI_FM_SB_VOLUME_ID)
3353 ++ sync = true;
3354 ++
3355 ++ err = erase_aeb(ubi, aeb, sync);
3356 ++ if (err)
3357 ++ goto out_free;
3358 + }
3359 ++
3360 ++ found_pebs++;
3361 + }
3362 +- else
3363 +- ubi_assert(ubi->good_peb_count == found_pebs);
3364 ++
3365 ++ dbg_wl("found %i PEBs", found_pebs);
3366 ++
3367 ++ ubi_assert(ubi->good_peb_count == found_pebs);
3368 +
3369 + reserved_pebs = WL_RESERVED_PEBS;
3370 + ubi_fastmap_init(ubi, &reserved_pebs);
3371 +diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
3372 +index 1325825d5225..ce3a56bea6e6 100644
3373 +--- a/drivers/net/ethernet/broadcom/tg3.c
3374 ++++ b/drivers/net/ethernet/broadcom/tg3.c
3375 +@@ -9278,6 +9278,15 @@ static int tg3_chip_reset(struct tg3 *tp)
3376 +
3377 + tg3_restore_clk(tp);
3378 +
3379 ++ /* Increase the core clock speed to fix tx timeout issue for 5762
3380 ++ * with 100Mbps link speed.
3381 ++ */
3382 ++ if (tg3_asic_rev(tp) == ASIC_REV_5762) {
3383 ++ val = tr32(TG3_CPMU_CLCK_ORIDE_ENABLE);
3384 ++ tw32(TG3_CPMU_CLCK_ORIDE_ENABLE, val |
3385 ++ TG3_CPMU_MAC_ORIDE_ENABLE);
3386 ++ }
3387 ++
3388 + /* Reprobe ASF enable state. */
3389 + tg3_flag_clear(tp, ENABLE_ASF);
3390 + tp->phy_flags &= ~(TG3_PHYFLG_1G_ON_VAUX_OK |
3391 +diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
3392 +index 8179727d3423..1f2f25a71d18 100644
3393 +--- a/drivers/net/phy/phy_device.c
3394 ++++ b/drivers/net/phy/phy_device.c
3395 +@@ -1265,11 +1265,8 @@ static int gen10g_resume(struct phy_device *phydev)
3396 +
3397 + static int __set_phy_supported(struct phy_device *phydev, u32 max_speed)
3398 + {
3399 +- /* The default values for phydev->supported are provided by the PHY
3400 +- * driver "features" member, we want to reset to sane defaults first
3401 +- * before supporting higher speeds.
3402 +- */
3403 +- phydev->supported &= PHY_DEFAULT_FEATURES;
3404 ++ phydev->supported &= ~(PHY_1000BT_FEATURES | PHY_100BT_FEATURES |
3405 ++ PHY_10BT_FEATURES);
3406 +
3407 + switch (max_speed) {
3408 + default:
3409 +diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
3410 +index da7bae991552..d877ff124365 100644
3411 +--- a/drivers/ptp/ptp_chardev.c
3412 ++++ b/drivers/ptp/ptp_chardev.c
3413 +@@ -88,6 +88,7 @@ int ptp_set_pinfunc(struct ptp_clock *ptp, unsigned int pin,
3414 + case PTP_PF_PHYSYNC:
3415 + if (chan != 0)
3416 + return -EINVAL;
3417 ++ break;
3418 + default:
3419 + return -EINVAL;
3420 + }
3421 +diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
3422 +index f2e9f59c90d6..2d837b6bd495 100644
3423 +--- a/drivers/usb/host/xhci.c
3424 ++++ b/drivers/usb/host/xhci.c
3425 +@@ -887,6 +887,41 @@ static void xhci_disable_port_wake_on_bits(struct xhci_hcd *xhci)
3426 + spin_unlock_irqrestore(&xhci->lock, flags);
3427 + }
3428 +
3429 ++static bool xhci_pending_portevent(struct xhci_hcd *xhci)
3430 ++{
3431 ++ __le32 __iomem **port_array;
3432 ++ int port_index;
3433 ++ u32 status;
3434 ++ u32 portsc;
3435 ++
3436 ++ status = readl(&xhci->op_regs->status);
3437 ++ if (status & STS_EINT)
3438 ++ return true;
3439 ++ /*
3440 ++ * Checking STS_EINT is not enough as there is a lag between a change
3441 ++ * bit being set and the Port Status Change Event that it generated
3442 ++ * being written to the Event Ring. See note in xhci 1.1 section 4.19.2.
3443 ++ */
3444 ++
3445 ++ port_index = xhci->num_usb2_ports;
3446 ++ port_array = xhci->usb2_ports;
3447 ++ while (port_index--) {
3448 ++ portsc = readl(port_array[port_index]);
3449 ++ if (portsc & PORT_CHANGE_MASK ||
3450 ++ (portsc & PORT_PLS_MASK) == XDEV_RESUME)
3451 ++ return true;
3452 ++ }
3453 ++ port_index = xhci->num_usb3_ports;
3454 ++ port_array = xhci->usb3_ports;
3455 ++ while (port_index--) {
3456 ++ portsc = readl(port_array[port_index]);
3457 ++ if (portsc & PORT_CHANGE_MASK ||
3458 ++ (portsc & PORT_PLS_MASK) == XDEV_RESUME)
3459 ++ return true;
3460 ++ }
3461 ++ return false;
3462 ++}
3463 ++
3464 + /*
3465 + * Stop HC (not bus-specific)
3466 + *
3467 +@@ -983,7 +1018,7 @@ EXPORT_SYMBOL_GPL(xhci_suspend);
3468 + */
3469 + int xhci_resume(struct xhci_hcd *xhci, bool hibernated)
3470 + {
3471 +- u32 command, temp = 0, status;
3472 ++ u32 command, temp = 0;
3473 + struct usb_hcd *hcd = xhci_to_hcd(xhci);
3474 + struct usb_hcd *secondary_hcd;
3475 + int retval = 0;
3476 +@@ -1105,8 +1140,7 @@ int xhci_resume(struct xhci_hcd *xhci, bool hibernated)
3477 + done:
3478 + if (retval == 0) {
3479 + /* Resume root hubs only when have pending events. */
3480 +- status = readl(&xhci->op_regs->status);
3481 +- if (status & STS_EINT) {
3482 ++ if (xhci_pending_portevent(xhci)) {
3483 + usb_hcd_resume_root_hub(xhci->shared_hcd);
3484 + usb_hcd_resume_root_hub(hcd);
3485 + }
3486 +diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
3487 +index 1715705acc59..84d8871755b7 100644
3488 +--- a/drivers/usb/host/xhci.h
3489 ++++ b/drivers/usb/host/xhci.h
3490 +@@ -382,6 +382,10 @@ struct xhci_op_regs {
3491 + #define PORT_PLC (1 << 22)
3492 + /* port configure error change - port failed to configure its link partner */
3493 + #define PORT_CEC (1 << 23)
3494 ++#define PORT_CHANGE_MASK (PORT_CSC | PORT_PEC | PORT_WRC | PORT_OCC | \
3495 ++ PORT_RC | PORT_PLC | PORT_CEC)
3496 ++
3497 ++
3498 + /* Cold Attach Status - xHC can set this bit to report device attached during
3499 + * Sx state. Warm port reset should be perfomed to clear this bit and move port
3500 + * to connected state.
3501 +diff --git a/fs/fat/inode.c b/fs/fat/inode.c
3502 +index cf644d52c0cf..c81cfb79a339 100644
3503 +--- a/fs/fat/inode.c
3504 ++++ b/fs/fat/inode.c
3505 +@@ -613,13 +613,21 @@ static void fat_set_state(struct super_block *sb,
3506 + brelse(bh);
3507 + }
3508 +
3509 ++static void fat_reset_iocharset(struct fat_mount_options *opts)
3510 ++{
3511 ++ if (opts->iocharset != fat_default_iocharset) {
3512 ++ /* Note: opts->iocharset can be NULL here */
3513 ++ kfree(opts->iocharset);
3514 ++ opts->iocharset = fat_default_iocharset;
3515 ++ }
3516 ++}
3517 ++
3518 + static void delayed_free(struct rcu_head *p)
3519 + {
3520 + struct msdos_sb_info *sbi = container_of(p, struct msdos_sb_info, rcu);
3521 + unload_nls(sbi->nls_disk);
3522 + unload_nls(sbi->nls_io);
3523 +- if (sbi->options.iocharset != fat_default_iocharset)
3524 +- kfree(sbi->options.iocharset);
3525 ++ fat_reset_iocharset(&sbi->options);
3526 + kfree(sbi);
3527 + }
3528 +
3529 +@@ -1034,7 +1042,7 @@ static int parse_options(struct super_block *sb, char *options, int is_vfat,
3530 + opts->fs_fmask = opts->fs_dmask = current_umask();
3531 + opts->allow_utime = -1;
3532 + opts->codepage = fat_default_codepage;
3533 +- opts->iocharset = fat_default_iocharset;
3534 ++ fat_reset_iocharset(opts);
3535 + if (is_vfat) {
3536 + opts->shortname = VFAT_SFN_DISPLAY_WINNT|VFAT_SFN_CREATE_WIN95;
3537 + opts->rodir = 0;
3538 +@@ -1184,8 +1192,7 @@ static int parse_options(struct super_block *sb, char *options, int is_vfat,
3539 +
3540 + /* vfat specific */
3541 + case Opt_charset:
3542 +- if (opts->iocharset != fat_default_iocharset)
3543 +- kfree(opts->iocharset);
3544 ++ fat_reset_iocharset(opts);
3545 + iocharset = match_strdup(&args[0]);
3546 + if (!iocharset)
3547 + return -ENOMEM;
3548 +@@ -1776,8 +1783,7 @@ out_fail:
3549 + iput(fat_inode);
3550 + unload_nls(sbi->nls_io);
3551 + unload_nls(sbi->nls_disk);
3552 +- if (sbi->options.iocharset != fat_default_iocharset)
3553 +- kfree(sbi->options.iocharset);
3554 ++ fat_reset_iocharset(&sbi->options);
3555 + sb->s_fs_info = NULL;
3556 + kfree(sbi);
3557 + return error;
3558 +diff --git a/fs/proc/array.c b/fs/proc/array.c
3559 +index b6c00ce0e29e..cb71cbae606d 100644
3560 +--- a/fs/proc/array.c
3561 ++++ b/fs/proc/array.c
3562 +@@ -79,6 +79,7 @@
3563 + #include <linux/delayacct.h>
3564 + #include <linux/seq_file.h>
3565 + #include <linux/pid_namespace.h>
3566 ++#include <linux/prctl.h>
3567 + #include <linux/ptrace.h>
3568 + #include <linux/tracehook.h>
3569 + #include <linux/string_helpers.h>
3570 +@@ -332,6 +333,31 @@ static inline void task_seccomp(struct seq_file *m, struct task_struct *p)
3571 + #ifdef CONFIG_SECCOMP
3572 + seq_printf(m, "Seccomp:\t%d\n", p->seccomp.mode);
3573 + #endif
3574 ++ seq_printf(m, "\nSpeculation_Store_Bypass:\t");
3575 ++ switch (arch_prctl_spec_ctrl_get(p, PR_SPEC_STORE_BYPASS)) {
3576 ++ case -EINVAL:
3577 ++ seq_printf(m, "unknown");
3578 ++ break;
3579 ++ case PR_SPEC_NOT_AFFECTED:
3580 ++ seq_printf(m, "not vulnerable");
3581 ++ break;
3582 ++ case PR_SPEC_PRCTL | PR_SPEC_FORCE_DISABLE:
3583 ++ seq_printf(m, "thread force mitigated");
3584 ++ break;
3585 ++ case PR_SPEC_PRCTL | PR_SPEC_DISABLE:
3586 ++ seq_printf(m, "thread mitigated");
3587 ++ break;
3588 ++ case PR_SPEC_PRCTL | PR_SPEC_ENABLE:
3589 ++ seq_printf(m, "thread vulnerable");
3590 ++ break;
3591 ++ case PR_SPEC_DISABLE:
3592 ++ seq_printf(m, "globally mitigated");
3593 ++ break;
3594 ++ default:
3595 ++ seq_printf(m, "vulnerable");
3596 ++ break;
3597 ++ }
3598 ++ seq_putc(m, '\n');
3599 + }
3600 +
3601 + static inline void task_context_switch_counts(struct seq_file *m,
3602 +diff --git a/include/linux/cpu.h b/include/linux/cpu.h
3603 +index 7e04bcd9af8e..2f9d12022100 100644
3604 +--- a/include/linux/cpu.h
3605 ++++ b/include/linux/cpu.h
3606 +@@ -46,6 +46,8 @@ extern ssize_t cpu_show_spectre_v1(struct device *dev,
3607 + struct device_attribute *attr, char *buf);
3608 + extern ssize_t cpu_show_spectre_v2(struct device *dev,
3609 + struct device_attribute *attr, char *buf);
3610 ++extern ssize_t cpu_show_spec_store_bypass(struct device *dev,
3611 ++ struct device_attribute *attr, char *buf);
3612 +
3613 + extern __printf(4, 5)
3614 + struct device *cpu_device_create(struct device *parent, void *drvdata,
3615 +diff --git a/include/linux/nospec.h b/include/linux/nospec.h
3616 +index e791ebc65c9c..0c5ef54fd416 100644
3617 +--- a/include/linux/nospec.h
3618 ++++ b/include/linux/nospec.h
3619 +@@ -7,6 +7,8 @@
3620 + #define _LINUX_NOSPEC_H
3621 + #include <asm/barrier.h>
3622 +
3623 ++struct task_struct;
3624 ++
3625 + /**
3626 + * array_index_mask_nospec() - generate a ~0 mask when index < size, 0 otherwise
3627 + * @index: array element index
3628 +@@ -55,4 +57,12 @@ static inline unsigned long array_index_mask_nospec(unsigned long index,
3629 + \
3630 + (typeof(_i)) (_i & _mask); \
3631 + })
3632 ++
3633 ++/* Speculation control prctl */
3634 ++int arch_prctl_spec_ctrl_get(struct task_struct *task, unsigned long which);
3635 ++int arch_prctl_spec_ctrl_set(struct task_struct *task, unsigned long which,
3636 ++ unsigned long ctrl);
3637 ++/* Speculation control for seccomp enforced mitigation */
3638 ++void arch_seccomp_spec_mitigate(struct task_struct *task);
3639 ++
3640 + #endif /* _LINUX_NOSPEC_H */
3641 +diff --git a/include/linux/sched.h b/include/linux/sched.h
3642 +index 90bea398e5e0..725498cc5d30 100644
3643 +--- a/include/linux/sched.h
3644 ++++ b/include/linux/sched.h
3645 +@@ -2167,6 +2167,8 @@ static inline void memalloc_noio_restore(unsigned int flags)
3646 + #define PFA_NO_NEW_PRIVS 0 /* May not gain new privileges. */
3647 + #define PFA_SPREAD_PAGE 1 /* Spread page cache over cpuset */
3648 + #define PFA_SPREAD_SLAB 2 /* Spread some slab caches over cpuset */
3649 ++#define PFA_SPEC_SSB_DISABLE 4 /* Speculative Store Bypass disabled */
3650 ++#define PFA_SPEC_SSB_FORCE_DISABLE 5 /* Speculative Store Bypass force disabled*/
3651 +
3652 +
3653 + #define TASK_PFA_TEST(name, func) \
3654 +@@ -2190,6 +2192,13 @@ TASK_PFA_TEST(SPREAD_SLAB, spread_slab)
3655 + TASK_PFA_SET(SPREAD_SLAB, spread_slab)
3656 + TASK_PFA_CLEAR(SPREAD_SLAB, spread_slab)
3657 +
3658 ++TASK_PFA_TEST(SPEC_SSB_DISABLE, spec_ssb_disable)
3659 ++TASK_PFA_SET(SPEC_SSB_DISABLE, spec_ssb_disable)
3660 ++TASK_PFA_CLEAR(SPEC_SSB_DISABLE, spec_ssb_disable)
3661 ++
3662 ++TASK_PFA_TEST(SPEC_SSB_FORCE_DISABLE, spec_ssb_force_disable)
3663 ++TASK_PFA_SET(SPEC_SSB_FORCE_DISABLE, spec_ssb_force_disable)
3664 ++
3665 + /*
3666 + * task->jobctl flags
3667 + */
3668 +diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
3669 +index 2296e6b2f690..5a53d34bba26 100644
3670 +--- a/include/linux/seccomp.h
3671 ++++ b/include/linux/seccomp.h
3672 +@@ -3,7 +3,8 @@
3673 +
3674 + #include <uapi/linux/seccomp.h>
3675 +
3676 +-#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC)
3677 ++#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC | \
3678 ++ SECCOMP_FILTER_FLAG_SPEC_ALLOW)
3679 +
3680 + #ifdef CONFIG_SECCOMP
3681 +
3682 +diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
3683 +index a6da214d0584..c28bd8be290a 100644
3684 +--- a/include/linux/skbuff.h
3685 ++++ b/include/linux/skbuff.h
3686 +@@ -514,6 +514,7 @@ static inline bool skb_mstamp_after(const struct skb_mstamp *t1,
3687 + * @hash: the packet hash
3688 + * @queue_mapping: Queue mapping for multiqueue devices
3689 + * @xmit_more: More SKBs are pending for this queue
3690 ++ * @pfmemalloc: skbuff was allocated from PFMEMALLOC reserves
3691 + * @ndisc_nodetype: router type (from link layer)
3692 + * @ooo_okay: allow the mapping of a socket to a queue to be changed
3693 + * @l4_hash: indicate hash is a canonical 4-tuple hash over transport
3694 +@@ -594,8 +595,8 @@ struct sk_buff {
3695 + fclone:2,
3696 + peeked:1,
3697 + head_frag:1,
3698 +- xmit_more:1;
3699 +- /* one bit hole */
3700 ++ xmit_more:1,
3701 ++ pfmemalloc:1;
3702 + kmemcheck_bitfield_end(flags1);
3703 +
3704 + /* fields enclosed in headers_start/headers_end are copied
3705 +@@ -615,19 +616,18 @@ struct sk_buff {
3706 +
3707 + __u8 __pkt_type_offset[0];
3708 + __u8 pkt_type:3;
3709 +- __u8 pfmemalloc:1;
3710 + __u8 ignore_df:1;
3711 + __u8 nfctinfo:3;
3712 +-
3713 + __u8 nf_trace:1;
3714 ++
3715 + __u8 ip_summed:2;
3716 + __u8 ooo_okay:1;
3717 + __u8 l4_hash:1;
3718 + __u8 sw_hash:1;
3719 + __u8 wifi_acked_valid:1;
3720 + __u8 wifi_acked:1;
3721 +-
3722 + __u8 no_fcs:1;
3723 ++
3724 + /* Indicates the inner headers are valid in the skbuff. */
3725 + __u8 encapsulation:1;
3726 + __u8 encap_hdr_csum:1;
3727 +@@ -635,11 +635,11 @@ struct sk_buff {
3728 + __u8 csum_complete_sw:1;
3729 + __u8 csum_level:2;
3730 + __u8 csum_bad:1;
3731 +-
3732 + #ifdef CONFIG_IPV6_NDISC_NODETYPE
3733 + __u8 ndisc_nodetype:2;
3734 + #endif
3735 + __u8 ipvs_property:1;
3736 ++
3737 + __u8 inner_protocol_type:1;
3738 + __u8 remcsum_offload:1;
3739 + /* 3 or 5 bit hole */
3740 +diff --git a/include/net/ipv6.h b/include/net/ipv6.h
3741 +index 84f0d0602433..0e01d570fa22 100644
3742 +--- a/include/net/ipv6.h
3743 ++++ b/include/net/ipv6.h
3744 +@@ -762,7 +762,7 @@ static inline __be32 ip6_make_flowlabel(struct net *net, struct sk_buff *skb,
3745 + * to minimize possbility that any useful information to an
3746 + * attacker is leaked. Only lower 20 bits are relevant.
3747 + */
3748 +- rol32(hash, 16);
3749 ++ hash = rol32(hash, 16);
3750 +
3751 + flowlabel = (__force __be32)hash & IPV6_FLOWLABEL_MASK;
3752 +
3753 +diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
3754 +index a8d0759a9e40..64776b72e1eb 100644
3755 +--- a/include/uapi/linux/prctl.h
3756 ++++ b/include/uapi/linux/prctl.h
3757 +@@ -197,4 +197,16 @@ struct prctl_mm_map {
3758 + # define PR_CAP_AMBIENT_LOWER 3
3759 + # define PR_CAP_AMBIENT_CLEAR_ALL 4
3760 +
3761 ++/* Per task speculation control */
3762 ++#define PR_GET_SPECULATION_CTRL 52
3763 ++#define PR_SET_SPECULATION_CTRL 53
3764 ++/* Speculation control variants */
3765 ++# define PR_SPEC_STORE_BYPASS 0
3766 ++/* Return and control values for PR_SET/GET_SPECULATION_CTRL */
3767 ++# define PR_SPEC_NOT_AFFECTED 0
3768 ++# define PR_SPEC_PRCTL (1UL << 0)
3769 ++# define PR_SPEC_ENABLE (1UL << 1)
3770 ++# define PR_SPEC_DISABLE (1UL << 2)
3771 ++# define PR_SPEC_FORCE_DISABLE (1UL << 3)
3772 ++
3773 + #endif /* _LINUX_PRCTL_H */
3774 +diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
3775 +index 0f238a43ff1e..e4acb615792b 100644
3776 +--- a/include/uapi/linux/seccomp.h
3777 ++++ b/include/uapi/linux/seccomp.h
3778 +@@ -15,7 +15,9 @@
3779 + #define SECCOMP_SET_MODE_FILTER 1
3780 +
3781 + /* Valid flags for SECCOMP_SET_MODE_FILTER */
3782 +-#define SECCOMP_FILTER_FLAG_TSYNC 1
3783 ++#define SECCOMP_FILTER_FLAG_TSYNC (1UL << 0)
3784 ++/* In v4.14+ SECCOMP_FILTER_FLAG_LOG is (1UL << 1) */
3785 ++#define SECCOMP_FILTER_FLAG_SPEC_ALLOW (1UL << 2)
3786 +
3787 + /*
3788 + * All BPF programs must return a 32-bit value.
3789 +diff --git a/kernel/seccomp.c b/kernel/seccomp.c
3790 +index efd384f3f852..9a9203b15cde 100644
3791 +--- a/kernel/seccomp.c
3792 ++++ b/kernel/seccomp.c
3793 +@@ -16,6 +16,8 @@
3794 + #include <linux/atomic.h>
3795 + #include <linux/audit.h>
3796 + #include <linux/compat.h>
3797 ++#include <linux/nospec.h>
3798 ++#include <linux/prctl.h>
3799 + #include <linux/sched.h>
3800 + #include <linux/seccomp.h>
3801 + #include <linux/slab.h>
3802 +@@ -214,8 +216,11 @@ static inline bool seccomp_may_assign_mode(unsigned long seccomp_mode)
3803 + return true;
3804 + }
3805 +
3806 ++void __weak arch_seccomp_spec_mitigate(struct task_struct *task) { }
3807 ++
3808 + static inline void seccomp_assign_mode(struct task_struct *task,
3809 +- unsigned long seccomp_mode)
3810 ++ unsigned long seccomp_mode,
3811 ++ unsigned long flags)
3812 + {
3813 + assert_spin_locked(&task->sighand->siglock);
3814 +
3815 +@@ -225,6 +230,9 @@ static inline void seccomp_assign_mode(struct task_struct *task,
3816 + * filter) is set.
3817 + */
3818 + smp_mb__before_atomic();
3819 ++ /* Assume default seccomp processes want spec flaw mitigation. */
3820 ++ if ((flags & SECCOMP_FILTER_FLAG_SPEC_ALLOW) == 0)
3821 ++ arch_seccomp_spec_mitigate(task);
3822 + set_tsk_thread_flag(task, TIF_SECCOMP);
3823 + }
3824 +
3825 +@@ -292,7 +300,7 @@ static inline pid_t seccomp_can_sync_threads(void)
3826 + * without dropping the locks.
3827 + *
3828 + */
3829 +-static inline void seccomp_sync_threads(void)
3830 ++static inline void seccomp_sync_threads(unsigned long flags)
3831 + {
3832 + struct task_struct *thread, *caller;
3833 +
3834 +@@ -333,7 +341,8 @@ static inline void seccomp_sync_threads(void)
3835 + * allow one thread to transition the other.
3836 + */
3837 + if (thread->seccomp.mode == SECCOMP_MODE_DISABLED)
3838 +- seccomp_assign_mode(thread, SECCOMP_MODE_FILTER);
3839 ++ seccomp_assign_mode(thread, SECCOMP_MODE_FILTER,
3840 ++ flags);
3841 + }
3842 + }
3843 +
3844 +@@ -452,7 +461,7 @@ static long seccomp_attach_filter(unsigned int flags,
3845 +
3846 + /* Now that the new filter is in place, synchronize to all threads. */
3847 + if (flags & SECCOMP_FILTER_FLAG_TSYNC)
3848 +- seccomp_sync_threads();
3849 ++ seccomp_sync_threads(flags);
3850 +
3851 + return 0;
3852 + }
3853 +@@ -747,7 +756,7 @@ static long seccomp_set_mode_strict(void)
3854 + #ifdef TIF_NOTSC
3855 + disable_TSC();
3856 + #endif
3857 +- seccomp_assign_mode(current, seccomp_mode);
3858 ++ seccomp_assign_mode(current, seccomp_mode, 0);
3859 + ret = 0;
3860 +
3861 + out:
3862 +@@ -805,7 +814,7 @@ static long seccomp_set_mode_filter(unsigned int flags,
3863 + /* Do not free the successfully attached filter. */
3864 + prepared = NULL;
3865 +
3866 +- seccomp_assign_mode(current, seccomp_mode);
3867 ++ seccomp_assign_mode(current, seccomp_mode, flags);
3868 + out:
3869 + spin_unlock_irq(&current->sighand->siglock);
3870 + if (flags & SECCOMP_FILTER_FLAG_TSYNC)
3871 +diff --git a/kernel/sys.c b/kernel/sys.c
3872 +index 6624919ef0e7..f718742e55e6 100644
3873 +--- a/kernel/sys.c
3874 ++++ b/kernel/sys.c
3875 +@@ -2075,6 +2075,17 @@ static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
3876 + }
3877 + #endif
3878 +
3879 ++int __weak arch_prctl_spec_ctrl_get(struct task_struct *t, unsigned long which)
3880 ++{
3881 ++ return -EINVAL;
3882 ++}
3883 ++
3884 ++int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which,
3885 ++ unsigned long ctrl)
3886 ++{
3887 ++ return -EINVAL;
3888 ++}
3889 ++
3890 + SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
3891 + unsigned long, arg4, unsigned long, arg5)
3892 + {
3893 +@@ -2269,6 +2280,16 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
3894 + case PR_GET_FP_MODE:
3895 + error = GET_FP_MODE(me);
3896 + break;
3897 ++ case PR_GET_SPECULATION_CTRL:
3898 ++ if (arg3 || arg4 || arg5)
3899 ++ return -EINVAL;
3900 ++ error = arch_prctl_spec_ctrl_get(me, arg2);
3901 ++ break;
3902 ++ case PR_SET_SPECULATION_CTRL:
3903 ++ if (arg4 || arg5)
3904 ++ return -EINVAL;
3905 ++ error = arch_prctl_spec_ctrl_set(me, arg2, arg3);
3906 ++ break;
3907 + default:
3908 + error = -EINVAL;
3909 + break;
3910 +diff --git a/lib/rhashtable.c b/lib/rhashtable.c
3911 +index 51282f579760..37ea94b636a3 100644
3912 +--- a/lib/rhashtable.c
3913 ++++ b/lib/rhashtable.c
3914 +@@ -670,8 +670,16 @@ EXPORT_SYMBOL_GPL(rhashtable_walk_stop);
3915 +
3916 + static size_t rounded_hashtable_size(const struct rhashtable_params *params)
3917 + {
3918 +- return max(roundup_pow_of_two(params->nelem_hint * 4 / 3),
3919 +- (unsigned long)params->min_size);
3920 ++ size_t retsize;
3921 ++
3922 ++ if (params->nelem_hint)
3923 ++ retsize = max(roundup_pow_of_two(params->nelem_hint * 4 / 3),
3924 ++ (unsigned long)params->min_size);
3925 ++ else
3926 ++ retsize = max(HASH_DEFAULT_SIZE,
3927 ++ (unsigned long)params->min_size);
3928 ++
3929 ++ return retsize;
3930 + }
3931 +
3932 + static u32 rhashtable_jhash2(const void *key, u32 length, u32 seed)
3933 +@@ -728,8 +736,6 @@ int rhashtable_init(struct rhashtable *ht,
3934 + struct bucket_table *tbl;
3935 + size_t size;
3936 +
3937 +- size = HASH_DEFAULT_SIZE;
3938 +-
3939 + if ((!params->key_len && !params->obj_hashfn) ||
3940 + (params->obj_hashfn && !params->obj_cmpfn))
3941 + return -EINVAL;
3942 +@@ -756,8 +762,7 @@ int rhashtable_init(struct rhashtable *ht,
3943 +
3944 + ht->p.min_size = max(ht->p.min_size, HASH_MIN_SIZE);
3945 +
3946 +- if (params->nelem_hint)
3947 +- size = rounded_hashtable_size(&ht->p);
3948 ++ size = rounded_hashtable_size(&ht->p);
3949 +
3950 + /* The maximum (not average) chain length grows with the
3951 + * size of the hash table, at a rate of (log N)/(log log N).
3952 +diff --git a/mm/memcontrol.c b/mm/memcontrol.c
3953 +index 55a9facb8e8d..9a8e688724b1 100644
3954 +--- a/mm/memcontrol.c
3955 ++++ b/mm/memcontrol.c
3956 +@@ -996,7 +996,7 @@ static void invalidate_reclaim_iterators(struct mem_cgroup *dead_memcg)
3957 + int nid, zid;
3958 + int i;
3959 +
3960 +- while ((memcg = parent_mem_cgroup(memcg))) {
3961 ++ for (; memcg; memcg = parent_mem_cgroup(memcg)) {
3962 + for_each_node(nid) {
3963 + for (zid = 0; zid < MAX_NR_ZONES; zid++) {
3964 + mz = &memcg->nodeinfo[nid]->zoneinfo[zid];
3965 +diff --git a/net/core/skbuff.c b/net/core/skbuff.c
3966 +index fa02c680eebc..55be076706e5 100644
3967 +--- a/net/core/skbuff.c
3968 ++++ b/net/core/skbuff.c
3969 +@@ -828,6 +828,7 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb)
3970 + n->cloned = 1;
3971 + n->nohdr = 0;
3972 + n->peeked = 0;
3973 ++ C(pfmemalloc);
3974 + n->destructor = NULL;
3975 + C(tail);
3976 + C(end);
3977 +diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
3978 +index c9e68ff48a72..8f05816a8be2 100644
3979 +--- a/net/ipv4/fib_frontend.c
3980 ++++ b/net/ipv4/fib_frontend.c
3981 +@@ -297,6 +297,7 @@ __be32 fib_compute_spec_dst(struct sk_buff *skb)
3982 + if (!ipv4_is_zeronet(ip_hdr(skb)->saddr)) {
3983 + struct flowi4 fl4 = {
3984 + .flowi4_iif = LOOPBACK_IFINDEX,
3985 ++ .flowi4_oif = l3mdev_master_ifindex_rcu(dev),
3986 + .daddr = ip_hdr(skb)->saddr,
3987 + .flowi4_tos = RT_TOS(ip_hdr(skb)->tos),
3988 + .flowi4_scope = scope,
3989 +diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
3990 +index 75abf978ef30..da90c74d12ef 100644
3991 +--- a/net/ipv4/sysctl_net_ipv4.c
3992 ++++ b/net/ipv4/sysctl_net_ipv4.c
3993 +@@ -141,8 +141,9 @@ static int ipv4_ping_group_range(struct ctl_table *table, int write,
3994 + if (write && ret == 0) {
3995 + low = make_kgid(user_ns, urange[0]);
3996 + high = make_kgid(user_ns, urange[1]);
3997 +- if (!gid_valid(low) || !gid_valid(high) ||
3998 +- (urange[1] < urange[0]) || gid_lt(high, low)) {
3999 ++ if (!gid_valid(low) || !gid_valid(high))
4000 ++ return -EINVAL;
4001 ++ if (urange[1] < urange[0] || gid_lt(high, low)) {
4002 + low = make_kgid(&init_user_ns, 1);
4003 + high = make_kgid(&init_user_ns, 0);
4004 + }
4005 +diff --git a/sound/core/rawmidi.c b/sound/core/rawmidi.c
4006 +index 16f8124b1150..59111cadaec2 100644
4007 +--- a/sound/core/rawmidi.c
4008 ++++ b/sound/core/rawmidi.c
4009 +@@ -635,7 +635,7 @@ static int snd_rawmidi_info_select_user(struct snd_card *card,
4010 + int snd_rawmidi_output_params(struct snd_rawmidi_substream *substream,
4011 + struct snd_rawmidi_params * params)
4012 + {
4013 +- char *newbuf;
4014 ++ char *newbuf, *oldbuf;
4015 + struct snd_rawmidi_runtime *runtime = substream->runtime;
4016 +
4017 + if (substream->append && substream->use_count > 1)
4018 +@@ -648,13 +648,17 @@ int snd_rawmidi_output_params(struct snd_rawmidi_substream *substream,
4019 + return -EINVAL;
4020 + }
4021 + if (params->buffer_size != runtime->buffer_size) {
4022 +- newbuf = krealloc(runtime->buffer, params->buffer_size,
4023 +- GFP_KERNEL);
4024 ++ newbuf = kmalloc(params->buffer_size, GFP_KERNEL);
4025 + if (!newbuf)
4026 + return -ENOMEM;
4027 ++ spin_lock_irq(&runtime->lock);
4028 ++ oldbuf = runtime->buffer;
4029 + runtime->buffer = newbuf;
4030 + runtime->buffer_size = params->buffer_size;
4031 + runtime->avail = runtime->buffer_size;
4032 ++ runtime->appl_ptr = runtime->hw_ptr = 0;
4033 ++ spin_unlock_irq(&runtime->lock);
4034 ++ kfree(oldbuf);
4035 + }
4036 + runtime->avail_min = params->avail_min;
4037 + substream->active_sensing = !params->no_active_sensing;
4038 +@@ -665,7 +669,7 @@ EXPORT_SYMBOL(snd_rawmidi_output_params);
4039 + int snd_rawmidi_input_params(struct snd_rawmidi_substream *substream,
4040 + struct snd_rawmidi_params * params)
4041 + {
4042 +- char *newbuf;
4043 ++ char *newbuf, *oldbuf;
4044 + struct snd_rawmidi_runtime *runtime = substream->runtime;
4045 +
4046 + snd_rawmidi_drain_input(substream);
4047 +@@ -676,12 +680,16 @@ int snd_rawmidi_input_params(struct snd_rawmidi_substream *substream,
4048 + return -EINVAL;
4049 + }
4050 + if (params->buffer_size != runtime->buffer_size) {
4051 +- newbuf = krealloc(runtime->buffer, params->buffer_size,
4052 +- GFP_KERNEL);
4053 ++ newbuf = kmalloc(params->buffer_size, GFP_KERNEL);
4054 + if (!newbuf)
4055 + return -ENOMEM;
4056 ++ spin_lock_irq(&runtime->lock);
4057 ++ oldbuf = runtime->buffer;
4058 + runtime->buffer = newbuf;
4059 + runtime->buffer_size = params->buffer_size;
4060 ++ runtime->appl_ptr = runtime->hw_ptr = 0;
4061 ++ spin_unlock_irq(&runtime->lock);
4062 ++ kfree(oldbuf);
4063 + }
4064 + runtime->avail_min = params->avail_min;
4065 + return 0;
4066 +diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
4067 +index 882fe83a3554..b3f345433ec7 100644
4068 +--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
4069 ++++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
4070 +@@ -1476,15 +1476,19 @@ TEST_F(TRACE_syscall, syscall_dropped)
4071 + #define SECCOMP_SET_MODE_FILTER 1
4072 + #endif
4073 +
4074 +-#ifndef SECCOMP_FLAG_FILTER_TSYNC
4075 +-#define SECCOMP_FLAG_FILTER_TSYNC 1
4076 ++#ifndef SECCOMP_FILTER_FLAG_TSYNC
4077 ++#define SECCOMP_FILTER_FLAG_TSYNC (1UL << 0)
4078 ++#endif
4079 ++
4080 ++#ifndef SECCOMP_FILTER_FLAG_SPEC_ALLOW
4081 ++#define SECCOMP_FILTER_FLAG_SPEC_ALLOW (1UL << 2)
4082 + #endif
4083 +
4084 + #ifndef seccomp
4085 +-int seccomp(unsigned int op, unsigned int flags, struct sock_fprog *filter)
4086 ++int seccomp(unsigned int op, unsigned int flags, void *args)
4087 + {
4088 + errno = 0;
4089 +- return syscall(__NR_seccomp, op, flags, filter);
4090 ++ return syscall(__NR_seccomp, op, flags, args);
4091 + }
4092 + #endif
4093 +
4094 +@@ -1576,6 +1580,78 @@ TEST(seccomp_syscall_mode_lock)
4095 + }
4096 + }
4097 +
4098 ++/*
4099 ++ * Test detection of known and unknown filter flags. Userspace needs to be able
4100 ++ * to check if a filter flag is supported by the current kernel and a good way
4101 ++ * of doing that is by attempting to enter filter mode, with the flag bit in
4102 ++ * question set, and a NULL pointer for the _args_ parameter. EFAULT indicates
4103 ++ * that the flag is valid and EINVAL indicates that the flag is invalid.
4104 ++ */
4105 ++TEST(detect_seccomp_filter_flags)
4106 ++{
4107 ++ unsigned int flags[] = { SECCOMP_FILTER_FLAG_TSYNC,
4108 ++ SECCOMP_FILTER_FLAG_SPEC_ALLOW };
4109 ++ unsigned int flag, all_flags;
4110 ++ int i;
4111 ++ long ret;
4112 ++
4113 ++ /* Test detection of known-good filter flags */
4114 ++ for (i = 0, all_flags = 0; i < ARRAY_SIZE(flags); i++) {
4115 ++ int bits = 0;
4116 ++
4117 ++ flag = flags[i];
4118 ++ /* Make sure the flag is a single bit! */
4119 ++ while (flag) {
4120 ++ if (flag & 0x1)
4121 ++ bits ++;
4122 ++ flag >>= 1;
4123 ++ }
4124 ++ ASSERT_EQ(1, bits);
4125 ++ flag = flags[i];
4126 ++
4127 ++ ret = seccomp(SECCOMP_SET_MODE_FILTER, flag, NULL);
4128 ++ ASSERT_NE(ENOSYS, errno) {
4129 ++ TH_LOG("Kernel does not support seccomp syscall!");
4130 ++ }
4131 ++ EXPECT_EQ(-1, ret);
4132 ++ EXPECT_EQ(EFAULT, errno) {
4133 ++ TH_LOG("Failed to detect that a known-good filter flag (0x%X) is supported!",
4134 ++ flag);
4135 ++ }
4136 ++
4137 ++ all_flags |= flag;
4138 ++ }
4139 ++
4140 ++ /* Test detection of all known-good filter flags */
4141 ++ ret = seccomp(SECCOMP_SET_MODE_FILTER, all_flags, NULL);
4142 ++ EXPECT_EQ(-1, ret);
4143 ++ EXPECT_EQ(EFAULT, errno) {
4144 ++ TH_LOG("Failed to detect that all known-good filter flags (0x%X) are supported!",
4145 ++ all_flags);
4146 ++ }
4147 ++
4148 ++ /* Test detection of an unknown filter flag */
4149 ++ flag = -1;
4150 ++ ret = seccomp(SECCOMP_SET_MODE_FILTER, flag, NULL);
4151 ++ EXPECT_EQ(-1, ret);
4152 ++ EXPECT_EQ(EINVAL, errno) {
4153 ++ TH_LOG("Failed to detect that an unknown filter flag (0x%X) is unsupported!",
4154 ++ flag);
4155 ++ }
4156 ++
4157 ++ /*
4158 ++ * Test detection of an unknown filter flag that may simply need to be
4159 ++ * added to this test
4160 ++ */
4161 ++ flag = flags[ARRAY_SIZE(flags) - 1] << 1;
4162 ++ ret = seccomp(SECCOMP_SET_MODE_FILTER, flag, NULL);
4163 ++ EXPECT_EQ(-1, ret);
4164 ++ EXPECT_EQ(EINVAL, errno) {
4165 ++ TH_LOG("Failed to detect that an unknown filter flag (0x%X) is unsupported! Does a new flag need to be added to this test?",
4166 ++ flag);
4167 ++ }
4168 ++}
4169 ++
4170 + TEST(TSYNC_first)
4171 + {
4172 + struct sock_filter filter[] = {
4173 +@@ -1592,7 +1668,7 @@ TEST(TSYNC_first)
4174 + TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!");
4175 + }
4176 +
4177 +- ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
4178 ++ ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
4179 + &prog);
4180 + ASSERT_NE(ENOSYS, errno) {
4181 + TH_LOG("Kernel does not support seccomp syscall!");
4182 +@@ -1810,7 +1886,7 @@ TEST_F(TSYNC, two_siblings_with_ancestor)
4183 + self->sibling_count++;
4184 + }
4185 +
4186 +- ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
4187 ++ ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
4188 + &self->apply_prog);
4189 + ASSERT_EQ(0, ret) {
4190 + TH_LOG("Could install filter on all threads!");
4191 +@@ -1871,7 +1947,7 @@ TEST_F(TSYNC, two_siblings_with_no_filter)
4192 + TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!");
4193 + }
4194 +
4195 +- ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
4196 ++ ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
4197 + &self->apply_prog);
4198 + ASSERT_NE(ENOSYS, errno) {
4199 + TH_LOG("Kernel does not support seccomp syscall!");
4200 +@@ -1919,7 +1995,7 @@ TEST_F(TSYNC, two_siblings_with_one_divergence)
4201 + self->sibling_count++;
4202 + }
4203 +
4204 +- ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
4205 ++ ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
4206 + &self->apply_prog);
4207 + ASSERT_EQ(self->sibling[0].system_tid, ret) {
4208 + TH_LOG("Did not fail on diverged sibling.");
4209 +@@ -1971,7 +2047,7 @@ TEST_F(TSYNC, two_siblings_not_under_filter)
4210 + TH_LOG("Kernel does not support SECCOMP_SET_MODE_FILTER!");
4211 + }
4212 +
4213 +- ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
4214 ++ ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
4215 + &self->apply_prog);
4216 + ASSERT_EQ(ret, self->sibling[0].system_tid) {
4217 + TH_LOG("Did not fail on diverged sibling.");
4218 +@@ -2000,7 +2076,7 @@ TEST_F(TSYNC, two_siblings_not_under_filter)
4219 + /* Switch to the remaining sibling */
4220 + sib = !sib;
4221 +
4222 +- ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
4223 ++ ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
4224 + &self->apply_prog);
4225 + ASSERT_EQ(0, ret) {
4226 + TH_LOG("Expected the remaining sibling to sync");
4227 +@@ -2023,7 +2099,7 @@ TEST_F(TSYNC, two_siblings_not_under_filter)
4228 + while (!kill(self->sibling[sib].system_tid, 0))
4229 + sleep(0.1);
4230 +
4231 +- ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC,
4232 ++ ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
4233 + &self->apply_prog);
4234 + ASSERT_EQ(0, ret); /* just us chickens */
4235 + }
4236 +diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
4237 +index 49001fa84ead..1203829316b2 100644
4238 +--- a/virt/kvm/eventfd.c
4239 ++++ b/virt/kvm/eventfd.c
4240 +@@ -119,8 +119,12 @@ irqfd_shutdown(struct work_struct *work)
4241 + {
4242 + struct kvm_kernel_irqfd *irqfd =
4243 + container_of(work, struct kvm_kernel_irqfd, shutdown);
4244 ++ struct kvm *kvm = irqfd->kvm;
4245 + u64 cnt;
4246 +
4247 ++ /* Make sure irqfd has been initalized in assign path. */
4248 ++ synchronize_srcu(&kvm->irq_srcu);
4249 ++
4250 + /*
4251 + * Synchronize with the wait-queue and unhook ourselves to prevent
4252 + * further events.
4253 +@@ -387,7 +391,6 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
4254 +
4255 + idx = srcu_read_lock(&kvm->irq_srcu);
4256 + irqfd_update(kvm, irqfd);
4257 +- srcu_read_unlock(&kvm->irq_srcu, idx);
4258 +
4259 + list_add_tail(&irqfd->list, &kvm->irqfds.items);
4260 +
4261 +@@ -419,6 +422,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
4262 + irqfd->consumer.token, ret);
4263 + #endif
4264 +
4265 ++ srcu_read_unlock(&kvm->irq_srcu, idx);
4266 + return 0;
4267 +
4268 + fail:
4269
4270 diff --git a/1144_linux-4.4.145.patch b/1144_linux-4.4.145.patch
4271 new file mode 100644
4272 index 0000000..f7b3f94
4273 --- /dev/null
4274 +++ b/1144_linux-4.4.145.patch
4275 @@ -0,0 +1,1006 @@
4276 +diff --git a/Makefile b/Makefile
4277 +index 63f3e2438a26..be31491a2d67 100644
4278 +--- a/Makefile
4279 ++++ b/Makefile
4280 +@@ -1,6 +1,6 @@
4281 + VERSION = 4
4282 + PATCHLEVEL = 4
4283 +-SUBLEVEL = 144
4284 ++SUBLEVEL = 145
4285 + EXTRAVERSION =
4286 + NAME = Blurry Fish Butt
4287 +
4288 +@@ -624,6 +624,7 @@ KBUILD_CFLAGS += $(call cc-disable-warning,frame-address,)
4289 + KBUILD_CFLAGS += $(call cc-disable-warning, format-truncation)
4290 + KBUILD_CFLAGS += $(call cc-disable-warning, format-overflow)
4291 + KBUILD_CFLAGS += $(call cc-disable-warning, int-in-bool-context)
4292 ++KBUILD_CFLAGS += $(call cc-disable-warning, attribute-alias)
4293 +
4294 + ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
4295 + KBUILD_CFLAGS += -Os
4296 +diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h
4297 +index 35c9db857ebe..cd8b589111ba 100644
4298 +--- a/arch/arm/include/asm/uaccess.h
4299 ++++ b/arch/arm/include/asm/uaccess.h
4300 +@@ -251,7 +251,7 @@ extern int __put_user_8(void *, unsigned long long);
4301 + ({ \
4302 + unsigned long __limit = current_thread_info()->addr_limit - 1; \
4303 + const typeof(*(p)) __user *__tmp_p = (p); \
4304 +- register const typeof(*(p)) __r2 asm("r2") = (x); \
4305 ++ register typeof(*(p)) __r2 asm("r2") = (x); \
4306 + register const typeof(*(p)) __user *__p asm("r0") = __tmp_p; \
4307 + register unsigned long __l asm("r1") = __limit; \
4308 + register int __e asm("r0"); \
4309 +diff --git a/arch/mips/ath79/common.c b/arch/mips/ath79/common.c
4310 +index 8ae4067a5eda..40ecb6e700cd 100644
4311 +--- a/arch/mips/ath79/common.c
4312 ++++ b/arch/mips/ath79/common.c
4313 +@@ -58,7 +58,7 @@ EXPORT_SYMBOL_GPL(ath79_ddr_ctrl_init);
4314 +
4315 + void ath79_ddr_wb_flush(u32 reg)
4316 + {
4317 +- void __iomem *flush_reg = ath79_ddr_wb_flush_base + reg;
4318 ++ void __iomem *flush_reg = ath79_ddr_wb_flush_base + (reg * 4);
4319 +
4320 + /* Flush the DDR write buffer. */
4321 + __raw_writel(0x1, flush_reg);
4322 +diff --git a/drivers/base/dd.c b/drivers/base/dd.c
4323 +index a641cf3ccad6..1dffb018a7fe 100644
4324 +--- a/drivers/base/dd.c
4325 ++++ b/drivers/base/dd.c
4326 +@@ -304,14 +304,6 @@ static int really_probe(struct device *dev, struct device_driver *drv)
4327 + goto probe_failed;
4328 + }
4329 +
4330 +- /*
4331 +- * Ensure devices are listed in devices_kset in correct order
4332 +- * It's important to move Dev to the end of devices_kset before
4333 +- * calling .probe, because it could be recursive and parent Dev
4334 +- * should always go first
4335 +- */
4336 +- devices_kset_move_last(dev);
4337 +-
4338 + if (dev->bus->probe) {
4339 + ret = dev->bus->probe(dev);
4340 + if (ret)
4341 +diff --git a/drivers/net/can/xilinx_can.c b/drivers/net/can/xilinx_can.c
4342 +index 51670b322409..700b98d9c250 100644
4343 +--- a/drivers/net/can/xilinx_can.c
4344 ++++ b/drivers/net/can/xilinx_can.c
4345 +@@ -2,6 +2,7 @@
4346 + *
4347 + * Copyright (C) 2012 - 2014 Xilinx, Inc.
4348 + * Copyright (C) 2009 PetaLogix. All rights reserved.
4349 ++ * Copyright (C) 2017 Sandvik Mining and Construction Oy
4350 + *
4351 + * Description:
4352 + * This driver is developed for Axi CAN IP and for Zynq CANPS Controller.
4353 +@@ -25,8 +26,10 @@
4354 + #include <linux/module.h>
4355 + #include <linux/netdevice.h>
4356 + #include <linux/of.h>
4357 ++#include <linux/of_device.h>
4358 + #include <linux/platform_device.h>
4359 + #include <linux/skbuff.h>
4360 ++#include <linux/spinlock.h>
4361 + #include <linux/string.h>
4362 + #include <linux/types.h>
4363 + #include <linux/can/dev.h>
4364 +@@ -100,7 +103,7 @@ enum xcan_reg {
4365 + #define XCAN_INTR_ALL (XCAN_IXR_TXOK_MASK | XCAN_IXR_BSOFF_MASK |\
4366 + XCAN_IXR_WKUP_MASK | XCAN_IXR_SLP_MASK | \
4367 + XCAN_IXR_RXNEMP_MASK | XCAN_IXR_ERROR_MASK | \
4368 +- XCAN_IXR_ARBLST_MASK | XCAN_IXR_RXOK_MASK)
4369 ++ XCAN_IXR_RXOFLW_MASK | XCAN_IXR_ARBLST_MASK)
4370 +
4371 + /* CAN register bit shift - XCAN_<REG>_<BIT>_SHIFT */
4372 + #define XCAN_BTR_SJW_SHIFT 7 /* Synchronous jump width */
4373 +@@ -117,6 +120,7 @@ enum xcan_reg {
4374 + /**
4375 + * struct xcan_priv - This definition define CAN driver instance
4376 + * @can: CAN private data structure.
4377 ++ * @tx_lock: Lock for synchronizing TX interrupt handling
4378 + * @tx_head: Tx CAN packets ready to send on the queue
4379 + * @tx_tail: Tx CAN packets successfully sended on the queue
4380 + * @tx_max: Maximum number packets the driver can send
4381 +@@ -131,6 +135,7 @@ enum xcan_reg {
4382 + */
4383 + struct xcan_priv {
4384 + struct can_priv can;
4385 ++ spinlock_t tx_lock;
4386 + unsigned int tx_head;
4387 + unsigned int tx_tail;
4388 + unsigned int tx_max;
4389 +@@ -158,6 +163,11 @@ static const struct can_bittiming_const xcan_bittiming_const = {
4390 + .brp_inc = 1,
4391 + };
4392 +
4393 ++#define XCAN_CAP_WATERMARK 0x0001
4394 ++struct xcan_devtype_data {
4395 ++ unsigned int caps;
4396 ++};
4397 ++
4398 + /**
4399 + * xcan_write_reg_le - Write a value to the device register little endian
4400 + * @priv: Driver private data structure
4401 +@@ -237,6 +247,10 @@ static int set_reset_mode(struct net_device *ndev)
4402 + usleep_range(500, 10000);
4403 + }
4404 +
4405 ++ /* reset clears FIFOs */
4406 ++ priv->tx_head = 0;
4407 ++ priv->tx_tail = 0;
4408 ++
4409 + return 0;
4410 + }
4411 +
4412 +@@ -391,6 +405,7 @@ static int xcan_start_xmit(struct sk_buff *skb, struct net_device *ndev)
4413 + struct net_device_stats *stats = &ndev->stats;
4414 + struct can_frame *cf = (struct can_frame *)skb->data;
4415 + u32 id, dlc, data[2] = {0, 0};
4416 ++ unsigned long flags;
4417 +
4418 + if (can_dropped_invalid_skb(ndev, skb))
4419 + return NETDEV_TX_OK;
4420 +@@ -438,6 +453,9 @@ static int xcan_start_xmit(struct sk_buff *skb, struct net_device *ndev)
4421 + data[1] = be32_to_cpup((__be32 *)(cf->data + 4));
4422 +
4423 + can_put_echo_skb(skb, ndev, priv->tx_head % priv->tx_max);
4424 ++
4425 ++ spin_lock_irqsave(&priv->tx_lock, flags);
4426 ++
4427 + priv->tx_head++;
4428 +
4429 + /* Write the Frame to Xilinx CAN TX FIFO */
4430 +@@ -453,10 +471,16 @@ static int xcan_start_xmit(struct sk_buff *skb, struct net_device *ndev)
4431 + stats->tx_bytes += cf->can_dlc;
4432 + }
4433 +
4434 ++ /* Clear TX-FIFO-empty interrupt for xcan_tx_interrupt() */
4435 ++ if (priv->tx_max > 1)
4436 ++ priv->write_reg(priv, XCAN_ICR_OFFSET, XCAN_IXR_TXFEMP_MASK);
4437 ++
4438 + /* Check if the TX buffer is full */
4439 + if ((priv->tx_head - priv->tx_tail) == priv->tx_max)
4440 + netif_stop_queue(ndev);
4441 +
4442 ++ spin_unlock_irqrestore(&priv->tx_lock, flags);
4443 ++
4444 + return NETDEV_TX_OK;
4445 + }
4446 +
4447 +@@ -528,6 +552,123 @@ static int xcan_rx(struct net_device *ndev)
4448 + return 1;
4449 + }
4450 +
4451 ++/**
4452 ++ * xcan_current_error_state - Get current error state from HW
4453 ++ * @ndev: Pointer to net_device structure
4454 ++ *
4455 ++ * Checks the current CAN error state from the HW. Note that this
4456 ++ * only checks for ERROR_PASSIVE and ERROR_WARNING.
4457 ++ *
4458 ++ * Return:
4459 ++ * ERROR_PASSIVE or ERROR_WARNING if either is active, ERROR_ACTIVE
4460 ++ * otherwise.
4461 ++ */
4462 ++static enum can_state xcan_current_error_state(struct net_device *ndev)
4463 ++{
4464 ++ struct xcan_priv *priv = netdev_priv(ndev);
4465 ++ u32 status = priv->read_reg(priv, XCAN_SR_OFFSET);
4466 ++
4467 ++ if ((status & XCAN_SR_ESTAT_MASK) == XCAN_SR_ESTAT_MASK)
4468 ++ return CAN_STATE_ERROR_PASSIVE;
4469 ++ else if (status & XCAN_SR_ERRWRN_MASK)
4470 ++ return CAN_STATE_ERROR_WARNING;
4471 ++ else
4472 ++ return CAN_STATE_ERROR_ACTIVE;
4473 ++}
4474 ++
4475 ++/**
4476 ++ * xcan_set_error_state - Set new CAN error state
4477 ++ * @ndev: Pointer to net_device structure
4478 ++ * @new_state: The new CAN state to be set
4479 ++ * @cf: Error frame to be populated or NULL
4480 ++ *
4481 ++ * Set new CAN error state for the device, updating statistics and
4482 ++ * populating the error frame if given.
4483 ++ */
4484 ++static void xcan_set_error_state(struct net_device *ndev,
4485 ++ enum can_state new_state,
4486 ++ struct can_frame *cf)
4487 ++{
4488 ++ struct xcan_priv *priv = netdev_priv(ndev);
4489 ++ u32 ecr = priv->read_reg(priv, XCAN_ECR_OFFSET);
4490 ++ u32 txerr = ecr & XCAN_ECR_TEC_MASK;
4491 ++ u32 rxerr = (ecr & XCAN_ECR_REC_MASK) >> XCAN_ESR_REC_SHIFT;
4492 ++
4493 ++ priv->can.state = new_state;
4494 ++
4495 ++ if (cf) {
4496 ++ cf->can_id |= CAN_ERR_CRTL;
4497 ++ cf->data[6] = txerr;
4498 ++ cf->data[7] = rxerr;
4499 ++ }
4500 ++
4501 ++ switch (new_state) {
4502 ++ case CAN_STATE_ERROR_PASSIVE:
4503 ++ priv->can.can_stats.error_passive++;
4504 ++ if (cf)
4505 ++ cf->data[1] = (rxerr > 127) ?
4506 ++ CAN_ERR_CRTL_RX_PASSIVE :
4507 ++ CAN_ERR_CRTL_TX_PASSIVE;
4508 ++ break;
4509 ++ case CAN_STATE_ERROR_WARNING:
4510 ++ priv->can.can_stats.error_warning++;
4511 ++ if (cf)
4512 ++ cf->data[1] |= (txerr > rxerr) ?
4513 ++ CAN_ERR_CRTL_TX_WARNING :
4514 ++ CAN_ERR_CRTL_RX_WARNING;
4515 ++ break;
4516 ++ case CAN_STATE_ERROR_ACTIVE:
4517 ++ if (cf)
4518 ++ cf->data[1] |= CAN_ERR_CRTL_ACTIVE;
4519 ++ break;
4520 ++ default:
4521 ++ /* non-ERROR states are handled elsewhere */
4522 ++ WARN_ON(1);
4523 ++ break;
4524 ++ }
4525 ++}
4526 ++
4527 ++/**
4528 ++ * xcan_update_error_state_after_rxtx - Update CAN error state after RX/TX
4529 ++ * @ndev: Pointer to net_device structure
4530 ++ *
4531 ++ * If the device is in a ERROR-WARNING or ERROR-PASSIVE state, check if
4532 ++ * the performed RX/TX has caused it to drop to a lesser state and set
4533 ++ * the interface state accordingly.
4534 ++ */
4535 ++static void xcan_update_error_state_after_rxtx(struct net_device *ndev)
4536 ++{
4537 ++ struct xcan_priv *priv = netdev_priv(ndev);
4538 ++ enum can_state old_state = priv->can.state;
4539 ++ enum can_state new_state;
4540 ++
4541 ++ /* changing error state due to successful frame RX/TX can only
4542 ++ * occur from these states
4543 ++ */
4544 ++ if (old_state != CAN_STATE_ERROR_WARNING &&
4545 ++ old_state != CAN_STATE_ERROR_PASSIVE)
4546 ++ return;
4547 ++
4548 ++ new_state = xcan_current_error_state(ndev);
4549 ++
4550 ++ if (new_state != old_state) {
4551 ++ struct sk_buff *skb;
4552 ++ struct can_frame *cf;
4553 ++
4554 ++ skb = alloc_can_err_skb(ndev, &cf);
4555 ++
4556 ++ xcan_set_error_state(ndev, new_state, skb ? cf : NULL);
4557 ++
4558 ++ if (skb) {
4559 ++ struct net_device_stats *stats = &ndev->stats;
4560 ++
4561 ++ stats->rx_packets++;
4562 ++ stats->rx_bytes += cf->can_dlc;
4563 ++ netif_rx(skb);
4564 ++ }
4565 ++ }
4566 ++}
4567 ++
4568 + /**
4569 + * xcan_err_interrupt - error frame Isr
4570 + * @ndev: net_device pointer
4571 +@@ -543,16 +684,12 @@ static void xcan_err_interrupt(struct net_device *ndev, u32 isr)
4572 + struct net_device_stats *stats = &ndev->stats;
4573 + struct can_frame *cf;
4574 + struct sk_buff *skb;
4575 +- u32 err_status, status, txerr = 0, rxerr = 0;
4576 ++ u32 err_status;
4577 +
4578 + skb = alloc_can_err_skb(ndev, &cf);
4579 +
4580 + err_status = priv->read_reg(priv, XCAN_ESR_OFFSET);
4581 + priv->write_reg(priv, XCAN_ESR_OFFSET, err_status);
4582 +- txerr = priv->read_reg(priv, XCAN_ECR_OFFSET) & XCAN_ECR_TEC_MASK;
4583 +- rxerr = ((priv->read_reg(priv, XCAN_ECR_OFFSET) &
4584 +- XCAN_ECR_REC_MASK) >> XCAN_ESR_REC_SHIFT);
4585 +- status = priv->read_reg(priv, XCAN_SR_OFFSET);
4586 +
4587 + if (isr & XCAN_IXR_BSOFF_MASK) {
4588 + priv->can.state = CAN_STATE_BUS_OFF;
4589 +@@ -562,28 +699,10 @@ static void xcan_err_interrupt(struct net_device *ndev, u32 isr)
4590 + can_bus_off(ndev);
4591 + if (skb)
4592 + cf->can_id |= CAN_ERR_BUSOFF;
4593 +- } else if ((status & XCAN_SR_ESTAT_MASK) == XCAN_SR_ESTAT_MASK) {
4594 +- priv->can.state = CAN_STATE_ERROR_PASSIVE;
4595 +- priv->can.can_stats.error_passive++;
4596 +- if (skb) {
4597 +- cf->can_id |= CAN_ERR_CRTL;
4598 +- cf->data[1] = (rxerr > 127) ?
4599 +- CAN_ERR_CRTL_RX_PASSIVE :
4600 +- CAN_ERR_CRTL_TX_PASSIVE;
4601 +- cf->data[6] = txerr;
4602 +- cf->data[7] = rxerr;
4603 +- }
4604 +- } else if (status & XCAN_SR_ERRWRN_MASK) {
4605 +- priv->can.state = CAN_STATE_ERROR_WARNING;
4606 +- priv->can.can_stats.error_warning++;
4607 +- if (skb) {
4608 +- cf->can_id |= CAN_ERR_CRTL;
4609 +- cf->data[1] |= (txerr > rxerr) ?
4610 +- CAN_ERR_CRTL_TX_WARNING :
4611 +- CAN_ERR_CRTL_RX_WARNING;
4612 +- cf->data[6] = txerr;
4613 +- cf->data[7] = rxerr;
4614 +- }
4615 ++ } else {
4616 ++ enum can_state new_state = xcan_current_error_state(ndev);
4617 ++
4618 ++ xcan_set_error_state(ndev, new_state, skb ? cf : NULL);
4619 + }
4620 +
4621 + /* Check for Arbitration lost interrupt */
4622 +@@ -599,7 +718,6 @@ static void xcan_err_interrupt(struct net_device *ndev, u32 isr)
4623 + if (isr & XCAN_IXR_RXOFLW_MASK) {
4624 + stats->rx_over_errors++;
4625 + stats->rx_errors++;
4626 +- priv->write_reg(priv, XCAN_SRR_OFFSET, XCAN_SRR_RESET_MASK);
4627 + if (skb) {
4628 + cf->can_id |= CAN_ERR_CRTL;
4629 + cf->data[1] |= CAN_ERR_CRTL_RX_OVERFLOW;
4630 +@@ -708,26 +826,20 @@ static int xcan_rx_poll(struct napi_struct *napi, int quota)
4631 +
4632 + isr = priv->read_reg(priv, XCAN_ISR_OFFSET);
4633 + while ((isr & XCAN_IXR_RXNEMP_MASK) && (work_done < quota)) {
4634 +- if (isr & XCAN_IXR_RXOK_MASK) {
4635 +- priv->write_reg(priv, XCAN_ICR_OFFSET,
4636 +- XCAN_IXR_RXOK_MASK);
4637 +- work_done += xcan_rx(ndev);
4638 +- } else {
4639 +- priv->write_reg(priv, XCAN_ICR_OFFSET,
4640 +- XCAN_IXR_RXNEMP_MASK);
4641 +- break;
4642 +- }
4643 ++ work_done += xcan_rx(ndev);
4644 + priv->write_reg(priv, XCAN_ICR_OFFSET, XCAN_IXR_RXNEMP_MASK);
4645 + isr = priv->read_reg(priv, XCAN_ISR_OFFSET);
4646 + }
4647 +
4648 +- if (work_done)
4649 ++ if (work_done) {
4650 + can_led_event(ndev, CAN_LED_EVENT_RX);
4651 ++ xcan_update_error_state_after_rxtx(ndev);
4652 ++ }
4653 +
4654 + if (work_done < quota) {
4655 + napi_complete(napi);
4656 + ier = priv->read_reg(priv, XCAN_IER_OFFSET);
4657 +- ier |= (XCAN_IXR_RXOK_MASK | XCAN_IXR_RXNEMP_MASK);
4658 ++ ier |= XCAN_IXR_RXNEMP_MASK;
4659 + priv->write_reg(priv, XCAN_IER_OFFSET, ier);
4660 + }
4661 + return work_done;
4662 +@@ -742,18 +854,71 @@ static void xcan_tx_interrupt(struct net_device *ndev, u32 isr)
4663 + {
4664 + struct xcan_priv *priv = netdev_priv(ndev);
4665 + struct net_device_stats *stats = &ndev->stats;
4666 ++ unsigned int frames_in_fifo;
4667 ++ int frames_sent = 1; /* TXOK => at least 1 frame was sent */
4668 ++ unsigned long flags;
4669 ++ int retries = 0;
4670 ++
4671 ++ /* Synchronize with xmit as we need to know the exact number
4672 ++ * of frames in the FIFO to stay in sync due to the TXFEMP
4673 ++ * handling.
4674 ++ * This also prevents a race between netif_wake_queue() and
4675 ++ * netif_stop_queue().
4676 ++ */
4677 ++ spin_lock_irqsave(&priv->tx_lock, flags);
4678 +
4679 +- while ((priv->tx_head - priv->tx_tail > 0) &&
4680 +- (isr & XCAN_IXR_TXOK_MASK)) {
4681 ++ frames_in_fifo = priv->tx_head - priv->tx_tail;
4682 ++
4683 ++ if (WARN_ON_ONCE(frames_in_fifo == 0)) {
4684 ++ /* clear TXOK anyway to avoid getting back here */
4685 + priv->write_reg(priv, XCAN_ICR_OFFSET, XCAN_IXR_TXOK_MASK);
4686 ++ spin_unlock_irqrestore(&priv->tx_lock, flags);
4687 ++ return;
4688 ++ }
4689 ++
4690 ++ /* Check if 2 frames were sent (TXOK only means that at least 1
4691 ++ * frame was sent).
4692 ++ */
4693 ++ if (frames_in_fifo > 1) {
4694 ++ WARN_ON(frames_in_fifo > priv->tx_max);
4695 ++
4696 ++ /* Synchronize TXOK and isr so that after the loop:
4697 ++ * (1) isr variable is up-to-date at least up to TXOK clear
4698 ++ * time. This avoids us clearing a TXOK of a second frame
4699 ++ * but not noticing that the FIFO is now empty and thus
4700 ++ * marking only a single frame as sent.
4701 ++ * (2) No TXOK is left. Having one could mean leaving a
4702 ++ * stray TXOK as we might process the associated frame
4703 ++ * via TXFEMP handling as we read TXFEMP *after* TXOK
4704 ++ * clear to satisfy (1).
4705 ++ */
4706 ++ while ((isr & XCAN_IXR_TXOK_MASK) && !WARN_ON(++retries == 100)) {
4707 ++ priv->write_reg(priv, XCAN_ICR_OFFSET, XCAN_IXR_TXOK_MASK);
4708 ++ isr = priv->read_reg(priv, XCAN_ISR_OFFSET);
4709 ++ }
4710 ++
4711 ++ if (isr & XCAN_IXR_TXFEMP_MASK) {
4712 ++ /* nothing in FIFO anymore */
4713 ++ frames_sent = frames_in_fifo;
4714 ++ }
4715 ++ } else {
4716 ++ /* single frame in fifo, just clear TXOK */
4717 ++ priv->write_reg(priv, XCAN_ICR_OFFSET, XCAN_IXR_TXOK_MASK);
4718 ++ }
4719 ++
4720 ++ while (frames_sent--) {
4721 + can_get_echo_skb(ndev, priv->tx_tail %
4722 + priv->tx_max);
4723 + priv->tx_tail++;
4724 + stats->tx_packets++;
4725 +- isr = priv->read_reg(priv, XCAN_ISR_OFFSET);
4726 + }
4727 +- can_led_event(ndev, CAN_LED_EVENT_TX);
4728 ++
4729 + netif_wake_queue(ndev);
4730 ++
4731 ++ spin_unlock_irqrestore(&priv->tx_lock, flags);
4732 ++
4733 ++ can_led_event(ndev, CAN_LED_EVENT_TX);
4734 ++ xcan_update_error_state_after_rxtx(ndev);
4735 + }
4736 +
4737 + /**
4738 +@@ -772,6 +937,7 @@ static irqreturn_t xcan_interrupt(int irq, void *dev_id)
4739 + struct net_device *ndev = (struct net_device *)dev_id;
4740 + struct xcan_priv *priv = netdev_priv(ndev);
4741 + u32 isr, ier;
4742 ++ u32 isr_errors;
4743 +
4744 + /* Get the interrupt status from Xilinx CAN */
4745 + isr = priv->read_reg(priv, XCAN_ISR_OFFSET);
4746 +@@ -790,18 +956,17 @@ static irqreturn_t xcan_interrupt(int irq, void *dev_id)
4747 + xcan_tx_interrupt(ndev, isr);
4748 +
4749 + /* Check for the type of error interrupt and Processing it */
4750 +- if (isr & (XCAN_IXR_ERROR_MASK | XCAN_IXR_RXOFLW_MASK |
4751 +- XCAN_IXR_BSOFF_MASK | XCAN_IXR_ARBLST_MASK)) {
4752 +- priv->write_reg(priv, XCAN_ICR_OFFSET, (XCAN_IXR_ERROR_MASK |
4753 +- XCAN_IXR_RXOFLW_MASK | XCAN_IXR_BSOFF_MASK |
4754 +- XCAN_IXR_ARBLST_MASK));
4755 ++ isr_errors = isr & (XCAN_IXR_ERROR_MASK | XCAN_IXR_RXOFLW_MASK |
4756 ++ XCAN_IXR_BSOFF_MASK | XCAN_IXR_ARBLST_MASK);
4757 ++ if (isr_errors) {
4758 ++ priv->write_reg(priv, XCAN_ICR_OFFSET, isr_errors);
4759 + xcan_err_interrupt(ndev, isr);
4760 + }
4761 +
4762 + /* Check for the type of receive interrupt and Processing it */
4763 +- if (isr & (XCAN_IXR_RXNEMP_MASK | XCAN_IXR_RXOK_MASK)) {
4764 ++ if (isr & XCAN_IXR_RXNEMP_MASK) {
4765 + ier = priv->read_reg(priv, XCAN_IER_OFFSET);
4766 +- ier &= ~(XCAN_IXR_RXNEMP_MASK | XCAN_IXR_RXOK_MASK);
4767 ++ ier &= ~XCAN_IXR_RXNEMP_MASK;
4768 + priv->write_reg(priv, XCAN_IER_OFFSET, ier);
4769 + napi_schedule(&priv->napi);
4770 + }
4771 +@@ -1030,6 +1195,18 @@ static int __maybe_unused xcan_resume(struct device *dev)
4772 +
4773 + static SIMPLE_DEV_PM_OPS(xcan_dev_pm_ops, xcan_suspend, xcan_resume);
4774 +
4775 ++static const struct xcan_devtype_data xcan_zynq_data = {
4776 ++ .caps = XCAN_CAP_WATERMARK,
4777 ++};
4778 ++
4779 ++/* Match table for OF platform binding */
4780 ++static const struct of_device_id xcan_of_match[] = {
4781 ++ { .compatible = "xlnx,zynq-can-1.0", .data = &xcan_zynq_data },
4782 ++ { .compatible = "xlnx,axi-can-1.00.a", },
4783 ++ { /* end of list */ },
4784 ++};
4785 ++MODULE_DEVICE_TABLE(of, xcan_of_match);
4786 ++
4787 + /**
4788 + * xcan_probe - Platform registration call
4789 + * @pdev: Handle to the platform device structure
4790 +@@ -1044,8 +1221,10 @@ static int xcan_probe(struct platform_device *pdev)
4791 + struct resource *res; /* IO mem resources */
4792 + struct net_device *ndev;
4793 + struct xcan_priv *priv;
4794 ++ const struct of_device_id *of_id;
4795 ++ int caps = 0;
4796 + void __iomem *addr;
4797 +- int ret, rx_max, tx_max;
4798 ++ int ret, rx_max, tx_max, tx_fifo_depth;
4799 +
4800 + /* Get the virtual base address for the device */
4801 + res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
4802 +@@ -1055,7 +1234,8 @@ static int xcan_probe(struct platform_device *pdev)
4803 + goto err;
4804 + }
4805 +
4806 +- ret = of_property_read_u32(pdev->dev.of_node, "tx-fifo-depth", &tx_max);
4807 ++ ret = of_property_read_u32(pdev->dev.of_node, "tx-fifo-depth",
4808 ++ &tx_fifo_depth);
4809 + if (ret < 0)
4810 + goto err;
4811 +
4812 +@@ -1063,6 +1243,30 @@ static int xcan_probe(struct platform_device *pdev)
4813 + if (ret < 0)
4814 + goto err;
4815 +
4816 ++ of_id = of_match_device(xcan_of_match, &pdev->dev);
4817 ++ if (of_id) {
4818 ++ const struct xcan_devtype_data *devtype_data = of_id->data;
4819 ++
4820 ++ if (devtype_data)
4821 ++ caps = devtype_data->caps;
4822 ++ }
4823 ++
4824 ++ /* There is no way to directly figure out how many frames have been
4825 ++ * sent when the TXOK interrupt is processed. If watermark programming
4826 ++ * is supported, we can have 2 frames in the FIFO and use TXFEMP
4827 ++ * to determine if 1 or 2 frames have been sent.
4828 ++ * Theoretically we should be able to use TXFWMEMP to determine up
4829 ++ * to 3 frames, but it seems that after putting a second frame in the
4830 ++ * FIFO, with watermark at 2 frames, it can happen that TXFWMEMP (less
4831 ++ * than 2 frames in FIFO) is set anyway with no TXOK (a frame was
4832 ++ * sent), which is not a sensible state - possibly TXFWMEMP is not
4833 ++ * completely synchronized with the rest of the bits?
4834 ++ */
4835 ++ if (caps & XCAN_CAP_WATERMARK)
4836 ++ tx_max = min(tx_fifo_depth, 2);
4837 ++ else
4838 ++ tx_max = 1;
4839 ++
4840 + /* Create a CAN device instance */
4841 + ndev = alloc_candev(sizeof(struct xcan_priv), tx_max);
4842 + if (!ndev)
4843 +@@ -1077,6 +1281,7 @@ static int xcan_probe(struct platform_device *pdev)
4844 + CAN_CTRLMODE_BERR_REPORTING;
4845 + priv->reg_base = addr;
4846 + priv->tx_max = tx_max;
4847 ++ spin_lock_init(&priv->tx_lock);
4848 +
4849 + /* Get IRQ for the device */
4850 + ndev->irq = platform_get_irq(pdev, 0);
4851 +@@ -1144,9 +1349,9 @@ static int xcan_probe(struct platform_device *pdev)
4852 + devm_can_led_init(ndev);
4853 + clk_disable_unprepare(priv->bus_clk);
4854 + clk_disable_unprepare(priv->can_clk);
4855 +- netdev_dbg(ndev, "reg_base=0x%p irq=%d clock=%d, tx fifo depth:%d\n",
4856 ++ netdev_dbg(ndev, "reg_base=0x%p irq=%d clock=%d, tx fifo depth: actual %d, using %d\n",
4857 + priv->reg_base, ndev->irq, priv->can.clock.freq,
4858 +- priv->tx_max);
4859 ++ tx_fifo_depth, priv->tx_max);
4860 +
4861 + return 0;
4862 +
4863 +@@ -1182,14 +1387,6 @@ static int xcan_remove(struct platform_device *pdev)
4864 + return 0;
4865 + }
4866 +
4867 +-/* Match table for OF platform binding */
4868 +-static const struct of_device_id xcan_of_match[] = {
4869 +- { .compatible = "xlnx,zynq-can-1.0", },
4870 +- { .compatible = "xlnx,axi-can-1.00.a", },
4871 +- { /* end of list */ },
4872 +-};
4873 +-MODULE_DEVICE_TABLE(of, xcan_of_match);
4874 +-
4875 + static struct platform_driver xcan_driver = {
4876 + .probe = xcan_probe,
4877 + .remove = xcan_remove,
4878 +diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
4879 +index e3080fbd9d00..7911dc3da98e 100644
4880 +--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
4881 ++++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
4882 +@@ -2891,7 +2891,7 @@ int mlx4_RST2INIT_QP_wrapper(struct mlx4_dev *dev, int slave,
4883 + u32 srqn = qp_get_srqn(qpc) & 0xffffff;
4884 + int use_srq = (qp_get_srqn(qpc) >> 24) & 1;
4885 + struct res_srq *srq;
4886 +- int local_qpn = be32_to_cpu(qpc->local_qpn) & 0xffffff;
4887 ++ int local_qpn = vhcr->in_modifier & 0xffffff;
4888 +
4889 + err = adjust_qp_sched_queue(dev, slave, qpc, inbox);
4890 + if (err)
4891 +diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
4892 +index 7ed30d0b5273..a501f3ba6a3f 100644
4893 +--- a/drivers/usb/class/cdc-acm.c
4894 ++++ b/drivers/usb/class/cdc-acm.c
4895 +@@ -1771,6 +1771,9 @@ static const struct usb_device_id acm_ids[] = {
4896 + { USB_DEVICE(0x09d8, 0x0320), /* Elatec GmbH TWN3 */
4897 + .driver_info = NO_UNION_NORMAL, /* has misplaced union descriptor */
4898 + },
4899 ++ { USB_DEVICE(0x0ca6, 0xa050), /* Castles VEGA3000 */
4900 ++ .driver_info = NO_UNION_NORMAL, /* reports zero length descriptor */
4901 ++ },
4902 +
4903 + { USB_DEVICE(0x2912, 0x0001), /* ATOL FPrint */
4904 + .driver_info = CLEAR_HALT_CONDITIONS,
4905 +diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
4906 +index 4d86da0df131..93756664592a 100644
4907 +--- a/drivers/usb/core/hub.c
4908 ++++ b/drivers/usb/core/hub.c
4909 +@@ -1123,10 +1123,14 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type)
4910 +
4911 + if (!udev || udev->state == USB_STATE_NOTATTACHED) {
4912 + /* Tell hub_wq to disconnect the device or
4913 +- * check for a new connection
4914 ++ * check for a new connection or over current condition.
4915 ++ * Based on USB2.0 Spec Section 11.12.5,
4916 ++ * C_PORT_OVER_CURRENT could be set while
4917 ++ * PORT_OVER_CURRENT is not. So check for any of them.
4918 + */
4919 + if (udev || (portstatus & USB_PORT_STAT_CONNECTION) ||
4920 +- (portstatus & USB_PORT_STAT_OVERCURRENT))
4921 ++ (portstatus & USB_PORT_STAT_OVERCURRENT) ||
4922 ++ (portchange & USB_PORT_STAT_C_OVERCURRENT))
4923 + set_bit(port1, hub->change_bits);
4924 +
4925 + } else if (portstatus & USB_PORT_STAT_ENABLE) {
4926 +diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c
4927 +index 4191feb765b1..4800bb22cdd6 100644
4928 +--- a/drivers/usb/gadget/function/f_fs.c
4929 ++++ b/drivers/usb/gadget/function/f_fs.c
4930 +@@ -3037,7 +3037,7 @@ static int ffs_func_setup(struct usb_function *f,
4931 + __ffs_event_add(ffs, FUNCTIONFS_SETUP);
4932 + spin_unlock_irqrestore(&ffs->ev.waitq.lock, flags);
4933 +
4934 +- return USB_GADGET_DELAYED_STATUS;
4935 ++ return creq->wLength == 0 ? USB_GADGET_DELAYED_STATUS : 0;
4936 + }
4937 +
4938 + static void ffs_func_suspend(struct usb_function *f)
4939 +diff --git a/include/net/tcp.h b/include/net/tcp.h
4940 +index a3696b778757..65babd8a682d 100644
4941 +--- a/include/net/tcp.h
4942 ++++ b/include/net/tcp.h
4943 +@@ -376,6 +376,7 @@ ssize_t tcp_splice_read(struct socket *sk, loff_t *ppos,
4944 + struct pipe_inode_info *pipe, size_t len,
4945 + unsigned int flags);
4946 +
4947 ++void tcp_enter_quickack_mode(struct sock *sk);
4948 + static inline void tcp_dec_quickack_mode(struct sock *sk,
4949 + const unsigned int pkts)
4950 + {
4951 +@@ -559,6 +560,7 @@ void tcp_send_fin(struct sock *sk);
4952 + void tcp_send_active_reset(struct sock *sk, gfp_t priority);
4953 + int tcp_send_synack(struct sock *);
4954 + void tcp_push_one(struct sock *, unsigned int mss_now);
4955 ++void __tcp_send_ack(struct sock *sk, u32 rcv_nxt);
4956 + void tcp_send_ack(struct sock *sk);
4957 + void tcp_send_delayed_ack(struct sock *sk);
4958 + void tcp_send_loss_probe(struct sock *sk);
4959 +diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
4960 +index 2017ffa5197a..96c9c0f0905a 100644
4961 +--- a/net/core/rtnetlink.c
4962 ++++ b/net/core/rtnetlink.c
4963 +@@ -2087,9 +2087,12 @@ int rtnl_configure_link(struct net_device *dev, const struct ifinfomsg *ifm)
4964 + return err;
4965 + }
4966 +
4967 +- dev->rtnl_link_state = RTNL_LINK_INITIALIZED;
4968 +-
4969 +- __dev_notify_flags(dev, old_flags, ~0U);
4970 ++ if (dev->rtnl_link_state == RTNL_LINK_INITIALIZED) {
4971 ++ __dev_notify_flags(dev, old_flags, 0U);
4972 ++ } else {
4973 ++ dev->rtnl_link_state = RTNL_LINK_INITIALIZED;
4974 ++ __dev_notify_flags(dev, old_flags, ~0U);
4975 ++ }
4976 + return 0;
4977 + }
4978 + EXPORT_SYMBOL(rtnl_configure_link);
4979 +diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
4980 +index 10286432f684..c11bb6d2d00a 100644
4981 +--- a/net/ipv4/ip_output.c
4982 ++++ b/net/ipv4/ip_output.c
4983 +@@ -480,6 +480,8 @@ static void ip_copy_metadata(struct sk_buff *to, struct sk_buff *from)
4984 + to->dev = from->dev;
4985 + to->mark = from->mark;
4986 +
4987 ++ skb_copy_hash(to, from);
4988 ++
4989 + /* Copy the flags to each fragment. */
4990 + IPCB(to)->flags = IPCB(from)->flags;
4991 +
4992 +diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
4993 +index ce9a7fbb7c5f..88426a6a7a85 100644
4994 +--- a/net/ipv4/ip_sockglue.c
4995 ++++ b/net/ipv4/ip_sockglue.c
4996 +@@ -135,15 +135,18 @@ static void ip_cmsg_recv_dstaddr(struct msghdr *msg, struct sk_buff *skb)
4997 + {
4998 + struct sockaddr_in sin;
4999 + const struct iphdr *iph = ip_hdr(skb);
5000 +- __be16 *ports = (__be16 *)skb_transport_header(skb);
5001 ++ __be16 *ports;
5002 ++ int end;
5003 +
5004 +- if (skb_transport_offset(skb) + 4 > skb->len)
5005 ++ end = skb_transport_offset(skb) + 4;
5006 ++ if (end > 0 && !pskb_may_pull(skb, end))
5007 + return;
5008 +
5009 + /* All current transport protocols have the port numbers in the
5010 + * first four bytes of the transport header and this function is
5011 + * written with this assumption in mind.
5012 + */
5013 ++ ports = (__be16 *)skb_transport_header(skb);
5014 +
5015 + sin.sin_family = AF_INET;
5016 + sin.sin_addr.s_addr = iph->daddr;
5017 +diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c
5018 +index 55d7da1d2ce9..e63b764e55ea 100644
5019 +--- a/net/ipv4/tcp_dctcp.c
5020 ++++ b/net/ipv4/tcp_dctcp.c
5021 +@@ -131,23 +131,14 @@ static void dctcp_ce_state_0_to_1(struct sock *sk)
5022 + struct dctcp *ca = inet_csk_ca(sk);
5023 + struct tcp_sock *tp = tcp_sk(sk);
5024 +
5025 +- /* State has changed from CE=0 to CE=1 and delayed
5026 +- * ACK has not sent yet.
5027 +- */
5028 +- if (!ca->ce_state && ca->delayed_ack_reserved) {
5029 +- u32 tmp_rcv_nxt;
5030 +-
5031 +- /* Save current rcv_nxt. */
5032 +- tmp_rcv_nxt = tp->rcv_nxt;
5033 +-
5034 +- /* Generate previous ack with CE=0. */
5035 +- tp->ecn_flags &= ~TCP_ECN_DEMAND_CWR;
5036 +- tp->rcv_nxt = ca->prior_rcv_nxt;
5037 +-
5038 +- tcp_send_ack(sk);
5039 +-
5040 +- /* Recover current rcv_nxt. */
5041 +- tp->rcv_nxt = tmp_rcv_nxt;
5042 ++ if (!ca->ce_state) {
5043 ++ /* State has changed from CE=0 to CE=1, force an immediate
5044 ++ * ACK to reflect the new CE state. If an ACK was delayed,
5045 ++ * send that first to reflect the prior CE state.
5046 ++ */
5047 ++ if (inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER)
5048 ++ __tcp_send_ack(sk, ca->prior_rcv_nxt);
5049 ++ tcp_enter_quickack_mode(sk);
5050 + }
5051 +
5052 + ca->prior_rcv_nxt = tp->rcv_nxt;
5053 +@@ -161,23 +152,14 @@ static void dctcp_ce_state_1_to_0(struct sock *sk)
5054 + struct dctcp *ca = inet_csk_ca(sk);
5055 + struct tcp_sock *tp = tcp_sk(sk);
5056 +
5057 +- /* State has changed from CE=1 to CE=0 and delayed
5058 +- * ACK has not sent yet.
5059 +- */
5060 +- if (ca->ce_state && ca->delayed_ack_reserved) {
5061 +- u32 tmp_rcv_nxt;
5062 +-
5063 +- /* Save current rcv_nxt. */
5064 +- tmp_rcv_nxt = tp->rcv_nxt;
5065 +-
5066 +- /* Generate previous ack with CE=1. */
5067 +- tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
5068 +- tp->rcv_nxt = ca->prior_rcv_nxt;
5069 +-
5070 +- tcp_send_ack(sk);
5071 +-
5072 +- /* Recover current rcv_nxt. */
5073 +- tp->rcv_nxt = tmp_rcv_nxt;
5074 ++ if (ca->ce_state) {
5075 ++ /* State has changed from CE=1 to CE=0, force an immediate
5076 ++ * ACK to reflect the new CE state. If an ACK was delayed,
5077 ++ * send that first to reflect the prior CE state.
5078 ++ */
5079 ++ if (inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER)
5080 ++ __tcp_send_ack(sk, ca->prior_rcv_nxt);
5081 ++ tcp_enter_quickack_mode(sk);
5082 + }
5083 +
5084 + ca->prior_rcv_nxt = tp->rcv_nxt;
5085 +diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
5086 +index 4350ee058441..5c645069a09a 100644
5087 +--- a/net/ipv4/tcp_input.c
5088 ++++ b/net/ipv4/tcp_input.c
5089 +@@ -187,13 +187,14 @@ static void tcp_incr_quickack(struct sock *sk)
5090 + icsk->icsk_ack.quick = min(quickacks, TCP_MAX_QUICKACKS);
5091 + }
5092 +
5093 +-static void tcp_enter_quickack_mode(struct sock *sk)
5094 ++void tcp_enter_quickack_mode(struct sock *sk)
5095 + {
5096 + struct inet_connection_sock *icsk = inet_csk(sk);
5097 + tcp_incr_quickack(sk);
5098 + icsk->icsk_ack.pingpong = 0;
5099 + icsk->icsk_ack.ato = TCP_ATO_MIN;
5100 + }
5101 ++EXPORT_SYMBOL(tcp_enter_quickack_mode);
5102 +
5103 + /* Send ACKs quickly, if "quick" count is not exhausted
5104 + * and the session is not interactive.
5105 +@@ -4788,6 +4789,7 @@ restart:
5106 + static void tcp_collapse_ofo_queue(struct sock *sk)
5107 + {
5108 + struct tcp_sock *tp = tcp_sk(sk);
5109 ++ u32 range_truesize, sum_tiny = 0;
5110 + struct sk_buff *skb = skb_peek(&tp->out_of_order_queue);
5111 + struct sk_buff *head;
5112 + u32 start, end;
5113 +@@ -4797,6 +4799,7 @@ static void tcp_collapse_ofo_queue(struct sock *sk)
5114 +
5115 + start = TCP_SKB_CB(skb)->seq;
5116 + end = TCP_SKB_CB(skb)->end_seq;
5117 ++ range_truesize = skb->truesize;
5118 + head = skb;
5119 +
5120 + for (;;) {
5121 +@@ -4811,14 +4814,24 @@ static void tcp_collapse_ofo_queue(struct sock *sk)
5122 + if (!skb ||
5123 + after(TCP_SKB_CB(skb)->seq, end) ||
5124 + before(TCP_SKB_CB(skb)->end_seq, start)) {
5125 +- tcp_collapse(sk, &tp->out_of_order_queue,
5126 +- head, skb, start, end);
5127 ++ /* Do not attempt collapsing tiny skbs */
5128 ++ if (range_truesize != head->truesize ||
5129 ++ end - start >= SKB_WITH_OVERHEAD(SK_MEM_QUANTUM)) {
5130 ++ tcp_collapse(sk, &tp->out_of_order_queue,
5131 ++ head, skb, start, end);
5132 ++ } else {
5133 ++ sum_tiny += range_truesize;
5134 ++ if (sum_tiny > sk->sk_rcvbuf >> 3)
5135 ++ return;
5136 ++ }
5137 ++
5138 + head = skb;
5139 + if (!skb)
5140 + break;
5141 + /* Start new segment */
5142 + start = TCP_SKB_CB(skb)->seq;
5143 + end = TCP_SKB_CB(skb)->end_seq;
5144 ++ range_truesize = skb->truesize;
5145 + } else {
5146 + if (before(TCP_SKB_CB(skb)->seq, start))
5147 + start = TCP_SKB_CB(skb)->seq;
5148 +@@ -4874,6 +4887,9 @@ static int tcp_prune_queue(struct sock *sk)
5149 + else if (tcp_under_memory_pressure(sk))
5150 + tp->rcv_ssthresh = min(tp->rcv_ssthresh, 4U * tp->advmss);
5151 +
5152 ++ if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf)
5153 ++ return 0;
5154 ++
5155 + tcp_collapse_ofo_queue(sk);
5156 + if (!skb_queue_empty(&sk->sk_receive_queue))
5157 + tcp_collapse(sk, &sk->sk_receive_queue,
5158 +diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
5159 +index 2854db094864..6fa749ce231f 100644
5160 +--- a/net/ipv4/tcp_output.c
5161 ++++ b/net/ipv4/tcp_output.c
5162 +@@ -177,8 +177,13 @@ static void tcp_event_data_sent(struct tcp_sock *tp,
5163 + }
5164 +
5165 + /* Account for an ACK we sent. */
5166 +-static inline void tcp_event_ack_sent(struct sock *sk, unsigned int pkts)
5167 ++static inline void tcp_event_ack_sent(struct sock *sk, unsigned int pkts,
5168 ++ u32 rcv_nxt)
5169 + {
5170 ++ struct tcp_sock *tp = tcp_sk(sk);
5171 ++
5172 ++ if (unlikely(rcv_nxt != tp->rcv_nxt))
5173 ++ return; /* Special ACK sent by DCTCP to reflect ECN */
5174 + tcp_dec_quickack_mode(sk, pkts);
5175 + inet_csk_clear_xmit_timer(sk, ICSK_TIME_DACK);
5176 + }
5177 +@@ -901,8 +906,8 @@ out:
5178 + * We are working here with either a clone of the original
5179 + * SKB, or a fresh unique copy made by the retransmit engine.
5180 + */
5181 +-static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
5182 +- gfp_t gfp_mask)
5183 ++static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
5184 ++ int clone_it, gfp_t gfp_mask, u32 rcv_nxt)
5185 + {
5186 + const struct inet_connection_sock *icsk = inet_csk(sk);
5187 + struct inet_sock *inet;
5188 +@@ -962,7 +967,7 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
5189 + th->source = inet->inet_sport;
5190 + th->dest = inet->inet_dport;
5191 + th->seq = htonl(tcb->seq);
5192 +- th->ack_seq = htonl(tp->rcv_nxt);
5193 ++ th->ack_seq = htonl(rcv_nxt);
5194 + *(((__be16 *)th) + 6) = htons(((tcp_header_size >> 2) << 12) |
5195 + tcb->tcp_flags);
5196 +
5197 +@@ -1005,7 +1010,7 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
5198 + icsk->icsk_af_ops->send_check(sk, skb);
5199 +
5200 + if (likely(tcb->tcp_flags & TCPHDR_ACK))
5201 +- tcp_event_ack_sent(sk, tcp_skb_pcount(skb));
5202 ++ tcp_event_ack_sent(sk, tcp_skb_pcount(skb), rcv_nxt);
5203 +
5204 + if (skb->len != tcp_header_size)
5205 + tcp_event_data_sent(tp, sk);
5206 +@@ -1036,6 +1041,13 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
5207 + return net_xmit_eval(err);
5208 + }
5209 +
5210 ++static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
5211 ++ gfp_t gfp_mask)
5212 ++{
5213 ++ return __tcp_transmit_skb(sk, skb, clone_it, gfp_mask,
5214 ++ tcp_sk(sk)->rcv_nxt);
5215 ++}
5216 ++
5217 + /* This routine just queues the buffer for sending.
5218 + *
5219 + * NOTE: probe0 timer is not checked, do not forget tcp_push_pending_frames,
5220 +@@ -3354,7 +3366,7 @@ void tcp_send_delayed_ack(struct sock *sk)
5221 + }
5222 +
5223 + /* This routine sends an ack and also updates the window. */
5224 +-void tcp_send_ack(struct sock *sk)
5225 ++void __tcp_send_ack(struct sock *sk, u32 rcv_nxt)
5226 + {
5227 + struct sk_buff *buff;
5228 +
5229 +@@ -3391,9 +3403,14 @@ void tcp_send_ack(struct sock *sk)
5230 +
5231 + /* Send it off, this clears delayed acks for us. */
5232 + skb_mstamp_get(&buff->skb_mstamp);
5233 +- tcp_transmit_skb(sk, buff, 0, sk_gfp_atomic(sk, GFP_ATOMIC));
5234 ++ __tcp_transmit_skb(sk, buff, 0, sk_gfp_atomic(sk, GFP_ATOMIC), rcv_nxt);
5235 ++}
5236 ++EXPORT_SYMBOL_GPL(__tcp_send_ack);
5237 ++
5238 ++void tcp_send_ack(struct sock *sk)
5239 ++{
5240 ++ __tcp_send_ack(sk, tcp_sk(sk)->rcv_nxt);
5241 + }
5242 +-EXPORT_SYMBOL_GPL(tcp_send_ack);
5243 +
5244 + /* This routine sends a packet with an out of date sequence
5245 + * number. It assumes the other end will try to ack it.
5246 +diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
5247 +index cae37bfd12ab..9f6e57ded338 100644
5248 +--- a/net/ipv6/datagram.c
5249 ++++ b/net/ipv6/datagram.c
5250 +@@ -657,13 +657,16 @@ void ip6_datagram_recv_specific_ctl(struct sock *sk, struct msghdr *msg,
5251 + }
5252 + if (np->rxopt.bits.rxorigdstaddr) {
5253 + struct sockaddr_in6 sin6;
5254 +- __be16 *ports = (__be16 *) skb_transport_header(skb);
5255 ++ __be16 *ports;
5256 ++ int end;
5257 +
5258 +- if (skb_transport_offset(skb) + 4 <= skb->len) {
5259 ++ end = skb_transport_offset(skb) + 4;
5260 ++ if (end <= 0 || pskb_may_pull(skb, end)) {
5261 + /* All current transport protocols have the port numbers in the
5262 + * first four bytes of the transport header and this function is
5263 + * written with this assumption in mind.
5264 + */
5265 ++ ports = (__be16 *)skb_transport_header(skb);
5266 +
5267 + sin6.sin6_family = AF_INET6;
5268 + sin6.sin6_addr = ipv6_hdr(skb)->daddr;
5269 +diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
5270 +index 74786783834b..0feede45bd28 100644
5271 +--- a/net/ipv6/ip6_output.c
5272 ++++ b/net/ipv6/ip6_output.c
5273 +@@ -559,6 +559,8 @@ static void ip6_copy_metadata(struct sk_buff *to, struct sk_buff *from)
5274 + to->dev = from->dev;
5275 + to->mark = from->mark;
5276 +
5277 ++ skb_copy_hash(to, from);
5278 ++
5279 + #ifdef CONFIG_NET_SCHED
5280 + to->tc_index = from->tc_index;
5281 + #endif