Architectural Support for System Security

Architectural Support for System Security

Hardware Features, Usage and Scenarios

performance counter性能监视器用来做安全

Security: Why Hardware?

Security is a negative goal

  • how to make a program not do something?
  • not execute any code from user, not leak some secret from memory, etc

Hardware features based security:

  • fixed and robust(hopefully)健壮
  • more efficient(most of the time)比较好地提高并行能力,减少CPU的开销

Features designed for Security

SMEP & SMAP

Return-to-user Attack

利用了用户空间进程不能访问内核空间,但内核空间能访问用户空间这个特性来定向内核代码或数据流指向用户软件,以ring 0特权执行用户空间代码完成提升权限

SMEP

Supervisor Mode Execution Prevention

  • allows pages to be protected from supervisor-mode instruction fetches
  • if SMEP = 1, OS cannot fetch instructions from applications

保护页面免受supervisor模式提取指令

Prevent Return-to-user Attack: the CPU will prevent the OS from executing user-level instructions

SMAP

supervisor mode access prevention

  • allows pages to be protected from supervisor-mode data accesses
  • if SMAP = 1, OS cannot access data at linear addresses of application

早期内核和用户态是一张页表,防止内核去访问用户态内存

ret2dir Attacks

return-to-direct-mapped memory attack简单来说,通过利用一个核心区域,直接映射系统部分或者全部物理内存(用户空间内存映射到physmap,内核可以直接访问physmap)允许攻击者在内核地址空间访问用户数据

physmap在0xffff888000000000 - 0xfffc87fffffffff这一段,大小为64TB,物理内存直接映射在该区域某地址处

内存分配主要有kmalloc和vmalloc两种方式:

  • vmalloc请求page size倍数大小的内存,要求虚拟地址连续,物理地址不需要连续
  • kmalloc请求字节级内存分配,虚拟地址和物理地址都必须是连续的,可以在physmap上做内存分配操作

physmap和RAM是直接映射关系,可以通过kmalloc分配的内存地址找到physmap的基址。

ARM's Similar Functionalities

  • PAN: Privileged Access Never
  • PXN: Privileged execute Never
  • UAO: User Access Only

Using SMAP for Intra-process Isolation

  • Scenario: information hiding
  • Observation: SMAP prevents kernel access user's memory
  • Idea: use SMAP to hide data from the rest of the process
  • Solution: put critical part in ring-3 and rest of the process in ring-0
  • Challenge: how to securely run user code in ring-0?

MPX & MPK

Bounds Error of Software: C/C++ programs are prone to bounds errors.

  • not type-safe language
  • buffer overflow bugs

MPX

memory protection extensions

Intel introduces MPX since Skylake

Programmer can create and enforce bounds

  • specified by two 64-bit addresses specifying the beginning and the end of a range
  • New instructions are introduced to efficiently compare a given value against the bounds, raising an exception when the value does not fall within the permitted range

Instructions:

  • bndmov: Fetch the bounds information (upper and lower)out of memory and put it in a bounds register.(有专门的寄存器来储存边界值)
  • bndcl: Check the lower bounds against an argument(%rax)
  • bndcu: Check the upper bounds against an argument (%rax)
  • bnd retq: Not a "true" Intel MPX instruction
    • The bnd here is a prefix to a normal retq instruction
    • It just lets the processor know that this is Intel MPX-instrumented code

Bounds Tables For efficiency, four bounds can be stored into dedicated registers

  • Registers: bnd0 to bnd3
  • When more bounds are required, they are stored in memory, and the bound registers serve as a caching mechanism
  • Bounds tables are a two-level radix tree, indexed by the virtual address of the pointer for which you want to load/store the bounds
  • The BNDLDX/BNDSTX instructions essentially take a pointer value and move the bounds information between a bounds register & bounds tables

最坏情况下内存overhead 500%,开销很大

大量指针同时进行 bound check使性能变差

在编译的时候设置一些flags来使用

MPK

memory protection keys

  • with MPK, every page belongs to one of 16 domains, a domain is determined by 4 bits in every page-table entry(referred to as the protection key)
  • for every domain, there are two bits in a special register(pkru) denotes whether pages associated with that key can be read or written
  • kernel and application
    • only the kernel can change the key of a page
    • Application can read and write the pkru register using the rdpkru and wrpkru instructions respectively

整个内存区域被分为16个domain,有对应ID,写进页表里,通过pkru控制这些domain读写权限

初衷是在进程内做细粒度的内存权限管理

  • Isolation can be enabled using MPK by placing the sensitive data in pages that have a particular protection key, forming the sensitive domain .
  • An appropriate instrumentation enables reads and/or writes to the data by setting the access disable and write-disable bits, respectively, using wrpkru
    • As long as these bits are unset, the sensitive domain is accessible
    • By setting the bits back, the sensitive domain is disabled, making only the non- sensitive domain available

软件有mproject方法与之相似,application can already change the permission of pages. MPK的优势在于mproject是一个system call,有性能损失,改内存权限要改页表和刷TLB,一个核改了其他核也要中断刷TLB,下一次访存会TLB miss而使用MPK只需要执行几条指令,开销更小

应用场景:

  • use case 1: protect critical data with one address space
    • Handling of sensitive cryptographic data
    • Only enable access to private key during encryption
  • use case 2: prevent data corruption
    • In-memory database prevents writes most of the time
    • Only enable changing data when needs to change
    • Changing protection on gigabytes using mprotect() is too slow

保护关键数据,只有特定代码可以访问,或特定数据不会被corruption:大部分新的数据都在内存里,而不在磁盘里。所有人都可以访问容易导致错误。把MPK用在微内核。微内核性能差,用户态之间调用性能很差

ARM Pointer Authentication

如何保证指针没有被修改?

ARM64 only use 40 bits out of 64 bits

  • On an ARM64 Linux system using three-level page tables, only the bottom 40 bits are used, while the remaining 24 are equal to the highest significant bit
  • the 40-bit address is sign-extended to 64 bits
  • those uppermost bits could be put to other uses, including holding an authentication code

use the 24 bits for security!

把指针加一个tag,和一个密钥一起算出密文,存在前24个bits中

Key Management

PA defines five keys: Four keys for PAC* and AUT* instructions(combination of instruction/data and A/B keys), one key for use with the general purpose PACGA instruction

Key storage:

  • Stored in internal registers and are not accessible by EL0(user mode)
  • The software(EL1, EL2 and EK3) is required to witch keys between exception levels
  • Higher privilege levels control the keys for the lower privilege level

指针加密,加密值存在前24个bits,加一条指令保护栈

New instructions

PAC value creation:

  • Writee the value to the uppermost bits in a destination register alongside and address pointer value

Authentication:

  • Validate a PAC and update the destination register with a correct or corrupt address pointer
  • if the authentication fails, an indirect branch or load that uses the authenticated, and corrupt, address will cause an exception

remove a PAC value from the specified register

软件方法保护栈是在栈帧和栈帧之间插入一个随机数,return之前检查随机数看看有没有被篡改过,而用硬件的方法只需要在开头和结尾分别加一个PAC和AUT即可,提高性能

Target: Memory Safety

Memory safety violation dominates:

  • Microsoft, Google,etc

software solutions:

  • ASan: AddressSanitizer
  • HWSAN: hardware-assisted AddressSanitizeer
  • Cons: costly

Hardware solution: tagged memory

ARM MTE

memory test extension

memory safety 空间错误(访存越界)&时间错误(访问一个已经free的指针)

A new memory type: Normal Tagged Memory

loads and stores to this new memory type perform an access where the tag present in the top byte of the address register is compared with the tag stored in memory

A mismatch between the tag in the address and the tag memory can be configured to cause a synchronous exception or to be asynchronously reported

每16 bytes对应一个 1 byte tag 指针加一个tag 要求相邻的space tag要不一致, malloc/free的时候要注意更新tag, 这样malloc开销会变大,因为要初始化所有的tag(虽然可以异步执行)

Combining MTE and PA

MTE和PA都用了24个闲置bits,

  • a tag for memory tagging
  • a PAC for pointer authentication

可以同时使用,PAC的大小是可变的,取决于virtual address space大小。同时使用的时候PA安全性会降低一点

这24个bit还能怎么用?Pump为每个memory设置等长的tag,每个memory对应的tag也可以是一个指针

Intel CET

control-flow Enforcement Technology

Two major techs:

  • Shadow stack
  • Indirect branch tracking

核心思想是改变代码的控制流,包括两种方式,

code injection attacks

即在内存中注入一段恶意代码,试着将return address 覆盖掉,并跳转到恶意代码段

  • inject malicious code in buffer
  • Overwrite return address to buffer
  • Once return, the malicious code runs

Solutions:

  • StackGuard, FormatGuard
  • make data section non-executable

New Attacks: Code-reuse Attack

  • return-to-libc & return-oriented programming

Code Reuse Attack

不需要注入新的代码,而是跳转到已有代码,找到若干个代码片段,在return address里压入若干个地址把这些片段串起来

Return-oriented Programming

  • Find code gadgets in existed code base
  • push address of gadgets on stack
  • leverage 'ret ' at the end of gadget to connect each code gadgets
  • No code injection

Solutions:

  • return-less kernels
  • Heuristic means

New: Jump-oriented attacks

  • Use gadget as dispatcher

CFI

control-flow integrity

General Solution to enforce CFI

  • Some need binary re-writing or source re-compiling
  • Some need application/OS/Hardware re-designing
  • Some have large overhead

Challenges:

  • Non-instrusive general attack detection
  • Apply to existing applications on commodity hardware

shadow stack

A shadow stack is a second stack for the program

  • Used exclusively for control transfer operations
  • Is separate from the data stack
  • Can be enabled for operation individually in user mode or supervisor mode

给程序加一个shadow stack,只记录调用trace,和数据分开,stack overflow就无法攻击

Shadow Stack Mode

CALL instruction

  • Pushes the return address on both the data and shadow stack

RET instruction

  • Pops the return address from both stacks and compare them
  • If the return addresses from two stacks do not match, the processor signals a control protection exception

Note that the shadow stack only holds the return addresses and not parameters passed to the call instruction

这样软件需要维护两个栈,开销比较大,可以用用户态维护也可以由内核态维护,用户态维护的话每次call 和return之前都要去另外的地方记录一下,内核态维护可以把shadow stack放到内核态,比较安全但是每次call和return都需要system call,考虑用硬件来做

Protecting the Shadow Stack

The shadow stack is protected by page table

  • Page tables support a new attribute: mark page as "Shadow Stack" pages依然属于用户态,但是不能被一般指令访问

Control transfers are allowed to store return addresses to the shadow stack

  • Like near call, far call, call to interrupt/exception handlers, etc.
  • However stores from instructions like MOV, XSAVE, etc. will not be allowed

When control transfer instructions attempt to read from the shadow stack

  • Access will fault if the underlying page is not marked as a "Shadow Stack" page

Detects and prevents conditions that cause an overflow or underflow of the shadow stack or any malicious attempts to redirect the processor to consume data from addresses that are not shadow stack addresses

Indirect Branch Tracking

new instruction: ENDBRANCH在jump的时候检查

  • mark valid indirect call/jmp targets in the program jmp地址必须是一个ENDBRANCH
  • Becomes a NOP on legacy processor,在不支持这一指令的CPU上会变成NOP指令,保证兼容性
  • On processors that suport CET the ENDBRANCH is still a NOP and is primarily pipeline to detect control flow violations

WAIT_ FOR_ ENDBRANCH State

The CPU implements a state machine that tracks indirect jimp and call

  • When one of these instructions is seen, the state machine moves from IDLE to WAIT_ FOR_ ENDBRANCH state

  • In WAIT_ FOR_ _ENDBRANCH state the next instruction in the program stream must be an ENDBRANCH

  • If an ENDBRANCH is not seen the processor causes a control protection fault else the state machine moves back to IDLE state

为了这个指令加入一个WAIT_FOR_ENDBRANCH State,进入jmp指令的时候进入这个状态。如果jmp一半发生中断,中断恢复的时候要注意保存状态

ARM上有类似的指令BTI(Branch Target Instructions)BR----> jmp to BTI,指定了落脚点。缺点是BTI依然很多,但正确的只有一个,需要更细粒度的CFI,这部分软件实现起来比较方便

Isolated Execution Environment

能不能把bug带来的影响降到最低

Background: HeartBleed Attack

In-application memory disclosure attack

  • one over-read bug discloses the whole memory data

在实现TLS心跳协议时没有对输入进行适当验证,缺少边界检查,读取的数据比应该允许读取的还多。连接的一段可以发一个特定类型的heart beat请求包给对方,里面携带最长64kb的数据,对方收到后把数据原样返回,完成检测,发送请求的客户端可以故意声明自己携带了很长的数据而实际上不带任何数据,服务器不会检查请求中声明的数据和实际数据大小,而是直接按照这个长度用memcpy从请求数据中复制,也就是实际复制的是内存中紧跟在请求数据后面的这一段空间的数据。

解决思路:把应用程序代码放到两台虚拟机中执行,一台执行普通代码一台执行加密代码

Virtual Machine

虚拟化有VMX root/VMX non-root mode, 切换通过VM entry和VM exit实现

VM Entry:

  • Transition from VMM to Guest
  • Enters VMX non-root operation
  • Loads Guest state from VMCS
  • VMLAUNCH used on initial entry
  • VMRESUME used on subsequent entries

VM Exit:

  • VMEXIT instruction used on transition from Guest to VMM
  • Enters VMX root operation
  • Saves Guest state in VMCS
  • Loads VMM state from VMCS

在这一过程中使用的页表多了一个Extended Page Table(EPT)

  • Translate guest physical addr to host physical addr, the two-level translation are all done by hardware

    Guest Virtual Address(GVA)---Guest page table--->Guest Physical Address(GPA) ---EPT--->Host Physical Address(HPA)

  • EPT is manipulated and maintained by hypervisor

    • Hypervisor controls how guest accesses physical addresss
    • any EPT violation triggers VMExit to hypervisor

所以其实有两个CR3,一个指向guest page table,一个指向EPT

如何通过两个虚拟机跑一个进程的两段代码?在一台虚拟机上维护两张页表 Main EPT和Secret EPT

Memory Isolation using EPT Mechanism

Leverage EPT mechanism to shadow secret memory

  • Data segment: secret memory is removed from main EPT
  • Code segment: sensitive functions only exist in secret EPT

关键数据和代码都只在secret EPT里映射,问题转化为如何高效地做页表切换

问题:context switch开销很大:

  • Every EPT switch is intervened by hypervisor
  • VMExit takes much more time than function call

使用VMFUNC特性,不需要hypervisor切换页表

VM Function(VMFUNC)101

允许一个虚拟机配置若干个EPT并在non-root情况下切换

VM Functions: Intel virtualization extension

  • Non-root guest VMs can directly invoke some functions without VMExit

VM Function 0: EPTP Switching

  • Software in guest VM can directly load a new EPT pointer

VMFUNC can provide the hypervosor-level function at the cost of system calls

Using VMFUNC for Efficiency

Separate control plane from data plane

  • control plane: hypervisor pre-configure the EPT used by different compartments
  • data plane: application can directly switch EPT without yhypervisor intervention

EPTP switching invocation: VMFUNC opcode (EAX=0, ECX=EPTP_index)

一个虚拟机切换了页表后hypervisor并不知道切换了页表,可能导致错误,需要补足信息缺失,同时,由于VMFUNC可以在用户态运行,因此要防止恶意攻击者随意调用VMFUNC

Security Problem of VMFUNC

What if attackers directly switch EPT?

  • Since EPT switching is not checked by hypervisor

Recall: the code segment of the secret compartment

  • It only contains trusted sensitive functions
  • The legal entrances to the secret compartment are fixed合法入口是固定的,只有这个地方可以调用VMFUNC
  • Invalid VMFUNC invocation causes EPT violation

Secret Compartment is not self-contained

  • main compartment may invoke sensitive functions
  • Secret compartment may invoke normal functions
  • Different compartments have different context
  • main compartment通过Trampoline切换为secret compartment执行敏感代码再切换回去
  • secret compartment通过springboard切换为main compartment调用lib_call再切换回去
  • Context switch is done using VMFUNC

Application Decomposition in SeCage

A hybrid approach to decomposing application

  • Dynamic approach to extracting the secret closure
  • Automatic decomposition during compilation time
  • Static approach to getting the complete potential secret data functions, used to avoid corner case during runtime

Features for Isolation

ARM Trustzone

Two Modes

  • Normal world(REE, rich execution environment) and secure world(TEE, trusted execution environment)
  • isolated with each other
  • SMC instruction to switch

可以把trustzone看成两个虚拟机,区别在于smc的功能并不像thypervisor那么多,逻辑比较简单

Different levels of trust

  • Secure Domain(Tamper-proof, isolated) High security, limited funcs
  • Trusted Domain(TrustZone and TEE)
  • Protected Domain(Hypervisor) Secure, but more complex
  • Rich Domain(Android or Linux) Not secure,but flexible

TrustZone Usage: in Phones

TEE has become standard for biometric

  • TEE for fingerprint registration, storage and attestation
  • Keep secure even if the phone is rooted

TrustZone Usage: in Vehicle

Secure Authentication:

  • start through fingerprint
  • secure payment for digital content,oil,etc

Secure connection

  • Internet: Through SoftSIM to switch between carriers
  • Connection with smartphone for unlocking and remote controlling

Isolation with Entertainment

  • Use TEE for secure authentication and connection

TrustZone Usage: in Drones

Secure Control Policies

  • No-fly zone: using GPS to restrict fly zone through TEE
  • Owner authentication: using biometrics on remote controller
  • Other fly-policies: return to specific spot under certain conditions

Secure Enforcement

  • Enforce policies through secure boot/secure storage
  • Tamper-resistant even under physical attacks

Current Eco-system of TEE

Fragmentation of TEE

  • From chip venders: QualComm, Spetrum
  • From phone venders: Apple, Huawei
  • TEE OS venders: TrustKernel, Trustonic, Google, Linaro
  • Many other implementations based on OP-TEE

Trusted applications:

  • must be ported to each TEE OS
  • have to trust the underlying TEE OS

TrustZone-based Real-time Kernel Protection

Event-driven monitor

  • Monitor the normal world critical events

Memory protection

  • Protect critical parts of the normal world memory

Goals

  • Prevent unauthorized privileged code on the target system
  • Prevent kernel data access by user level processes

Intel SGX

Why Intel SGX?

Motivation: untrusted privileged software

  • protect application from untrusted OS

What if the OS direct accesses application's memory?

  • Data are encrypted in memory
  • Data can only be accessed by the app within CPU boundary
  • The TCB contains only the CPU app, no OS

首次在商用处理器上引入内存加密,攻击者通过物理手段偷取数据很难(嗅探内存总线,拔下NVRAM读数据)需要直接读取CPU才能得到数据

How can Memory Always be Encrypted?

Question: data will eventually be decrypted when using

  • Then, what if an attacker steal data when it is being used

Solution: only decrypt data inside CPU(in cache)

  • The attacker now has to steal data directly from CPU

Counter-mode Encryption

有两个cache,分别是data cache和counter cache不直接对数据做加解密,而是对counter做。每个cache line对应一个counter,数据加密其实是对数据对应的counter做加密。VM-key对counter做加密,生成一个PAD。这个PAD再和data做一次XOR运算作为最终密文, 因为XOR比较快

为什么是安全的?因为counter值是随机的,而且每次写内存counter都会+1,一直是变化的

Merkel Tree for Data Integrity

对所有的data和counter做一个哈希,对哈希值再次哈希,一路往上变成一个root of hash tree放在CPU里,攻击者无法修改

性能比较差,写一次要多次哈希,哈希树不能太深,内存不能太大。128MB-->改善后256MB

Process View

  • With its own code and data
  • Providing Confidentiality & Integrity
  • Controlled entry points
  • Multi-thread support
  • Full access to app memory and processor performance

protected execution environment embedded in a process

SGX Execution Flow

  • App built with trusted and untrusted parts
  • App runs & creates the enclaves which is placed in trusted memory
  • Trusted function is called, execution transitioned to the enclave此时call的时候要必须通过call gate限制跳转范围
  • Enclave sees all process data in clear; external access to enclave data is denied
  • Trusted function returns; enclave data remains in trusted memory
  • Application continues normal execution

怎么使用?

Software Architectures of SGX

  • Code Snippet只把APP trusted part放进enclaves
  • Application 把整个app和LibC interface放进SGX,好处是app不需要修改,缺点是不能很好保证安全性,libC向外传参是明文还是密文?
  • Container把LibC也加进来,system call才出去,但如果OS也是恶意的呢?
  • LibOS 把LibOS也放进来,把常用system call封装成一个OS放进来,外面是virtual machine级别

AMD SME & INTEL TME

AMD x86 Memory Encryption Technologies

Two Technologies:

  • AMD Secure Memory Encryption(SME)
  • AMD Secure Encrypted Virtualization(SEV)

Features

  • Hardware AES engine located in the memory controller performs inline encryption and decryption of DRAM
  • Minimal performance impact: Extra latency only taken for encrypted pages
  • No application changes required
  • Encryption keys are managed by the AMD Secure Processor and are hardware isolated. Not known to any software on the CPU

页表第47位设为0不加密,设为1为加密,对软件完全透明。依赖于OS,防硬件不防软件

Comparing with Intel SGX

The SME approach is different - It will not protect memory from an attacker who has compromised the kernel - It is intended to protect against cold- boot attacks, snooping on the memory bus, and the disclosure of transient data stored in persistent-memory arrays

Intel MKTME: Multi-Key TME

配置多个key,既可以从hard generated临时的key,也可以用 software provided key,适用于NVRAM重启后仍然想知道里面的数据(SGX这样纯硬件生成的重启后就不知道Key了,无法解密) Multi-Key Total Memory Encryption (MKTME)

  • A fixed number of encryption keys are supported
  • This functionality is available on a per-page basis

Uses the hardware- generated ephemeral key

  • Inaccessible by software or external interfaces

MKTME also supports software-provided keys

  • E.g.. a hypervisor can manage the keys to transparently provide memory encryption support for legacy OSes
  • OS can also use MKTME to provide support in native and virtualized environment

不同的VM可以有多个KeyID的内存区域,通过具有相同keyID的内存区域进行交互

AMD SEV

Threat Model of Public Cloud

Isolation between co-resident VMs provided by hypervisor sometimes breaks down:

  • QEMU "VENOM", VirtualBox bug, etc.

Cloud vendors and hypervisor they provide can not be trusted

  • Hypervisor has full access to guest secrets in memory
  • Not ideal for cloud users

AMD SEV assumes no side channel attacks or integrity compromise

Design of SEV

SEV adds an encryption engine in memory controller for encryption

  • Encryption engine encrypts data using corresponding key
  • Encryption key is selected by secure processor

SEV adds a secure processor for key management

DRAM里面是加密的,靠SOC里的Key进行保护,guest owner把自己的VM加密之后 VM只能运行在SEV里面并且以加密方式运行。hypervisor只能偷到密文

Limitation of AMD SME

Vulnerable to side channel attacks

  • Cache side channel, TLB side channel, etc.

No guarantee of integrity

  • Vulnerable to extend page table remap attack
  • VuInerable to physically rewrite to DRAM

Limited number of encryption keys

  • Encryption key is associated with ASID
  • Number of ASID is limited in secure processor

encryption key数量有限,能起的虚拟机数量有限。为了解决这个问题提出SMP,其中一个很重要的数据结构是RMP

RMP: Reverse Map Table

Memory integrity is enforced using a new DRAM structure called the Reverse Map Table (RMP)

There is 1 RMP for the entire system, it is created by software during boot

Basic properties:

  • RMP contains 1 entry for every 4k of assignable memory Hypervisor page
  • RMP is indexed by System Physical Address (SPA)
  • RMP entries may only be manipulated via new x86 instructions

The RMP indicates page ownership and dictates write-ability. Examples:

  • A page assigned to a guest is only writeable by that guest
  • A page assigned to the hypervisor cannot be used as a private (encrypted) guest page
  • A page used by AMD firmware cannot be written by any x86 software

RMP记录的是physical memory到virtual memory之间的映射关系,又叫page ownership

加了一条新指令PVALIDATE,guest可以对每个加到自己地址空间里的内存做VALIDATE操作,加进来之后会写RMP。guest执行PVALIDATE,硬件会把RMP设置好。如果hypervisor把mapping改了,此时guest并不知情,再去访问这块内存就会报错,可以保证hypervisor对页表的监控

Why TEE Virtualization?

能否对TrustZone做虚拟化,使得里面可以跑多个Trust OS和对应的App?

  • before 2021: A fixed piece of code by venders
  • 2012-2017: Some pre-installed trusted apps(TAs) by venders
  • 2017-now: Support dynamic installation of third party TAs

Why multiple isolated TEEs are needed?

  • More and more CVEs of TEE OS and TAs are disclosed
  • A compromised TEE may breach the entire system
  • APP vendors(e.g.,mobile payment) may compensate users for the faults of TEE OS, thuus they prefer to run on TEEs the trust

CVE Example: The Boomerang Attack

A time service running in the secure world.

  • Writing current time to a memory address (as parameter)

The bug: no check on the address→arbitrary memory writes to REE

  • Recall that TEE has higher privilege than REE
  • Similar bugs exist in QualComm, Trustonic, SierrawareTEE, Huawei, OP-TEE

降低TEE权限

TEEv: Enabling Multiple Virtualized TEEs

在一个CPU内运行多个TEE,这些vTEE可以是不同厂商的

interaction between vTEEs & vTEE/REE

  • secure communication channel by TEE-visor
    • TEE-visor manages the shared memry pages between vTEEs and vTEE/REE
    • Memory pages in one context need to be explicitly other context
  • Defend Boomerang attack

PMP

Hardware Property: PMP

RISC-V平台的隔离技术,physical memory protection

Secure monitor only ensure memory isolation when creating enclave

  • Keystone use PMP to ensure memory isolation during execution

N (typically 8) groups of PMP registers

  • Each group configures access permission to a specific piece of continuous physical memory

Hardware check during memory access

  • Hardware will look up the first PMP register group whose memory region contains destination address (from0 to N)

  • Check access permission according to first found PMP register

Each enclave will be assigned a group of PMP registers, indicates memory region allocated to enclave

pmpN is assigned to OS by secure monitor in default, so OS can only access memory after the address passes the check of all enclave's check

After enclave creation, the physical memory is divided into several independent memory region, each belongs to one enclave

total number of enclaves is limited, because the number of PMP register is limited

Limitations of PMP

Vulnerable to physical attacks

  • Bus snooping, cold boot attack, etc.

Not support dynamically allocating new memory for enclave

  • Enclave's memory region can only be set during enclave creation

  • This is limited by hardware PMP's design

Limited number of enclave supported simultaneously

Motivation of sPMP

For loT devices(MMU-less). It is desirable to enable S-mode OS to limit the physical addresses accessible by U-mode software

之前的PMP是monitor mode,是 RISC-V平台特有的权限,非常底层

M-mode PMP virtualization is non-secure, S-mode virtualization for scalable enclaves

Penglai

在machine mode里做了一个secure monitor,负责Enclave management,包括创建enclave等,user态有enclave APP, Enclave service如FS等 ,主要工作在于secure communnication channel

Fine-grained Memory Isolation

Naive way

1-bit tag for memory isolation

  • Secure monitor reserves a bitmap in DRAM and protects it via PMP

  • Each bit in bitmap corresponds to one physical page and indicate whether the page is enclave page

  • CPU checks corresponding bit in bitmap before accessing certain physical page to prohibit kernel from accessing enclave memory

对性能影响和硬件改动比较大 Cons:

  • Too much modification to hardware

  • CPU extension introduces one extra memory access for querying bitmap

  • Overhead can be alleviated via tag cache but can not be mitigated and introduces more modification

Hardware Solution

  • All unsecure page tables are stored in a reserved memory region (PT_ AREA). New hardware feature is added in page table walker (PTW)

  • PT_ AREA is isolated from kernel by PMP

  • Kernel is still in charge of memory mappings but can not write PT_ AREA directly

  • Secure monitor helps kernel set page table entry and check malicious mappings

  • Minor modification to hardware (only some comparing logic in page table walker)

  • No extra memory access overhead during application execution

It achieves:

  • G1: Non-enclaves cannot access secure pages
  • G2: Fine-grained memory isolation without static partitioning

Temporally Cache Partition

Penglai uses cache partition mechanism to alleviate side channel

Partition cache when current CPU issues certain instruction

  • CPU can still read/write all cache lines but can only evict cache lines allocated to it

Cancel the partition via certain instruction

Most of time the whole cache is shared among CPUs

Fast IPC

  • Secure monitor allows an enclave to register itself as a server with certain name

  • Then secure monitor will bind the server enclave with its name

  • Other enclaves can request secure monitor for handle of server enclave with certain name

  • Then it can call server enclave with the handle

  • Penglai supports both host- enclave IPC and enclave - enclave IPC

  • Penglai supports fast ownership transfer between host and enclave via unmapping pages in PT AREA, marking enclave pages and remapping them in enclave's page table

  • Penglai supports fast ownership transfer between enclaves and enclave via unmapping and remapping pages in each enclave's page table

  • When enclave call is finished, pages' owner- ship transfer can also happen in the opposite direction

Features NOT for Security

Transactional Memory 101

本来是给数据库和其他并发软件用的

Hardware TM to mass market

  • Intel's restricted transactional memory (RTM)
  • IBM's IBM Blue Gene/Q
  • AMD advanced synchronization family (ASF proposal)

Generally provides:

  • Opportunistic concurrency
  • Strong atomicity: read set & write set
  • Semantic of both all-or-nothing and before-or-after

Real-world best - effort TM

  • Limited read/write set
  • System events may abort an TX

Using HTM for Data Protection

Idea: leverage the strong atomicity guarantee provided by HTM to defeat illegal concurrent accesses to the memory space that contains sensitive data

  • Each private- key computation is performed as an atomic transaction

During the transaction

  • Private key is first decrypted into plaintext,
  • Use to decrypt or sign messages
  • If the transaction is interrupted, the abort handler clears all updated but uncommitted data in the transaction
  • Before committing the computation result, all sensitive data are carefully cleared

Intel CAT

The Noisy Neighbor Problem

"noisy neighbor" on core zero over-utilizes shared resources in the platform, causing performance inversion

Though the priority app on core one is higher priority, it runs slower than expected

Software Controlled Cache Allocation

The basic mechanisms of CAT include:

  • The ability to enumerate the CAT capability and the associated LLC allocation support via CPUID
  • Interfaces for the OS/hypervisor to group applications into classes of service (CLOS) and indicate the amount of last-level cache available to each CLOS
  • These interfaces are based on MSRs: Model- Specific Registers

PMU

Monitor Control Flow by Existing PMU

PEBS: Precise Performance Counter

  • Save samples in memory region for batching
  • Atomic-freeze: record exact IP address precisely

BTS: Branch Trace Store

  • Capture all control transfer events

  • Also save exact IP in memory region

LBR: Last Branch Record

  • Save samples in register stack, only 16 pairs

Event Filtering

  • E.g. "do not capture near return branches"
  • Only available in LBR, not BTS

Conditional Counting

  • E.g. "only counting when at user mode"

Main idea

Leverage PMU for CFI Monitoring

  • Using already existing hardware
  • No need to modify software

Two Phases

  • Offline phase: Get all the legal targets for each branch source

  • Online phase: Monitor all branches and detect malicious ones

Branch Types

Direct Branches

  • Direct call
  • Direct jump

Indirect Branches

  • return
  • indirect call
  • indirect jump

Target Address Sets:

Target Sets for indirect branches

  • ret_set: all the addresses next to a call
  • call_set: all the first addresses of a function
  • train_sets: all the target addresses that once happened

INTEL PT

Intel Processor Tracing (IPT)

Privileged agent configures IPT per core

  • Define memory location and size for tracing
  • 3 filtering mechanisms: CPL, CR3, IP range

Efficiently captures various information

  • Control flow, timing, mode change, etc.

Challenges: Fast Trace VS. Slow Decode

Performance overhead is shifted from tracing to decoding, decoding is several orders of magnitude slower than tracing

FlowGuard

FlowGuard: transparent, efficient and precise CFI

  • Transparent: no source code needed, no hardware change
  • Precise: enforce fine-grained CFI with dynamic information
  • Efficient: reconstruct CFG and separate fast and slow paths

Evaluation results

  • Apply FlowGuard to real machine with server workloads
  • Prevent a various of real code reuse attacks
  • Less than 8% performance overhead for normal use cases

Usage of Microcode

  • Customizable RDTSC Precision

  • Microcode- Assisted Address Sanitizer

  • Microcoded Instruction Set Randomization

  • Microcode- Assisted Instrumentation

  • Authenticated Microcode Updates

  • μEnclave

Conclusion

  • Hardware VS. software
  • User-mode VS. kernel- mode
  • Integrity VS. privacy
  • Heterogenous VS. homogenous
  • Encryption VS. isolation
  • Side channel attacks & physical attacks

Architectural Support for System Security
https://mundi-xu.github.io/2021/11/30/Architectural-Support-for-System-Security/
Author
寒雨
Posted on
November 30, 2021
Licensed under