Ur

Seminars/Lectures

December 02 Nathan Beckmann

From CMU.

Pushing the limits of online and offline caching.

Marry thery and practice

  • computing tight bounds on OPT even when is NP-hard.

E.G. Mysql, cache can be 100x faster.

Caches: Think smarter, not larger: a better cache policy. LRU, ++.

Caching: theory questions; practical questions; online & offline policies. ==> an intersection of these.

Beckmann, NSDI’18. LHD, AdaptSize, Hyperbolic, GDSF, LRU. Figure.

Caches have performance Cliffs.

LRU: cliffs. 31GB/32GB -> miss 100% until 32/32GB.

Talus: random sampling.

Metric: Hit density: (hit probability)/(expected lifetime)

Least Hit density eviction.

Open problems:

  • metrics that really matters.
  • object features.

OPT:

  • if objects have different sizes, then NP-hard. (adapting (farach-colton and liberatore, ‘2000))
  • if writebacks cost > 0 , then Max-SNP-hard. (hard to approximate, reduction from hypergraph matching, )

OPT in practices:

NP-hardness is ‘artificial’, and real traces are ‘easy’.

==> FOO: computing OPT in real traces efficiently. Map LP to mini-cost flow.O(N^2log^2(N))

==> PFOO: O(Nlog^2N)

  • coupon collector problem.

Write back matters.

Write-back aware caching.

OPT: intractable, (not uncomputable).

Nov 18 Shaoshan Liu

From PerceptIn

R-CNN

Object Tracking: MDP POMDP (deep learning)

Stereo (content CNN , ELAS) and Optical Flow (& FlowNet)

traffic prediction.

lane routing

behavioral decisions: for neighbor vehicle.

motion planning. path planning.

feedback control.

A heavyweight approach

  • A robust OS. Linux?WIndows? No. but A middleware that combines each component together.

-> Robotic OS. ROS: Topic and Topic.

  • Problems of ROS:
    • not efficient: communication
    • not secure: ?
    • not scalable: central control node
    • not reliable: crash

Cloud: spark* in autonomous driving

HD Map: Raw data -> point cloud -> alignment -> 2d reflectance map -> HD Map labeling -> high precision map

==> very expensive; e.g. Google map update once a year.

–> but we need update at least once a week. ==> who pays for it?

Problem: heavyweight

Lightweight approach to automonous driving –> we are doing

  • Computer vision based sensor fusion
  • GNSS
  • radar
  • sonar
  • Chasis

** Edge computing

Edge Client

KITTI data set (but we still need benchmarks).

RTOS -> VX QNX, space machines.

Middleware -> ROS: not stable.

** Potential cyber attacks on Automated Vehicles. IEEE journal **

Bundle adjustment

ROS replacement

Nanomsg + ROS reimplementation

nanomsg: 50% more performance.

Nov 15 Ruiqin

Ruiqin: Value-set Analysis.

CC’04.

Page 10. Read examples.

Give examples; what is a a-loc.

“i don’t have a complete example” – Sree

Micheal:

Nov 01 Micheal

locks: when application can fails, others cannot proceed.

  • preemption at unexpected times;
  • priority inversion;

Hardware primitives: for load and stores.

  • compare and swap(a,e,n);
  • load and store conditional; do load and tagged L1 cache; store when tag still valid;

  • write with same old value. ==> CAS cannot fail; SC will fail.

A hardware swap: always a success; unconditional.

sequential consistency: global visible way; but no hardware behaves that way;

  • store release;
  • load acquire;

ABA problem

ABA problem: same address got freed and reallocated to the same address; but CAS regard as same before;

  • counted pointers;

Correctness of non-blocking structures

  • safety: bad things never happen; e.g. dead-lock freedom;
  • liveness: good things eventually happen; e.g. starvation problem.

dead-lock free is necessary for starvation free; but not sufficient.

Linearizability (Herlihy & Wing 1987)

Exe hisotry H is linearizable, n threads, each sequential, exe in parallel; the result is the same as never overlapped (but different order is allowed).

Linearization point: points that takes effect and visible to others;

Liveness

wait-free (starvation-free): there is a bound of instructions to finish; e.g. fetch_and_add based counter;

lock-free (=livelock-free): a bounded number of instructions for others to finish; itself might be starved; e.g. CAS-based counter;

obstruction-free (=merely nonblocking): my ops is guaranteed to complete in a bounded number of steps if I get to run all myself; external monitor/scheduler will take control if passed limited steps; e.g. deqeue of Herlihy.

Treiber stack

Replicated counter example: array of counters, one counter for each thread to update;

scan array and add to total counts;

BUT no static linearization point: the each count being added might not be exist simultaneously at any time point;

Micheal & Scott queue [1996]

A -> B -> D Insert C and delete B at same time; C got lost

Harris’s insight: tag next pointer (D); lazy xxx using gc

Micheal’s idea: no gc

Favorate non-block data structure: Extensible hash table by S &Shavit

Oct 18 Jie Checked C

compatible with legacy code.

web servers do not have much use of alloca/dealloca;

use after free: elevation from application to kernel. CVE examples.

Existing works

MMU page invalidate on free. (Adve: 2006 DSN, etc.)

invalidating dangling: tracking point-to relations; (DangNULL; FreeSentry; DangSan)

Dynamic checking on pointer dereference

  • Key-lock based dynamic checking: CETS @ ISMM’10
  • Tumstones.

Checked C solution

Carry metadata with pointers: ‘safe-ptr’

Time of Check to Time of Use: multi-threading: probabilistic use-after-free bugs.

  • retire immediately, but free it later.

Sep 27 from Wenhao Group

  • Tigris: 3D perception.
  • Point Cloud. -> KD search.
  • Energy efficient & real-time
  • registration pipeline. Huge design space. Frontier points in design space. Bottleneck: KD-tree search. –> optimize it.
  • KD-tree search: sequential nature. –> two-stage KD. –> leaf nodes (unordered set, can be paralled).
  • QLP: query level para; NLP: node-level para.

??

Sep 20 ONR/Adve

  • machine learning to detect vulnerabilities in LLVM IR. Few millions of functions in BCDB.

  • reduce code size improves security: in Geogia Tech. ROP/ret-libc/ Is less really more.

  • SafeCode:

  • TU overhead.

  • Souper (Sclang)

Sandeep.

X86-64 Semantics: formal user level

githb: kframework/ - 3155 instructions - 31353736. - K framework. - BVL: 2016. Strata BVL Semantics.

?- Chelly related work?

ONR:VENKMAN.

  • Retpolines: jmp -> return: BTB -> RSB.

  • BCDB: Bit code database; one module per function;

    • software multiplexing:
      • Busybox: manually multiplexing
      • Allmux: automatically, OOPSLA 2018
  • SoC mobile.

Binocular Depth sensing.

KVell @ SOSP’19: CPU is bottleneck. ==> a new paradigm for persistent KVs to use new SSDs;

Log-Structred merge tree: LSM. B Tree: good for read intensive load; in MongoDB

Solution: multi-threading -> each thread for subset of keys.

Sep 13 Professors Intro

Chen: Memory problems, Locality. - is there a difference of locality between languages? What is the best language to leverage/create best locality for algorithms.

Micheal: - Persistent memory. Denser, cheaper, –> fufuture: use pm to replace disk? - OS structure: fast devices, OS interaction, context switch expensive. - IO device direct with APP, but cannot be monitored by OS. - Communicate just enough to stay fair. Data structure - Hodor: shared library. PKU, MPK. - Move entire os to a library: libOS. - a wall in Rust and C for OS kernel;

John: build systems that is resilient to attack. - harden it. - isolate malicious - security metrics - IskiOS, Silhouette, Apparition, PrivAnalyzer, - what is security: secure state defined by user.

Sandya: - The hardware-software interface - shared mem in hw and sw: DDCache. - adaptive cache granularity

Yuhao: - sw-hw co-design - point cloud, 2d, 3d, no images transfered, just point cloud.

Sree: - heterogenouts - domain specific langs - automate all the things - graph analytics - on CPUs? - accelerating general purpose data structures. - matrices. - Thur 3:30 - 5pm; 2506.

Aug 30 Isaac

Sparse objects; a fast ccall or a traditional ccall; recognize the boundaries; encapsulate different data into same object type?

Continuous data; but mini protection for code: legal ccall; but all other illegal jumps are allowed, but will trap.

Aug 30 Jie & Ethan

Who is my audience, and what do I need to say in order to let them understand my research.

SMAP: Supervisor Mode Access Prevention from Intel.

Ethan: reading notes; should control words condense.

July 26 Edward: Yufei Du. ARM project. Silhouette.

  • Threat model
  • Why do we need separate domains if only one application?
  • C language for embedded system? Still true?

-O3 opt: caches. Why use O3?

#July 19 Lele: CheriABI

July 15, Xiaowan: efficient and protected address translation in memory management

  • in core: TLB, PTW, page table walker.
  • Eurosys’16: sharing address translation info. –> share them all: -> single address space.
    • sharing pagetable and TLB.
    • android, zygote-preloaded shared lib;
    • 88 to 107 shared lib loaded; used 24-68.
    • instruction traces: 98% from shared; 70% from zygote;
    • page traces: 46% pages from shared lib
    • translation: 35% duplicate translation.
  • Android process model:
    • all are children of zygote
  • TLB: global bit, overwrites the ASID in TLB; primarily used for kernel pages.
    • zygote, set global bit
    • to avoid kernel service to change shared lib: domain field.
      • domain protection model
  • zygote fork: x2.1 speedup
  • application: launch: 10%
  • exe: 38% fewer page faults. 35% fewer pt pages
  • android ipc: last level TLB stall. reduced.

  • super page promotion:

    • for code: linux vs freebsd
    • relaxing the policy of automatic superpage promotion.
    • up to 8.5%
    • memory consumption. sometimes can load large page, but most not used.
  • padding residual code:

    • 18% improve

Side-channel attacks

  • os removed from TCB –> still have side channel attacks.
  • apparition
  • page table side channel

    • direct map: virtual - physicall for kernel to access.
    • lazy allocation in malloc disabled.
  • last level cache side channel

    • apparition VM, prevent OS from reconfiguring the cache partition.
  • bound check bypass attacks

    • spectre
    • separate addr space
    • MPX SFI with lfence
    • Bit masking sfi

July 12, Chen: timescale function

  • APF: allocation per fetch.
  • lifetime: average time between two page faults.

Timescale functions:

  • thread local reserve.
  • central reserve.
  • empty page pool.

sree: jemalloc.

memory reuse in a window.

June 28, 2019: MPK/PKU intra-address space isolation

i/o op as libraries, out of kernel.

Hodor loader: assign key to diff domains.

-challenge 1: pkru register mod instructions: not privileged. - sol; scan binary executable pages, undecidable, not disassemb. in loader. use debug registers: watch points: pkru instructions at runtime. 4 registers: 4 watch points: as the cache for illegal sequences. P15.

  • use trampoline to switch between page tables.

  • questions:

    • the slow down of > 4 debug registers.

    • how many isolated domains at maximum? > 16 domains

    • how to avoid hijacking the trampoline? Not Atomic. A final check on registers.

    • implicit pkru instructions?

Sandiya: use the 5 minutes QA to get yourself back to highlight your work.

Jun 21, 2019 PrivAnalyzer.

bounded model checking: ROSA, in Maude (term rewriting logic). limit the number of times each process can call a sys call. INSECURE state? – undecidable or not; DAC undecidable. – number of system call the process can use. –

AutoPriv: Src – static analysis

ChronoPriv: binary – dyn priv analysis – provides input to model checker. processes + privileges + user ids.

credentials?

linux privileges: permitted priv set effective priv set.

Evaluation: /dev/mem, open r/w, priv port. send sigkill

  • security analysis passed, ping, su, thttpd – root has everything.

  • solution:

    • saved id has root
    • different users for diff files, instead of all are root as owner.
Created Dec 2, 2019 // Last Updated Dec 4, 2019

If you could revise
the fundmental principles of
computer system design
to improve security...

... what would you change?