Seminars/Lectures
From CMU.
Pushing the limits of online and offline caching.
Marry thery and practice
E.G. Mysql, cache can be 100x faster.
Caches: Think smarter, not larger: a better cache policy. LRU, ++.
Caching: theory questions; practical questions; online & offline policies. ==> an intersection of these.
Beckmann, NSDI’18. LHD, AdaptSize, Hyperbolic, GDSF, LRU. Figure.
Caches have performance Cliffs.
LRU: cliffs. 31GB/32GB -> miss 100% until 32/32GB.
Talus: random sampling.
Metric: Hit density: (hit probability)/(expected lifetime)
Least Hit density eviction.
Open problems:
OPT:
OPT in practices:
NP-hardness is ‘artificial’, and real traces are ‘easy’.
==> FOO: computing OPT in real traces efficiently. Map LP to mini-cost flow.O(N^2log^2(N))
==> PFOO: O(Nlog^2N)
Write-back aware caching.
OPT: intractable, (not uncomputable).
From PerceptIn
R-CNN
Object Tracking: MDP POMDP (deep learning)
Stereo (content CNN , ELAS) and Optical Flow (& FlowNet)
traffic prediction.
lane routing
behavioral decisions: for neighbor vehicle.
motion planning. path planning.
feedback control.
-> Robotic OS. ROS: Topic and Topic.
Cloud: spark* in autonomous driving
HD Map: Raw data -> point cloud -> alignment -> 2d reflectance map -> HD Map labeling -> high precision map
==> very expensive; e.g. Google map update once a year.
–> but we need update at least once a week. ==> who pays for it?
Problem: heavyweight
** Edge computing
KITTI data set (but we still need benchmarks).
RTOS -> VX QNX, space machines.
Middleware -> ROS: not stable.
** Potential cyber attacks on Automated Vehicles. IEEE journal **
Nanomsg + ROS reimplementation
nanomsg: 50% more performance.
Ruiqin: Value-set Analysis.
CC’04.
Page 10. Read examples.
Give examples; what is a a-loc.
“i don’t have a complete example” – Sree
Micheal:
locks: when application can fails, others cannot proceed.
Hardware primitives: for load and stores.
load and store conditional; do load and tagged L1 cache; store when tag still valid;
write with same old value. ==> CAS cannot fail; SC will fail.
A hardware swap: always a success; unconditional.
sequential consistency: global visible way; but no hardware behaves that way;
ABA problem: same address got freed and reallocated to the same address; but CAS regard as same before;
dead-lock free is necessary for starvation free; but not sufficient.
Linearizability (Herlihy & Wing 1987)
Exe hisotry H is linearizable, n threads, each sequential, exe in parallel; the result is the same as never overlapped (but different order is allowed).
Linearization point: points that takes effect and visible to others;
wait-free (starvation-free): there is a bound of instructions to finish; e.g. fetch_and_add based counter;
lock-free (=livelock-free): a bounded number of instructions for others to finish; itself might be starved; e.g. CAS-based counter;
obstruction-free (=merely nonblocking): my ops is guaranteed to complete in a bounded number of steps if I get to run all myself; external monitor/scheduler will take control if passed limited steps; e.g. deqeue of Herlihy.
Treiber stack
Replicated counter example: array of counters, one counter for each thread to update;
scan array and add to total counts;
BUT no static linearization point: the each count being added might not be exist simultaneously at any time point;
Micheal & Scott queue [1996]
A -> B -> D Insert C and delete B at same time; C got lost
Harris’s insight: tag next pointer (D); lazy xxx using gc
Micheal’s idea: no gc
Favorate non-block data structure: Extensible hash table by S &Shavit
compatible with legacy code.
web servers do not have much use of alloca/dealloca;
use after free: elevation from application to kernel. CVE examples.
??
machine learning to detect vulnerabilities in LLVM IR. Few millions of functions in BCDB.
reduce code size improves security: in Geogia Tech. ROP/ret-libc/ Is less really more.
SafeCode:
TU overhead.
Souper (Sclang)
Sandeep.
X86-64 Semantics: formal user level
githb: kframework/ - 3155 instructions - 3135⁄3736. - K framework. - BVL: 2016. Strata BVL Semantics.
?- Chelly related work?
ONR:VENKMAN.
Retpolines: jmp -> return: BTB -> RSB.
BCDB: Bit code database; one module per function;
SoC mobile.
Binocular Depth sensing.
KVell @ SOSP’19: CPU is bottleneck. ==> a new paradigm for persistent KVs to use new SSDs;
Log-Structred merge tree: LSM. B Tree: good for read intensive load; in MongoDB
Solution: multi-threading -> each thread for subset of keys.
Chen: Memory problems, Locality. - is there a difference of locality between languages? What is the best language to leverage/create best locality for algorithms.
Micheal: - Persistent memory. Denser, cheaper, –> fufuture: use pm to replace disk? - OS structure: fast devices, OS interaction, context switch expensive. - IO device direct with APP, but cannot be monitored by OS. - Communicate just enough to stay fair. Data structure - Hodor: shared library. PKU, MPK. - Move entire os to a library: libOS. - a wall in Rust and C for OS kernel;
John: build systems that is resilient to attack. - harden it. - isolate malicious - security metrics - IskiOS, Silhouette, Apparition, PrivAnalyzer, - what is security: secure state defined by user.
Sandya: - The hardware-software interface - shared mem in hw and sw: DDCache. - adaptive cache granularity
Yuhao: - sw-hw co-design - point cloud, 2d, 3d, no images transfered, just point cloud.
Sree: - heterogenouts - domain specific langs - automate all the things - graph analytics - on CPUs? - accelerating general purpose data structures. - matrices. - Thur 3:30 - 5pm; 2506.
Sparse objects; a fast ccall or a traditional ccall; recognize the boundaries; encapsulate different data into same object type?
Continuous data; but mini protection for code: legal ccall; but all other illegal jumps are allowed, but will trap.
Who is my audience, and what do I need to say in order to let them understand my research.
SMAP: Supervisor Mode Access Prevention from Intel.
Ethan: reading notes; should control words condense.
-O3 opt: caches. Why use O3?
#July 19 Lele: CheriABI
zygote
android ipc: last level TLB stall. reduced.
super page promotion:
padding residual code:
page table side channel
last level cache side channel
bound check bypass attacks
Timescale functions:
sree: jemalloc.
memory reuse in a window.
i/o op as libraries, out of kernel.
Hodor loader: assign key to diff domains.
-challenge 1: pkru register mod instructions: not privileged. - sol; scan binary executable pages, undecidable, not disassemb. in loader. use debug registers: watch points: pkru instructions at runtime. 4 registers: 4 watch points: as the cache for illegal sequences. P15.
use trampoline to switch between page tables.
questions:
the slow down of > 4 debug registers.
how many isolated domains at maximum? > 16 domains
how to avoid hijacking the trampoline? Not Atomic. A final check on registers.
implicit pkru instructions?
Sandiya: use the 5 minutes QA to get yourself back to highlight your work.
bounded model checking: ROSA, in Maude (term rewriting logic). limit the number of times each process can call a sys call. INSECURE state? – undecidable or not; DAC undecidable. – number of system call the process can use. –
AutoPriv: Src – static analysis
ChronoPriv: binary – dyn priv analysis – provides input to model checker. processes + privileges + user ids.
credentials?
linux privileges: permitted priv set effective priv set.
Evaluation: /dev/mem, open r/w, priv port. send sigkill
security analysis passed, ping, su, thttpd – root has everything.
solution:
If you could revise
the fundmental principles of
computer system design
to improve security...
... what would you change?