Parallel Algorithm
:material-circle-edit-outline: 约 474 个字 :fontawesome-solid-code: 42 行代码 :material-image-multiple-outline: 8 张图片 :material-clock-time-two-outline: 预计阅读时间 5 分钟
Intro¶
- Parallelism:
Machine Parallelism: Processor, Pipelining, Very-Long-Instruction-Word (VLIW), etc Parallel Algorithm: Today's topic
-
Discription:
PRAM: Parallel Random Access Machine
WD: Work Depth Model -
PRAM:
multi processors, shared memory, parallel read/write, concurrent execution, etc.
-
Avoid Conflicts:
EREW: Exclusive Read Exclusive Write
CREW: Concurrent Read Exclusive Write
CRCW: Concurrent Read Concurrent Write, three rules arbitrary, common(when trying to write the same value), priority(P with the smallest number) -
Example:
LessCss | |
---|---|
The prblems are obvious:
1. Does not reveal how the algorithm will run on PRAMs with different number of processors
2. Fully specifying the allocation of instructions to processors requires a level of detail which might be unnecessary
- WD: Layer by layer
Measuring the Performance of Parallel Algorithms¶
Work Load: total number of operations: \(W(N)\)
Worst Case Running Time: \(T(N)\)
\(P(n) = W(n)/T(n)\) processors and \(T(n)\) time (on a PRAM)
\(W(n)/p\) time using any number of \(p ≤ W(n)/T(n)\) processors (on a PRAM) \(O(W(n)/p + T(n))\) time using any number of p processors (on a PRAM)
- Let's take a look back to the previuos example pf WD.
- Work Load:
\(W(n)=n+n/2+n/4+\dots+n/2^k+1\), where \(k=\log n\)
WD-presentation Sufficiency Theorem
An algorithm in the WD mode can be implemented by any \(P(n)\) processors within \(O(W(n)/P(n) + T(n))\) time, using the same concurrent-write convention as in the WD presentation.
Prefix Computations¶
Illustration
- Remember when constructing thew whole tree, construct step by step
Merging¶
- Merge two non-decreasing arrays \(A(1), A(2), …, A(n) and B(1), B(2),\dots, B(m)\) into another non-decreasing array \(C(1), C(2),\dots, C(n+m)\)
- To solve this problem in parallel, we need a rank(acknowlege where to put it, thus not altering C)
Illustration
- It's OK to use binary search or serial ranking.\, but they either increases workload or time consumed.
- Parallel Ranking:
Tip
Use BS in the first search
Tip
Maximum Finding¶
- Simple Solution: Replace “+” by “max” in the summation algorithm \(-> T(n)=O(\log n)\), \(W(n)=O(n)\)
- or we can compare all pairs of elements in parallel, but how to deal with conflicts?
Note
Note
- Then,here comes the question: Is it possible to have a \(O(1)\) time complexity?
Solution
Theorem: The algorithm finds the maximum among n elements. With very high probability it runs in O(1) time and O(n) work. The probability of not finishing within this time and work complexity is \(O(1/n^c)\) for some positive constant c.