# CS 127 Spring 2018 - Homework Zero

Due on **Thursday January 25 at midnight** but I recommend you do this homework even before the first lecture.

Please write the names of yourself and your collaborators below: (this page does not contain any solutions, and so is not viewed by the people grading your problems on gradescope.)

__Collaboration policy for homework zero:__ You can work on this problem set and submit it in **groups of one or two students**. In addition you can discuss these problems with other students that are considering taking this course. However, each pair needs to write up their solution on their own, and you need to write down the names of your collaborators below.


__Submitters:__ (List your names and HUIDs here)

__Collaborators:__ (List here anyone you discussed problems or ideas for solutions with)

If submitting late, for every pair member add a sentence of the form __"Submitting X days late, Y has used (including this time) Z days out of the 6 total late days"__

For any questions or clarifications, please see the Piazza board. See the [course syllabus](https://www.boazbarak.org/cs127/syllabus/) for policies.

This  homework needs to be  submitted as a PDF document. The PDF can be  generated using either markdown or LaTeX. The Markdown source for this homework is posted online.

\newpage

## Probability questions

As we'll see in the first lecture, much of cryptography relies on probability theory, and so basic knowledge of probability will be essential. The [CS 121 probability review](https://www.introtcs.org/public/lec_15_probability.pdf) is one source for the probability theory we will need.
During the course  I will assume you are familiar with (or can pick on your own)  all notions presented there. However, if you find these questions unfamiliar or difficult you should not despair!
There are plenty of sources on probability on the web, and in particular Harvard STAT 110 and its textbook are of course  wonderful resources.
If any of the notation is unfamiliar, looking at the lecture notes might help, and otherwise feel free to ask questions on Piazza, even before the semester starts!


__Notation:__   While often in probability theory people use the name "random variable" for a distribution over the set $\mathbb{R}$ of real numbers, it will be convenient for us to generalize this to arbitrary sets, and hence we will use the following notation. We define a _random variable_ or _distribution_  $X$ over a finite set $S$ to  correspond to the probabilistic experiment where we draw an element $x$ from $S$ with some probability, which we denote by $\Pr[ X=x]$.    All we need from these probabilities is that they are non-negative and sum up to one. (One can also consider distributions over infinite sets, though almost always in this course we will restrict ourselves to the finite case.) We use $x \leftarrow_R X$ as shorthand for saying that $x$ is drawn accoring to the distribution $X$. If $f:S \rightarrow T$ is a function, then the random variable $f(X)$ corresponds to the probabilistic experiment where we draw $x \leftarrow_R X$ and output $f(x)$.

__Question 0.__ Join the course Piazza board at [https://piazza.com/harvard/spring2018/cs127](https://piazza.com/harvard/spring2018/cs127).


\newpage

In this question we will study the  notion known as [Total Variation](https://en.wikipedia.org/wiki/Total_variation_distance_of_probability_measures) or statistical distance. It is a basic notion of distance between probability distribution, and its computational analog is fundamental for cryptography.


__Question 1.1:__ If $X$ and $Y$ are two distributions over the same set $S$, we define $\Delta(X,Y)$ (also known as the _statistical_ or _total variation_ distance of $X$ and $Y$) to be $\sum_{x\in S} | \Pr[X=x]- \Pr[Y=x] |$. Prove that for every function $f:S \rightarrow [0,1]$,  $| \mathbb{E}[f(X)] - \mathbb{E}[f(Y)] | \leq \Delta(X,Y)$.

__Solution 1.1:__


__Question 1.2:__ Prove that the statistical distance satisfies the _triangle inequality_: For every three distributions $X,Y,Z$ over the same set $S$, $\Delta(X,Z) \leq \Delta(X,Y)+ \Delta(Y,Z)$.

__Solution 1.2:__

\newpage


We will now use the notion of statistical distance to study one of the most basic questions in probability theory: if we are given a coin that is either completely unbiased, or has bias $\epsilon>0$ towards "heads", how many tosses will it take for us to distinguish between the two cases.

__Question 2:__ Prove that this can be done in at most $O(1/\epsilon^2)$ coin tosses. Specifically prove that if $k>100/\epsilon^2$, then there exists some function $f:\{0,1\}^k \rightarrow \{0,1\}$ such that if $X$ is the uniform distribution over $\{0,1\}^k$ and $Y$ is the distribution obtained by tossing $k$ independent coins, each equalling $1$ with probability $1/2+\epsilon$ and equalling $0$ with probability $1/2-\epsilon$, then $\Pr[ f(X)=0 ] > 0.9$ and $\Pr[ f(Y) = 1 ] > 0.9$, and hence $f$ can distinguish between $X$ and $Y$.^[__Hint:__ Use the Chernoff bound.]

__Solution 3:__

\newpage


We now study the converse problem: showing a _lower bound_ on the number of coin tosses needed.

__Question 3.1:__ (This problem involves some  notation, so take your time reading it carefully and parsing what it means.) For every $k \in \mathbb{N}$, $0 \leq i \leq k$, and $\epsilon>0$, let $X^{k,\epsilon}_i$ be the following distribution over $\{0,1\}^k$: the first $i$ bits of $X^{k,\epsilon}_i$ are chosen independently and uniformly at random, and the last $k-i$ bits are chosen independently at random with but each is equal to $1$ with probability $1/2+\epsilon$ and equal to $0$ with probability $1/2-\epsilon$. Prove that for very $k$, $i<k$ and $\epsilon$, $\Delta(X^{k,\epsilon}_i,X^{k,\epsilon}_{i+1}) \leq 10\epsilon$.

__Solution 3.1:__


__Question 3.2:__ Prove that, in the notation of Question 2, $\Delta(X,Y) \leq 10k\epsilon$. Show that this implies if $k< 1/(100\epsilon)$, there does not exists a function $f:\{0,1\}^k \rightarrow \{0,1\}$ such that $\Pr[ f(X)=0 ] > 0.9$ and $\Pr[ f(Y) = 1 ] > 0.9$. Use your solution for Question 3.1 (this is a special case  of the [Hybrid Argument](https://en.wikipedia.org/wiki/Hybrid_argument_(Cryptography)) which is used time and again in cryptography).

__Solution 3.2:__

\newpage

__Question 4 (bonus problem: optional and more challenging):__ Prove that in fact $\Delta(X,Y) \leq O(\sqrt{k}\epsilon)$.^[__Hint:__  One way to prove this is to use the notion of KL divergence which is another notion of distance between the distributions that satisfies the triangle inequality. You can show that the KL divergence of $X^{k,\epsilon}_i$ and $X^{k,\epsilon}_{i+1}$ is $O(\epsilon^2)$ and then use the Pinsker Inequality that shows that the statistical distance between two distributions is at most the square root of their KL divergence. You can read about both KL divergence and the Pinsker Inequality in Wikipedia as well in several other sources.] Conclude that the method of Question 2 is essentially _optimal_ in the sense that  there exist some absolute constant $\delta$ (independent of $\epsilon$)  such that  for every $\epsilon$ and distribution $X,Y$ as in Question 2, if $k< \delta/\epsilon^2$ then there does not exists a function $f:\{0,1\}^k \rightarrow \{0,1\}$ such that $\Pr[ f(X)=0 ] > 0.9$ and $\Pr[ f(Y) = 1 ] > 0.9$.

__Solution 4:__

\newpage


## And now for a little crypto


__Question 5 (very important! but no grades :) ):__ Read the lecture notes for [lecture 1: introduction](https://www.intensecrypto.org/public/lec_01_introduction.pdf).

__Solution 5:__ Write here "I solemnly swear that I read every word of the lecture notes". (Just kidding, I trust you, and in any case I expect you to read every word _twice_.)


__Question 6:__ Prove that an encryption scheme $(E,D)$ with messages of length $\ell$ and keys of length $n$ is _perfectly secret_ if and only  for every $x,x' \in \{0,1\}^\ell$, $\Delta(Y^x,Y^{x'})=0$, where $Y^x$ is the distribution  obtained by choosing a random $k \leftarrow_R \{0,1\}^n$ and outputting $E_k(x)$.

__Solution 6:__