A bit pattern of length $n$ is (here) just a vector consisting of $0$'s and $1$'s of length $n$. These are the messages that are actually sent and can be disturbed.

A code of length $n$ is just a set of such vectors and each of them stands for a message: in the case of a linear code, as the Hamming example, you have a generator matrix $G$, and a message of length $4$ (so a 4 bit vector) that you multiply from the left by $G$ to get the code word. Because we have a $4 \times 7$ matrix in this special form, the result is a $7$ bit vector that has as its first 4 bits the message vector and the last $3$ bits are called the *check bits* or *parity bits*.

Now the Hamming code has $16$ code words, because we have $2^4 = 16$ messages to send. So **not** just the rows of the $G$ matrix, but also all its linear combinations. The transmission can flip bits randomly and the receiver receives some $7$ bit vector that might be a code word or not (he could have received one of $2^7=128$ vectors).

Now if you consider a code word and you flip exactly *one* bit, the resulting vector could be one of $7$ vectors (depending on which position the flip occurred) but the first part of being perfect (for $t=1$, as we have for Hamming) means that all codewords plus all 1-flips of codewords together exactly form all possible 7-bit receivable words, and the flip is uniquely reconstructible: no two code words are exactly 2 flips apart so that you can change a code word $w_1$ by 1 flip to $w_1'$ and be certain there is no other code word $w_2$ and another flip that changes it into $w_1'$. The so-called Hamming-weight (closed) balls of radius 1, are exactly a disjoint cover of the vectors of length $7$ of size $2^7$ and this is plausible as each such ball has $1+7=8$ vectors and $16 \times 8 = 2^7$.

So when we use a Hamming code we can correct exactly one error occurring, but only detect an error for 2 errors and we'd get a wwrong decode. With three errors we can transform one codeword in another one, which you can already see in the first 2 rows of $G$ (flip 1, 2 and 6, say).

Normally you'd have a *check matrix* $H$ for decoding of a linear code: if you multiply this matrix by a code word you get $0$ and if some error occurs you get some unit vector that tells you what position to correct (syndrome decoding).