Basic theories on encoding speech (or any sound) digitally were laid down over 50 years ago. You might do a web search on Claude Shannon from Bell Labs. The basic theory goes that, to record frequencies up to XX hertz, you need to take digital samples at twice that frequency. So, in order to record spoken telephone up to 4 kHz, you need to take samples at 8 kHz. Each sample can have one or more bits, with more bits translating to higher quality. 8 bit samples were deemed good enough for voice, thus resulting in the standard of 64 kbps for standard telephone conversation.
(Music is usually sampled at 44 kHz or faster, with 16 bits or more.....)
When the early work was done, compression of a bit-stream wasn't practical. We actually had to have digital bandwidth sufficient to transmit 64kbps for a digital circuit. For a long time, it was more economical to just send analog signals rather than try to do digital. Today, of course, it's cheap to send gigabits over wire or fiber. But radio spectrum is scarce. So, modern cell phone systems need a way to use less bandwidth for each call. Using GSM techniques, 13 kbps could be used to carry a compressed signal with roughly the same quality as uncompressed 64 kbps. CDMA techniques are a bit more advanced, and can compress even further.
I don't have references to GSM encoding techniques, but some google searching should reveal them. Expect to find several different techniques. GSM standards were set many years ago, when proccessing power inside a cell phone was limited. Today, more advanced encoding techniques are possible.
One interesting factor to consider is the time-delay imposed by encoding. An unencoded bit stream can be converted almost immediately upon receipt. GSM and CDMA techniques impose some delay; several bits have to be received and processed in groups before audio can be re-generated. That is at least part of why cell phone calls sound bad; the delay between when one party speaks and the other hears him. You can demonstrate this by calling yourself (from cell to landline) and listening to the delay. Perhaps future encoding schemes will improve the latency and give us better quality, in addition to being spectrum efficient.
Also, there are many approaches for getting the most out of a given radio spectrum. Encoding efficiency is only one. Some of the more exciting ones involve spatial diversity; recognizing signals from different cell phones using the same frequency at the same time, just based on their location. Powerful directional antennas (at the cell tower site) can distiguish between two users, allowing one frequency to be re-used many times in the same neighborhood.