Implementating Telnet
Telnet Basics
Telnet was designed to mimic character-based terminals. The majority of the data sent over a telnet connection is pure character data. Each byte is generally represented as a single character (although these days other encodings, such as UTF-8, are usable on telnet). The vast majority of telnet servers or clients can work with software that doesn't use the telnet protocol and just sends raw character data.
However, telnet does include a framework for feature negotiation and non-character-data communication. It is these features that take up most of the work of a complete and correct telnet implementation.
Protocol Details
Telnet uses the value 255 (the largest value that can be held in an unsigned byte) to control its protocol. This byte is called the IAC byte in the telnet protocol.
Every telnet request outside of the normal character data stream begins with an IAC byte. This means that the protocol handlers only ever need to check for this byte to detect when a telnet request starts, and no other multi-byte combinations are needed. This makes telnet fairly easy implement compared to some other protocols.
There are four main classes of telnet request: IAC escaping, feature negotiation, sub-requests, and "other" requests.
IAC Escaping
A telnet IAC escape sequence is simply IAC IAC. This means that the value 255 (just one) should be inserted into the character stream. This allows character streams with binary data to send bytes with the value 255.
This request is VERY important! It is absolutely essential that all normal character stream bytes with the value of 255 be sent as an IAC IAC sequence. If this is not done then miscellaneous output from the server or the client input could be interpreted as telnet codes. This is a security risk. For example, a client could send a telnet sequence that would cause display malfunction if received by a client to a server that relays the client input to other clients. An example of this kind of software would be a MUD server. Always escape IAC bytes in the normal character stream.
Feature Negotiation
Telnet feature negotiation is used to enable and disable various protocol features.
One of the more visible examples would be password prompts. Normally, a client does not echo characters you type - instead, it simply sends them to the server, and the server is responsible for sending them back to be displayed on your screen. For many modern applications, this behavior is turned off due to the obvious performance problems. However, for password entries, the server will re-enable this behavior, causing your client to not print out anything you type. The server will not send back what you type, however. This results in you typing in text that is sent to the server but is not displayed on your screen, protecting the privacy of your password. Another example would be any non-line based application interface, such as a text editor, where keys are interpreted as commands and not to be displayed.
All telnet feature negotiations are three bytes in length. The IAC byte, following by a mode, and then an features code. The mode is one of DO, DONT, WILL, or WONT. The option codes are numbers that represent the various features available, such as ECHO, MCCP, ZMP, or many others.
The modes have specific meanings. When DO is received, it means that the other end of the link is requesting that this end enables a particular feature. Likewise, DONT requests that the feature be turned off. WILL is one of two things - it is either an offer to enable the feature (i.e., "I can do this, do you want me to?") or a confirmation that a feature will be used after it has received a DO request. WONT is a refusal to enable the feature.
Just to make things slightly confusing, each feature has its own idea of which end of the connection may request, whether it should be requested first or offered first, and so on.
Sub-Requests
Sub requests are possibly the most complicated feature. A sub-request is, essentially, a block of data. The meaning of this data is entirely arbitrary. You don't even know how long this data is until you have received all of it.
A sub-request begins with three bytes. IAC SB followed by a feature code. SB is the value 250. The feature code tells you how to interpret the sub-request data. All of the following bytes are the data, until you receive IAC SE. SE is the value 240.
It must be noted that since the sub-request is arbitrary data, and that data could include the byte sequence 255 240, the IAC byte must be escaped using the IAC IAC routine. It is this small issue that makes implementing sub-requests more difficult than the other telnet requests.
It is vital that implementations be very careful with sub-request data. First, because you cannot know the length of the sub-request until all of the data comes in, you must buffer it. If you do not put an upper bound on the buffer size, buggy or malicious software could cause your code to exhaust all available memory buffering up an infinitely long sub-request.
Second, it is possible for a sub-request to have invalid data due to either buggy or malicious software. Always check the data to make sure that it fits the specifications for the feature it is part of. For example, the NAWS feature expects four bytes of data in its sub-request - two bytes for the row count and two bytes for the column count. You must check that the data you receive with a NAWS sub-request is exactly four bytes long. (And remember, IAC IAC sequences are really only one byte of data with the value 255.)
Other Requests
All other requests are two-byte values. If you receive an IAC byte followed by another byte that you do not expect then they should be ignored. Check the telnet specification for all of the other available telnet requests. Some of them are rather useful to MUD developers.
Network Anarchy
One of the most frustrating and yet common problems I experience when looking at other programmer's network code is a complete lack of understanding on the programmers' parts as to how networking works. It is vital to understand the fundamentals of networking, particularly the TCP connections that telnet works over, before you attempt to write stable, secure network code.
The most important part of TCP to keep in mind are the packets. All data sent over the network is encoded in these packets. On the actual network, packets can get lost, get rearranged into different orders, and get corrupted. Protocols like UDP require that the programmer handle these issues. TCP takes care of it for you.
However, TCP does not offer any guarantees as to how or when the packets will be available. Take, for example, a client that is sending a single line of text to the server to be interpreted as a command. Many, many buggy telnet servers will make a single read() or recv() system call and always assume that lines are sent whole. However, that line may be broken into two packets. If only one packet has arrived at the time the software reads in from the socket, it will only get half a line. Buggy servers will either then try to execute the incomplete command (with potentially disasterous results) or throw it out as invalid, only to then exhibit the same problem with the second half the command when its packet comes in. Alternatively, the client might send several lines of text which may either be combined into a single packet or the multiple packets might all arrive and the server could read them all at once, possibly interpreting them as a single command.
The same problem can affect telnet requests as well. Take a simple three byte request like IAC WILL ECHO. It is possible that the only the first two bytes will arrive in a packet and the server will read those before the next packet with the third byte arrives. If the software tries to evaluate the request in this case it will (hopefully, at least) assume an invalid request as sent.
This simple fact is not understood by far too many MUD developers, unfortunately. I have seen some of them obstreperously argue against me on this. They assume that since they have never seen this problem that it doesn't exist. They are merely luckly. In most cases, whenever a client sends a chunk of data, it will be sent as a single, whole packet, and this will be received by the server as a single, whole packet. When large amount of data, more than can fit in a single packet, are sent across network with high latency, congestion, or packet loss these once-working applications will suddenly start misbehaving wildly.
Would it not be far better to have a correct telnet implementation that can handle these real life scenarios from the very beginning instead of leaving the stability and security of your client or server application up to pure luck?
Other Considerations
This document, while focusing on telnet, also addresses the problems found using protocols implemented on top of telnet, such as ANSI terminal control codes (i.e. color) and protocols such as MXP and MCP. While handling these can be rather tricky compared to the simplicity of telnet, the techniques provided are the same as what you would use to correctly decode ANSI sequences or MXP and MCP commands.
Implementing Telnet Correctly
Many years ago I had a question: how do I implement telnet correctly? I too had started out with a very simple implementation of telnet that did not handle the scenarios above correctly. I was "lucky" in that my test environment had occassional packet loss due to low quality hardware, and I was noticing these problems fairly early on.
Fortunately, the Open Source software movement has been very successful. There are billions of lines of code available for developers to study and learn off of. I decided to study (but not copy, due to copyright concerns, which you should always take very seriously!) the GNU telnet server implementation. What I found was a simple state machine. Some time later, when studying some other telnet implementations, I found that this mechanism was used in them, as well. Another approach, albeit one I'm less fond of, is the stack analyzer approach.
Absolute Wrong Way
I have seen more code than I ever cared to that handles telnet absolutely wrong. Just to make sure it's quite clear what the wrong way here, here's a description of the code I have found in many MUD servers and clients.
The method used by much software is to simply read in a large chunk of data (say, 1024 to 4096 bytes) and process it all in one go. That processing is usually a variation of the stack method mentioned below.
This simply does not work! If a telnet request sequence (or an ANSI terminal control sequence) happens to be broken across two recv calls then it simply fails to be processed correctly. That isn't acceptable at all, under any circumstance, when simply coding the telnet layer of the software properly would completely avoid the problem!
Another problem I see in many implementations, even those that use either of the below approaches, is a failure handling the IAC IAC escape sequence. If those are not handled properly then certain (albeit likely rare for most MUDs) data sequences will not be interpreted correctly causing various problems. Again, it's not a good idea to accept erroneous behavior when correct behavior can be had by simply using correct code.
Stack Analyzer
The stack analyzer approach works by buffering all received bytes. Every time it receives data the software appends the data to the end of its received buffer. It then analyzes the front of this buffer to see if it comprises a valid telnet command.
The analyzer generally works by using a number of nested if/else clauses. It checks if the first byte is IAC. If so, it checks the next byte to see what kind of telnet request it is, and so on. If it hits the end of the buffer before finding a complete telnet request, it assumes that it has only received a partial request, and simply stops. When more data is received, it tries again.
If the front of the buffer does not start with an IAC byte, it is considered normal character data. All of the buffer up until the first IAC byte, or the end of the buffer, is removed from the buffer and sent on its merry way, while the remaining data in the buffer is moved up to the front. Then analyzer then starts over.
Its important to note that the character data may need to be placed in its own buffer. For example, on a server that only accepts commands as lines of text, it has to buffer up the character data until it receives a newline. For a client, the data must be put into the output buffer.
This mechanism is unfortunately complex. The nested if/else clauses can result in horrificly difficult to read code, and re-analzying the buffer over and over is inefficient. The mechanism certainly works, but it suffers from potential efficiency problems that the state machine approach does not.
This method of parsing telnet is very similar to how many programming language compilers work. They break the data stream into tokens (like special symbols, variable names, strings, numbers, and so on) and then analyze the available tokens to find valid patterns to interpret. Although the technique is about the best you can get for a compiler, it's not necessarily the best for parsing a network stream - compilers put a lot of effort into error handling to help the programmer. Your telnet code's job is not to help other people debug their poorly written programs, but to efficiently handle valid telnet data, and therefor should be optimized for that use-case.
State Machine
The state machine approach relies on remembering the state of the received data stream. It does not require a buffer for the normal request data, although it does require a buffer for sub-request data and the normal character stream if it requires further processing, such as waiting for full lines of text.
The code works by scanning over the received data character-by-character. Then, depending on the current state, it handles the character in various ways. This is usually implemented as a C switch construct inside of a loop.
The system begins in the Text state. All characters seen in the Text state are copied into the character buffer or handled immediately for display/processing, depending on your application's need. As soon as a single IAC byte is seen, however, the state is changed to Iac.
Now, in the Iac state, character are handled differently. Bytes like SB and DO are checked for. If an unexpected byte value is found while in the Iac state then the code assumes an unknown two-byte telnet request has come in, ignores it, and sets the state back to Text. If an IAC byte is seen, it then handles the value 255 as a normal character input and the state is set back to Text. That is, it turns an IAC IAC escape sequence into a character value of 255, which is the correct behavior.
When a negotiation codea (DO, DONT, WILL, WONT) is seen in the Iac state, the state is set to Do, Dont, Will, or Wont as appropriate. When a byte is then seen in one of those states it is considered a feature code and handled appropriately and the state is set back to Text. So, IAC WILL ECHO puts the handler in the Iac state, then the Will state, then processes the "will echo" command and resets to the Text state, as it should.
The only difficult state to handle is when an SB byte is received during the Iac state. The state is then set to Subrequest. All bytes then seen while in this state are added to the sub-request buffer, except for IAC bytes. When an IAC byte is seen, the state changes to SubrequestIac. In this state, only one of two bytes are expected to be seen. If another IAC is seen, a single byte of value 255 is put on the sub-request buffer and the state changes back to Subrequest, thereby properly handling an IAC IAC escape sequence in the sub-request. If an SE byte is seen, the state is switched back to Text, and the sub-request buffer is evaluated; the first byte in this buffer will be the feature code of the sub-request, which informs the software as to what the data is supposed to mean. If any other byte value is seen in the SubrequestIac mode, it should be considered an error, the state set back to Text, and the sub-request buffer cleared.
This description certainly sounds complex, I know. It is really very simple in code, however. You can check out AweMUD for a C++ telnet implementation example (BSD licensed, so feel free to use the code so long as you adhere to the license terms), PyCL for a Python implementation (same license), or the GNU or BSD telnet and telnetd implementations for C-based examples.
This method works because the state is stored between two data receptions. The implementation doesn't care if it sees two bytes right next to each other at the same time from a single reception or if the first byte is seen two minutes before the second. Either way, the state will be remembered and the second byte will be handled appropriately.
Additionally, because this approach to telnet does not request a buffer for normal data operation, and it does not require analyzing the same data multiple times as does the Stack Analyzer approach, it makes better use of memory and CPU resouces. For applications that do not use any sub-requests and can handle character input immediately there is no need for any buffers between processing at all.
Conclusion
Implementing a complete and correct telnet layer for your software is not particularly difficult. It does require a little more than just interpreting chunks of data as soon as they come in, however.
A complete and accurate telnet implementation will reduce the chance of bugs or security flaws in your application.

