Security takes a layered approach to reduce the risk to our organization. Input validation is the perfect example of one of these layers. In most cases, input validation is 1 factor in a multi-pronged approach to protecting against common vulnerabilities.
Take any course on secure development and they will, or should, mention input validation as a mitigating control for so many vulnerabilities. You might notice that it always comes with a but. Use input validation, but also use output encoding. Use input validation, but also use parameterized queries.
The reason for this is that input validation that covers every potential vulnerability is hard, if not impossible. Even input validation for a specific vulnerability can be complex and difficult. We have to walk this line of good validation, but also a useable application.
I always recommend not trying to look for specific vulnerabilities or attack strings within an input validation function. That turns into a blacklist of attacks that is impossible to keep up with. In addition, you may be looking for attack strings that don’t matter to that input.
One of the challenges is not knowing where the input will actually be used. Is the data going to the database? Is the data going to a database and then going to the browser? If the data is going to the browser, what context will it be in? If you don’t know how that data will be used, how do you know what attack strings to look for?
Instead of focusing on vulnerabilities, you should spend the time to define what good data looks like for that field. What type of data is expected? Is it an integer, a date, a string? What are the valid range of values? How long should the data be? I will discuss these in just a moment.
These limitations will not stop all attacks. There is no argument there. They are just going to help make sure the data is acceptable or the field. It is possible that a cross-site scripting attack string is valid input. That can be ok. You have limited the potential attacks. Then there is the but, remember? In this case, you would have output encoding to protect the app from cross-site scripting.
Let’s look at some of these items in more detail.
Type
One of the easiest checks to implement is checking the data type of the input field. If you are expecting a date data type and the input received is not a valid date then it is invalid and should be rejected. Numbers are also fairly simple to validate to ensure they match the type expected. Many of the most popular frameworks provide methods to determine if a string matches a specific data type. Use these features to validate this type of data. Unfortunately, when the data type is a free form string then the check isn’t as useful. This is why we have other checks that we will perform.
Range
You have determined that the input is of the correct type, say a number or a date. Now it is time to validate that it falls within the correct range of acceptable values. Imagine that you have an age field on a form. Depending on your business cases, lets say that a valid age is 0-150. It is a simple step to ensure that the value entered falls into this valid range. Another example is a date field. What if the date represents a signed date and can’t be in the future. In this case, it is important to ensure that the value entered is not later than the current day.
Length
This is really more for the free form text fields to validate that the input doesn’t exceed the expected length for the field. In many cases, this lines up with the database or back end storage. For example, in SQL Server you may specify a first name field that is 100 characters or a state field that only allows two characters for the abbreviation. Limiting down the length of the input can significantly inhibit some of the attacks that are out there. Two characters can be tough to exploit, while possible depending on the application design.
Format
Some fields may have a specific format they must match. For example, a social security number may have to match DDD-DD-DDDD. An account number may have a specific format of DDDDDSSS. There are a lot of options for specific formats. If you verify that the data matches the expected format, this can drastically cut down on potential vulnerabilities because many attacks will not be possible when matching the format.
White Lists
White lists can be a very effective control during input validation. Depending on the type of data that is being accepted it may be possible to indicate that only alphabetical characters are acceptable. In other cases you can open the white list up to other special characters but then deny everything else. It is important to remember that even white lists can allow malicious input depending on the values required to be accepted.
Black Lists
Black lists are another control that can be used when there are specifically known bad characters that shouldn’t be allowed. Like white lists, this doesn’t mean that attack payloads can’t get through, but it should make them more difficult.
Regular Expressions
Regular expressions help with white and black lists as well as pattern matching. With the former it is just a matter of defining what the acceptable characters are. Pattern matching is a little different and really aligns more with the range item discussed earlier. Pattern matching is great for free form fields that have a specific pattern. A social security number or an email address. These fields have an exact format that can be matched, allowing you to determine if it is right or not.
Test Your Validation
Make sure that you test the validation routines to ensure they are working as expected. If the field shouldn’t allow negative numbers, make sure that it rejects negative numbers. If the field should only allow email addresses then ensure that it rejects any pattern that doesn’t match a valid email address.
Validate Server Side
Maybe this should have been first? Client side validation is a great feature for the end user. They get immediate feedback to what is wrong with their data without having that round trip to the server and back. THAT IS EASILY BYPASSED!! It is imperative that the validation is also checked on the server. It is too easy to bypass client side validation so this must be done at the server where the user cannot intercept it and bypass it.
Don’t Try To Solve All Problems
Don’t try to solve all output issues with input validation. It is common for someone to try and block cross site scripting with input validation, it is what WAFs do right? But is that the goal? What if that input isn’t even sent to a browser? Are you also going to try and block SQL Injection, LDAP Injection, Command Injection, XML Injection, Parameter Manipulation, etc. with input validation? Now we are getting back to an overly complex solution when there are other solutions for these issues. These types of items shouldn’t be ignored and that is why we have the regular expressions and white lists to help decrease the chance that these payloads make it into the system. We also have output encoding and parameterized queries that help with these additional issues. Don’t spend all of your time focusing on input validation and forget about where this data is going and protecting it there.
Input validation is only half of the solution, the other half is implemented when the data is transmitted from the application to another system (ie. database, web browser, command shell). Just like we don’t want to rely solely on output encoding, we don’t want to do that with input validation. Assess the application and its needs to determine what solution is going to provide the best results. If you are not doing any input validation, start as soon as possible. It is your first line of defense. Start small with the Type/Range/Length checks. When those are working as expected then you can start working into the more advanced input validation techniques. Make sure the output encoding routines are also inplace.
Don’t overwhelm yourself with validating every possible attack. This is one cog in the system. There should also be other controls like monitoring and auditing that catch things as well. Implementing something for input validation is better than nothing, but don’t let it be a false sense of security. Just because you check the length of a string and limit it to 25 characters, don’t think a payload can’t be sent and exploited. It is, however, raising the bar (albeit just a little).
Take the time to assess what you are doing for input validation and work with the business to determine what valid inputs are for the different fields. You can’t do fine validation if you don’t understand the data being received. Don’t forget that data comes in many different forms or encodings. Convert all data to a specific encoding before you perform any validation on it.