Safe Database Insert & Update with PHP

October 6, 2016

A long-time characteristic of the sites I build is that visitors do not provide input that goes directly into a database. Such access has always been the province of SiteCommander and the protections it affords. My easy days are over, as customer-managed profiles reared their head.

Let me offer a warning here. No matter what else, do not even contemplate user-managed anything unless the site is encrypted by means of an SSL certificate. Profiles involve passwords and "personally identifiable information," which require as much protection as you can provide.

The key goal is to protect against SQL injection attacks. Historically my code has used the mysqli_real_escape_string() function to process each field but it was messy. I needed to know which fields were strings, numbers, Booleans or time so I could build the field_name=value strings in the query appropriately. I did this using another one of my classes, query_builder, and its methods whereby I told it which fields were which type. In other words, I had to list the field names manually and update the query building if fields were added or removed. It was also necessary to deal with the peculiar way HTML deals with checkboxes and my query builder dealt with that, again manually. Query construction was more complicated than it needed to be.

I wanted a much simpler way to perform record inserts and updates, one that would serve not only user input but also provide an ultimate replacement to my old methods in SiteCommander. Based on a new PHP class that I would write, this is what I imagined:

$dbw = new db_writer('table_name');
$saved = $dbw->record_insert($record);
if ($saved) {
    // Success!
} else {
    // Trouble
     echo $dbw->error_message;
}
unset($dbw);

There is an analogous method for updates:

$saved = $dbw->record_update($record, $fields);

$fields is an optional argument that contains an array of field names. If provided, only those fields are updated. If absent, all fields in the provided record are updated.

How I Do Records

In order to fully understand my db_writer class, it's necessary to understand how I handle database records in my PHP code. Not surprisingly, they are PHP associative arrays where the indices are the field names in the table. This structure is well-known to PHP programmers because it is the same structure returned by calls to fetch records associatively:

$record = mysqli_fetch_assoc($link);
$record = $pdo_stmt->fetch(PDO::FETCH_ASSOC);

It is also important to know that every table in all databases that I create begins with an auto-increment field named "id." Again, this is not surprising. It simply means that my new class must be capable of dealing with that field appropriately.

Finally, the PHP representation of the record may contain fields that are not actually in the table. There are two reasons for this. First, $_POST will return fields that are not actually data, such as the submit button on the data entry form. There may also be hidden fields that are not actually part of a data record, a programming convenience for me. My new class must only deal with fields that are in the schema for the table and ignore everything else.

Challenges

Here's one of the most important things to notice about the method record_insert - there's no mention of fields. The implication is clear. The class must figure out everything it needs to know about the table structure and act accordingly. And even though record_update can be called with a field list, both methods must know the type of all fields in the table.

As mentioned, I represent database records in PHP by an associative array where the indices are the MySQL field names for the given table. In the same vein, all form elements have their HTML ID set to the MySQL field name for the table. This means that $_POST from one of my forms has the same structure and, with one notable exception, can be used as a database record.

The latter point is very important. It makes a lot of sense for the record_insert method to handle the $_POST array directly, which in turn means special handling must be applied for checkbox fields. Unlike other form field types in HTML, a checkbox is present in $_POST if the box is checked but it is absent when the box is not checked. While other fields always provide a value that can simply be written to the database, logic needs to deal with unchecked boxes.

The last challenge is knowing the type of every field. There isn't any way to get that from posted HTML fields, which provide only the value. My class thus needs a schema for the table in question. Knowing the type is necessary so that parameter binding into the SQL statement can be performed properly.

The "Simple" Schema

The schema needed by the class does not need every detail of field definitions. For MYSQLI, it only needs to distinguish between strings and non-strings. For PDO, which I decided to use, the simple schema needs to identify strings, numbers, and Booleans. Building the simple schema is relatively straightforward but it also the key that makes my new class work. Here's the code.

	// Read the table schema and make a "simple schema" consisting of 
	// an array containing PDO binding types indexed by field name.
	// Timestamp fields are ignored.
	// A list of booleans is also collected
	private function get_simple_schema() {
		$this->simple_schema = array();
		$this->booleans = array();
		$query = "SHOW COLUMNS FROM {$this->table_name}";
		$result = $this->pdo->query($query);
		while ($row = $result->fetch(PDO::FETCH_ASSOC)) {
			$sql_type = strtolower($row['Type']);
			$sql_field = strtolower($row['Field']);
			if ($sql_type != 'timestamp') {
				$this->simple_schema[$sql_field] = $this->get_simple_type($sql_type);
				if ($this->simple_schema[$sql_field] == self::SIMPLE_TYPE_BOOLEAN) {
					$this->booleans[] = $sql_field;
				}
			}
		}
	}

	// Convert an actual SQL type to a simplified type used for PDO binding
	private function get_simple_type($sql_type) {
		$type = $sql_type;
		$pos = strpos($type, '(');
		if ($pos !== false) { $type = substr($type, 0, $pos); }
		if (in_array($type, $this->sql_numeric_types)) {
			$simple_type = self::SIMPLE_TYPE_NUMBER;
		} else if ($sql_type == 'tinyint(1)' or in_array($type, array('bool', 'boolean'))) {
			$simple_type = self::SIMPLE_TYPE_BOOLEAN;
		} else {
			$simple_type = self::SIMPLE_TYPE_STRING;
		}
		return $simple_type;
	}

The get_simple_type method converts a SQL type into one of my simple types. Later these simple types are used in the binding process and at that time are converted to the necessary PDO types. The simple schema is an associative array indexed by field name and having a value of the simple type.

Note that the simple schema method also collects a list of Boolean fields. This is important for later.

I'm getting ahead of myself by discussing the simple schema first. However, it is the essential ingredient in my class and needs to be understood first.

The db_writer Class

Now let's take a look at the top of the class definition.

class db_writer {
	// CONSTANTS
	const SIMPLE_TYPE_NUMBER  = 'number';
	const SIMPLE_TYPE_STRING  = 'string';
	const SIMPLE_TYPE_BOOLEAN = 'boolean';
	const USE_POSTED_DATA = NULL;
	
	// PROPERTIES
	private $table_name;
	
	private $sql_numeric_types;
	private $pdo_types;
	private $simple_schema;
	private $booleans;
	private $record;

	private $pdo;
	private $stmt;
	public  $pdo_error;
	
	// CONSTRUCTOR - initializes
	public function __construct($table, $my_db = false) {
		$this->table_name = $table;
		$this->sql_numeric_types = array('int', 'float', 'double', 'decimal');
		$this->pdo_types = array(
			self::SIMPLE_TYPE_NUMBER  => PDO::PARAM_INT,
			self::SIMPLE_TYPE_STRING  => PDO::PARAM_STR,
			self::SIMPLE_TYPE_BOOLEAN => PDO::PARAM_BOOL
		);
		$this->pdo = pdo_connect($my_db);
		$this->get_simple_schema();
	}

A few things in the class definition bear explanation.

First, an array of SQL types for numeric values is defined. In my experience, the ones you see listed in the code are the only ones returned by the SHOW COLUMNS SQL statement but I wanted to allow for future changes. As you can see in the simple schema methods, the array is used to convert all numeric types to my simple type of number.

Second, the pdo_types property is an associative array indexed by my simple schema types and having values of the PDO constant defining binding types.

Finally, I have core functions for connecting to a database, one called db_connect() when using MYSQLI and one called pdo_connect() for PDO. These functions get the connection information from a global array in my system that contains the host, database name, user name, and password. My system allows other databases to be opened, which is why the constructor includes the optional $my_db argument. You'll need to modify as necessary to work with your environment, with the result being a PDO connection in $this->pdo.

Inserting a Record

Let's take a look at the insert method.

	// Insert query that makes sure the inputs are properly sanitized.
	// The source data is assumed to be a fully-formed record. Its id and any timestamp fields are ignored.
	public function record_insert($source_record) {
		if ($source_record == self::USE_POSTED_DATA) {
			$this->record = $_POST;
			$this->process_booleans();
		} else {
			$this->record = $source_record;
		}
		// Do the binding
		$column_list = array();
		$value_list = array();
		$binding_type = array();
		foreach ($this->record as $field_name => $value) {
			// Select any field in the simple schema except for ID
			if (array_key_exists($field_name, $this->simple_schema) and $field_name != 'id') {
				$column_list[] = $field_name;
				$value_list[] = ":$field_name";
				$binding_type[$field_name] = $this->pdo_types[$this->simple_schema[$field_name]];
			}
		}
		$columns = implode(', ', $column_list);
		$values = implode(', ', $value_list);
		$sql = "INSERT INTO {$this->table_name} ($columns) VALUES ($values)";
		$this->stmt = $this->pdo->prepare($sql);
		foreach ($column_list as $column) {
			$this->stmt->bindValue(":$column", $this->record[$column], $binding_type[$column]);
		}
		if ($this->stmt->execute()) {
			return true;
		} else {
			$this->save_error_message($this->stmt->errorInfo());
			return false;
		}
	}

Sometimes the array that contains the data record happens to be $_POST. I needed a way to tell the insert and update methods that this was the case. After some quick research on methods to detect $_POST automatically, I decided on a simpler, direct technique. I defined the class constant USE_POSTED_DATA and use it in place of the data record when calling the insert or update methods. Here's what a call looks like when using the constant.

$saved = $dbw->record_insert(db_writer::USE_POSTED_DATA);

If the passed value of $source_record is the class constant USE_POSTED_DATA, the method assumes the record is contained in $_POST and duplicates it in the internal variable $this->record. If an ordinary array is passed, it is simply copied. For posted data, the private method to process Booleans is invoked and here's what that code looks like.

	// Process the booleans - posted data does not return true or false but rather present or not present
	private function process_booleans() {
		foreach ($this->booleans as $boolean) {
			$this->record[$boolean] = (array_key_exists($boolean, $this->record)) ? true : false ;
		}
	}

And now you can see why I collected a separate list of Booleans in the simple schema code. This method uses that list to make sure that all Booleans are represented in the record. Presence of the field in $_POST means true while absence means false.

The rest of the insert method has three parts. The first part builds the internal lists needed to construct the PDO query statement. A key to this part is that only fields contained in the simple schema are processed, so hidden or irrelevant (e.g., buttons) fields are ignored. For each field that will be used, its name is added to the column list, its placeholder is added to the value list, and the PDO binding type is added to the binding list. For placeholders I use the name of the fields prefixed with a colon. The binding type list is indexed by field name and is set to the correct PDO type by referencing the simple schema.

The second part of the method constructs the query as a PDO statement. This code is explained well in the PHP documentation for PDO statements.

The third and final part of the method executes the query and saves error information if there is a failure. There is a small helper method in the class for handling the error messaging.

	// Get and Format an error message
	// $error_info is the array returned by PDO for errors
	private function save_error_message($error_info) {
		$this->pdo_error = "[{$error_info[0]}:{$error_info[1]}] {$error_info[2]}";
		$this->stmt->debugDumpParams(); // debugging
	}

The pdo_error property is public and can be accessed easily by the caller. My code also emits PDO's debug information about the query. This can be a bit ugly if a user sees it; the statement should probably be turned off when not debugging, either manually or by adding a Boolean property to the class to control what happens when debug is on. I actually prefer leaving it on because it is such a shocking display that my clients call right away when they see it, allowing me to address the problem quickly. (The truth is that since I started using this class in production, I've never seen the dump. Database problems, mostly availability, are caught well before any type of insert is performed.)

Updating a Record

The update code is very similar.

	/**
	 * Update query that makes sure the inputs are properly sanitized
	 * The source record is assumed to be a fully-formed record, probably read from the database and then altered.
	 * The ID field must be present, as it will if read from the database.
	 * The field names array identifies which fields are to be updated. Only the named fields are updated.
	 * If $field_names is NULL, update all the fields contained in the source record and the simple schema except for IDs.
	 */
	public function record_update($source_record, $field_names = NULL) {
		if ($source_record == self::USE_POSTED_DATA) {
			$this->record = $_POST;
			$this->process_booleans();
		} else {
			$this->record = $source_record;
		}
		// Get the list of fields needed for the update
		$field_list = array();
		if (is_null($field_names)) {
			foreach ($this->record as $field_name => $value) {
				if (array_key_exists($field_name, $this->simple_schema) and $field_name != 'id') {
					$field_list[] = $field_name;
				}
			}
		} else {
			// Modify this to make sure all passed field names 
			// are actually in the schema and the record!
			$field_list = $field_names;
		}
		// Create the binding information
		$nv_pairs = array();
		$binding_type = array();
		foreach ($field_list as $name) {
			$nv_pairs[] = "$name = :$name";
			$binding_type[$name] = $this->pdo_types[$this->simple_schema[$name]];
		}
		// Create the statement
		$pairs = implode(', ', $nv_pairs);
		$sql = "UPDATE {$this->table_name} SET $pairs WHERE id={$this->record['id']}";
		$this->stmt = $this->pdo->prepare($sql);
		foreach ($field_list as $name) {
			$this->stmt->bindValue(":$name", $this->record[$name], $binding_type[$name]);
		}
		if ($this->stmt->execute()) {
			return true;
		} else {
			$this->save_error_message($this->stmt->errorInfo());
			return false;
		}
	}

The key difference is that the update method can take a list of fields as an argument and only update those fields. In the version of the code above, I do need to make one change. Rather than just taking the field list as provided, I need to run it through the simple schema to make sure I only include fields that are actually in the table. I'll admit that I only noticed that oversight when I wrote this article. I decided to publish as is, letting you make the fix as an exercise.

Conclusion

Before I wrote this class, each one of my form handlers had a custom function to build the query. Each form handler called its query builder and then ran the query. In a simplified fragment, this is what my record_save code looked like.

    $mode = ($id == 0) ? 'add' : 'edit' ;
    $db = db_connect();
    $query = create_query($mode, $db, $id);
    if (mysqli_query($db, $query)) {

Now that fragment looks like this.

	$dbw = new db_writer(TB_NAME);
	if ($id == 0) { // adding
		$saved = $dbw->record_insert(db_writer::USE_POSTED_DATA);
	} else {
		$_POST['id'] = $id; // For db_writer, the ID must be in the record
		$saved = $dbw->record_update(db_writer::USE_POSTED_DATA);
	}
	if ($saved) {

At first blush this looks like more code and slightly more complex. That's because the original code determined mode (add or edit) and passed that to the query builder, which determined the query type (insert or update, respectively). db_writer eliminates the need for the query builder, so the code must figure out whether to call insert or update. There's one other little gotcha, which is that the record ID is not in posted data and therefore must be added for updates.

Eliminating the query builder is significant. Here's an actual sample of one such function.

// Create the appropriate query
function create_query($mode, $db, $id = 0) {
  $aqb = new admin_query_builder(array(
  	'query_type' => $mode, // 'add' or 'edit'
  	'db' => $db,
  	'table_name' => TB_NAME,
  	'id' => $id
  ));
  // Don't let $field_list break across lines; use concatenation...
  $field_list = 'brand,family,model,year,serial,serialnr,inlay,casetype,finish,country,notes,buyer,seller,ebaynr,saledate,price,story';
  $aqb->add_fields($field_list, SC_QB_TYPE_NORMAL);
  $field_list = 'enabled';
  $aqb->add_fields($field_list, SC_QB_TYPE_BOOLEAN);
  $query = $aqb->get_query();
  return $query;
}

Here you can clearly see the manual field lists and the need to identify Boolean fields. This particular query builder discriminates strings from numbers in the fields from the "normal" list. In my systems I have dozens of these things, all of which need to be kept up whenever there is a database change.

For a long time, I was worried about PHP run time and overhead. While I still think like that, I've taken to building classes like this one and several of much greater complexity. After having instrumented all the pages in my systems to get an approximate execution time. I've noticed two things. First, the overhead I've been adding is a fraction of the total Web page rendering time, so it's negligible from a user perspective, almost invisible. Second, some of the added overhead is not additive at all because it replaces clumsier code that had its own overhead.

A good example is the create_query call above with its use of the admin_query_builder class. I'm not going to show you that code but remember my earlier mention of mysqli_real_escape_string() calls. Those are embedded in the query builder and further into individual methods in the class for strings, numbers, and Booleans. I'm sure my old calls were less efficient than the bindValue() calls to PDO. In other words, while I did add some overhead I also stripped out some inefficiencies.

Even if the new class does add overhead, using it makes my programming work easier, both when writing new code or maintaining existing systems. I'm more productive as a result. The bottom line is that this is a dramatic improvement for me and for the quality of the systems I deliver to my clients.

I'm very happy to share this code and hope it will benefit others. That said, it is offered without warranty of any kind. You are welcome to use at your own risk. Please test carefully! 

 

Tags: MySQL, PDO, PHP, Programming

A total of 8 related articles were found. See them all...