Here is a brief introduction to the topic of joining. For further explanation, I would recommend getting a book on SQL — any basic introduction book would be fine.
(Note, we’re working on an example data set which is similar to what would be
given to you from a simple database query. We’re saying that this data is
loaded into the variables $posts and $authors. This data can be found at
http://uwwebpub.com/php; click on “Get Data File”).
It is much “cheaper” (and easier) to join data straight from the database: doing this manually requires that we fetch all the entities from the database. For our small data set, that’s okay, but it’s not a good idea to be fetching more than a couple hundred entities at once. It turns out that most of the joins we’ll see are versions of hasOne and belongsTo ((2), below).
An “entity” is a single instance of a prototypical thing. E.g., $posts[0]
(from our data.php file) is a post entity of the prototypical post. Each of
the data about an entity (e.g. its id, its title, its author_id, etc.) go by
the term data fields. When we’re talking about databases, though, they are
more commonly referred to as columns because we typically have a table for
each one of our kind of entities. Our tables are structured like:
+---+----------------+-----------------+-----------+ |id |title |created |date | --->... +---+----------------+-----------------+-----------+ |0 |My First Entry |1195073683 |2007-11-13 | |1 |My Second Entry |1195073623 |2007-11-14 | |3 |Another Entry |1195072623 |2007-10-11 | |4 |YAE |1095072623 |2004-10-11 | +---+----------------+-----------------+-----------+
And the “entity” relationship is that these two tables should be “glued together” on the ‘post.author_id = author.id’ condition (called a predicate):
+----------------+ +----------------+ | Posts | | Authors | +----------------+ +----------------+ | id | /+ id | | title | / | title | | created | / | email | | date | / +----------------+ | author_id +- | text | +----------------+
(Note that the notation is simply $table_name.$column_name, as you probably
figured out.)
There are three kinds of relationships between entities:
hasMany and belongsTo
e.g., an author hasMany posts, and a post belongsTo an author.
hasOne and belongsTo
e.g., a post hasOne author, and an author belongsTo many posts. This is best thought of as the “other side” of (1).
hasAndBelongsToMany
This one is trickier to store (we might talk about it on the last day of PHP II): It involves having a separate data store called a join table. If we were to introduce categories into our data, then a post might belongTo many categories, and many categories might belongTo a post.
Joining entities involves iterating over two sets of entities and “gluing” pairs of them together based on some sort of condition. This condition is usually the two entities having a data field in common. In our example, we want to glue together our $posts and our $authors, but the question is in which direction do we want to glue them together? Should we glue authors to posts, or should we glue posts to authors.
In the first case, gluing authors to posts, we’ll be replacing the ‘author_id’ field with the corresponding author entity. This is called a left join. Joining the first post, $posts[0], with $authors,
array( 'id' => 0, 'title' => 'My First Entry', 'created' => 1195073683, 'date' => '2007-11-13', // Notice that we no longer have // an 'author_id' field and instead // we have an entire 'author' field: 'author' => array( 'id' => 1, 'title' => 'Mark Twain', 'email' => 'marktwain@yahoo.com' ) 'text' => 'My First Post' ),
In the second case, we’ll be gluing posts to authors. This is a right join.
(The direction is so because we’re selecting from “posts and authors”; if we
were selecting from “authors and posts”, this would be a left join and the
previous example would be a right join.) Remember that an author hasMany posts
and a post belongsTo an author. In this case, we would have to add a field in
our author called maybe “posts” that would be an array of post entities
(i.e., an array of arrays). If we were to join $posts with $authors[0]
(“Mark Twain”), we would have:
array( 'id' => 1, 'title' => 'Mark Twain', 'email' => 'marktwain@yahoo.com' // The 'posts' field is an array of the // posts that had an 'author_id' field // equal to this author's 'id' field: 'posts' => array( array( 'id' => 0, 'title' => 'My First Entry', 'created' => 1195073683, 'date' => '2007-11-13', 'author_id' => 1, 'text' => 'My First Post' ), array( 'id' => 3, 'title' => 'Another Entry', 'created' => 1195072623, 'date' => '2007-10-11', 'author_id' => 1, 'text' => 'This is another entry' ), ) )
We’d like to be able to call
$posts_with_author_info = join_tables( $posts,$authors, 'posts.author_id=authors.id', 'one' );
and
$authors_with_all_posts_by_them = join_tables( $authors,$posts, 'authors.id=posts.author_id', 'many' );
Where the first argument to is the “left” table, the second argument is the “right” table, and the third argument is the condition that tells us whether or not a post is related to an author. I’ll refer to the third argument as the predicate.
The fourth argument is either ‘one’ or ‘many’ and indicates whether or not we’re looking for a single entity from the right table or many entities in the right column to pair with the left column.
Notice that we’ll craft the function to always do a left-join (that is, we’ll be adding information to the “left” table, which is the table given in the first argument). Since our second example join was originally posed in terms of a right-join, we’re swapping the order of the arguments and the order of the predicate.
Succinctly, posts right-joined with authors is the same as authors left-joined with posts.
We’re going to be iterating over arrays with keys and values, not tables with dotted column names. Our predicate is of the form:
left_table.left_column=right_table.right_column
We’re following a good-sense convention that column names match up with the entities’ array keys. So we need to extract the array keys from a predicate so we can compare two entity arrays:
function extract_keys($predicate) { // $predicate is something like // 'posts.author_id=authors.id' $full_column_names = explode('=',$predicate); if ( count($full_column_names) != 2 ) { die('Invalid predicate'); } // Book-keeping: $left = trim($full_column_names[0]); $right = trim($full_column_names[1]); // trim($a) simply removes any whitespace before or // or after $a. This allows us to put spaces around // the equals sign and still be okay // $left is something like 'posts.author_id' // $right is something like 'authors.id' $left_a = explode('.',$left); $right_a = explode('.',$right); if ( count($left_a) != 2 or count($right_a) != 2 ) { die('Invalid predicate'); } // Now the table names are available at key 0 // and the column names are available at key 1. // We're interested in the column names, so // return an array with both the column names: return( array($left_a[1],$right_a[1]) ); }
We’ll also need to extract the table names. This is a very similar function, so I’ll make it brief:
function extract_tables($predicate) { $full_column_names = explode('=',$predicate); if ( count($full_column_names) != 2 ) die('Invalid predicate'); $left_a = explode('.',trim($full_column_names[0])); $right_a = explode('.',trim($full_column_names[1])); if ( count($left_a) != 2 or count($right_a) != 2 ) die('Invalid predicate'); return( array($left_a[0],$right_a[0]) ); }
And finally we’ll need a function to iterate over the “right” table and return the matching entity or entities. In the case of a ‘one’ relationship, we return a single entity — the first (and hopefully only) one that matches the predicate. In the case of a ‘many’ relationship, we return an array of the entities that match:
function match($authors,$author_key,$posts_author_key,$type) { $tor = array(); // we might eventually return this foreach ( $authors as $author ) { if ( $author[$author_key] == $posts_author_key ) { if ( $type == 'one' ) { return( $author ); // the function "short-circuits" as soon // as it returns a value; it stops executing } else if ( $type == 'many' ) { $tor[] = $author; } else { die('Improper relationship type!'); } } } if ( count($tor) == 0 ) { return( null ); } return( $tor ); }
function join_tables($posts,$authors,$predicate,$type) { list($left_key,$right_key) = extract_keys($posts,$authors,$predicate); // `list` is a handy way to assign variables to // array values. More information at // <http://php.net/list> list($left_table,$right_table) = extract_tables($posts,$authors,$predicate); // $left_key is something like 'author_id' // $right_key is something like 'id' // $left_table is something like 'posts' // $right_table is something like 'authors' $tor = array(); // what we're going to return later foreach ( $posts as $post ) { $post[$right_table] = match( $authors, $right_key, $post[$left_key], $type ); unset($post[$left_key]); // We unset because we no longer need // $post['author_id'] -- we were keeping it // to join on it, but that's what we just did! $tor[] = $post; } return $tor; }
Returning to our original formulation, we have:
$posts_with_author_info = join_tables( $posts,$authors, 'posts.author_id=authors.id', 'one' );
… when print_r-ed, we get:
Array
(
[0] => Array
(
[id] => 0
[title] => My First Entry
[created] => 1195073683
[date] => 2007-11-13
[text] => My First Post
[authors] => Array
(
[id] => 1
[title] => Mark Twain
[email] => marktwain@yahoo.com
)
)
[1] => Array
(
[id] => 1
[title] => My Second Entry
[created] => 1195073623
[date] => 2007-11-14
[text] => My Second Post
[authors] => Array
(
[id] => 2
[title] => Archimedes
[email] =>
)
)
[2] => Array
(
[id] => 3
[title] => Another Entry
[created] => 1195072623
[date] => 2007-10-11
[text] => This is another entry
[authors] => Array
(
[id] => 1
[title] => Mark Twain
[email] => marktwain@yahoo.com
)
)
[3] => Array
(
[id] => 4
[title] => YAE
[created] => 1095072623
[date] => 2004-10-11
[text] => This is perhaps an old entry
[authors] =>
)
)
…and…
$authors_with_all_posts_by_them = join_tables( $authors,$posts, 'authors.id=posts.author_id', 'many' );
… when print_r-ed, we get:
Array
(
[0] => Array
(
[title] => Mark Twain
[email] => marktwain@yahoo.com
[posts] => Array
(
[0] => Array
(
[id] => 0
[title] => My First Entry
[created] => 1195073683
[date] => 2007-11-13
[author_id] => 1
[text] => My First Post
)
[1] => Array
(
[id] => 3
[title] => Another Entry
[created] => 1195072623
[date] => 2007-10-11
[author_id] => 1
[text] => This is another entry
)
)
)
[1] => Array
(
[title] => Archimedes
[email] =>
[posts] => Array
(
[0] => Array
(
[id] => 1
[title] => My Second Entry
[created] => 1195073623
[date] => 2007-11-14
[author_id] => 2
[text] => My Second Post
)
)
)
)
The above examples were written with posts and authors in mind, but really they’ll work for any tables.
We didn’t cover “un-pluralizing” the table-names for ‘one’ relationship types.
It would be nice if we could “left-join posts,authors” and be able to refer
to, e.g, $post[0]['author'], but in our implementation we still refer to
$post[0]['authors']. Removing an ‘s’ isn’t hard to do, but to do it in
general (e.g. turning “people” into “person”) is difficult. Frameworks like
Ruby on Rails actually do handle this. See
http://nubyonrails.com/tools/pluralize (that’s not a typo) for more
information.
This implementation isn’t by any means the most-efficient, but it was written for clarity. And since joins aren’t really something we do with PHP, concerns about efficiency have little relevance here.
Author: Ryan Timmons
Last Modified: 06 August 2008 15:23:13 PDT
URL: http://uwwebpub.com/