PHP Parse Title Description Keywords From A Website
Posted by in PHP April 17, 2011 8 Comments

I’m working on a Website Directory project which need to get and display Title, Description and Keywords of any website and some information regarding Geographical Location such as: Hostname, Country, Region, City, Postal Code, Latitude, Longitude, ISP, Organization via the website IP Address.

From a previous article, I shared how I get Geographical Location from an IP Address, you can read more via: http://4rapiddev.com/internet/free-online-tools-get-ip-address-location-organization-isp-hostname-country/. Today, I will show How do I parse Title Description Keywords From A Website by using PHP script.

The main ideas of the PHP script is:

  • Take a URL as input
  • Get HTML content of the URL by using file_get_contents
  • Parse Title, Description and Keywords from the content by using preg_match and preg_match_all PHP functions
  • Return an Array which includes Title, Description and Keywords as Array Items

PHP scirpt

<?php
function getUrlData($url)
{
	$result = false;
	$contents = getUrlContents($url);
 
	if (isset($contents) && is_string($contents))
	{
		$title = null;
		$metaTags = null;
 
		preg_match('/<title>([^>]*)<\/title>/si', $contents, $match );
 
		if (isset($match) && is_array($match) && count($match) > 0)
		{
			$title = strip_tags($match[1]);
		}
 
		preg_match_all('/<[\s]*meta[\s]*name="?' . '([^>"]*)"?[\s]*' .'[lang="]*[^>"]*["]*'.'[\s]*content="?([^>"]*)"?[\s]*[\/]?[\s]*>/si', $contents, $match);
		if (isset($match) && is_array($match) && count($match) == 3)
		{
			$originals = $match[0];
			$names = $match[1];
			$values = $match[2];
 
			if (count($originals) == count($names) && count($names) == count($values))
			{
				$metaTags = array();
 
				for ($i=0, $limiti=count($names); $i < $limiti; $i++)
				{
					$metaname=strtolower($names[$i]);
					$metaname=str_replace("'",'',$metaname);
					$metaname=str_replace("/",'',$metaname);
					$metaTags[$metaname] = array (
					'html' => htmlentities($originals[$i]),
					'value' => $values[$i]
					);
				}
			}
		}
		if(sizeof($metaTags)==0) {
			preg_match_all('/<[\s]*meta[\s]*content="?' . '([^>"]*)"?[\s]*' .'[lang="]*[^>"]*["]*'.'[\s]*name="?([^>"]*)"?[\s]*[\/]?[\s]*>/si', $contents, $match);
 
			if (isset($match) && is_array($match) && count($match) == 3)
			{
				$originals = $match[0];
				$names = $match[2];
				$values = $match[1];
 
				if (count($originals) == count($names) && count($names) == count($values))
				{
					$metaTags = array();
 
					for ($i=0, $limiti=count($names); $i < $limiti; $i++)
					{
						$metaname=strtolower($names[$i]);
						$metaname=str_replace("'",'',$metaname);
						$metaname=str_replace("/",'',$metaname);
						$metaTags[$metaname] = array (
							'html' => htmlentities($originals[$i]),
							'value' => $values[$i]
						);
					}
				}
			}
		}
 
		$result = array (
			'title' => $title,
			'metaTags' => $metaTags
		);
	}
 
	return $result;
}
 
function getUrlContents($url, $maximumRedirections = null, $currentRedirection = 0)
{
	$result = false;
	$contents = file_get_contents($url);
 
	if (isset($contents) && is_string($contents))
	{
		preg_match_all('/<[\s]*meta[\s]*http-equiv="?REFRESH"?' . '[\s]*content="?[0-9]*;[\s]*URL[\s]*=[\s]*([^>"]*)"?' . '[\s]*[\/]?[\s]*>/si', $contents, $match);
 
		if (isset($match) && is_array($match) && count($match) == 2 && count($match[1]) == 1)
		{
			if (!isset($maximumRedirections) || $currentRedirection < $maximumRedirections)
			{
				return getUrlContents($match[1][0], $maximumRedirections, ++$currentRedirection);
			}
 
			$result = false;
		}
		else
		{
			$result = $contents;
		}
	}
 
	return $contents;
}
?>

Usage

<?php
 
	$result = getUrlData("http://4rapiddev.com/php/php-parse-title-description-keywords-from-a-website/");
 
	if($result['title']=="") {
		$title="No Data Available";
	} else {
		$title=$result['title'];
	}
	if($result['metaTags']['description']['value']=="") {
		$description="No Data Available";
	} else {
		$description=$result['metaTags']['description']['value'];
	}
	if($result['metaTags']['keywords']['value']=="") {
		$keywords="No Data Available";
	} else {
		$keywords=$result['metaTags']['keywords']['value'];
	}
 
	echo "title: " . $title . "<br>";
	echo "description: " . $description . "<br>";
	echo "keywords: " . $keywords . "<br>";
?>

Output

Title: PHP Parse Title Description Keywords From A Website | 4 Rapid Development
Description: I'm working on a Website Directory project which need to get and display Title, Description and Keywords of any website and some information regarding Geographical Location such as: Hostname, Country, Region, City, Postal Code, Latitude, Longitude, ISP, Organization via the website IP Address.
Keywords: file_get_contents,preg_match,preg_match_all,php

Click here to download the source code.

(*)I copied the PHP script somewhere but I completely forget where I copied it from. Thank you & appreciate the guy who created this script.

Hoan Huynh is the founder and head of 4rapiddev.com. Reach him at hoan@4rapiddev.com
  • david

    A way to make the code shorter is to use the DOM functions

    libxml_use_internal_errors(true);

    $doc = new DOMDocument();
    $doc->loadHTMLFile("http://ictdag.be");

    $title = $doc->getElementsByTagName('title')->item(0)->nodeValue;
    $meta = $doc->getElementsByTagName('meta');
    $meta_values = array();
    $meta_attributes = array('description','keywords');
    $is_content_tag = '';

    foreach($meta as $tag)
    {
    foreach($tag->attributes as $index=>$attr)
    {
    if($attr->name == 'name' && in_array($attr->value,$meta_attributes))
    {
    $is_content_tag = $attr->value;
    continue;
    }

    if($is_content_tag && $attr->name == 'content')
    {
    $meta_values[$is_content_tag] = $attr->value;
    $is_content_tag = '';
    }
    }
    }

    var_dump($title);
    var_dump($meta_values);

    • hoanhuynh

      Perfect!

      It’s working with less code lines.

      Much appreciate your sharing David oi.

  • Ümit

    $result = getUrlData(“http://4rapiddev.com/php/php-parse-title-description-keywords-from-a-website/”);

    function searchText($start, $finish, $result)
    {
    @preg_match_all(‘/’ . preg_quote($start, ‘/’) .
    ‘(.*?)’. preg_quote($finish, ‘/’).’/i’, $result, $m);
    return @$m[1];
    }

    example:
    print_r(searchText(“”,”",$result));

  • http://www.ingbase.com Mayur

    This is very helpful to get started with what i am trying to do.

    What i want to do is accept a list of URL’s in a text area and then when we submit this text area form. Backend will crawl each URL and display keywords, title, descriptions for each URL in text area serially.

    This will then be stored in a database, i can do the database part once i am able to get all the data for the listed URL. I am just trying to work out how to get data for each URL from text area.

    Any help on this?

  • http://www.bing.com/ Melloney

    Essays like this are so important to broadening ppolee’s horizons.

  • http://www.shackel.co.nz/vehicles.aspx New Cars for Sale Wellington

    You really make it seem really easy along with your presentation however I find this matter to be really one thing which I feel I would by no means understand. It sort of feels too complicated and very huge for me. I’m looking ahead in your subsequent post, I’ll try to get the dangle of it!

  • http://sealsystem.sourceforge.net/phpBB2/profile.php?mode=viewprofile&u=355295 Need Free Electric

    Thank you a bunch for sharing this with all folks you actually understand what you’re speaking about! Bookmarked. Please additionally talk over with my website =). We can have a hyperlink exchange arrangement among us

  • http://www.naturalgasagent.com energy saving

    It appears you comprehend tons concerning this.