Ask questions about WordPress courses

PHP class for working with text

In one of my jobs, I needed to get the page code and solve several problems, I won't go deep, what was that job, but by collecting the necessary functions and writing some of them, I completed the task. I put some of the functions into a class and now I decided to share these functions with my readers.

Problems that were solved in this class

  1. Cleaning up all tags in code;
  2. remove extra characters and words, which were not of great importance in further analysis of the text;
  3. deleting sections of code, which do not take part in the indexing of the page and do not affect the issuance in search engines;

Class for cleaning html code and working with text

class textOperation{
	
	function wp_strip_all_tags($string, $remove_breaks = false) { // чистит все теги 
		$string = preg_replace( '@<(script|style)[^>]*?>.*?</\\1>@si', '', $string );
		$string = strip_tags($string);

		if ( $remove_breaks )
			$string = preg_replace('/[\r\n\t ]+/', ' ', $string);

		return trim($string);
	}
	
	function cleanTextHtml($text){
	$search = array ("'<script[^>]*?>.*?</script>'si",  // Вырезается javascript 
					 "'<[\/\!]*?[^<>]*?>'si",           // Вырезаются html-тэги 
					 "'([\rn])[\s]+'",                 // Вырезается пустое пространство 
					 "'&(quot|#34);'i",                 // Замещаются html-элементы 
					 "'&(amp|#38);'i", 
					 "'&(lt|#60);'i", 
					 "'&(gt|#62);'i", 
					 "'&(nbsp|#160);'i", 
					 "'&(iexcl|#161);'i", 
					 "'&(cent|#162);'i", 
					 "'&(pound|#163);'i", 
					 "'&(copy|#169);'i" 
	); 

	$replace = array ("", 
					  "", 
					  "\\1", 
					  "\"", 
					  "&", 
					  "<", 
					  ">", 
					  " ", 
					  chr(161), 
					  chr(162), 
					  chr(163), 
					  chr(169) 
	); 

	$text = preg_replace($search, $replace, $text); 	
	return $text;
	}
	
	function delsimbpl($text){
		$del_symbols = array(",", ".", ";", ":", "\"", "#", "\$", "%", "^",
                         "!", "@", "`", "~", "*", "-", "=", "+", "\\",
                         "|", "/", ">", "<", "(", ")", "&", "?", "?", "\t",
                         "\r", "\n", "{","}","[","]", "'", "“", "”", "•",
                         " how ", " for ", " what ", " or ", " This is ", " этих ",
                         " всех ", " вас ", " они ", " оно ", " еще ", " when ",
                         " where ", " эта ", " лишь ", " уже ", " вам ", " нет ",
                         " если ", " надо ", " all ", " So ", " его ", " than ",
                         " at ", " даже ", " мне ", " есть ", " once ", " два ", " in ", "не",
                         " 0 ", " 1 ", " 2 ", " 3 ", " 4 ", " 5 ", " 6 ", " 7 ", " 8 ", " 9 "
                         );
		$text = str_replace($del_symbols, ' ', $text);
		return $text;
	}
	
	function wordcount($text){
		$text = $this->clearnBadHtml($text);
		$text = $this->cleanTextHtml($text);
		$text = $this->delsimbpl($text);
		$array = explode(' ', $text);
		echo '<pre>';
		print_r($array);
		echo '</pre>';
		return count($array);
		$array = $this->RemoveEmpty($array);
		return count($array);
	}
	
	function clearnBadHtml($code){ // убрать noindex nofollow <!---->
		$res = preg_replace("|<noindex>(.*?)</noindex>|si",'',$code);
		$res = preg_replace("|<style>(.*?)</style>|si",'',$res);
		$res = preg_replace("|<script(.*?)>(.*?)</script>|si",'',$res);
		$res = preg_replace("|<link(.*?) />|si",'',$res);
		$res = preg_replace("|\n\n|si",'',$res);
		$res = preg_replace("|\r\r|si",'',$res);
		//$res = preg_replace("|<a(.*?)rel=(.*?)nofollow(.*?)</a>|si",'',$res); // не всегда срабатывает
		$res = preg_replace("|<!--(.*?)-->|si",'',$res);
		$res = preg_replace(array("<noindex>","</noindex>"),'',$res);
		return $res;
	}
	
	function RemoveEmpty($array)
	{
		$Result = array();
		foreach ($array as $key => $value) {
			if ($value != '')
				$Result[] = $value;
		}
		return $Result;
	}
}

Below is an example of using the class. The class is used to clean up the code and leave only the text.

$obj = new textOperation;
$text = '	<ul class="sub-menu">
		<li id="menu-item-2662" class="menu-item menu-item-type-post_type menu-item-object-page menu-item-2662"><a href="https://wp-admin.com.ua/uroki-frilansa/oblasti-frilansa/"><span>Freelancing areas</span></a></li>
	</ul>
</li>
	<li id="menu-item-2671" class="menu-item menu-item-type-post_type menu-item-object-page menu-item-2671"><a href="https://wp-admin.com.ua/uroki-frilansa/frilans-i-fultaym-sushhestvuyut-vmeste/"><span>Freelancing and full-time work together</span></a></li>
	<li id="menu-item-2669" class="menu-item menu-item-type-post_type menu-item-object-page menu-item-2669"><a href="https://wp-admin.com.ua/uroki-frilansa/test-smogu-li-stat-frilanserom/"><span>Can I become a freelancer test?</span></a></li>
	<li id="menu-item-2661" class="menu-item menu-item-type-post_type menu-item-object-page menu-item-2661"><a href="https://wp-admin.com.ua/uroki-frilansa/mogu-li-ya-stat-frilanserom/"><span>Can I become a freelancer?</span></a></li>
</ul>';
echo $obj->wp_strip_all_tags($text);

Here is a memo to the programmer. I think many will be useful for their projects. Good luck in website development.


Купить хостинг WordPress
/* WordPress tutor
Online tutoring services. List of courses I teach
  • Basic web design course;
  • Site layout;
  • General course on CMS WordPress and continuation of the course on template development;
  • Website development in PHP.
Read more on the page WordPress tutor
*/

Nikolaenko Maxim

Director of web studies ProGrafika. I am developing, website design and promotion. Always glad to new blog readers and good clients.


You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Templates for WordPress
The best hosting in Ukraine
Stable hosting for Drupal