Prohibition of indexing of different types of robots.txt files
Recently on the Internet, in one English forum, found the list commands to block file indexing for expansion and various addresses on the site through a file robots.txt. I decided that it might be useful to someone in three cases.
- If you do not want to show hackers sites that you yourself programmed.
- In order to prevent indexing of canonical pages, pages that are similar and are not taken into account by search engines, but they can lower the site in search results. Although only developers of search engines and analytical systems can judge this, if they do it.
- When developing a closed site, it is also desirable to indicate a ban on indexing, but you can make a complete ban on site indexing.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
So, maybe if instead of using User-agent: Googlebot-Image Disallow: / You tried: User-agent: Googlebot-Image Disallow: / User-agent: Googlebot Disallow: /images/ Disallow: /img/ Disallow: /icons/ Disallow: /icons/small/ Disallow: /gallery/ Disallow: /graphics/ Disallow: /gfx/ Disallow: /buttons/ Disallow: /thumbs/ Disallow: /thumbnails/ Disallow: /*.pdf$ Disallow: /*.ico$ Disallow: /*.tif$ Disallow: /*.pict$ Disallow: /*.png$ Disallow: /*.gif$ Disallow: /*.jpg$ Disallow: /*.jpeg$ Disallow: /*.doc$ Disallow: /*.xls$ Disallow: /*.pps$ Disallow: /*.ppt$ Disallow: /*.eml$ Disallow: /*.url$ Disallow: /*.log$ Disallow: /*.txt$ Disallow: /*.js$ Disallow: /*.pac$ Disallow: /*.css$ Disallow: /*.csv$ Disallow: /*.ext$ Disallow: /*.class$ Disallow: /*.cls$ Disallow: /*.jar$ Disallow: /*.java$ Disallow: /*.c$ Disallow: /*.htx$ Disallow: /*.idc$ Disallow: /*.qry$ Disallow: /*.wo$ Disallow: /*.woa$ Disallow: /*.wos$ Disallow: /*.lp$ Disallow: /*.ls$ Disallow: /*.lsp$ Disallow: /*.au$ Disallow: /*.mid$ Disallow: /*.wav$ Disallow: /*.avi$ Disallow: /*.dat$ Disallow: /*.mov$ Disallow: /*.mpeg$ Disallow: /*.mpg$ Disallow: /*.dir$ Disallow: /*.dcr$ Disallow: /*.dxr$ Disallow: /*.aam$ Disallow: /*.aas$ Disallow: /*.aab$ Disallow: /*.fh$ Disallow: /*.spl$ Disallow: /*.swf$ Disallow: /*.fla$ Disallow: /*.ipx$ Disallow: /*.bin$ Disallow: /*.hqx$ Disallow: /*.sea$ Disallow: /*.sit$ Disallow: /*.dmg$ Disallow: /*.conf$ Disallow: /*.plist$ Disallow: /*.cab$ Disallow: /*.dll$ Disallow: /*.exe$ Disallow: /*.zip$ Disallow: /*.tar$ Disallow: /*.gz$ Disallow: /*.gzip$ Disallow: /*? Disallow: /*.t$ Disallow: /*.cgi$ Disallow: /*.pl$ Disallow: /*.plx$ Disallow: /*.pm$ Disallow: /*.py$ Disallow: /*.pyc$ |
1 2 |
User-Agent: * Disallow: / |
1 2 |
User-agent: Googlebot Disallow: /*.php$ |
/*
- Basic web design course;
- Site layout;
- General course on CMS WordPress and continuation of the course on template development;
- Website development in PHP.
Hello, Tell me please, what does string mean 88 Disallow: /*? – prohibition of what? pages without extension?
I think it's correct to read it like this. Do not index if internal pages of the site have a GET request.
For example:
So it will index:
http://wp-admin.com.ua/zapret-indeksatsii-raznyih-tipov-faylov-robots-txt/#comment-728559463
But there will be no such link:
http://wp-admin.com.ua/zapret-indeksatsii-raznyih-tipov-faylov-robots-txt?zapros=123
* – in this case means any number of any characters between the first (root) slash in the address and a question mark. Roughly speaking, all pages in which there is a question mark in the address.
If something is not clear write.