SEO - Technical Requirements
Google Search technical requirements
Search Essentials specifies only three technical requirements to be eligible to be indexed by Google Search.
-
Googlebot is crawling and not blocked.
-
The page works, i.e. Google receives an HTTP 200 (success) status code.
-
The content should be indexable
Googlebot is crawling and not blocked (it can find and access the page)
Your website page should be accessible to the public and not block Googlebot (crawler) from crawling them. If your website page is made private, it means someone needs to login to access that page then Google would not be able to crawl such page. Also, if similar mechanisms are used then Googlebot would not be able to crawl and would not index your page. To understand better let’s take an example. Suppose you have a friend living in a building which has a full security and surveillance system. If you want to go to his flat, you must first enter the building premises through the main gate. Now at the entrance watchman stops you and make you enter your details in his register, after that you go to the lobby area where you find the list of room numbers and their owner’s name. There you find your friend’s name against the floor and room number in which he resides.
Finally, you go to the room and meet your friend. In the same way, Googlebot enter your website premises, the watchman asks Googlebot to enter his details in register. In this case the watchman is robot.txt file of your website. Once robot.txt file allows Googlebot and make Googlebot to access the list in the lobby area, list in this case is the sitemap where all the details of your website are listed like where is the about page of your website, where is the privacy page etc.
Google visit every page and reads what’s there in those pages means crawl the whole website. If any page or files that requires password or which is locked or private, then those pages are not accessible to Googlebot.
How to check Googlebot crawl and access your page?
Use Page Indexing Report and Crawl Stats Report in Search Console. If you want to
test a specific page whether it has been indexed or not, then visit URL Inspection Tool.
NOTE: Blocking Googlebot using a robots.txt file will prevent crawling, a page's URL might still appear in search results. To instruct Google not to index a page, use noindex and allow Google to crawl the URL.
The page works, i.e. Google receives an HTTP 200 (success) status code.
Before going to understand HTTP 200, lets first discuss that you have seen many times 404 Error on the webpage.
Hey what is 404 Error, have you ever wondered about? I will help you to make you understand with an example.
Suppose you have a website name https://www.adcampaings.com, and there is a dead page called AboutMe on your website. Then Googlebot won’t be able to crawl that page and tells the Search Engine (gives code HTTP 404(Error)) not to index such page. But if your websites’ AboutMe page is perfectly working fine then Googlebot will crawl and tells the Search Engine (gives the code HTTP 200(success)) to index that page as well.
The content should be indexable.
Once Googlebot can access a working page, it checks the page for indexable content through two parameters
The content should be in a file type that Google Search Support
The content should not violate Googles’ spam policies.
Below are the most common file types that Google Search Support.
Adobe Portable Document Format (.pdf)
Adobe PostScript (.ps)
Comma-Separated Values (.csv)
Electronic Publication (.epub)
Google Earth (.kml, .kmz)
GPS eXchange Format (.gpx)
Hancom Hanword (.hwp)
HTML (.htm, .html, other file extensions)
Microsoft Excel (.xls, .xlsx)
Microsoft PowerPoint (.ppt, .pptx)
Microsoft Word (.doc, .docx)
OpenOffice presentation (.odp)
OpenOffice spreadsheet (.ods)
OpenOffice text (.odt)
Rich Text Format (.rtf)
Scalable Vector Graphics (.svg)
TeX/LaTeX (.tex)
Text (.txt, .text, other file extensions), including source code in common programming languages, such as:
Basic source code (.bas)
C/C++ source code (.c, .cc, .cpp, .cxx, .h, .hpp)
C# source code (.cs)
Java source code (.java)
Perl source code (.pl)
Python source code (.py)
Wireless Markup Language (.wml, .wap)
XML (.xml)
Google can also index the following media formats:
Image formats: BMP, GIF, JPEG, PNG, WebP and SVG
Video formats: 3GP, 3G2, ASF, AVI, DivX, M2V, M3U, M3U8, M4V, MKV, MOV, MP4, MPEG, OGV, QVT, RAM, RM, VOB, WebM, WMV and XAP