SharePoint Hybrid (OCR) Search
This idea is to enable SharePoint search to perform optical character recognition of images (incl. scanned PDF documents) when they are crawled by the SharePoint hybrid crawler. This will make it possible to search for text inside images (and scanned PDF documents) and find these documents more easily. Today, it is not possible to search for text inside such images or documents.

4 comments
-
Mr Nigel commented
I presume this request is not related to searching scanned PDF/A documents as we have successfully uploaded 500,000K plus as we digitize our paper archive and are able to search their content no problem
-
Will Young commented
photography and news shops would benefit greatly from image and facial recognition for cataloging and searching image libraries. The iPhone "Photos" app does this remarkably well. This capability would be huge for photo based organizations or departments.
-
Dan Gøran Lunde commented
OCR can be supported via content enrichment. The pre-built CEWS Pipeline Toolkit already includes integrations with OCR vendors.
-
Lawrence Dwight commented
Given the need to classify and protect data given GDPR and other data privacy laws and regulations this is becoming increasingly important to enable automatic scanning including using Azure Information Protection Scanner.