Enable crawling file contents with no extension in URL when the Content-Type is properly set
[Content Source Type: Web site]
Files even with no extension in URL (like "http://xxxx.com/files/12345") should be crawled successfully when the Content-Type in the HTTP response header is properly set.
When we crawl (docx, xlsx, pptx, pdf) files which include no extension in URL, the following crawl log was output and failed to identify the file format.
"The filtering process could not load the item. This is possibly caused by an unrecognized item format or item corruption."
The same result is obtained when the Content-Type in the HTTP response header is properly set.
In the Diagnostic logging, we can see that SharePoint's crawler obtain file's Content-Type as "DocFormat".
Why is the Content-Type not used to identify the file format?