The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.
The information trapped in text files, PDFs, and other digital content isa valuable information asset that can be very difficult to discover anduse. Apache Tika is an open source toolkit that makes it easy for searchengines, content management systems and other applications to detectand extract content from digital documents in all major file formats. Tika in Action is a hands-on guide for developers working with searchengines, content management systems and other similar applicationswho want to exploit the information locked in digital documents. Itintroduces the world of mining text and binary documents as well asother information sources. The book shows where Tika fits within thislandscape and how readers can use Tika to build and extendapplications. The book's many case studies give real-world experiencefrom domains ranging from search engines to digital asset managementand scientific data processing.