Ubuntu – grep for text in *.odt or *.doc files?

greplibreoffice

How can I search for *.odt or *.doc files that contain certain text in Ubuntu?

I use grep -rl <text to search for>, but this only works for text files.

Note: a solution that uses grep (such as searchmonkey) will not work because the *.doc or *.odt files have a special format.

From How to search for strings inside files in a folder?

  • Recoll wants to index my home directory, but I want to search *.odt
    files in specific directories; I couldn't figure out how to do that with
    this tool.
  • Searchmonkey seems to be a GUI for grep, and as I mentioned, grep
    doesn't work on *.doc or *.odt files.
  • Regexxer also has the same problem.

From Searching through ODT documents without opening them?

  • Like Recoll, I couldn't figure out how to search *.odt files in specific directories with this tool.

Best Answer

  • catdoc appears to work recursively for .doc files in 16.04: https://superuser.com/questions/330242/how-to-recursively-find-a-doc-file-that-contains-a-specific-word

    There's no mention of .docx so you'll need to figure that one out yourself.

    For .ods or .odt files, you could have the following script courtesy kaibob @ ubuntuforums.org:

    #!/bin/bash
    
    find . -type f -name "*.od*" | while read i ; do
       [ "$1" ] || { echo "You forgot search string!" ; exit 1 ; }
       unzip -ca "$i" 2>/dev/null | grep -iq "$*"
       if [ $? -eq 0 ] ; then
          echo "string found in $i" | nl
       fi
    done
    

    Let's say you call it "libre-search" and have made it executable.

    Then, running libre-search your_string should list files containing your_string.

    unzip -ca "$i" 2>/dev/null takes care of unwanted content.
    grep -iq makes the search case-insensitive.
    nl numbers the output.