The paper that I am writing requires some background information about how digital imaging has been related to phenology. For this I have decided to implement an SLR. I am at the point in the SLR process where I have to “Define or elicit search string”. I chose to do this by counting the repeated words in my initial list of papers (Denominated “Quasi Gold Standard” in Pablo’s paper).
The linux Journal got me started with this interesting article from Dave Taylor. The article got most of the work done, but it lacked the part where I change all my pdf files into text files. So here is the revised version of the Dave’s really cool one liner :
find . -name *.pdf -exec pdftotext '{}' - \; |tr ' ' '\ ' |tr '[:upper:]' '[:lower:]' | tr -d '[:punct:]' | grep -v '[^a-z]' |sort | uniq -c | sort -rn > output.txt
I just added the initial part where I find all the pdf and execute `pdftotext`. Hope this is useful :)