Page 1 of 1

Remove/Extract Specific Pages From pdf File Using Terminal

Posted: Sat Feb 27, 2016 11:13 am
by Eli
Sometimes you may need to extract/remove some specific pages from a pdf document. You can achieve this without much hassle through ubuntu command line. Here is an example to keep only pages 10 - 90 of the original document, File_1.pdf with 100 pages and produce a new document File_2.pdf:

Code: Select all

pdftk File_1.pdf cat 10-90 output File_2.pdf

Re: Remove/Extract Specific Pages From pdf File Using Terminal

Posted: Fri Dec 09, 2016 7:58 am
by Eli
Here is another magic to remove only a specific page in the pdf document by using pdftk/PdfToolKit. Suppose you have a pdf document with 10 pages and you would like to remove only page number 7, then execute the following command:

Code: Select all

pdftk document.pdf cat 1-6  8-end output new_document.pdf
Where document.pdf is the input file with ten pages and new_document is the output file with the seventh page stripped out.

You can do things more magically, for example to take every pdf file in the current directory and copy them to the new directory with only a certain page of each pdf file removed. In the example below, we make a directory named Trimmed and copy all pdf files with only the first page stripped out:

Code: Select all

mkdir Trimmed
for i in *pdf ; do pdftk "$i" cat 2-end output "Trimmed/$i" ; done

Re: Remove/Extract Specific Pages From pdf File Using Terminal

Posted: Fri Jan 04, 2019 11:10 am
by Eli
Install pdftk in Ubuntu 18.04

pdftk is not installed by default in Ubuntu 18.04 (Bionic) due to deprecated GCJ runtime on which pdftk package in Ubuntu (and its upstream Debian package) depend on.

You can install pdftk from PPA as follows:

  1. $sudo add-apt-repository ppa:malteworld/ppa
  2. $sudo apt update
  3. $sudo apt install pdftk


See more.

Re: Remove/Extract Specific Pages From pdf File Using Terminal

Posted: Tue Dec 17, 2019 12:21 am
by Eli
Here is the magic to insert a range of pdf pages in another large pdf file. The command line below will insert any number of pages, "Insert_pages.pdf" in between the "Big.pdf" document after page 99 and output a new pdf file "Bigger.pdf":

  1. pdftk A=Big.pdf B=Insert_pages.pdf cat A1-99 B A100-end output Bigger.pdf


Re: Remove/Extract Specific Pages From pdf File Using Terminal

Posted: Wed Jun 08, 2022 11:29 am
by Eli
Usually removing specific pages from a document may go with word counts. Counting words in a pdf document under Unix/Linux can be done as follows:

Count words -- including pages numbers, header text, etc,

  1. pdftotext yourfile.pdf - | wc -w


Or

count only words starting with a char out of [A-Za-z], or out of [A-Za-z,0-9], respectively:

  1. pdftotext yourfile.pdf - | tr " " "\n" | sort | uniq | grep "^[A-Za-z]" > words

  1. pdftotext yourfile.pdf - | tr " " "\n" | sort | uniq | grep "^[A-Za-z,0-9]" > words


and after getting the words list, "words", grep it within the output of pdftotext:

  1. pdftotext yourfile.pdf - | tr " " "\n" | grep -Ff words | wc -l


See a reference.

See also an online tool here for counting words in the LaTeX documents.