Remove/Extract Specific Pages From pdf File Using Terminal

Post Reply
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5326
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#1

Sometimes you may need to extract/remove some specific pages from a pdf document. You can achieve this without much hassle through ubuntu command line. Here is an example to keep only pages 10 - 90 of the original document, File_1.pdf with 100 pages and produce a new document File_2.pdf:

Code: Select all

pdftk File_1.pdf cat 10-90 output File_2.pdf
1
1 Image
TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5326
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#2

Here is another magic to remove only a specific page in the pdf document by using pdftk/PdfToolKit. Suppose you have a pdf document with 10 pages and you would like to remove only page number 7, then execute the following command:

Code: Select all

pdftk document.pdf cat 1-6  8-end output new_document.pdf
Where document.pdf is the input file with ten pages and new_document is the output file with the seventh page stripped out.

You can do things more magically, for example to take every pdf file in the current directory and copy them to the new directory with only a certain page of each pdf file removed. In the example below, we make a directory named Trimmed and copy all pdf files with only the first page stripped out:

Code: Select all

mkdir Trimmed
for i in *pdf ; do pdftk "$i" cat 2-end output "Trimmed/$i" ; done
1
1 Image
TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5326
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#3

Install pdftk in Ubuntu 18.04

pdftk is not installed by default in Ubuntu 18.04 (Bionic) due to deprecated GCJ runtime on which pdftk package in Ubuntu (and its upstream Debian package) depend on.

You can install pdftk from PPA as follows:

  1. $sudo add-apt-repository ppa:malteworld/ppa
  2. $sudo apt update
  3. $sudo apt install pdftk


See more.
0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5326
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#4

Here is the magic to insert a range of pdf pages in another large pdf file. The command line below will insert any number of pages, "Insert_pages.pdf" in between the "Big.pdf" document after page 99 and output a new pdf file "Bigger.pdf":

  1. pdftk A=Big.pdf B=Insert_pages.pdf cat A1-99 B A100-end output Bigger.pdf

0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5326
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#5

Usually removing specific pages from a document may go with word counts. Counting words in a pdf document under Unix/Linux can be done as follows:

Count words -- including pages numbers, header text, etc,

  1. pdftotext yourfile.pdf - | wc -w


Or

count only words starting with a char out of [A-Za-z], or out of [A-Za-z,0-9], respectively:

  1. pdftotext yourfile.pdf - | tr " " "\n" | sort | uniq | grep "^[A-Za-z]" > words

  1. pdftotext yourfile.pdf - | tr " " "\n" | sort | uniq | grep "^[A-Za-z,0-9]" > words


and after getting the words list, "words", grep it within the output of pdftotext:

  1. pdftotext yourfile.pdf - | tr " " "\n" | grep -Ff words | wc -l


See a reference.

See also an online tool here for counting words in the LaTeX documents.
0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
Post Reply

Return to “Linux and Unix Based Operating Systems”

  • Information
  • Who is online

    Users browsing this forum: No registered users and 0 guests