Page 1 of 1
Remove/Extract Specific Pages From pdf File Using Terminal
Posted: Sat Feb 27, 2016 11:13 am
by Eli
Sometimes you may need to extract/remove some specific pages from a pdf document. You can achieve this without much hassle through ubuntu command line. Here is an example to keep only pages 10 - 90 of the original document, File_1.pdf with 100 pages and produce a new document File_2.pdf:
Code: Select all
pdftk File_1.pdf cat 10-90 output File_2.pdf
Re: Remove/Extract Specific Pages From pdf File Using Terminal
Posted: Fri Dec 09, 2016 7:58 am
by Eli
Here is another magic to remove only a specific page in the pdf document by using
pdftk/PdfToolKit. Suppose you have a pdf document with 10 pages and you would like to remove only page number 7, then execute the following command:
Code: Select all
pdftk document.pdf cat 1-6 8-end output new_document.pdf
Where document.pdf is the input file with ten pages and new_document is the output file with the seventh page stripped out.
You can do things more magically, for example to take every pdf file in the current directory and copy them to the new directory with only a certain page of each pdf file removed. In the example below, we make a directory named Trimmed and copy all pdf files with only the first page stripped out:
Code: Select all
mkdir Trimmed
for i in *pdf ; do pdftk "$i" cat 2-end output "Trimmed/$i" ; done
Re: Remove/Extract Specific Pages From pdf File Using Terminal
Posted: Fri Jan 04, 2019 11:10 am
by Eli
Install pdftk in Ubuntu 18.04
pdftk is not installed by default in Ubuntu 18.04 (Bionic) due to deprecated GCJ runtime on which
pdftk package in Ubuntu (and its upstream
Debian package) depend on.
You can install pdftk from PPA as follows:
$sudo add-apt-repository ppa:malteworld/ppa
$sudo apt update
$sudo apt install pdftk
See
more.
Re: Remove/Extract Specific Pages From pdf File Using Terminal
Posted: Tue Dec 17, 2019 12:21 am
by Eli
Here is the magic to insert a range of pdf pages in another large pdf file. The command line below will insert any number of pages, "Insert_pages.pdf" in between the "Big.pdf" document after page 99 and output a new pdf file "Bigger.pdf":
pdftk A=Big.pdf B=Insert_pages.pdf cat A1-99 B A100-end output Bigger.pdf
Re: Remove/Extract Specific Pages From pdf File Using Terminal
Posted: Wed Jun 08, 2022 11:29 am
by Eli
Usually removing specific pages from a document may go with word counts. Counting words in a pdf document under Unix/Linux can be done as follows:
Count words -- including pages numbers, header text, etc,
pdftotext yourfile.pdf - | wc -w
Or
count only words starting with a char out of [A-Za-z], or out of [A-Za-z,0-9], respectively:
pdftotext yourfile.pdf - | tr " " "\n" | sort | uniq | grep "^[A-Za-z]" > words
pdftotext yourfile.pdf - | tr " " "\n" | sort | uniq | grep "^[A-Za-z,0-9]" > words
and after getting the words list, "words", grep it within the output of pdftotext:
pdftotext yourfile.pdf - | tr " " "\n" | grep -Ff words | wc -l
See a
reference.
See also an online tool
here for counting words in the LaTeX documents.