5 Processing of LaTeX Main Files

Chapter 5
Processing of LaTeX Main Files

Given graphics in formats includable in TEX files, which may require preprocessing described in Chapter 4, this section describes the conversions of LaTeX main files into target files in detail. The most important target file format is pdf. Conversion into this format is described in Section 5.1. Note that pdf also occurs as source format for included pictures and as intermediate files. Specific for LaTeX is the dvi format, which is supported mainly for historical reasons.

Almost independent of the format created, inclusion of bibliographies, indices and glossaries requires additional conversions done by several auxiliary programs. Bibliographies are described in Section 5.2, indices in Section 5.3 and glossaries in Section 5.4. Only at the first sight different but behind the scenes quite analogous is inclusion of results of code evaluations, code in python and other languages described in Section 5.5. Here, an auxiliary program essentially invokes the language interpreter.

Sections 5.6 and 5.7 describe running and rerunning auxiliary programs like makeindex and the LaTeX engine, respectively. The latter may be necessary if certain lists are present like table of contents list of figures or list of tables. Section 5.6 clarifies the exchange of information between the LaTeX engines and auxiliary programs, whereas Section 5.7 essentially describes the exchange of information between individual runs of the LaTeX engine.

Section 5.8 is special in that it is not related with conversion but with checking reproducibility. This LaTeX builder has some built-in build algorithm, but one can also use latexmk as a build tool in a way that invokes all tools with parameters given by the configuration. Note that latexmk has a different build algorithm, but the results should be the same. This is mainly to integrate document development more seamlessly. For details on motivation and implementation see Section 5.9.

Besides the output formats traditional for LaTeX, pdf and dvi describing e.g. books, Section 5.10 describes creation of html, Section 5.11 the creation of odt and Section 5.12 creation of MS Word formats like docx. Finally, also pure text can be generated as described in Section 5.13.

5.1 Transforming LaTeX files into PDF files

The next step is to create a PDF file from the TEX files. LaTeX distinguishes master TEX files from TEX files intended to be inputted from elsewhere. Not taking comments and that like into account, master TEX files roughly have the form

\RequirePackage[l2tabu, orthodox]{nag} % optional 
\documentclass{...} 
 
\begin{document} 
... 
\end{document}

The core of conversion of a TEX file into a PDF file is running a LaTeX engine latex2pdf to a master TEX file xxx.tex. The LaTeX engine latex2pdf is configurable via the parameter latex2pdfCommand. Possible values are lualatex, xelatex and pdflatex, where the first is the default for which this software is also tested. It is also possible to pass parameters to the LaTeX engine. Besides conversion into pdf format, all engines offer conversion to the older dvi format via option --output-format as lualatex and pdflatex, or the alternative xdv generalizing dvi as xelatex does with the option --no-pdf.

In fact, the engine latex2pdf does much more than converting TEX files to PDF files. Figure 5.1 shows for latex2pdf set e.g. to lualatex, that besides the PDF file also a LOG file and an AUX file is created. The LOG file contains logging information on the run of the conversion and the AUX file transports information from one run to the next, writing in one run and reading in the next run. Thus, conversion goes without it, but it is read if present. This is why it is depicted at input side in dashed lines.

Optionally, an FLS file is created containing paths to the files the converted LaTeX file depends on and a file with ending synctex.gz or synctex with information for mapping locations at the created PDF file to the according input files. This is to support backward search, meaning click on a place in the PDF viewer opens an editor in the source file.

What is in fact in the AUX file depends on the package. Among other information, also citations and the location of the bibliography file with ending bib are present. This cannot be used directly in the next latex2pdf run to create the bibliography, because the entries referenced in the document must be extracted from the BIB file and sorted. This is done by invoking bibtex between two latex2pdf runs. Based on the AUX file, bibtex creates a BBL file containing the bibliography, which is read in the next latex2pdf run. For details see Section 5.2.

Alternatively to bibtex a bibliography can be created with the package biblatex in conjunction with the auxiliary program biber. Running a LaTeX engine with package biblatex loaded creates a bcf file read by biber. At time of this writing, this software does not support that option. Nevertheless, for sake of completeness we added this data path to Figure 5.1.

If an index is demanded, in addition latex2pdf creates a idx file. As the citations, it cannot be used directly to create an index in the next latex2pdf run, because the index entries must be collected and sorted before. This is done by invoking makeindex between the two latex2pdf runs. Based on the idx file, makeindex creates a ind file containing the index, which is read in the next latex2pdf run. For details see Section 5.3.

If more than one index is demanded, we suggest using splitindex instead of makeindex which creates one ind file per index.

A more modern technique to create an index is via xindy, but at time of this writing, this software does not support xindy yet.

If a glossary is demanded, this can be read off the aux file and a glo file containing the index entries is created and a file with style information. Depending on the configuration, this may be a ist file or a xdy file. As for the index the idx file, the glo file cannot be used directly to create a glossary in the next latex2pdf run, because the glossary entries must be collected and sorted before. This is done by invoking makeglossaries between the two latex2pdf runs. Based on the glo file, makeglossaries creates a gls file containing the glossary, which is read in the next latex2pdf run. For details see Section 5.4.

Besides makeglossaries, there is a more modern tool, bib2gls, which is not yet supported by this software at time of this writing.

The package pythontex allows including python code or related in the tex file and to evaluate it. The first latex2pdf run creates a pytxcode file which contains essentially the code parts of the LaTeX file. Invoking pythontex creates by default a folder pythontex-files-xxx with material where code is already evaluated. In the next latex2pdf run, this material is included in the document. The pythontex comes with a second command line utility, depythontex, eliminating all python code from the original TEX file. Optionally, latex2pdf also creates a depytx file with all information to replace python code in the original TEX file with evaluated material from pythontex-files-xxx. Replacement is done by depythontex which by default, sends the result to stdout, but there is an option to write into another LaTeX file. Converting this new LaTeX file yields the same result as converting the original one. Depythonization is a feature needed e.g. for papers when the publisher does not accept included code. For details see Section 5.5.

In addition, if a table of contents, a list of figures, a list of tables or a list of listings is required, also a TOC file, a LOF file, a LOT file and a LOL file is created, respectively, collecting the according information. Also, if hyper-references are built, an out file containing bookmarks is created. If such a file is present, it is read in and is used to create a table of contents, a list of figures, of tables and of listings or bookmarks in the second run of latex2pdf.

To summarize, if a table of contents, a list of figures, a list of tables, a list of listings or a bibliography, an index or a glossary is present, or if code must be replaced by their evaluation, a second LaTeX run is required to make that material appear in the PDF output.

If a table of contents and at the same time a bibliography, an index or a glossary is present, even two further LaTeX runs are required: After the first one, the bibliography, the index or the glossary occurs in the PDF file but not yet in the table of contents. This happens after the second additional LaTeX run. As described in Sections 5.6 and 5.7, further runs of auxiliary programs mainly to create index or glossaries, but also under certain circumstances bibliographies and inserting invoked code, followed by invocation of the LaTeX engine latex2pdf may be necessary.

should be a picture

Figure 5.1: Conversion of a TEX file into a PDF, DVI, XDV file

5.2 Bibliographies

For each occurrence of a command \cite in the TEX file, referring to a document with given key, latex2pdf writes an according entry \citation with that key into an AUX file. Note that, if the LaTeX main file includes other TEX files with \include, and the \cite-command is invoked in the included TEX file, the \citation commands go into the AUX file of that TEX file. Moreover, a \bibliography-command in the TEX file writes a link to the BIB files containing the bibliography data into the (top level) AUX file as \bibdata. Note that \bibliography accepts a list of BIB files, not only a single one, as maybe suggested by the singular name¹ . The key given by \cite commands must refer to exactly one key in the BIB files. Last not least, a \bibliographystyle-command in the TEX file writes a link to the bibliography style file which determines the appearance of the bibliography and also the labels and the ordering into the AUX file as \bibstyle. Typically, the style file comes from the TeX distribution rather than the user. Its ending is bst.

To create a bibliography, a bibtexCommand must be run after the LaTeX run. The default command is the traditional bibtex, but there are more modern alternatives also supported like bibtexu and bibtex8 supporting utf8 encoding and others. Among the tools which are not supported are biber and mlbibtex.

We run bibtexCommand if either \bibliography or \bibliographystyle is in the top level AUX file. If there is no \cite-command, bibtex yields an error. If neither \bibliography-command nor \bibliographystyle-command are present, then presence of \cite yields an error when running the LaTeX engine. So, there is an error if not either all three ingredients are present or neither.

Essentially, bibtex extracts the citations in the AUX files, unifies them, i.e. a citation is listed once even if it is used more than once, retrieves the according entries from the BIB files specified, sorts and formats these entries according to the bst file and writes all into a bbl file which can be included in the next run of latex2pdf. Formatting includes associating a label with each key and sorting is based typically on the label. The bbl file consists essentially in a thebibliography environment listing the \bibitems. These relate the key and the label given by the BST file and show the text of the bibliography entry.

Note that after a bibtex-run, two LaTeX runs are required: The first one just puts the bibliography found in the bbl file xxx.bbl into the PDF file at place of \bibliography as \input{xxx.bbl} would do (which shows why \bibliography is singular, although a list of BIB files may serve as source) and the labels of the citations into the AUX file as \bibcite-commands. The second run places the labels of the citations found in the AUX file at the citations given by \cite. The package tocbibind described in [WP10], then writes the headline of the bibliography into the table of contents if option numbib

This software presupposes, that bibtex reads the AUX file and creates a bbl file and also a blg file with logging output as illustrated by Figure 5.2. From the BLG file this software may determine whether bibtex emitted an error or warnings.

should be a picture

Figure 5.2: Conversion of an AUX file into a BBL file using bibliographies

Vital information on bibtex is found in [Pat88] and in [Mar09]. Also, [Grä96], Chapter 10 is worth reading in this context.

Note that in the master AUX file one can find also entries \bibcite relating the labels for bibliography entries to the representations to be inserted for the \cite commands, but it is the LaTeX engine which extracts these mappings from the \bibitem entries in the BBL file written by bibtex.

The package tocbibind described in [WP10], then writes the headline of the index into the table of contents, if the option numibib is given.

5.3 Indices

If an index is wanted, the command \makeindex must be issued before any index entry is requested. In fact, this does nothing but opening xxx.idx given a LaTeX main file xxx.tex. The idx file collects all index information extracted from xxx.tex.

Let us first assume that only a single index is wanted. For each occurrence of a command \index in the TEX file, specifying an index entry, latex2pdf writes an according entry \indexentry into the idx file which relates the entry with the page number where it occurred.

For example \index{ant-task} occurring on page 3 creates an entry

\indexentry{ant-task}{3}

in the idx file. Caution: If the idx file is not open, \index has no significant impact and in particular no index is created, without any warning.

To create an index, a makeIndexCommand described in Table 6.6 on page 271 must be run after the LaTeX run. The default command and the only one currently supported by this software, is the traditional makeindex. Similar but based on Unicode are upmendex and xindex. Whereas in the context of glossaries xindy is still used, for pure index creation for which xindy has been designed originally, it seems widely abandoned. The manual is [Sch14].

At time of this writing, quality of output and above all quality of logging of errors and warnings given by makeindex as described in [Mös98], Section 5 is not yet fully reached by upmendex which is described in [Tan24]. Even worse, xindex described in [Vos24] shows poor quality and seemingly does not log errors and messages at all. Thus still makeindex is preferred usage, as described in Section 10, but it is easy to adapt patternErrMakeIndex and patternErrMakeIndex also described in Table 6.6 to upmendex or even so these parameters apply to both makeindex and upmendex.

Note that entries in the idx file can occur more than once, even with the same page number. The task of makeindex and related is, to sort the index entries given in the idx file and within each entry to sort the page numbers unifying same page numbers and simplifying by using ranges and writing the result to an ind file, which essentially consists of an theindex environment listing \items and \subitems.

The behavior of the various makeIndexCommand-tools varies if the idx file is empty: upmendex does not create an ind file at all, makeindex creates and empty one and only xindex creates an ind file with an empty theindex environment.

Then the makeindex-command is applied to the idx file which sorts keywords and for each keyword collects the according page numbers, sorts it and writes the result into a ind file. In the next run of latex2pdf, the \printindex-command in the TEX file includes the index much like \inputxxx.ind. The most basic package to provide this command is makeidx described in [BLC+14]. In addition, makeidx provides the command \see which is for cross-reference within an index. The package tocbibind described in [WP10], then writes the headline of the index into the table of contents, if the option numindex is given.

The same document, [BLC+14] also describes the package showidx which prints index entries at the margin of the document. This is for debugging only.

This software presupposes, that makeindex converts the idx file into an ind file containing the index and creating also an ilg file with logging output as shown in Figure 5.3. From the ilg file this software may determine whether makeindex emitted an error or warnings.

should be a picture

Figure 5.3: Conversion of an IDX file into an IND file

The main restriction of the package makeidx is, that only a single index can be created. The reason is that, latex2pdf creates a single idx file and makeindex creates a single ind file from that, representing a single index.

To overcome this restriction, replace package makeidx and makeindex with package splitidx and splitindex both described in [Koh16].

Using package splitidx instead of makeidx, still the commands \index and \printindex can be used and both work as for package makeidx. Thus, also the tool makeindex can be combined with package splitidx. This shows that splitidx is a full replacement of makeidx, if a single index is required besides supporting multiple indices.

To support multiple indices, splitidx offers the command \newindex[…]{…} to define a new index with given identifier and optional headline. Besides indexing command \index, there is the command \sindex[…]{…}, with optional index identifier.

Also command \printindex has additional variants, among those write all indices or write the index of a given identifier \printindex[…]. Note that there is a special identifier, idx referring to the main index. So \sindex[idx]{…} mainly behaves as \sindex{…}.

Option split of splitidx makes latex2pdf creating idx files xxx-y.idx directly. Here y represents the identifier of an individual index which is idx for entries created by \sindex[idx]{…}, \sindex{…} and \index{…}. These idx files can be transformed individually with makeindex into ind files creating log files ilg as illustrated in Figure 5.4. Since latex2pdf can keep open only up to 16 output streams at once, not all of which can be used to create a file xxx-y.idx, this approach allows a limited number of indices and is thus not recommended and not supported by this software.

should be a picture

Figure 5.4: Not supported: Conversion of IDX files into IND files

Instead, splitidx is supported without option split. Then latex2pdf creates a single idx file but \sindex[y]{…} creates lines \indexentry[y]{…} in the idx file which allow to identify the index y. For example \newindex[Packages]{pkg} defines a new index of LaTeX packages and \sindex[pkg]{splitidx} on page 3 indicates that index there shall be an entry splitidx referring to page 3. The according entry in the idx file is as follows:

  \indexentry[pkg]{splitidx}{3}

Note that both \sindex{…} and \index{…} create entries \indexentry{…} as with a single index.

The program splitindex splits up the single file xxx.idx into several idx files xxx-y.idx. Besides the lines \indexentry[idx]{…} also the lines \indexentry{…} go into xxx-idx.idx. Then splitindex applies makeindex to each of these idx files separately creating files xxx-y.ind and according ilg files, as illustrated in Figure 5.5.

Note that splitindex itself does not create any kind of log file. Strictly speaking, there are at least two variants of splitindex implemented in different languages and with slightly different behavior. At time of this writing, only the main variant in Perl is supported, but it may be interested to generalize this to the version in the Lua language.

should be a picture

Figure 5.5: Conversion of an IDX file into IND files

The package splitidx is intended to be used in conjunction with the program splitindex, but it can also create a single index and if so, it is better to use it in conjunction with makeindex or that like. This software can decide on the idx file whether there is a line specifying an index like so,

  \indexentry[pkg]{splitidx}{3}

or neither has an explicit index, i.e. like so

  \indexentry[pkg]{splitidx}{3}

In the first case splitindex is invoked, in the second makeindex is invoked.

For usage of further packages supporting multiple indices which are not intended to be used with this software, see Chapter 8.

It is possible to configure the makeindex-command and to pass arbitrary options. CAUTION: For the usual makeindex-command, the options -o specifying an output file and -t (transcript) specifying the logging file are not allowed, because this breaks the expectation to find the sorted index in file xxx.ind and bypasses the detection of errors and warnings of this software, respectively. Also specifying a style file via option -s is not recommended because this is used to create a glossary and so breaks glossary creation as described in Section 5.4.

Information on the makeindex program can be found in [Mös98] and in [Lam87]. Also, there is a site [LRZ] describing all available options for makeindex.

As indicated above, the program splitindex invokes makeindex. Its options are described in [Koh16], Section 3.10. Since the long option names are not understood in all environments, only the short options are recommended.

Since splitindex must satisfy the interface given by Figure 5.5, the option –help and its shortcut -h are not allowed. Likewise for option –version and its shortcut -V. The option –makeindex <makeindex>, resp. -m <makeindex>, is used with the makeindex command used for single indices. Thus, this may not be given explicitly but is specified implicitly. Also, the option –identify <regex>, resp. -i <regex> must be set implicitly because it must be the same expression as used to ***** Then splitindex.tlu is not allowed, because this has another expression.

Only allowable seems -V, the shortcut for –verbose.

Then comes the name of the index file to be processed without suffix.

The program splitindex invokes makeindex. The option – coming after the filename, indicates that all following options are passed to makeindex

5.4 Glossaries

CAUTION: The method described here, has at least two severe bugs: The number of reruns of the LaTeX engine and also of makeglossaries is not guaranteed as a consequence of a bug in rerunfilecheck and the fact, that it does not fit current versions of makeglossaries. In addition, entries of the glossaries not mentioned directly in the document but must be included because they are used in the explanation of entries to be included are not treated properly.

As a consequence, this document, or to be more precise its glossary, could not always be reproduced and so the author excluded the glossary until the problem is fixed.

In addition, it is a conceptual weakness that a glossary data base shall be centralized and shall thus not be included in a LaTeX document and not even be written in LaTeX. All weaknesses, bugs and conceptual shortcomings are overcome by the package glossaries-extra in conjunction with the auxiliary program bib2gls which will replace glossaries and makeglossaries. For the time being, use glossaries with caution.

Creating glossaries requires the package glossaries described in [Tal24b]. By default, package glossaries creates a single “main glossary”, which can be switched off specifying the option nomain described in Section 2.6. In this case at least, more specific glossary types with according headline must be specified. As specified in [Tal24b], Section 2.6, glossaries offers acronyms, symbols, numbers and index. To avoid collision with indexing as described in Section 5.3, this software does not allow the latter. Moreover, the package glossaries even supports user-defined glossary types, but this software does not, mainly to keep the internal build in line with build using latexmk. For details see Section 8.4.

Also, the package glossaries offers sorting and unifying either via makeindex as for indices or via xindy, and it offers also to do without external programs. In contrast, this software supports only the variant using makeindex.

As for creating indices there is a LaTeX-command \makeindex, to create a glossary there is a LaTeX-command \makeglossaries, but the latter is not built-in as \makeindex but provided by the package glossaries. If xxx.tex is the LaTeX main file, \makeglossaries opens the glo file xxx.glo containing glossary entries for writing. As the built-in command \index writes entries into the idx file defining the index, the command \gls defined by the package glossaries writes an entry into the glo file. Note that xxx.glo typically contains entries more than once and that the entries are not sorted.

To perform sorting, formatting and typically also unification, the package glossaries allows three mechanisms. This software supports two of them: via the shell command makeindex, which is also used for indices, and via the shell command xindy. Using makeindex is the default but can also be activated through \usepackage[makeindex]{glossaries}. Using xindy instead of makeindex is triggered through \usepackage[xindy]{glossaries}. Accordingly, for option makeindex the AUX file receives lines

\providecommand\@istfilename[1]{} 
\@istfilename{manualLMP.ist}

whereas for option xindy, there are lines

\providecommand\@istfilename[1]{} 
\@istfilename{manualLMP.xdy} 
... 
\providecommand\@xdylanguage[2]{} 
\@xdylanguage{main}{english} 
\providecommand\@gls@codepage[2]{} 
\@gls@codepage{main}{}

This software neither invokes makeindex nor xindy directly. Instead, it invokes the shell command makeglossaries invoked without file ending which determines from the AUX file whether to invoke makeindex nor xindy. Accordingly, it writes the style definition by creating an ist file xxx.ist or an xdy file xxx.xdy if makeindex or xindy is specified as package option, respectively.

Seemingly, makeglossaries relies on the AUX file to determine whether to invoke makeindex or xindy for sorting and unification. Then it invokes the according command and writes a LOG file with ending glg, redirecting the logging output of makeindex or xindy adding own output so that a glg file may be written, even if e.g. makeindex is invoked and does not. In any case, if the glg file is written, makeglossaries writes text matching

(^\*\*\* unable to execute: )

in the glg file if an error occurs, no matter whether makeindex or xindy is invoked. Possibly, there are cases where an error causes no glg file to be written. If no error occurs, a glg file is written and if warnings are emitted, they either come from makeindex or from xindy. Thus warnings may be detected with the patterns defined by makeindex and by xindy.

The style list (which is the default) is set in the form

\usepackage[style=list]{glossaries}

where [Tal24b], Section 13 lists predefined styles. So, the style determines the content of the style definition, whereas the options makeindex and xindy specify the form in which the style is encoded and thus the ending of the style file, which is either ist or xdy.

Sorting the glo file, as said above, currently is only supported using the command makeglossaries. The allowed options are essentially those making sense for makeindex and those making sense for xindy. If the shell command makeglossaries invokes makeindex of course only the according options are passed supplemented by additional options -s, -t, -o, to specify the ist file, the glg file (the transcript file) and the gls file, respectively, which is the result of sorting, the output file, and contains the entries of the glo file just sorted, formatted and unified. So for a tex main file xxx.tex the program makeglossaries invokes

makeindex  -s "xxx.ist" -t "xxx.glg" -o "xxx.gls" "xxx.glo"

Accordingly, if the shell command makeglossaries invokes xindy of course only the according options are passed supplemented by additional options -M, -t, -o. This is illustrated in Figure 5.6.

should be a picture

Figure 5.6: Conversion of a glo file into a gls file using makeglossaries

5.5 Including code via pythontex

The package pythontex, described in [Poo21] originally allowed including Python code into a latex document. Later on, further languages were added, most notably octave or Matlab, and the user can easily extend it to further languages as sketched in [Poo21], Section 7. Of course, to that end, the interpreter for the desired language must be installed. The meaning of the term “including” used above ranges from mere listing to pure execution and comprises also inserting results of execution. A field of application is also creating figures.

Note that like the package splitindex, also pythontex comes with an according auxiliary program, in this case, besides pythontex also depythontex. Consequently, [Poo21] is not only on the package but also on the corresponding command line tools. Since [Poo21] is quite detailed, there is an introduction [Poo] and a gallery [Poo17]. For background on the intentions of package pythontex, consult [Poo15]. Information required to integrate pythontex into this software partially goes much beyond the official documentation and is collected in [Rei22]. It could also be interesting for the user for debugging.

Running the LaTeX engine on a file xxx.tex with package pythontex loaded yields a file xxx.pytxcode and if the package is loaded with option depythontex also a file xxx.depytx. If the file xxx.pytxcode is present, this software invokes the command line tool pythontex (same name as the according package) to xxx.pytxcode (without ending) which converts this into a variety of output files, which are, without further configuration, all in the folder pythontex-files-xxx as shown in Figure 5.7, which is described in more detail in [Rei22], Section 3. Note that this software uses the wrapper pythontexW of pythontex described in Section 3.5.7, instead of pythontex itself. The figure reflects this.

Running the LaTeX engine again, includes all the output files *.stdout in the PDF file or whatever output file created.

An important remark is that lualatex is the preferred engine, because files *.stdout can impose heavy memory usage and currently lualatex is the only engine allocating memory dynamically.

As one can see, pythontex cooperates with lualatex in a way also bibtex or the other auxiliary programs do. Although pythontex, at time of this writing in version 0.18, is quite mature, it refrains from writing a log file and indicates errors and warnings just on standard output or error output. This is unlike all the other auxiliary programs in a line with pythontex. As a consequence, in particular warnings are difficult to detect and cannot be detected in a uniform way. Thus, the author wrote a little wrapper, called pythontexW and place it where it can be found, e.g. in the folder of pythontex.

Accordingly, depythontex behaves in a non-standard way: Firstly, by default, it does not output a result file but outputs on standard output. This can be changed using the option –output or -o for short. Also, depythontex changes into interactive mode if the output file is already present. To avoid this, the option –overwrite is required. Overwriting without asking is the standard behavior of all other auxiliary programs. As pythontex also depythontex does not write a log file but just prints its errors and warnings. Thus, the author wrote a little wrapper, called depythontexW and described in Section 3.5.7, and place it where it can be found, e.g. in the folder of depythontex.

Unfortunately, as the original author of pythontex shifted his focus away from LaTeX towards Markdown, he ceased development and so development slowed down next to zero. In the meantime, python 2 came out of use and having a link called python, pointing to either python2 or to python3 as (de)pythontex requires, went out of fashion. So, the user must add the link manually.

The package pythontex and the according auxiliary programs are highly configurable, more than this software allows.

In particular, in the LaTeX document, the commands \setpythontexoutputdir setting the output directory and \setpythontexworkingdir setting the working directory shall not be used, because this software assumes the default, that the working directory is the directory containing the LaTeX main file xxx.tex and the output directory is in the working directory and its name is pythontex-files-xxx.

Further, the package pythontex can be configured with package options when loading the package. Since this software is designed for reproducibility, most appropriate would be to specify runall=true meaning that even if no python code is modified the auxiliary program pythontex executes the python code in the document. Also, it is appropriate to specify rerun=always. Note that the defaults are runall=false and rerun=errors. This behavior makes sense to speed up creation of the document, but it differs from the behavior of all other auxiliary programs and causes the check for update of output files to fail. Moreover, reproducibility is not as easily shown.

The package documentation [Poo21] suggests, that this makes a difference between runall=true/false and rerun=always/errors if external sources are modified, but as is proved in [Rei22], Section 2.1, the package translates package option runall=true/false into key value pair rerun=always/errors and this is the only information pythontex obtains from the package, so there is no difference.

Also, the auxiliary program pythontex itself can be configured via command line arguments. For the package options runall and rerun, there are according command line options --runall and --rerun with the same scope. Whereas the package merges options runall and rerun silently, the auxiliary program pythontex emits an error, if both are combined. Essentially one can forget about runall and stick to rerun.

Strange enough, according to [Poo21], Section 4.1, package options overwrite command line options. This software shall invoke pythontex with the option --rerun=always which is thus specified as the default. To force unconditional update, this is not sufficient. Instead, this software relies on an undocumented feature of auxiliary program pythontex which is likely not to change: If one of the expected output files is missing, it recreates all output files, independent of command line options and package options. Thus, this software deletes one output file if present, before executing pythontex.

When this software invokes pythontex the exit codes may not be changed via --error-exit-code, i.e. if specified then with value true. Neither the options --interactive, -h, --help or --version are allowed. Currently, this software does not check for options which are not allowed. Fortunately, the latter two command line options have no counterpart in the package configuration.

If we place some code, e.g. python code as inline code using \pyc

  \usepackage[depythontex]{pythontex} 
  ... 
  \pyc|print(rf'Python inside latex says: "Hello World; 1+1={1+1}"')|

the code is really evaluated, and the string result is included at proper place as illustrated by the following text which is created by python:

Python inside latex says: "Hello World; 1+1=2" .

Note that the typewriter font is not created by python, it is explicitly set to highlight the string created by python, but it is python which evaluates the little computation and which prints the string.

Since pythontex is written in python, including python code in the LaTeX document uses the python interpreter already installed, as a prerequisite of pythontex. To use another language, the according interpreter must be installed in addition to python.

should be a picture

Figure 5.7: Conversion of a pytxcode file using pythontex

Figure 5.8 shows the files converted by depythontex. As for depythontex, this software uses the wrapper depythontexW of depythontex instead of depythontex itself. This is reflected in the figure.

should be a picture

Figure 5.8: Conversion of a depytx file using depythontex

5.6 Running and rerunning auxiliary programs

After describing the interface between the LaTeX engine and the auxiliary programs in Section 5.6.1, Section 5.6.3 explains why we don’t use the package rerunfilecheck to determine when to (re-) run auxiliary programs.

5.6.1 The interface between LaTeX and auxiliary programs

Auxiliary programs perform tasks which LaTeX cannot carry out at all or only with bad performance, for example adding bibliographies which comprises sorting or executing program code.

The interface between the LaTeX engine and an auxiliary program is always implemented via files: In the first run, the LaTeX engine writes a file or files specific for the auxiliary program or at least writes entries specific for the auxiliary program in a standard file or even both. Then the auxiliary program is run which creates other files which in turn must be read back, in a second run of the LaTeX engine. So the run of an auxiliary program is always enclosed between two runs of a LaTeX engine.

Typically, the LaTeX run needs a LaTeX package associated with the auxiliary tool which performs reading and writing. An exception is bibtex and friends for which LaTeX engines support communication out of the box. An example with more complicated communication is makeglossaries with associated package makeglossaries which writes lines into the AUX file and which typically writes the main glossary into a glo file. The tool makeglossaries which is invoked without ending, reads the AUX file, determines which other files to read, typically the glo file also and writes the result into the GLS file. This is read back by the package makeglossaries in the next run of the LaTeX engine.

5.6.2 When running an auxiliary program

After the first run of the LaTeX engine, one must decide which auxiliary programs to run. For each auxiliary program, there is a specific file it reads or at least specific entries in a general file, typically the AUX file. If this file or these entries exist, the auxiliary program must be run and after the LaTeX engine must be rerun to read in the data created by the auxiliary program. As is discussed for each auxiliary program separately in Section 5.6.3, this file or these entries may change after each run of the LaTeX engine and as a result, the auxiliary program must be rerun as well. So, LaTeX engine and auxiliary program maybe must be run alternately.

Instead of checking whether the relevant data really changes, only the number of relevant lines and a hash is taken into account. This bears a minimal risk of not rerunning the auxiliary program although needed. Note that also package rerunfilecheck is based on hashes and bears the same risk.

It is an interesting detail, that deciding whether an auxiliary program must be run at all, i.e. for the first time, is just based on the existence of a specific file or of a specific line in a file, not comprising all pieces of information read by the auxiliary program. Nevertheless, if it is decided that the auxiliary program must be run, it is clear that the LaTeX engine must be run after also and so the information may change. So one must be prepared for a rerun check. For this, all the information in the file(s) relevant for the auxiliary program must be hashed.

From the second run of the LaTeX engine on, only those auxiliary programs must be checked for rerun condition, for which a hash is present.

After these quite abstract considerations, let us apply these to the concrete auxiliary programs supported.

5.6.3 Why rerunfilecheck is not used for auxiliary programs

As described in Section 2.1, package rerunfilecheck is used to check whether the LaTeX engine must be rerun, and its authors also intended it to check for need of rerun of auxiliary programs. While this works satisfactory for a single index, it fails for multiple indices. Likewise, support for glossaries is buggy and works only in case of a single glossary, which in addition must be the main glossary. In contrast, the package glossaries supports multiple glossaries, with and without main glossary and even allows user-defined glossaries. It is awkward to implement rerun check for all this functionality with rerunfilecheck.

It may be surprising, that there are situations where even bibliography processors need to be rerun, among these backlinks, and citations in headlines and glossaries. Package rerunfilecheck does not take this into account. Accordingly, even pythontex may need a rerun, e.g. if code is executed in headlines or in captions of floating objects, because this may insert additional invocations and may change invocation order which may lead to different results.

While many auxiliary programs depend only on a subset of entries in their source file, rerunfilecheck can take files into account only as a whole. As a consequence, even if no rerun is required because the relevant entries did not change, rerunfilecheck could trigger useless rerun, because irrelevant entries in the relevant file changed.

Tanking all these aspects into account, we decided to provide an internal algorithm for rerun check of auxiliary programs, which is based on the ideas of rerunfilecheck but avoiding all its shortcomings.

Note also, that besides whether to rerun an auxiliary program, there is also the question in which case to run it at all, i.e. for a first time. Since package rerunfilecheck interprets a newly occurring file as a changed file, this case is addressed implicitly.

Unfortunately, not all packages associated with auxiliary tools give a hint if the auxiliary program must be run.

As described in Section 5.1, running a LaTeX engine as latex2pdf may detect the presence of a bibliography, an index and/or of a glossary and writes raw files to describe them. After that, an intermediate step is required, sorting, unifying and formatting the entries. This is always done by an external program, we call an auxiliary program. Similarly, the presence of code to be interpreted may be detected which is also written in a separate file and an external program, pythontex must be run to run the code in sequence and in many cases to determine the result of invocation.

In the next step, the LaTeX processor must read in the results of the auxiliary programs again to write bibliography, indices and glossaries and to insert the results of code invocations. Also, except the code invocations, all other pieces of information typically go into the table of contents. If code is invoked in a headline or in a caption, the result of the code invocation goes into the TOC and in the list of captions, e.g. the list of figures LOF also. So in any case, after an auxiliary program the LaTeX processor must be rerun.

Obviously, the run of a LaTeX processor may change page numbers and thus invalidate the index or the glossary. So the auxiliary program to create the index or the glossary must be rerun if the LaTeX processor changes the input file for the auxiliary program creating index or glossary and after that, the LaTeX processor must be run again.

What is less obvious is, that bibliographies may be invalidated also, e.g. because of a backlink or because a bibliographic reference occurs in a glossary. Even code may be invalidated by a run of the LaTeX processor if some code occurs in a floating object, e.g. in the caption or in a glossary. So code invocations may change order and also there may be additional code occurring not before later runs of the LaTeX processor. So also in this case, the according auxiliary program, pythontex must be rerun after the run of the LaTeX processor.

Summarizing, a run of the LaTeX processor may trigger invocation of each auxiliary program. This must be done if the according raw file changes. Note that various auxiliary programs share the AUX file to get information. So only the aspects relevant for the specific auxiliary program shall be taken into account. What makes things a bit more complicated is, that including TEX files yields included AUX files which must be taken into account also.

To implement rerun check completely reliable, huge parts of text files, a lot of information must be stored. Thus, we go a way like package rerunfilecheck, detecting only the change of number of relevant lines and the according hash. In extremely rare cases, this software may fail to rerun a program although needed, because number of relevant lines or its hash don’t change although contents change.

Note that we only use the concept of rerunfilecheck to detect running and rerunning auxiliary programs, but we do not use the package rerunfilecheck itself for this task. This is because supporting all relevant auxiliary programs and also included AUX files would require considerable extensions on rerunfilecheck and would impact considerable dependencies. So, as described in Section 5.7, rerunfilecheck is used to control rerunning the LaTeX processor as far as auxiliary programs are not involved, whereas detecting auxiliary programs to be rerun is done internally while the algorithm is inspired by the package rerunfilecheck.

5.7 Rerunning the LaTeX processor

CAUTION rework needed

FIXME: a word on change in toc, lof, lot and lol.

As indicated in the previous sections, latex2pdf must be rerun, if an auxiliary program like bibtex, makeindex or makeglossaries had been run.

Likewise, if a toc file, a lof file, a lot file or a lol file had been created in the first latex2pdf run, another run is needed to read in these files to create a table of contents, a list of figures or a list of tables, respectively. Note that for all these cases, the LOG file does not allow to detect that latex2pdf has to be rerun, by matching a fixed pattern.

After the second run of latex2pdf, the table of contents, the list of figures, the list of tables and the list of listings are included and a section with the bibliography, the index and the glossary are inserted. It takes a third run of latex2pdf to include the bibliography the index and the glossary into the table of contents. Also, it takes that third run to replace the citations with the proper labels given in the bibliography.

Inserting the table of contents, the list of figures, the list of tables and the list of listings may shift the subsequent text which may require another run of latex2pdf to get the page numbers right. As described in Section 5.6 intermediate runs of auxiliary programs like makeindex may be required and these also require another run of latex2pdf also to get the page numbers right.

The package rerunfilecheck allows detecting file changes via a hash almost for sure, and writes an according message into the LOG file. This is offered for pure rerun control of latex2pdf based on TOC, LOL, LOF and LOT, but also on the OUT file written by package hyperref. Partially, it supports also the need to rerun auxiliary programs, but for sake of uniformity, we refrain from using this, and rely on in internal algorithm also based on hashes.

Only for rerunning latex2pdf alone, we rely on package rerunfilecheck. This software just reruns textttlatex2pdf if it detects the pattern of warning written by rerunfilecheck into the LOG file.

Note that there are several packages which require additional runs, such as the package longtable, which may vary dimensions of tables. This software presupposes, that all these reruns may be detected by matching a fixed pattern in the LOG file. Since packages are frequently changed and new packages are written, also the pattern cannot be fixed. Thus, it is configurable.

Note that, if a package requires running other programs between two runs of latex2pdf, this may require a change in this software.

5.8 Checking reproducibility

There are use cases, where it is extremely important that the according artifacts are really reproducible. One is when we have to deliver the sources and the receiver has to reconstruct the artifacts. Another obvious use case is integration test for this software by ensuring that each artifact created is equivalent with a confirmed version, although this software changed. Details are given in Section 10.

Currently, reproducibility checks are supported for PDF files only. The problem with PDF files is, that besides visible contents they contain also metadata (see [PDF08] or [ISO20], each Section 14.3), which depends on the run of the conversion. For example the timestamp and the timezone of conversion goes into and derived from these other values.

There are two strategies to deal with the problem:

Make the build process reproducible. The advantage of this approach is that diffing is quite simple, fast and reproducible: it is byte by byte as provided by command diff. This is easily done with a fixed installation but tends to break with update of tools.
Use diff tools implementing a weaker notion of equivalence, in a sense visibility equivalence of some degree. One approach is the script vmdiff described in Section 3.5.6 which combines visibility equivalence with equivalence of part of metadata.

Since the first one works very well, it is the one we describe here, but it is always possible to configure a diff tool with a weaker equivalence check.

The first question is, whether reproducibility is requested. It is, if there is according magic comment in the LaTeX main file requires this as described in Section 3.1.1.2. If there is no such magic comment is present, if the setting chkDiff specifies so. If in this section settings are given without explicit reference, they are described in Table 6.13 on page 288 in Section 6.13.

Since date and time both visible and in the metadata of a PDF document is given relative to a timezone, for reproducible builds compilers must run with a fixed timezone and, as reproducibility shall not break if changing a timezone or if the country running the build changes between daylight saving time and standard time, we chose a uniform timezone namely UTC.

If a LaTeX main file is already under reproducibility control, then there is an according original PDF file in diffDirectory or in a subfolder to be compared with a newly created PDF file which occurs in a subfolder of the TEX source directory texSrcDirectory described in Table 6.1 on page 247. The PDF file for comparison has the same path relative to diffDirectory as the created PDF file relative to texSrcDirectory.

First pdfMetainfoCommand is used to extract metadata CreationTime from the original PDF file. This comprises time and timezone which is UTC.

The compilation to create the new PDF file is run in an environment with that timezone and with that creation time. In addition, there is an environment variable forcing that the timestamp does not only affect metadata but also visual data of the PDF file to be created, as e.g. typically the date at the front page. Note that if the PDF file is created from TEX files via DVI/xdv files, both engines need the appropriate environment.

After creating the new PDF file with this environment, coincidence with the original PDF file is checked using the tool given by setting diffPdfCommand described in Table 6.13. If the actual artifact does not coincide with predefined one according to the chosen diff tool, a build exception is thrown as specified in Table 7.7.

If a LaTeX main file is not already under reproducibility control, then no original PDF file exists. In this case, the environment for compilation only ensures the timezone UTC. Then the created PDF file is copies at proper place into diffDirectory – that’s all for setting a document under reproducibility control.

Finally, if a LaTeX main f8ile file is under reproducibility control but is to be changed in a way that also the according PDF file is affected, then before compilation just the original PDF file is deleted, and the workflow is as setting under reproducibility control.

Reproducibility is affected or even supported by various injections as defined in Section 3.5. First, the generic header described in Section 3.5.2 affects metadata, above all because it loads the package hyperref. Part of this metadata is overwritten by another header described in Section 3.5.4, to improve security and privacy, but enough metadata remains to keep up reproducibility. Reproducibility is guaranteed with the full set of metadata or with somehow reduced metadata. The only piece of information needed for reproducibility is CreationDate and this is preserved by the headers. Removing this also has severe consequences so that we can assume it is preserved. On the other hand, removing metadata may stabilize reproducibility as this is true for the banner which identifies the latex compiler and its version and consequently breaks reproducibility in any version change. Details to reproducibility with a focus on metadata are given in [Rei23b], Section 4.

Obviously, reproducibility checks cause work when putting a document under check, i.e. in the end phase of document development as defined in Section 3.6 or if the source document changes, i.e. if document development is entered again, or if the output PDF changes unintended normally, although the sources did not change in an obvious way, which triggers again document development searching the cause of the change in the sources.

This LaTeX builder is not the tool for document development. Instead, Section 3.6.2 suggests to use latexmk for, and describes how latexmk is integrated in this LaTeX builder: This builder writes a config file .latexmkrc reflecting the settings of this software, at least to some extent. The config file .latexmkrc is again written as an injection and is described in Section 3.5.1. It supports reproducibility checks even reading magic comments, checking existence of original PDF file and reading its timestamp if the PDF file is present. Creation of the new PDF file takes timestamp and timezone into account.

Two further injections may be helpful in the context of reproducibility checks, both described in Section 3.5.6: ntlatex to create a PDF file and vmdiff realizing a weaker variant of diffing tool as described above: It checks for visual equality and equality of metadata.

For updating metadata only, we suggest the following technique: Keep the original PDF file in diffDirectory and check with vmdiff that visually, the PDf file remains the same and that the correct metadata is updated. Of course, a new timestamp is wanted. So in a second step, the original PDF file is deleted, compilation is repeated, e.g. by ntlatex and copied into diffDirectory.

There are rare occasions where the timestamp shall be set explicitly. This is not possible directly as it is read off from the original PDF file. We suggest to use exiftool to modify the CreationDate of the original PDF file in diffDirectory before compilation. This is done by something like

  exiftool -PDF:CreateDate=2020-01-01T00:01:02Z xxx.pdf

Here, the option PDF:CreateDate is in fact the name of the tag to be written. Note that the timezone must be UTC represented by the Z signifying zero time offset compared to UTC. The attentive reader may wonder why the option is PDF:CreateDate instead of CreationDate. One may check with pdfinfo, that really CreationDate is modified. Note that exiftool writes the original PDF file into xxx.pdf_original

Two important details are not so obvious:

Not only the given metadata is changed but also all metadata depending on it, in this case the trailer ID. This is to keep the PDF file consistent.
The metadata is not really overwritten, but it is hidden by new metadata. In fact, exiftool uses incremental update specified for the PDF format, adding a layer describing the modification. All modifications done can also be undone by
```
        exiftool -PDF-update:All= xxx.pdf
```
unless the PDF file has been linearized. LaTeX to PDF compilers always create linearized PDF files and never update incrementally.

To know that changing metadata is done by incremental update is important, insofar as a PDF file with modified timestamp and timezone differs from a PDF file compiled directly with the given timestamp and timezone; it is shorter. So, updating the timestamp of the PDF file in diffDirectory does not yield a PDF file which is reproduced. Compilation leads to another PDF file and only the updated timestamp is reproduced. This compiled PDF file is reproduced, so copying it the into diffDirectory solves the problem: Next compilation yields a PDF file with the correct timestamp and timezone, and it coincides with the PDF file in diffDirectory.

When subjecting a document under reproduction control with a predefined timestamp, then initially there is no original PDF file. One could place any PDF file in diffDirectory, overwrite the timestamp and timezone by exiftool. Is content is immaterial.

5.9 Alternative build process with latexmk

tool]latexmk

This section is on running the build process of LaTeX main files with latexmk or equivalent. Currently, that way only PDF files can be created. Although the functionality is readily explained, the intention is not so obvious: In Section 3.6.2 describes the role of latexmk as a build tool in the course of document development, whereas this LaTeX builder is for final, quality checked build. So the two tools seem to be complementary. Section 3.5.1 describes that this LaTeX builder can write its own configuration as a config file .latexmkrc for latexmk so that builds with latexmk are in line with final builds by this LaTeX builder itself internally.

So running latexmk from within this LaTeX builder seems superfluous at first sight. A closer look onto .latexmkrc unveils that this is just a Perl script which is very flexible realizing new or special functionality, whereas this LaTeX builder is tied to a quite rigid configuration in the pom. So, for example if for building a document tools are needed which are not supported by this LaTeX builder, their invocation can be implemented directly in .latexmkrc. Since this LaTeX builder writes a single .latexmkrc in the root directory texSrcDirectory, which must be made available in each subfolder by adding a link, the config .latexmkrc by this LaTeX builder may be replaced by a hand-crafted config file for each folder separately.

Another advantage being able to run latexmk from within this builder: It is conceivable, that the artifacts created in the course of document development using latexmk cannot be reproduced by this builder. Most likely because .latexmkrc does not reimplement the internal functionality properly. Invoking latexmk in a final build reduces this risk to a minimum.

Further motivations for integrating latexmk in this builder, in particular for individual files: there are cases where the build process of latexmk works, but not the internal build process of this builder. Integrating latexmk offers the strengths of latexmk. Note that there are also cases where the built-in build process of this builder is mightier than that of latexmk. Another reason for integrating latexmk here, is the use case of source distribution: The document(s) may be passed to someone as the source, not as a target, like PDF. It is not clear that the “customer” uses this latex builder, but maybe (s)he uses latexmk. In this case it makes sense to check, whether the document can be built with latexmk alone.

Having explained this, the question arises why this LaTeX builder does not in general rely on latexmk and invokes LaTeX engines and other converters directly. One reason is that LaTeX builder does not only invoke converters, it also checks return values and, depending on the converter, log files emitting errors and warnings if appropriate. So, delegating to latexmk the user can no longer check that the build process passed without warning or error. A second aspect is, that the build algorithms differ: latexmk runs the LaTeX main file then detecting which files are missing and then tries to build these based on rules. The basic idea behind is “backward discovery” of dependencies, whereas this LaTeX builder first builds the graphic files globally (latexmk detects last) before for each LaTeX main file is compiled. So this LaTeX builder combines “forward discovery” and backwards discovery. Pure backward discovery is more elegant but as the LaTeX compiler stops at each graphic file not present before creating it and rerunning compilation of the LaTeX main file, it may result in excessive reruns of the LaTeX engine if there are many created graphics in the document.

So there are strong reasons to avoid latexmk, but there are also reasons to allow in special cases. The parameter $latexmkUsage described in Table 6.1 on page 247 allows gradually use of latexmk, not at all, fully or as backend where latexmk is invoked after graphic files have been created with an internal process. As a rule, latexmk shall be used as much as required and as little as possible.

This shows also, that it is a good thing to be able to activate latexmk in individual LaTeX main files which is realized with the magic comment latexmk. It can take the form latexmk=false, latexmk=true or just latexmk which is the short form of the latter. Magic comments are described in Section 3.1.1.2. In general, they overwrite settings. Here, the situation is a bit more complicated. Whereas $latexmkUsage allows three levels of usage, the magic comment can choose to use latexmk or not. If latexmk shall be used due to the magic comment, then it is used to compile the TEX file in any case, but it compiles graphic files only, if $latexmkUsage takes the value NotAtAll. If latexmk shall not be used due to the magic comment, then it will never compile the TEX file itself, and if $latexmkUsage takes the value Fully, all required graphic files must be compiled for some reason, e.g. there is none to be compiled.

By the way, invoking latexmk from within this software is the same as invoking manually. Both are based on .latexmkrc. The features supported are described in Section!3.5.1. Among those are the supported targets, reading magic comments independently from internal implementations and support for reproducibility checks.

5.10 Creating hypertext

To create HTML and XHTML from TEX files (more precise from LaTeX files), a tex4htCommand-command is used Together with its parameters, it is described in Section!6.10. This may be htlatex, the default based on latex and htxelatex based on xelatex.

Figure 5.9 shows the steps htlatex performs: From the input LaTeX file xxx.tex another LaTeX file yyy.tex is created which arises from xxx.tex by adding

\usepackage[...]{tex4ht}.

Then htlatex runs latex on yyy.tex which results in yyy.dvi. Note that this is in contrast to lualatex which would create some yyy.pdf unless otherwise specified.

Then comes the converter tex4ht into the game which creates several html files among those also xxx.html. The other files, yyy.idv and yyy.lg, are further processed by t4ht creating the stylesheet xxx.css and graphic files.

Let us make this more precise. The output of latex is a standard dvi file interleaved with special instructions for the post-processor tex4ht to use. Note that tex4ht is the name both of the post-processor and of the LaTeX-package. The special instructions come from implicit and explicit requests made in the source file through commands for TeX4ht.

The utility tex4ht translates the dvi-code into standard text, while obeying the requests it gets from the special instructions. The special instructions may request the creation of files, insertion of html code, filtering of pictures, and so forth. In the extreme case that the source code contains no commands of TeX4ht, tex4ht gets pure dvi-code and it outputs (almost) plain text with no hypertext elements in it.

The special (\special) instructions seeded in the dvi-code are not understood by dvi processors other than those of TeX4ht.

t4ht This is an interpreter for executing the requests made in the xxx.lg script.

xxx.idv This is a dvi file extracted from xxx.dvi, and it contains the pictures needed in the html files.

xxx.lg This is a log file listing the pictures of xxx.idv, the png files that should be created, CSS information, and user directives introduced through the “\Needs” command.

should be a picture

Figure 5.9: Conversion of a TEX file into an xml file

(/usr/local/texlive/2014/texmf-dist/tex/generic/tex4ht/tex4ht.4ht 
version 2009-01-07-07:11 
-------------------------------------- 
Note --- for additional information, use the command line option `info' 
-------------------------------------- 
 
(/usr/local/texlive/2014/texmf-dist/tex/generic/tex4ht/html4.4ht 
 
Note: to remove the <?xml version=...?> processing instruction 
use the command line option `no-VERSION' 
 
Note: to remove the DOCTYPE declaration 
use the command line option `no-DOCTYPE' 
) 
 
-------------------------------------- 
Note: for marking of the base font, use the command line option `fonts+' 
Note: for non active _, use the command line option `no_' 
Note: for _ of catcode 13, use the command line option `_13' 
Note: for non active ^, use the command line option `no^' 
Note: for ^ of catcode 13, use the command line option `^13' 
-------------------------------------- 
 
(/usr/local/texlive/2014/texmf-dist/tex/generic/tex4ht/html4.4ht 
-------------------------------------- 
Note: For section filenames that reflect on their titles 
use the command line option `sec filename' 
 
Note: for alternative charset, use the command line option `charset=...' 
 
Note: to ignore CSS font decoration, use the `NoFonts' command line option 
 
Note: for jpg bitmaps of pictures, 
use the `jpg' command line option. 
(Character bitmaps are controled only by `g' 
records of tex4ht.env and `-g' switches of tex4ht.c) 
 
Note: for gif bitmaps of pictures, use the `gif' command line option. 
(Character bitmaps are controled only by `g' 
records of tex4ht.env and `-g' switches of tex4ht.c) 
 
Note: for content and toc in 2 frames, 
use the command line option `frames' 
 
Note: for content, toc, and footnotes in 3 frames, 
use the command line option `frames-fn' 
 
Note --- for file extension name xht, use the command line option `xht' 
-------------------------------------- 
TeX4ht package options: xhtml,uni-html4,2,pic-tabular,html 
-------------------------------------- 
Note: to ignore CSS code, use the command line option `-css 
 
Note: for inline CSS code, use the command line option `css-in' 
 
Note: for pop ups on mouse over, use the command line option `mouseover' 
 
Note: for addressing images in a subdirectory, 
use the command line option `imgdir:.../' 
) 
 
Note --- for back links to toc, use the command line option `sections+' 
 
Note --- for linear crosslinks of pages, use the command line option `next' 
 
(/usr/local/texlive/2014/texmf-dist/tex/generic/tex4ht/latex.4ht 
version 2009-05-21-09:32 
-------------------------------------- 
Note --- for links into captions, instead of float heads, use the command l 
ine option `refcaption' 
-------------------------------------- 
 
(/usr/local/texlive/2014/texmf-dist/tex/generic/tex4ht/html4.4ht 
-------------------------------------- 
Note --- For mini tocs immediately aftter the header 
use the command line option `minitoc<' 
 
Note --- for enumerated list elements with valued data, 
use the command line option `enumerate+' 
 
Note --- for enumerated list elements li's with value attributes, use the c 
ommand line option `enumerate-' 
 
Note --- for CSS2 code, use the command line option `css2' 
 
Note --- for bitmap fbox'es, use the command line option `pic-fbox' 
 
Note --- for bitmap framebox'es, use the command line option `pic-framebox' 
 
Note --- for inline footnotes use command line option `fn-in' 
 
Note --- for tracing of latex font commands, 
use the command line option `fonts' 
-------------------------------------- 
-------------------------------------- 
Note --- for width specifications of tabular p entries, 
use the `p-width' command line option 
or a configuration similar to 
\Configure{HColWidth}{\HCode{style="width:\HColWidth"}} 
-------------------------------------- 
) 
(/usr/local/texlive/2014/texmf-dist/tex/generic/tex4ht/html4-math.4ht 
version 2009-05-18-23:01 
-------------------------------------- 
Note --- for pictorial eqnarray, use the command line option `pic-eqnarray' 
 
Note --- for pictorial array, use the command line option `pic-array' 
 
Note --- for pictorial $...$ environments, 
use the command line option `pic-m' (not recommended!!) 
 
Note --- for pictorial $...$ and $$...$$ environments with latex alt, 
use the command line option `pic-m+' (not safe!!) 
 
Note --- for pictorial array, use the command line option `pic-array' 
) 
(/usr/local/texlive/2014/texmf-dist/tex/generic/tex4ht/unicode.4ht 
version 2010-12-18-17:40 
) 
(/usr/local/texlive/2014/texmf-dist/tex/generic/tex4ht/html4-uni.4ht)) 
 
 
(/usr/local/texlive/2014/texmf-dist/tex/generic/tex4ht/html4.4ht 
-------------------------------------- 
Note --- for tocs without * entries, use command line option `notoc*' 
 
Note --- for tocs without * entries, use command line option `notoc*' 
 
Note --- to eliminate mini tables of contents, 
use the command line option `nominitoc' 
 
Note --- for frames-like object-based table of contents, 
use the command line option `obj-toc' 
 
Note --- for files named derived from section titles, 
use the command line option `sec filename' 
 
Note --- for i-columns index, 
use the command line option `index=i' (e.g., index=2) 
-------------------------------------- 
) 
 
(/usr/local/texlive/2014/texmf-dist/tex/generic/tex4ht/html4.4ht 
 
Note --- if included graphics are of degraded quality, 
try the command line options `graphics-num' or `graphics-'. 
The `num' should provide the density of pixels in the bitmaps (e.g., 110). 
 
Note --- for key dimensions try the option `Gin-dim'; 
for key dimensions when bounding box is unavailable 
try `Gin-dim+'; neither is recommended 
) 
 
(/usr/local/texlive/2014/texmf-dist/tex/generic/tex4ht/html4.4ht 
Note --- for URL encoding within href use the command line option `url-enc' 
) 
 
(/usr/local/texlive/2014/texmf-dist/tex/generic/tex4ht/html4.4ht 
 
Note --- for pictorial longtable, 
use the command line option `pic-longtable' 
) 
 
(/usr/local/texlive/2014/texmf-dist/tex/generic/tex4ht/html4.4ht 
 
Note --- to ensure proper alignments use fixed size fonts (see listings.dtx 
) 
)

tex4ht yields

---------------------------- 
tex4ht.c (2012-07-25-19:36 kpathsea) 
tex4ht 
--- error --- improper command line 
tex4ht [-f<path-separator-ch>]in file[.dvi] 
   [-.<ext>]            replacement to default file extension name .dvi 
   [-c<tag name>]       choose named segment in env file 
   [-e<env file>] 
   [-f<path-separator-ch>]        remove path from the file name 
   [-F<ch-code>]        replacement for missing font characters; 0--255; default 0 
   [-g<bitmap file-ext>] 
   [-h(e|f|F|g|s|v|V)]  trace: e-errors/warnings, f-htf, F-htf search 
                            g-groups, s-specials, v-env, V-env search 
   [-i<htf-font-dir>] 
   [-l<bookkeeping file>] 
   [-P(*|<filter>)]     permission for system calls: *-always, filter 
   [-S<image-script>] 
   [-s<css file-ext>]   default: -s4cs; multiple entries allowed 
   [-t<tfm-font-dir>] 
   [-u10]               base 10 for unicode characters 
   [-utf8]              utf-8 encoding for unicode characters 
   [-v<idv version>]    replacement for the given dvi version 
   [-xs]           ms-dos file names for automatically generated gifs

t4ht yields

-------------------------------------------------------------------- 
t4ht [-f<dir char>]filename ... 
  -b     ignore -d -m -M for bitmaps 
  -c...  choose named segment in env file 
  -d...  directory for output files       (default:  current) 
  -e...  location of tex4ht.env 
  -i     debugging info 
  -g     ignore errors in system calls 
  -m...  chmod ... of new output files (reused bitmaps excluded) 
  -p     don't convert pictures           (default:  convert) 
  -r     replace bitmaps of all glyphs    (default:  reuse old ones) 
  -M...  chmod ... of all output files 
  -Q     quit, if tex4ht.c had problems 
  -S...  permission for system calls: *-always, filter 
  -X...  content for field %%3 in X scripts 
  -....  content for field %%2 in . scripts 
 
Example: 
   t4ht name -d/WWW/temp/ -etex4ht-32.env -m644 
--------------------------------------------------------------------

5.11 Creating odt files

5.12 Creating MS word files

The best way to convert LaTeX files into MS word files is via ODT files. Conversion from LaTeX to odt is already described in Section 5.11. The last step can be done by odt2doc which can create both doc-format and docx-format and many others which is illustrated in Figure 5.10.

should be a picture

Figure 5.10: Conversion of a TEX file into a docx file

5.13 Creating plain text files

Why should one create plain text from LaTeX files? Maybe this is the minimal format the receiver can work with. Another common application is word-count, in particular if writing a paper for a journal.

Plain text files can be created from LaTeX files just by stripping off the tex-commands. The disadvantage is, that references, bibliography, index, glossary, table of contents, list of figures, list of tables, …and symbols get lost. Thus, the first step we take is complete creation of a PDF file except display of warnings like bad boxes as described in Section 5.1. This creates an appropriate pdf file, with correct numbering and links, possibly with overfull boxes and that like. As a final step, we convert the pdf file into a text file using, as a default pdftotext with ending txt. Figure 5.11 illustrates the translation process.

should be a picture

Figure 5.11: Conversion of a TEX file into a txt file

Note that pdftotext produces a text file with page numbers and signifies the end of a page (to see how, just have a look at the end of the file), so that one can identify page numbers as such. Thus references, index, glossary, table of contents and that like referring to page numbers carry valuable information. Also symbols available in utf8 encoding are preserved. In contrast, heavily stacked formulae become unreadable, because pdftotext displays them line by line and drops fraction bars completely. Also, formulae with complex subformulae in a root operator become unreadable because the root operator becomes just a root symbol. Likewise for integrals and that like.

Aspects of figures kept are the captions of course but also the LaTeX-texts. This is displayed line-wise. What gets lost is the postscript/pdf-parts, i.e. the plain graphics.

[prev] [prev-tail] [front] [up]

Chapter 5Processing of LaTeX Main Files

5.1 Transforming LaTeX files into PDF files

5.2 Bibliographies

5.3 Indices

5.4 Glossaries

5.5 Including code via pythontex

5.6 Running and rerunning auxiliary programs

5.6.1 The interface between LaTeX and auxiliary programs

5.6.2 When running an auxiliary program

5.6.3 Why rerunfilecheck is not used for auxiliary programs

5.7 Rerunning the LaTeX processor

5.8 Checking reproducibility

5.9 Alternative build process with latexmk

5.10 Creating hypertext

5.11 Creating odt files

5.12 Creating MS word files

5.13 Creating plain text files

Chapter 5
Processing of LaTeX Main Files