Exporting numbered citations in html with unsorted numbered bibliography

| categories: org-mode | tags:

In this post we illustrated a simple export of org-ref citations to html. This was a simple export that simply replaced each citation with a hyperlink as defined in the export function for each type of link. Today we look at formatting in text citations with superscripted numbers, and having an unsorted (i.e. in order of citation) numbered bibliography. This will take one pass to get the citations and calculate replacements and the bibliography, and one pass to replace them and insert the bibliography.

This text is just some text with somewhat random citations in it for seeing it work. You might like my two data sharing articles 1,2. We illustrate the use of org-mode in publishing computational work 3,4,5, experimental 6and mixed computational and experimental work 7,8. This example will correctly number multiple references to a citation, e.g. 1and 2.

This post is somewhat long, and the way I worked it out is at the end (Long appendix illustrating how we got the code to work). The short version is that we do some preprocessing to get the citations in the document, calculate replacement values for them and the bibliography, replace them in the org-buffer before the export in the backend (html) format we want, and then conclude with the export. This is proof of concept work.

The main issues you can see are:

  1. Our formatting code is very rudimentary, and relies on reftex. It is not as good as bibtex, or presumably some citation processor. Major improvements would require abandoning the reftex approach to use something that builds up the bibliography entry, allows modification of author names, and accomodates missing information gracefully.
  2. The bibliography contents reflect the contents of my bibtex file, which is LaTeX compatible. We could clean it up more, by either post-processing to remove some things like escaped &, or by breaking compatibility with LaTeX.
  3. The intext citations could use some fine tuning on spaces, e.g. to remove trailing spaces after words, or to move superscripts to the right of punctuation, or to adjust spaces after some citations.
  4. Changing the bibliography style for each entry amounts to changing a variable for the bibliography. We have to modify a function to change the intext citation style, e.g. to brackets, or (author year).
  5. I stuck with only cite links here, and only articles and books. It would not get a citenum format correct, e.g. it should not be superscripted in this case, or a citeauthor format correct. That would require some code in the replacement section that knows how to replace different types of citations.

The org-ref-unsrt-html-processor function could be broken up more, and could take some parameters to fine-tune some of these things, and generalize some things like getting the citation elements for the buffer. Overall, I think this shows that citations in org-mode with org-ref are actually pretty flexible. It is not as good as bibtex/LaTeX, and won't be for an unforseeably long time unless someone really needs high quality citations in a format other than LaTeX. Note for LaTeX export, we don't have to do any preprocessing at all. If you wanted to try Word export, you might make a pandoc processor that replaces everything in pandoc citation syntax, and then use pandoc for the conversion. If you didn't care to use the bibtex database for anything else, you could just use backend specific markup to make it exactly right for your output. I did this in reference 8where you can see the chemical formulas are properly subscripted.

If you would like to see the bibtex file used for this you can get it here: numbered.bib

Bibliography

  1. Kitchin, Examples of Effective Data Sharing in Scientific Publishing, ACS Catalysis, 5(6), 3894-3899 (2015). link. doi.
  2. John Kitchin, Data Sharing in Surface Science, Surface Science , (0), - (2015). link. doi.
  3. Zhongnan Xu & John R Kitchin, Tuning Oxide Activity Through Modification of the Crystal and Electronic Structure: From Strain To Potential Polymorphs, Phys. Chem. Chem. Phys., 17(), 28943-28949 (2015). link. doi.
  4. Prateek Mehta, Paul Salvador & John Kitchin, Identifying Potential BO2 Oxide Polymorphs for Epitaxial Growth Candidates, ACS Appl. Mater. Interfaces, 6(5), 3630-3639 (2014). link. doi.
  5. Curnan & Kitchin, Effects of Concentration, Crystal Structure, Magnetism, and Electronic Structure Method on First-Principles Oxygen Vacancy Formation Energy Trends in Perovskites, The Journal of Physical Chemistry C, 118(49), 28776-28790 (2014). link. doi.
  6. Hallenbeck & Kitchin, Effects of O2 and SO2 on the Capture Capacity of a Primary-Amine Based Polymeric CO2 Sorbent, Industrial & Engineering Chemistry Research, 52(31), 10788-10794 (2013). link. doi.
  7. Spencer Miller, Vladimir Pushkarev, Andrew, Gellman & John Kitchin, Simulating Temperature Programmed Desorption of Oxygen on Pt(111) Using DFT Derived Coverage Dependent Desorption Barriers, Topics in Catalysis, 57(1-4), 106-117 (2014). link. doi.
  8. Jacob Boes, Gamze Gumuslu, James Miller, Andrew, Gellman & John Kitchin, Estimating Bulk-Composition-Dependent H2 Adsorption Energies on CuxPd1-x Alloy (111) Surfaces, ACS Catalysis, 5(), 1020-1026 (2015). link. doi.

1 The working code

Here is a function to process the org file prior parsing during the export process. This function goes into org-export-before-parsing-hook, and takes one argument, the backend. We simply replace all the citation links with formatted HTML snippets or blocks. If the snippets get longer than a line, it will break.

We use org-ref-reftex-format-citation to generate the bibliography, which uses reftex to format a string with escape characters in it.

(setq org-ref-bibliography-entry-format
      '(("article" . "<li><a name=\"\%k\"></a>%a, %t, <i>%j</i>, <b>%v(%n)</b>, %p (%y). <a href=\"%U\">link</a>. <a href=\"https://doi.org/%D\">doi</a>.</li>")
        ("book" . "<li><a name=\"\%k\"></a>%a, %t, %u (%y).</li>")))

(defun org-ref-unsrt-latex-processor () nil)
(defun org-ref-unsrt-html-processor ()
  "Citation processor function for the unsrt style with html output."
  (let (links
        unique-keys numbered-keys
        replacements
        bibliography-link
        bibliographystyle-link
        bibliography)
    ;; step 1 - get the citation links
    (setq links (loop for link in (org-element-map
                                      (org-element-parse-buffer) 'link 'identity)
                      if (-contains?
                          org-ref-cite-types
                          (org-element-property :type link))
                      collect link))

    ;; list of unique numbered keys. '((key number))
    (setq unique-keys (loop for i from 1
                            for key in (org-ref-get-bibtex-keys)
                            collect (list key (number-to-string i))))


    ;; (start end replacement-text)
    (setq replacements
          (loop for link in links
                collect
                (let ((path (org-element-property :path link)))
                  (loop for (key number) in unique-keys
                        do
                        (setq
                         path
                         (replace-regexp-in-string
                          key (format "<a href=\"#%s\">%s</a>" key number)
                          path)))
                  (list (org-element-property :begin link)
                        (org-element-property :end link)
                        (format "@@html:<sup>%s</sup>@@" path)))))

    ;; construct the bibliography string
    (setq bibliography
          (concat "#+begin_html
<h1>Bibliography</h1><ol>"
                  (mapconcat
                   'identity
                   (loop for (key number) in unique-keys
                         collect
                         (let* ((result (org-ref-get-bibtex-key-and-file key))
                                (bibfile (cdr result))
                                (entry (save-excursion
                                         (with-temp-buffer
                                           (insert-file-contents bibfile)
                                           (bibtex-set-dialect
                                            (parsebib-find-bibtex-dialect) t)
                                           (bibtex-search-entry key)
                                           (bibtex-parse-entry t)))))
                           ;; remove escaped & in the strings
                           (replace-regexp-in-string "\\\\&" "&"
                                           (org-ref-reftex-format-citation
                                            entry
                                            (cdr (assoc (cdr (assoc "=type=" entry))
                                                        org-ref-bibliography-entry-format))))))
                   "")
                  "</ol>
#+end_html"))

    ;; now, we need to replace each citation. We do that in reverse order so the
    ;; positions do not change.
    (loop for (start end replacement) in (reverse replacements)
          do
          (setf (buffer-substring start end) replacement))

    ;; Eliminate bibliography style links
    (loop for link in (org-element-map
                          (org-element-parse-buffer) 'link 'identity)
          if (string= "bibliographystyle"
                      (org-element-property :type link))
          do
          (setf (buffer-substring (org-element-property :begin link)
                                  (org-element-property :end link))
                ""))

    ;; replace the bibliography link with the bibliography text
    (setq bibliography-link (loop for link in (org-element-map
                                                  (org-element-parse-buffer) 'link 'identity)
                                  if (string= "bibliography"
                                              (org-element-property :type link))
                                  collect link))
    (if (> (length bibliography-link) 1)
        (error "Only one bibliography link allowed"))

    (setq bibliography-link (car bibliography-link))
    (setf (buffer-substring (org-element-property :begin bibliography-link)
                            (org-element-property :end bibliography-link))
          bibliography)))


(defun org-ref-citation-processor (backend)
  "Figure out what to call and call it"
  (let (bibliographystyle)
    (setq
     bibliographystyle
     (org-element-property
      :path (car
             (loop for link in
                   (org-element-map
                       (org-element-parse-buffer) 'link 'identity)
                   if (string= "bibliographystyle"
                               (org-element-property :type link))
                   collect link))))
    (funcall (intern (format "org-ref-%s-%s-processor" bibliographystyle backend)))))

(add-hook 'org-export-before-parsing-hook 'org-ref-citation-processor)

(browse-url (org-html-export-to-html))
#<process open ./blog.html>

2 Long appendix illustrating how we got the code to work

The first thing we need is a list of all the citation links, in the order cited. Here they are.

(mapcar
 (lambda (link) (org-element-property :path link))
 (loop for link in (org-element-map (org-element-parse-buffer) 'link 'identity)
       if (-contains? org-ref-cite-types (org-element-property :type link))
       collect link))
kitchin-2015-examp,kitchin-2015-data-surfac-scien xu-2015-tunin-oxide,mehta-2014-ident-poten,curnan-2014-effec-concen hallenbeck-2013-effec-o2 miller-2014-simul-temper,boes-2015-estim-bulk kitchin-2015-examp kitchin-2015-data-surfac-scien boes-2015-estim-bulk

Now, we need to compute replacements for each citation link, and construct the bibliography. We will make a numbered, unsorted bibliography, and we want to replace each citation with the corresponding numbers, hyperlinked to the entry.

We start with a list of the keys in the order cited, and a number we will use for each one.

(loop for i from 1
      for key in (org-ref-get-bibtex-keys)
      collect (list key i))
kitchin-2015-examp 1
kitchin-2015-data-surfac-scien 2
xu-2015-tunin-oxide 3
mehta-2014-ident-poten 4
curnan-2014-effec-concen 5
hallenbeck-2013-effec-o2 6
miller-2014-simul-temper 7
boes-2015-estim-bulk 8

Now, we need to compute replacements for each cite link. This will be replacing each key with the number above. We will return a list of ((start end) . "replacement text") that we can use to replace each link. For fun, we make these superscripted html.

(let ((links (loop for link in (org-element-map (org-element-parse-buffer) 'link 'identity)
                   if (-contains? org-ref-cite-types (org-element-property :type link))
                   collect link))
      (replacements (loop for i from 1
                          for key in (org-ref-get-bibtex-keys)
                          collect (list key (number-to-string i)))))
  (loop for link in links
        collect (let ((path (org-element-property :path link)))
                  (dolist (repl replacements)
                    (setq path (replace-regexp-in-string (car repl) (nth 1 repl) path)))
                  (list (org-element-property :begin link)
                        (org-element-property :end link)
                        (format "<sup>%s</sup>" path)))))
950 1004 <sup>1,2</sup>
1073 1145 <sup>3,4,5</sup>
1160 1190 <sup>6</sup>
1236 1286 <sup>7,8</sup>
1364 1388 <sup>1</sup>
1392 1427 <sup>2</sup>
4091 4117 <sup>8</sup>

We also need to compute the bibliography for each key. We will use org-ref-reftex-format-citation to do this. For that we need the parsed bibtex entries, and a format string. org-ref provides most of this.

(setq org-ref-bibliography-entry-format
      '(("article" . "<li>%a, %t, <i>%j</i>, <b>%v(%n)</b>, %p (%y). <a href=\"%U\">link</a>. <a href=\"https://doi.org/%D\">doi</a>.</li>")
        ("book" . "<li>%a, %t, %u (%y).</li>")))

(concat "<h1>Bibliography</h1><br><ol>"
        (mapconcat
         'identity
         (loop for key in (org-ref-get-bibtex-keys)
               collect
               (let* ((result (org-ref-get-bibtex-key-and-file key))
                      (bibfile (cdr result))
                      (entry (save-excursion
                               (with-temp-buffer
                                 (insert-file-contents bibfile)
                                 (bibtex-set-dialect (parsebib-find-bibtex-dialect) t)
                                 (bibtex-search-entry key)
                                 (bibtex-parse-entry)))))
                 (org-ref-reftex-format-citation
                  entry
                  (cdr (assoc (cdr (assoc "=type=" entry))
                              org-ref-bibliography-entry-format)))))
         "")
        "</ol>")

Bibliography


  1. Kitchin, Examples of Effective Data Sharing in Scientific Publishing, {ACS Catalysis}, 5(6), 3894-3899 (2015). link. doi.
  2. "John Kitchin", Data Sharing in Surface Science, "Surface Science ", (0), - (2015). link. doi.
  3. Zhongnan Xu \& John R Kitchin, Tuning Oxide Activity Through Modification of the Crystal and Electronic Structure: From Strain To Potential Polymorphs, {Phys. Chem. Chem. Phys.}, 17(), 28943-28949 (2015). link. doi.
  4. Prateek Mehta, Paul Salvador \& John Kitchin, Identifying Potential BO2 Oxide Polymorphs for Epitaxial Growth Candidates, {ACS Appl. Mater. Interfaces}, 6(5), 3630-3639 (2014). link. doi.
  5. Curnan \& Kitchin, Effects of Concentration, Crystal Structure, Magnetism, and Electronic Structure Method on First-Principles Oxygen Vacancy Formation Energy Trends in Perovskites, {The Journal of Physical Chemistry C}, 118(49), 28776-28790 (2014). link. doi.
  6. "Hallenbeck \& Kitchin, Effects of O2 and SO2 on the Capture Capacity of a Primary-Amine Based Polymeric CO2 Sorbent, "Industrial \& Engineering Chemistry Research", 52(31), 10788-10794 (2013). link. doi.
  7. Spencer Miller, Vladimir Pushkarev, Andrew, Gellman \& John Kitchin, Simulating Temperature Programmed Desorption of Oxygen on Pt(111) Using DFT Derived Coverage Dependent Desorption Barriers, {Topics in Catalysis}, 57(1-4), 106-117 (2014). link. doi.
  8. Jacob Boes, Gamze Gumuslu, James Miller, Andrew, Gellman \& John Kitchin, Estimating Bulk-Composition-Dependent H2 Adsorption Energies on CuxPd1-x Alloy (111) Surfaces, {ACS Catalysis}, 5(), 1020-1026 (2015). link. doi.

Copyright (C) 2015 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter