Assorted tips for authors of papers, especially scientific papers. Erez Zadok Updated 2021-02-01 This list is no replacement for good English skills, and reading such classics as Strunk and White's "The Elements of Style." Below I list the most common mistakes that authors make, collected after years of authoring and shepherding of papers. ------------------------------------------------------------------------------ Don't use quotes to imply an alternate meaning in scientific prose, which is supposed to be precise and unambiguous. The quotes cause the reader to question what you *really* mean, whether you know what you're saying, etc. Quite often, just removing the quotes is enough; in some case, you may have to rewrite the text slightly. For example: Bad: The system finds out the ``correct'' parameter to set. Good: The system finds out the correct parameter to set. Better: The system finds out an optimal parameter to set, using the XYZ algorithm. The most appropriate use of quotes in scientific papers is when quoting something literally, for example The word ``dog'' has three letters. ------------------------------------------------------------------------------ Don't mix your tenses in a scientific paper. At its basic core, a paper is written in past tense, because the work had already been done and completed. So by default, everything should be in past tense. Present tense is used to describe a system's operation b/c that was true before, but is still true now. Future tense should be reserved only when describing future work. Tenses are often confused in Design and Evaluation sections; worse, some authors flip back and forth between tenses, which can be annoying and causes the reader to lose some concentration. Examples: Bad: We design our system using three principles. Good: We designed our system using three principles. Bad: Our XYZ system will hash the items for faster lookups. Good: Our XYZ system hashes the items for faster lookups. Bad: We examine the performance of our system. Good: We examined the performance of our system. Bad: XYZ had a 10% overhead due to hash table lookups. Good: XYZ has a 10% overhead due to hash table lookups. Also OK: In our experiments, XYZ had a 10% overhead due to hash table lookups. ------------------------------------------------------------------------------ Citation marks such as "[2]" are considered parenthetical statements in English. As such, they are extraneous to the prose. That means that one should be able to remove all parenthetical statements from a paper, and what's left should be perfectly valid English. Some picky reviewer may complain about such style, and it is also harder to read for some reviewers. Therefore, it is generally better not to use citation marks as nouns. Bad/Awkward: In [17], the authors show that... Better: John Doe et al. [17] show that... or Better: The XYZ system [17] showed that... ------------------------------------------------------------------------------ Avoid using too much parenthesized text (or footnotes, if any). Parenthetical statements can be removed from the prose, and what's left should be valid prose that's well understood. I often find people use parentheses for text that's too important to be parenthesized, and should instead be merged with the main paragraph or sentences in which it appears. Also, parentheses (or footnotes) are distracting to the smooth flow of the paper. They cause the reader to execute a mental "subroutine call" of sorts, losing some concentration in the process; by the time the reader had gotten back to the text past the parenthesis/footnote, their attention had been slightly lost. A sparse use of parenthesis/footnotes is perfectly fine, but too much can make a paper harder to read and understand. There's no easy way to fix those. You'll have to go over every parentheses and decide what to do. If it's not needed, remove it. If it's needed, then find a way to fold it into the main body of text more smoothly; perhaps split long sentences in 2-3 sentences, or turn long parenthetical text into new sentences. ------------------------------------------------------------------------------ Use various hyphens correctly. There are three kinds: a regular/short hyphen; a slightly longer one called "en-dash", and a longer one called "em-dash". In LaTeX, a hyphen is one dash sign, "-"; an en-dash is two consecutive dashes, "--"; and an em-dash is three consecutive dashes, "---". A regular/short hyphen separates hyphenated words, as in "CPU-bound workload". A number range uses an en-dash, as in "3--5% overhead". In this paper you've used a regular dash instead of an en-dash for number ranges. A sentence with hyphenated phrases uses an em-dash; instead, in this paper you've used en-dashes instead of em-dashes to separate phrases of a sentence. Also, when you use an em-dash or en-dash, it does not need to be surrounded by spaces. Bad: We model workloads with 12 threads -- our architecture is... Good: We model workloads with 12 threads---our architecture is... More importantly, *however, is that in this paper, you've used em-dashes way too much and in many cases inappropriately. Most of the times where you've used them, you should just start new sentences or use a semi-colon to separate two related sentences. In some cases you need to use a colon which often separates a phrase from another which explains or expounds upon it. ------------------------------------------------------------------------------ Avoid using passive voice as much as possible in scientific prose. Instead, use active voice. Passive voice is less assertive: scientific writing needs to be assertive, certain, and definitive. Passive voice often tends to also be more verbose, taking up more space in a paper. But the most serious problem is that passive voice is ambiguous, which is bad for a scientific paper. Bad: Algorithm X was implemented in system Y. That's bad because it's not clear *who* implemented the algorithm. Was it you, the author of THIS paper, or was it someone else in another work that you're referring to. Instead, you should just convert it to active voice: Good: We implemented algorithm X in system Y. You have a bunch of these in your paper. I tried to highlight as many of those as I could. Note: I often use the special red marker "MKA" near phrases that I think you should "MaKe it Active". ------------------------------------------------------------------------------ Overfull lines. Some latex styles (which I assume are given by the publishing organization) cause a number of overfull lines -- text that spills too far past the right edge of the text column. They look bad, esp. for a CRC. Try to fix them b/c it makes the document looks unpolished. To fix this, there may be something in the latex you're using which'd help. It could also be related to the version of latex you're using. Alternatively, you can fix this by careful insertion of hyphens, and some minor rewording of sentences so that long/unbreakable words don't end up spilling over. ------------------------------------------------------------------------------ Overuse of the word "while". The word "while" literally means "at the same time", but it's often confused with "although" or "whereas". Use "while" only if you really mean "at the same time". Use "although" when you compare two opposing phrases/facts. And use "whereas" when you compare two items that may not be opposing. Sometimes, "although" and "whereas" can be interchangeable, but they cannot be interchanged with "while". Good: While the CPU is multiplying the matrix, the disk controller retrieved the block. Good: Whereas the disk was slow, our overhead was still relatively low. Good: Although system X proposed the use of DHTs, it was never implemented. ------------------------------------------------------------------------------ Wrong use of "since". The word "since" is a temporal one, having to do with actions that took place starting at a point in time. This word is often confused with "as" or "because". Use "since" only if you mean it in a temporal sense. Use "because" (even at the beginning of a sentence) if you're comparing cause and effect. And use "as" if you're comparing a looser association of cause and effect. Bad: Since we sort our buckets, our lookup times are fast. Good: Because we sort our buckets, our lookup times are fast. Good: As our system uses hash tables, lookups are cheap. Good: Since 1984, the NFS protocol had evolved substantially. Note: these preferences may depend if you follow American or British styles. ------------------------------------------------------------------------------ Precise use of "only" and other restrictive/expansive words. The word "only" (and similar) should be as close as possible to that which it restricts. Otherwise your prose is imprecise: scientific text should be accurate. Often, people put the word "only" too far before the subject which is restricted. Consider these three sentences: 1. We only read the data blocks of file F. 2. We read only the data blocks of file F. 3. We read the data blocks of only file F. These three sentences have entirely different meanings. In sentence 1, you're restricting the verb "read". This could mean, for example, that you only perform the "read" action, and not, say, a "write" or "modify" action on the blocks. In sentence 2, you restrict "the data blocks", as opposed to, say, "meta-data blocks". In sentence 3, you're saying that read actions are applied to just a specific file F, and not some other file X. ------------------------------------------------------------------------------ Understated prose. Scientific prose is supposed to be understated. This means your text should not exaggerate your claims, and if anything, should downplay your contributions. Instead of exaggerated claims, present the figures and let the numbers and facts speak for themselves. For that reason, you should avoid words such as "very" or "extremely" in any scientific paper. In many case, these words should just be removed; in other cases, you can choose more subdued words. Bad: Our overhead is very low. Good: Our overhead is 3.7% and is therefore relatively small. Moreover, don't ever use exclamation points in scientific papers, nor boldface, underlined, or all-caps words to emphasize text. In rare occasions, you may use italics to emphasize a word, only if you fear that the read may miss an important claim. The use of italics is ok if, for example, you're making a claim that is counter-intuitive (and you can explain it precisely). For example: Bad: Our system ACTUALLY increases throughput! Good: Although out system encrypts every packet, overall end-to-end throughout actually \emph{increased}. We investigated this unexpected behavior and found it to be due to a specific interaction of TCP buffer caches on the client and server, as we detail in the next section. ------------------------------------------------------------------------------