  1. GCG, the old bioinformatics package, was named after the authors kept high-fiving each other, shouting “good code guys!”. (GCG is a software package for the analyses of gene and protein sequences.)

  2. Bowtie is named so because “it is almost impossible to tie”, referring to code to avoid a “race condition” when using multiple processors.

  3. TopHat is named do because it was the first spliced RNA-Seq aligner, and when it worked first time, the authors shouted `Top that!``.

  4. Velvet is so named because @dzerbino wore velvet gloves(天鹅绒手套) when coding it (via @pathogenomenick)

  5. Tuxedo suit is so named that only ‘privileged’ (特权阶层) know how to use it ! #bioinformaticsfun (via @harshinamdar)

  6. @BenLangmead wrote Bowtie while wearing a tuxedo but he did all the testing in zip-up onesie batman pajamas (via @coletrapnell)

  7. Heng Li writes all his code in x86 assembly language, and uses a C decompiler before releasing it. @lh3lh3 (via @torstenseemann)

  8. The SRA (short read archive) is the best known of the archives, and not many people know or use the MRA (medium read archive), the KLRA (kinda long read archive) and the LRA (long read archive). (SRA: sequence read archive)

  9. EBI (FBI) actually stands for “European bureau of investigation”. It’s a front of the EU secret service, collecting genomic info (via @klmr)

  10. Illumina is short for Illuminati(光明会), the shadowy organisation that controls sequencing worldwide. (via @neilfws)

  11. The HMMer package was so named when someone asked how it worked, and the developers said Hmmmm… errr…. (via @mgollery)

  12. Hidden Markov Models are like the recipe for Kentucky Fried Chicken.  There are only three people in the world who understand small parts of how HMMs work, and only when they get together do they know the full picture.


  1. BLAST is so fast, the authors had to deliberately slow down the code so it doesn’t overheat the servers.

  2. The HGAP assembler is actually an elaborate front-end hiding three thousand slave laborers all running GAP4 (via @IanGoodhead)

  3. The @PacBio machines are so large because inside’s an Illumina machine + a bioinformatician running assemblies (via @gedankenstuecke)

  4. NCBI’s bacterial annotation takes 6 weeks because it’s done manually by work experience students pasting ORFs into web BLAST (via @torstenseemann)

  5. The p in p-value actually stands for p-otentially interesting! (via @jessenleon)

  6. The e in e-value stands for excellent, as in “that’s an excellent BLAST hit”

  7. The EBI is an elaborate front-end to NCBI services. (现在EBI也做的越来越好,国内也有了更多越来越好的数据平台)

  8. Europe PubMed Central has only ever been accessed by people accidentally clicking on links.  100% of visitors immediately bounce to pubmed.com.


  1. The number of replicates needed for your RNA-seq experiment equals the impact factor of the journal you want to publish in (via @torstenseemann)

  2. 99.5% of people who cite Altschul et al have never read the paper. (发表了BLAST的那篇文章)

  3. Over 1 billion people have searched the NCBI protein database for their own name.

  4. The word ELVIS appears 35 times in human peps (GRCh38). ELVISLIVES appears 0 times. The king (猫王Elvis) has left the genome #slowday (via @rdemes) -

  5. A single anonymous donor, RP11, accounts for 72 percent of the human reference genome (via CanGenom)

  6. There are now more journals than papers.

  7. It has been calculated that there are twice as many data formats as there are Bioinformaticians (via @mgollery)

  8. FASTA 80 character line wrapping was invented to standardise data sharing using MS Word (via @IanGoodhead)

  9. Nine out of ten Bioinformaticians prefer Excel (via @CIgenomics)


  1. BGI exclusively publish in Nature journals because their papers are first rejected by Gigascience.

  2. BGI actually only have one HiSeq but made to look like hundreds by a set up of mirrors, like that bit in Enter the Dragon (via @froggleston) (现在我们都用BGI系列了)

  3. If you stand in front of a mirror and say HiSeq 3 times, Illumina staff member will show up holding the HiSeq X Ten system (via @nazeetafatima)

  4. Illumina reads are short as before the development of Basespace they were delivered via Twitter (via @RoyChaudhuri)

  5. Base qualities are called Phred scores in honour of Fred Sanger who developed DNA sequencing. #101bioinfofunfacts (via @tostenseemann)


  1. CriMap was called CriMap because users do an awful lot of crying before they get a half decent map. (via @dj_de_koning)

  2. If you amass the de-bugging tears of a bioinformatician it is enough to fill an Olympic size swimming pool annually (via @paulhoskisson)

  3. The majority of bioinformaticians can’t pronounce de Bruijn properly

  4. The consumption rate of coffee (+ beer ) among Bioinformaticians from around the world is increasing every year. TRUE FACT! (via @NazeefaFatima)

  5. In a recent public survey of the 100 most desirable jobs, bioinformatician was a close second to astronaut (via @dynomics)

  6. Pet Bioinformaticians are paid with cuddling (via @riccombeni)

  7. Spike-ins are like gold (via @nomad421)

  8. It’s easy! You only have to download this database in which all the genes have only one ID and you can retrieve the IDs in the most important databases (via @jorjial)

  9. If you’ve never shown the NIH sequencing costs plot in talk/lecture you’re not a real bioinformatician  (via @AliciaOshlack)

  1. https://www.genome.gov/about-genomics/fact-sheets/DNA-Sequencing-Costs-Data

