Need Randomness? Harness the Power of Shuf!
In the world of data manipulation and scripting, sometimes you need a touch of randomness. Whether it’s shuffling a playlist, selecting a random winner from a list, or generating test data, the shuf command-line utility is your go-to tool. This unassuming program, part of the GNU Core Utilities, provides a simple yet powerful way to generate random permutations of your input. Let’s dive into how shuf can revolutionize your workflow.
Overview

shuf, short for “shuffle,” is a command-line tool that generates random permutations of the input. It reads lines from a specified file or standard input, and then outputs them in a random order. The beauty of shuf lies in its simplicity and versatility. It’s remarkably easy to use, yet it can be applied to a wide range of tasks where randomness is required. The tool’s elegance lies in its ability to handle large datasets efficiently, making it a staple in many data-processing pipelines.
Installation

Since shuf is part of the GNU Core Utilities, it’s typically pre-installed on most Linux distributions. However, if you find that it’s missing or you are using a different operating system, you can install it using your system’s package manager.
Installing on Debian/Ubuntu:
sudo apt update
sudo apt install coreutils
Installing on Fedora/CentOS/RHEL:
sudo dnf install coreutils
Installing on macOS (using Homebrew):
brew install coreutils
After installation, you can verify that shuf is correctly installed by checking its version:
shuf --version
This command should print the version number of the shuf utility.
Usage

shuf offers a variety of options to customize its behavior. Let’s explore some common use cases with practical examples.
Shuffling Lines from a File
The most basic usage is to shuffle the lines of a file. Suppose you have a file named names.txt containing a list of names, one name per line:
cat names.txt
Alice
Bob
Charlie
David
Eve
To shuffle these names randomly, use the following command:
shuf names.txt
This will output the names in a randomized order. Each time you run the command, the order will be different.
Shuffling Standard Input
shuf can also read input from standard input, allowing it to be used in pipelines. For example, you can generate a sequence of numbers using seq and then shuffle them:
seq 1 10 | shuf
This command generates the numbers 1 through 10 and then shuffles them randomly. This is very useful for generating random test data.
Specifying a Range of Numbers
Instead of using seq separately, shuf provides its own option to generate a range of numbers directly with -i:
shuf -i 1-10
This is equivalent to the previous example but more concise. The -i option specifies the input range.
Limiting the Output
Often, you don’t need to shuffle the entire input; you only need a random sample. The -n option allows you to specify the number of lines to output:
shuf -n 3 names.txt
This command will output only 3 randomly selected names from the names.txt file. If you provide a number larger than the number of lines in the input, shuf will output all the lines in random order.
Generating Unique Random Numbers
Sometimes you need to generate a sequence of unique random numbers within a specified range. This can be useful for simulations or generating unique identifiers. To ensure that the generated numbers are unique, you can combine shuf with other utilities like head.
shuf -i 1-100 | head -n 10
This generates a shuffled sequence of numbers from 1 to 100 and then takes the first 10, ensuring they are unique. It effectively chooses 10 distinct random numbers from that range.
Repeating with Replacement
By default, shuf does not repeat elements. However, you can use the -r option for sampling with replacement, meaning elements can be chosen more than once:
shuf -r -n 5 names.txt
This command will output 5 names from names.txt, and each name can be selected multiple times. This is useful for simulations where you want to mimic drawing items from a population where each item is replaced after being drawn.
Specifying a Random Seed
For reproducible results, you can set a specific random seed using the --random-source option in combination with a file containing random data or the --seed option to directly provide a numerical seed. This is crucial for testing and debugging where you need consistent behavior.
Using --random-source:
shuf --random-source=my_random_data.txt names.txt
Using --seed:
shuf --seed=12345 names.txt
Using the same seed will produce the same sequence of random numbers each time.
Shuffling Lines containing special characters
Shuf can handle lines with spaces and other special characters without issue. If your data file, say `data.txt`, contains lines like this:
cat data.txt
This is line one.
This is line two with, commas.
Line three has some; semicolons.
And finally, a line with "quotes".
You can shuffle this without any special treatment:
shuf data.txt
Tips & Best Practices

- Use
-nfor efficiency: If you only need a small random sample, use the-noption to limit the output. This is much more efficient than shuffling the entire input and then taking the first few lines. - Use
--random-sourceor--seedfor reproducibility: When testing or debugging, always use a fixed seed to ensure consistent results. - Be mindful of large inputs: While
shufis efficient, shuffling extremely large files can still take time. Consider using alternative methods if performance is critical. - Combine with other tools:
shufis most powerful when combined with other command-line tools in pipelines. Use it to generate random data for testing, sampling from large datasets, or adding randomness to your scripts. For example, you can pair it withxargsto execute commands on random subsets of files. - Sanitize Input: While
shufis robust, ensure your input data is clean. Unexpected characters or formatting issues might lead to unexpected behavior.
Troubleshooting & Common Issues
shuf: memory exhausted: This error occurs whenshufattempts to load an extremely large file into memory. To avoid this, consider processing the data in smaller chunks or using alternative tools designed for large-scale data processing.- Inconsistent results: If you’re getting different results each time you run
shuf, make sure you haven’t accidentally introduced randomness in other parts of your script. If you need consistent results, always use the--random-sourceor--seedoption. - Missing
shufcommand: If you get a “command not found” error, ensure thatcoreutilsis installed correctly on your system and thatshufis in your system’s PATH. - Encoding Issues:
shuftreats input as lines of text. If your input file has encoding issues (e.g., non-UTF-8 characters), the shuffling might produce unexpected results. Ensure your input file is properly encoded.
FAQ
- Q: Can
shufshuffle directories recursively? - A: No,
shufshuffles lines of text. To shuffle files in a directory, usefindto list the files, then pipe the output toshuf. - Q: How can I shuffle a CSV file while keeping the header row intact?
- A: Use
head -n 1to extract the header row, thentail -n +2to get the data rows, shuffle the data, and concatenate the header with the shuffled data. - Q: Is
shufcryptographically secure for generating random numbers? - A: No,
shufis not designed for cryptographic purposes. For generating secure random numbers, use tools like/dev/urandomor libraries specifically designed for cryptography. - Q: How to shuffle only a part of the file?
- A: You can combine
headandtailwithshuf. For instance, to shuffle lines 10 to 20, usehead -n 20 filename | tail -n 11 | shuf.
Conclusion
The shuf command is an invaluable tool for anyone working with data and scripting on the command line. Its simplicity, versatility, and efficiency make it an indispensable utility for adding randomness to your workflows. Whether you’re generating test data, selecting random samples, or simply shuffling a playlist, shuf has you covered. Give it a try, explore its options, and discover the many ways it can enhance your productivity. Visit the GNU Core Utilities documentation for more in-depth information and advanced usage scenarios.