Need Random Data? Unleash the Power of Shuf!
In the world of data manipulation, randomization is a crucial technique. Whether you’re building datasets for machine learning, running simulations, or simply need to sample data randomly, the shuf command-line tool is your secret weapon. This unassuming utility, part of the GNU Core Utilities, provides a simple yet powerful way to generate random permutations and selections from input data, making it an indispensable tool for any data professional, system administrator, or developer.
Overview

shuf, short for “shuffle,” is a command-line utility designed to generate random permutations of its input. It reads lines from standard input or a file, shuffles them randomly, and writes the shuffled output to standard output. The brilliance of shuf lies in its simplicity and its ability to integrate seamlessly with other command-line tools through pipes. It eliminates the need for complex scripting or programming when you just need a quick and easy way to randomize your data. Its origins trace back to the textutils package within the GNU Core Utilities, ensuring widespread availability on most Unix-like systems.
Imagine you have a list of customer IDs and you want to randomly select a subset for A/B testing. Or perhaps you need to randomize the order of questions in a quiz. With shuf, these tasks become trivial. You can also generate random numbers within a specified range, making it a valuable tool for generating sample data for testing purposes. It’s a versatile utility for anyone needing to inject randomness into their workflow. Beyond its core function of shuffling lines, shuf can select a specific number of random samples, with or without replacement, further extending its utility. The command is efficient, reliable, and generally available on most Linux distributions. This makes sharing and reusing shell scripts containing shuf easy.
Installation

Since shuf is part of the GNU Core Utilities, it is typically pre-installed on most Linux and macOS systems. However, if for some reason it’s missing, you can install it using your distribution’s package manager.
- Debian/Ubuntu:
sudo apt update
sudo apt install coreutils
sudo yum install coreutils
brew install coreutils
After installing on macOS, you will likely need to use gshuf instead of shuf to run the command.
After installation, verify that shuf is available by running the following command:
shuf --version
This should display the version information for the shuf utility.
Usage

The shuf command offers a variety of options to control its behavior. Here are some common use cases with examples:
1. Shuffling Lines from a File
The most basic usage is to shuffle the lines of a file. Let’s say you have a file named data.txt:
cat data.txt
# Output:
apple
banana
cherry
date
fig
To shuffle the lines in this file, use the following command:
shuf data.txt
# Example Output (will vary due to randomness):
date
cherry
banana
apple
fig
The output will be the same lines, but in a random order.
2. Shuffling Standard Input
shuf can also read from standard input, allowing you to pipe data from other commands. For example, to shuffle a list of numbers generated by seq:
seq 1 10 | shuf
# Example Output (will vary):
3
7
1
9
4
6
2
5
8
10
This command generates the sequence of numbers from 1 to 10 and pipes it to shuf, which then shuffles the numbers and prints them to the terminal.
3. Selecting a Sample of Lines
You can use the -n option to select only a specified number of lines from the input. For instance, to select 3 random lines from data.txt:
shuf -n 3 data.txt
# Example Output (will vary):
banana
fig
apple
This command will output 3 randomly selected lines from the data.txt file.
4. Sampling with Replacement
By default, shuf samples without replacement, meaning that each line is selected at most once. To sample with replacement, use the -r option. This allows the same line to be selected multiple times. For example, to generate 5 random numbers between 1 and 10 (inclusive), allowing duplicates:
seq 1 10 | shuf -n 5 -r
# Example Output (will vary):
7
3
3
1
9
5. Generating a Range of Numbers
The -i option allows you to specify a range of integers to shuffle. The syntax is -i *start*-*end*. For example, to generate a random permutation of the numbers from 1 to 10:
shuf -i 1-10
# Example Output (will vary):
2
8
6
1
9
3
7
10
5
4
This is equivalent to using seq 1 10 | shuf, but is more concise.
6. Using a Specific Seed
For reproducibility, you can use the --random-source=FILE argument to specify a file containing random numbers, or the --seed=NUMBER option to initialize the random number generator with a specific seed. This ensures that the same sequence of random numbers is generated each time you run the command with the same seed.
shuf --seed=123 data.txt
#Output will vary based on the file contents but will be consistent across runs with same seed
shuf --random-source=/dev/urandom data.txt
#Output will vary based on the file contents, as it uses the system's random number generator.
Tips & Best Practices

- Use pipes for flexibility:
shufshines when combined with other command-line tools. Use pipes to filter, transform, or process data before or after shuffling. - Consider sampling with replacement carefully: Sampling with replacement can be useful for simulations or generating synthetic data, but be aware that it can skew the distribution of your data if used inappropriately.
- Set a seed for reproducibility: If you need to repeat a random process exactly, use the
--seedoption to initialize the random number generator. This is especially important for testing and debugging. - Handle large files efficiently:
shufloads the entire input into memory before shuffling. For extremely large files, consider using alternative tools or techniques that process data in chunks to avoid memory issues. - Combine with other tools
shufworks best when combined with commands such asawk,sed, andgrep. For example, you might want to usegrepto filter a file before shuffling its contents withshuf.
Troubleshooting & Common Issues
- “shuf: command not found”: This usually indicates that
shufis not installed or not in your system’s PATH. Verify the installation and PATH settings. - Memory errors with large files: If you’re shuffling a very large file,
shufmight run out of memory. Consider using alternative tools likesort -R(which also shuffles, but may be less random) or processing the file in smaller chunks. - Unexpected output when sampling with replacement: Double-check that you’re using the
-roption correctly if you intend to sample with replacement. - Non-uniform randomness: While
shufis generally considered to provide good randomness, for highly sensitive applications, you might want to consider using a dedicated random number generator with stronger statistical properties.
FAQ
- Q: What is the difference between
shufandsort -R? - A: Both
shufandsort -Rcan randomize data, butshufis generally considered to provide better randomness.sort -Rmay be faster for very large files, but its randomness may be less uniform. - Q: Can I use
shufto generate random passwords? - A: While you can use
shufto generate random passwords by shuffling a character set, it’s generally recommended to use dedicated password generation tools that are designed for security and compliance. - Q: Is
shufavailable on Windows? - A:
shufis not natively available on Windows. However, you can use it via the Windows Subsystem for Linux (WSL) or by installing GNU Core Utilities through a package manager like Chocolatey. - Q: How can I generate a random floating-point number using
shuf? - A:
shufprimarily works with integers and lines of text. To generate random floating-point numbers, you’ll need to combineshufwith other tools likeawkorbcto perform the necessary calculations.
Conclusion
shuf is a powerful and versatile command-line tool that simplifies the process of generating random permutations and selections from data. Its ease of use and seamless integration with other utilities make it an invaluable asset for data scientists, system administrators, and anyone who needs to introduce randomness into their workflows. Experiment with the different options and discover how shuf can streamline your tasks. Now, go ahead and give shuf a try! Visit the GNU Core Utilities page for more details.