Need Random Data? Learn How to Use Shuf!
Have you ever needed to shuffle data for testing, generate random samples from a dataset, or create unpredictable sequences in your scripts? The shuf command, a part of the GNU coreutils, is your answer! This powerful tool lets you create random permutations of input, making it indispensable for various tasks, from data science to system administration. This article will guide you through everything you need to know to master shuf, from installation to advanced usage scenarios.
Overview

The shuf command is a seemingly simple yet remarkably ingenious utility designed for one core purpose: generating random permutations of input. Unlike more complex data manipulation tools, shuf focuses on randomization, offering a straightforward way to introduce unpredictability into your workflows. Its beauty lies in its simplicity and its capacity to be integrated seamlessly into larger scripts and pipelines. Imagine needing to pick a random winner from a list of participants, or perhaps you want to split a dataset into random training and testing sets. shuf enables these tasks with minimal effort, making it a valuable asset for developers, data scientists, and system administrators alike.
Installation

The shuf command is part of the GNU coreutils, which means it’s typically pre-installed on most Linux distributions. However, if you find that it’s missing or you’re using a different operating system, here’s how to get it:
Linux (Debian/Ubuntu)
sudo apt update
sudo apt install coreutils
Linux (Fedora/CentOS/RHEL)
sudo dnf install coreutils
macOS
On macOS, you can install coreutils using Homebrew:
brew install coreutils
After installing, the shuf command will be available as gshuf to avoid conflicts with any existing system utilities. You may want to create an alias:
alias shuf='gshuf'
Verification
To verify that shuf is installed correctly, run the following command:
shuf --version
This should output the version number of the shuf utility.
Usage
The shuf command provides several options to control its behavior. Let’s explore some common use cases with practical examples:
Shuffling Input from a File
To shuffle the lines of a file, use the following command:
shuf input.txt
This will output the lines of input.txt in a random order to the standard output.
Shuffling Input from Standard Input
You can also pipe input to shuf:
cat input.txt | shuf
This achieves the same result as the previous example, but it’s useful when you’re working with data generated by other commands.
Generating a Random Sample
To select a random sample of lines from a file, use the -n option:
shuf -n 5 input.txt
This will output 5 random lines from input.txt.
Generating a Random Number Sequence
To generate a sequence of random numbers, use the -i option:
shuf -i 1-10
This will output a random permutation of the numbers from 1 to 10.
Specifying an Output File
To save the shuffled output to a file, use the -o option:
shuf input.txt -o output.txt
This will shuffle the lines of input.txt and save the result to output.txt.
Repeating the Shuffle
By default, shuf treats its input as lines or numbers and outputs a single shuffled order. If you want the shuffle operation to be repeated, you can pipe the output back into shuf
seq 10 | shuf | shuf | shuf
This command will generate the numbers from 1 to 10, shuffle them and then re-shuffle the result twice. The output will have the effect of an arbitrary sampling, with some numbers repeated and some missing, depending on the output of the random operations.
Creating a Deck of Cards
Let’s create a simple script to simulate shuffling a deck of cards:
#!/bin/bash
suits=("Hearts" "Diamonds" "Clubs" "Spades")
ranks=("2" "3" "4" "5" "6" "7" "8" "9" "10" "Jack" "Queen" "King" "Ace")
declare -a deck
for suit in "${suits[@]}"; do
for rank in "${ranks[@]}"; do
deck+=("$rank of $suit")
done
done
shuf -e "${deck[@]}"
This script defines arrays for suits and ranks, creates a deck of cards, and then shuffles it using shuf. The -e option treats each argument as a separate input line.
Sampling for A/B Testing
Imagine you have a list of user IDs and want to randomly assign them to either group A or group B for A/B testing:
#!/bin/bash
user_ids=$(seq 1 100) # Generate user IDs from 1 to 100
group_a=$(echo "$user_ids" | shuf -n 50) # Select 50 random users for group A
group_b=$(echo "$user_ids" | grep -v -F -x -e "$group_a" ) # Select the rest for group B
echo "Group A: $group_a"
echo "Group B: $group_b"
This script generates a list of user IDs, shuffles them, and assigns the first 50 to group A and the remaining to group B. Note the use of `grep` to efficiently find elements not in group A.
Tips & Best Practices
- Understand the Randomness Source:
shufrelies on a pseudo-random number generator. While suitable for most purposes, it may not be cryptographically secure. For applications requiring true randomness, consider using tools that draw from system entropy sources. - Handle Large Files Efficiently: When shuffling large files, be mindful of memory usage.
shufloads the entire input into memory, so very large files might lead to performance issues. Consider alternatives like splitting the file into smaller chunks, shuffling each chunk, and then merging the shuffled chunks. - Combine with Other Utilities:
shufshines when combined with other command-line tools likeawk,sed, andgrepto perform more complex data manipulations. - Use the
-rOption with Caution: The-ror--repeatoption allows elements to be repeated in the output. This is useful in certain scenarios (like simulating a biased coin flip), but make sure it’s what you intend.
Troubleshooting & Common Issues
shufcommand not found: This usually indicates that coreutils is not installed or not in your system’s PATH. Follow the installation instructions above.- Output not truly random: If you suspect that
shuf‘s output is not random enough, ensure that your system’s random number generator is properly seeded. On Linux, this is typically handled automatically. - Memory errors with large files: If you’re working with very large files and encounter memory errors, consider splitting the file into smaller chunks or using alternative tools designed for handling large datasets.
- Inconsistent results across different systems: The exact sequence generated by
shufmay vary slightly across different systems or versions of coreutils. This is due to variations in the underlying random number generators.
FAQ
- Q: What is the primary purpose of the
shufcommand? - A: The
shufcommand generates random permutations of input, such as lines in a file or numbers in a sequence. - Q: How can I select a random sample of 10 lines from a file using
shuf? - A: Use the command
shuf -n 10 filename.txt. - Q: Is
shufsuitable for generating cryptographically secure random numbers? - A: No,
shufrelies on a pseudo-random number generator and is not suitable for cryptographic purposes. - Q: How do I save the shuffled output to a new file?
- A: Use the
-ooption, like this:shuf input.txt -o output.txt. - Q: What if `shuf` is not found after installing coreutils on macOS?
- A: The command is often installed as `gshuf`. You can either type `gshuf`, or add `alias shuf=’gshuf’` to your `.bashrc` or `.zshrc` file to create an alias.
Conclusion
The shuf command is a simple yet powerful tool for generating random permutations of input. Its versatility makes it an invaluable asset for various tasks, from data science to system administration. By mastering the techniques outlined in this article, you can effectively integrate shuf into your scripts and workflows to add an element of unpredictability and randomness. So, go ahead and experiment with shuf to discover its full potential and streamline your data manipulation tasks. Visit the GNU coreutils page to learn more and explore other helpful command-line utilities!