setting probablities of random text in matlab

To generate random text with specific probabilities, you can use Markov chain models. The idea is to define a transition matrix that describes the probabilities of moving between different characters or words. The entries of the matrix can be estimated from a training corpus, and then used to generate new text.

Here's an example of how to do this in MATLAB, using character-level Markov chains:

main.m
% Define the transition matrix
alphabet = 'abcdefghijklmnopqrstuvwxyz ';
N = length(alphabet);
P = zeros(N);
for i = 1:N-1
    P(i,i+1) = 1/3;  % probability of moving to next letter
    P(i,N) = 2/3;    % probability of ending a word and starting a new one
end
P(N,N) = 1;          % probability of staying at the end of a word

% Generate random text
x = 1;               % start at the beginning of a word
while x ~= N
    p = P(x,:);
    x = find(rand < cumsum(p), 1);
    fprintf('%s', alphabet(x));
end
509 chars
18 lines

This code defines a transition matrix P where each row corresponds to a character in the alphabet (plus a special "end of word" character), and each column corresponds to the next possible character. The entries of P are set to the desired probabilities, such that the sum of each row is equal to 1.

To generate text, we start at the beginning of a word (character 1), and repeat the following steps until we reach the "end of word" character (character N):

  1. Compute the probabilities of moving to each possible next character using the current position x.
  2. Choose the next character randomly according to these probabilities.
  3. Print the chosen character.

This process generates random text that follows the same patterns as the training corpus, but is not identical. The quality of the generated text depends on the quality and size of the training corpus, as well as the complexity of the Markov chain model.

gistlibby LogSnag