Мой файл sonnets.txt выглядит так (только сонетов 154 вместо двух):
Код: Выделить всё
I.
FROM fairest creatures we desire increase,
That thereby beauty's rose might never die,
But as the riper should by time decease,
His tender heir might bear his memory:
But thou, contracted to thine own bright eyes,
Feed'st thy light'st flame with self-substantial fuel,
Making a famine where abundance lies,
Thyself thy foe, to thy sweet self too cruel.
Thou that art now the world's fresh ornament
And only herald to the gaudy spring,
Within thine own bud buriest thy content
And, tender churl, makest waste in laggarding.
Pity the world, or else this glutton be,
To eat the world's due, by the grave and thee.
II.
When forty winters shall beseige thy brow,
And dig deep trenches in thy beauty's field,
Thy youth's proud livery, so gazed on now,
Will be a tatter'd weed, of small worth held:
Then being ask'd where all thy beauty lies,
Where all the treasure of thy lusty days,
To say, within thine own deep-sunken eyes,
Were an all-eating shame and thriftless praise.
How much more praise deserved thy beauty's use,
If thou couldst answer 'This fair child of mine
Shall sum my count and make my old excuse,'
Proving his beauty by succession thine!
This were to be new made when thou art old,
And see thy blood warm when thou feel'st it cold.
Чтобы убрать пунктуацию Я сделал это, кажется, работает:
Код: Выделить всё
import string
s = open("sonnets.txt").read()
exclude = set(string.punctuation)
s = ''.join(ch for ch in s if ch not in exclude)
Подробнее здесь: https://stackoverflow.com/questions/793 ... s-an-array