Licence CC BY-NC-ND Thierry Parmentelat & Arnaud Legout

random dates¶

étape 1: générer une date¶

from randomdate import generate_random_date

help(generate_random_date)

Help on function generate_random_date in module randomdate:

generate_random_date(start='01/01/2024', end='15/06/2024')
    generate a random date with a uniform distribution
    between two given dates, inclusive
    all dates use format dd/mm/yyyy

    Parameters:
        start: str - start date
        end: str - end date
    Returns: str - generated date
    Examples:
        generate_random_date()
            -> "30/04/2024"

# that can be called like this
generate_random_date()

'04/01/2024'

# or like this
generate_random_date("10/02/2020", "31/12/2021")

'01/10/2020'

indices¶

ne même pas essayer¶

bien sûr, si vous essayez de générer les trois morceaux indépendamment

vous n’avez aucune chance d’être uniforme
en plus ça ne marche plus du tout si les bornes tombent comme ici au milieu de l’année et du mois

le module `random`¶

# pour générer un entier dans un intervalle
# lisez la doc pour savoir si les bornes sont incluses ou pas

import random
random.randint(1000, 2000)

1873

le module `datetime`¶

pour représenter :

les instants au cours du temps (dates, heures, ...): la classe datetime
que, juste pour être compatibles avec la PEP008, on va importer sous le nom de DateTime
les durées - i.e. la différence entre deux instants: la classe timedelta
et idem ici on va l’appeler `TimeDelta``

voici quelques-uns des traits qu’on va utiliser

from datetime import datetime as DateTime, timedelta as TimeDelta

# to build a DateTime that describes a specific day
# one can do (among other methods)

day = DateTime.strptime("15/02/2017", "%d/%m/%Y")
day

datetime.datetime(2017, 2, 15, 0, 0)

ici le deuxième paramètre représente le format utilisé pour afficher les dates, voir la liste complète ici https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior

# the interesting thing is that these objects
# know ho to do basic arithmetic

# le surlendemain de ce jour ce sera
day_after = day + TimeDelta(days=2)
day_after

datetime.datetime(2017, 2, 17, 0, 0)

# then to convert a DateTime object back to a string
# one can do (again there are other ways...)

f"{day_after:%d %m %Y}"

'17 02 2017'

# ofor example, this works too

DateTime.strftime(day_after, "%Y/%m/%d")

'2017/02/17'

solution¶

ouvrez-moi

randomdate.py


def generate_random_date(start="01/01/2024", end="15/06/2024"):
    """
    generate a random date with a uniform distribution
    between two given dates, inclusive
    all dates use format dd/mm/yyyy

    Parameters:
        start: str - start date
        end: str - end date
    Returns: str - generated date
    Examples:
        generate_random_date()
            -> "30/04/2024"
    """

    start_date = DateTime.strptime(start, FORMAT)
    end_date = DateTime.strptime(end, FORMAT)

    one_day = TimeDelta(days=1)
    nb_days = (end_date - start_date) // one_day

    random_days = random.randint(0, nb_days)
    return DateTime.strftime(start_date + random_days * one_day, FORMAT)

étape 2: écrire n échantillons dans un fichier¶

mêmes modalités, récrivez la fonction suivante

from randomdate import write_random_data

help(write_random_data)

Help on function write_random_data in module randomdate:

write_random_data(output: <class 'TextIO'>, nb_lines=1000)
    Writes into the *output* object <nb_lines> lines

    Each line will contain, separated by spaces:
    * a line number
    * a random date (same format)
    * a random string : containing lowercase letters,
      with a length itself a random number between 3 and 9.

    Parameters:
        output: TextIO
            the opened file - typically the result of
            open(..., 'w'); it is NOT a filename !
        nb_lines: int
            the number of lines to be generated

with open("randomdate.txt", 'w') as output:
    write_random_data(output, 5)

# this is just to see what is in the generated file
# if on Windows, it will likely not work, use vs-code instead..

%cat randomdate.txt

1 12/01/2024 kzfxkd
2 28/02/2024 huyeghsx
3 28/02/2024 rfhcru
4 13/05/2024 bvhz
5 17/02/2024 jeslhi

indices¶

le type TextIO représente un fichier déjà ouvert (et non pas un nom de fichier)
regardez la valeur de string.ascii_lowercase
regardez les fonctions random.choice() et random.choices()

solution¶

ouvrez-moi

randomdate.py


from typing import TextIO
import string

def write_random_data(output: TextIO, nb_lines=1000):
    """
    Writes into the *output* object <nb_lines> lines

    Each line will contain, separated by spaces:
    * a line number
    * a random date (same format)
    * a random string : containing lowercase letters,
      with a length itself a random number between 3 and 9.

    Parameters:
        output: TextIO
            the opened file - typically the result of
            open(..., 'w'); it is NOT a filename !
        nb_lines: int
            the number of lines to be generated
    """
    def random_token():
        length = random.randint(3, 9)
        return "".join(random.choices(string.ascii_lowercase, k=length))
    for i in range(nb_lines):
        print(f"{i+1} {generate_random_date()} {random_token()}",
              file=output)

étape 3: relire le fichier et le trier¶

mêmes modalités, récrivez la fonction suivante

from randomdate import sort_data

help(sort_data)

Help on function sort_data in module randomdate:

sort_data(input_filename, output_filename)
    Reads the input file, sorts them by date,
    and stores the result in the output file

    NOTE that as opposed to write_random_data, the
    parameters this time are FILENAMES - i.e. strings

    Parameters:
        input_filename: str
          the name of the input file
        output_filename: str
          the name of the output file

sort_data("randomdate.txt", "randomdate-sorted.txt")

!cat randomdate-sorted.txt

1 12/01/2024 kzfxkd
5 17/02/2024 jeslhi
2 28/02/2024 huyeghsx
3 28/02/2024 rfhcru
4 13/05/2024 bvhz

indices¶

on a le choix entre
- la fonction sorted() qui fabrique une copie triée
- le méthode list.sort() qui copie une liste en place
du coup si on essaye d’optimiser l’utilisation de la mémoire, on va choisir laquelle ?
ces fonctions pour trier acceptent un paramètre key=, regardez bien comment ça marche ce truc-là
pas la peine d’essayer de finasser et de lire le fichier ligne par ligne, on n’a pas d’autre choix que de lire l’entrée en entier avant de trier

solution¶

ouvrez-moi

randomdate.py


def sort_data(input_filename, output_filename):
    """
    Reads the input file, sorts them by date,
    and stores the result in the output file

    NOTE that as opposed to write_random_data, the
    parameters this time are FILENAMES - i.e. strings

    Parameters:
        input_filename: str
          the name of the input file
        output_filename: str
          the name of the output file
    """
    with open(input_filename) as feed, open(output_filename, 'w') as writer:
        # read all lines in memory
        lines = list(feed)
        # define the criteria used for sorting
        def the_date(line):
            date_str = line.split()[1]
            return DateTime.strptime(date_str, FORMAT)
        lines.sort(key=the_date)
        for line in lines:
            writer.write(line)

variantes¶

indépendantes les unes des autres:

si vous vous sentez confortable avec les classes, vous pouvez écrire une classe Sample pour chaque élément dans le fichier
écrivez un if __name__ == '__main__' pour rendre votre script exécutable et utilisez ArgumentParser pour paramétrer le nombre de lignes générées

random dates

random dates¶

étape 1: générer une date¶

indices¶

ne même pas essayer¶

le module random¶

le module datetime¶

solution¶

étape 2: écrire n échantillons dans un fichier¶

indices¶

solution¶

étape 3: relire le fichier et le trier¶

indices¶

solution¶

variantes¶

le module `random`¶

le module `datetime`¶