pip install FileSampler
get_line() method returns a string which represents one row.get_lines() methods returns a list of strings which represents multiple rows.get_random_lines() method returns a list of stirngs that represents multple rowsfrom FileSampler import TextSampler
sampler_text = TextSampler('c:\file path\text_file.txt')
# single line
string_line = sampler_text.get_a_line(int_line_number)
print(string_line)
# multiple lines
list_lines = sampler_text.get_lines(list_line_numbers)
for line in list_lines:
print(line)
# random lines
list_random_lines = sampler_text.get_random_lines(int_number_of_random_lines)
for line in list_random_lines:
print(line)
m_string_endline_character- self-explanatory (default is endline character\n)m_bool_estimate- if set toTrue, blank lines in the file will not be read or indexed (default isFalse)
number_of_lines-> type: int; returns the number of lines in the fileestimate_mode-> type: bool; flag if the class counted all the line lenghts in the
file or estimated the line length based on a sample
header: returns the header of the csv file if there is one in the form of a tuple of stringshas_header: a boolean flag which returns True or False if a header exists
from FileSampler import CsvSampler
sampler_csv = CsvSampler('~/myfile.csv')
# single line
series_line = sampler_csv.get_a_csv_line(int_line_number)
# returns a pandas Series with the; the index is the header of it exists
# multiple lines
df_lines = sampler_csv.get_csv_lines(list_line_numbers)
for string_column in df_lines:
for int_line in range(0, len(df_lines)):
print(df_lines[string_column].iloc[int_line])
# returns a pandas DataFrame where the columns are the file headers; the above example will
# print each line of each column in the dataframe
# random lines
df_random_lines = sampler_csv.get_csv_random_lines(int_number_of_random_lines)
for int_line in range(0, len(df_random_lines)):
print(df_random_lines.iloc[int_line])
# returns a pandas DataFrame whre the columns are the header of it exists
# the above example prints each full line of the csv file
m_bool_ignore_bad_lines- if set toTrue, lines that do not fit the csv file format will be ignored (default isFalse)string_values_delimiter- character used by the csv to separate values within a line (default is,)string_quotechar- character used by the csv to surround values that contain the value delimiting character (default is")m_bool_has_header- if set toTrue, the first line of the csv file will be used at the header / column names for the DataFrame (default isTrue)