-
Notifications
You must be signed in to change notification settings - Fork 20
[deprecate in summer 2017] Back end Programming Guide for Lexos
- Overview
- What is This?
- Helpful Tips
- A General Introduction to Some Important Things
- Back-end Program Structure and Programming Standards
- This is the back-end programming guide for Lexos programmers.
- It would be helpful to read it before programming the back-end of Lexos, including tips and standards.
- This guide assumes you know basic web structure and Python (if you find this hard to read, stop and go here)
1. Read the constants.py and general_functions.py files in the helpers folder before you do anything real, so that you don't reinvent the wheel.
2. Play with the join and split functions before you deal with strings. Small changes in the use of these functions can make a significant difference in runtime efficiency.
For example use:
str = ''.join[list]Instead of:
str = ''
for element in list:
str += elementTo create a comma-separated-value (csv) file:
rows = [','.join[row] for row in matrix]
csv = '\n'.join[rows]3. Play with the filter map function, the * operator, and in-line for loops before you deal with Lists
For example use:
list = map(lambda element: element[:50], list)Instead of:
for i in range(len(list)):
list[i] = list[i][:50]When you initialize the list, use * rather than a for loop:
This is not used that often
For example use:
empty_list = [0] * Len_listInstead of:
emptyMatrix = []
for _ in LenMatrix:
emptyMatrix.append(0)For example use:
try:
dict[i] += 1
except KeyError:
dict[i] = 1Instead of:
if i in dict:
dict[i] += 1
else:
dict[i] = 1Use:
try:
os.makedir(path)
except:
passInstead of:
if os.path.isdir(path)
pass
else:
os.makedir(path)5. Using except to do complicated jobs; as a general rule, specify the error type (KeyError, ValueError, etc.) explicitly when using except.
(Note to self: our current code uses Python lists in a number of places where we could use the above data types.)
Use:
for element in npArray.flat():
print elementInstead of:
for row in pythonList:
for element in row:
print elementRead this tutorial for more info.
Use:
sortedList = sorted(ListofTuples, key=lambda tup: tup[n])Instead of:
def sortby(somelist, n):
nlist = [(x[n], x) for x in somelist]
nlist.sort()
return [val for (key, val) in nlist]
sortedList = sortby(ListofTuples, n)8. Read this for more tips.
- The Lexos back-end is built with Python and
Flask, a microframework. TheFlasklibrary in Python enables us to interact with web requests.
-
request: a variable that has web request information-
request.method: return methods of the request,postorgetin this case -
request.form: return a Dict containing the id of the request map to the value of the request -
request.form.getlist: return a Dict containing the id of the request map to the multiple values of the request (only if there is more than 1 value) -
request.file: return a Dict containing the id of the request map to the value of the request (only if the request value is a file) -
request.json: return a a json object (generally sent from an Ajax request)
-
-
session: a cookie that can be shared with the browser and the back-end code- This is used to cache users options and information, also sends the default information (which is in
constant.py) to the front-end - This variable works like a Dict
- It will not be renewed unless you call
session_function.init(); we use it to keep users' options on the Graphical User's Interface (GUI) - This variable can be accessed both in the front-end and the back-end, so we sometimes use it to send information to the front-end.
- This is used to cache users options and information, also sends the default information (which is in
In Lexos 2, most Lexos tools required the user to submit the form, which sent the form data to lexos.py and then triggered a page refresh with the back-end response. In Lexos 2.5, some features were transfered to Ajax functions, which sent data to lexos.py, which returned a response without a page refresh. For Lexos 3, we are continuing to transfer implementations of features to Ajax.
- Any files uploaded and/or created during a session are presently stored in
/tmp/Lexos/. In order to simplify the file monitoring process, you might want to clear this folder frequently. - Inside
/tmp/Lexos/(~\AppData\Local\Tempin Windows), there are workspace files (with the extension.lexos) and thesession folder(the folder with a random string as its name since each session is stored in its own folder). - A workspace file is generated whenever a user clicks
Download Workspaceat the top of the GUI. - Inside the
session folder, there are at most 3 files:-
filemanager.py: the file that contains the picled FileManager for the files in the current session, including files that have been cut into segments. In this way we can save and load (withutility.loadFileManagerandutility.saveFileManager). -
filecontents/: the folder containing all the user's uploaded files. -
analysis_results/: the folder containing all the results that a user needs to download (for example, a .csv document-term matrix, a Rolling Window graph, etc.).
-
-
This section introduces how the front-end and back-end interact.
-
- Create a file that the user wants to download in a path, and save the path in a variable (e.g.
SavePath). - Return
SavePathtolexos.py. - Use
return send_file(SavePath, attachment_filename=filename, as_attachment=True)to send a file to the user. - See the
topword,tokenizer, and/orrollingwindowfunctions inlexos.pyfor examples.
- Create a file that the user wants to download in a path, and save the path in a variable (e.g.
-
Render template
- First, produce the requested result in the back-end. For example, assume I have 2 variables I want to send to the front-end:
labelsandresults. - Send the variables to the front-end by
return render_template(front-end.html, labels=labels, result=result) - Then, in the template file (something like
front-end.html) there will be Jinja code that can make use of thelabelsandresultvariables. - The Jinja will complete (fill-in) the html template and send the page to the user.
- First, produce the requested result in the back-end. For example, assume I have 2 variables I want to send to the front-end:
-
Session
- As we noted before,
sessionis the variable that can be accessed both on the front-end and back-end. - The session variable can be called in the front-end as a Jinja variable.
- The session variable is ONLY used to cache a user's option(s). Do not use it to cache anything else.
- As we noted before,
- Note: the Lexos project is not completely following this guide at this time.
-
InTheMargins: draft pages for In the Margins -
templates/: the folder contain all the html files. -
static/: the folder contain all the javascript, images, and CSS that are needed in the GUI. -
TestSuite/: the folder containing a set of (benchmark) tests we use on Lexos. -
0_InstallGuide/: the folder containing installation directions if you are installing Lexos locally (rather than using the web-based app). -
requirement.txt: list of additional required packages for python referenced by the private installers -
gitignore: the file specifies intentionally untracked files to ignore -
LICENSE: a MIT license -
BackendProgrammingGuide.md: this file. (^_^) -
DevelopersGuide.md: some more front end development discussion
A description of the files that are used when working with Lexos software, as well as the file structure encountered
-
Description: the file that is used to connect the file with the front end
-
Calling map:
lexos.py -> managers/utility.py (used to save and load the filemanager and push info to the front-end)
-> managers/file_manager.py (mainly used to get labels)
-> managers/session_manager.py (used to load the default and cached options)
-> helpers/* (these files can be accessed throughout the entire project)
-
Programming workflow:
- load filemanager
- load variable (usually loading labels. If there are other variables to load, write a function to load them)
- split request
- 'GET' request
- apply the default setting to the
session - get result(optional, usually we don't need to get the result in a 'GET' request)
- render_template
- apply the default setting to the
- 'POST' request (sometimes we need to use
ifelseto handle 'POST', because we need to render different templates, for example seetopword())- get the calculation result
- turn result into display form (generally handles something like generating a preview of the result) or save the result in a file (for download) (optional)
- savefilemanager (only when the file manager is changed)
- cache session
- render_template or send_file
- 'GET' request
-
programming workflow example:
The following uses the Analysis tool topword() as an example: Download the file branch of prop-z test for class branch
# load filemanager
fileManager = managers.utility.loadFileManager()
# load variable (usually loading labels. If there is other variable need to be load, write a function to load them)
labels = fileManager.getActiveLabels()
# split request ('GET')
if request.method == 'GET':
# apply default setting to the `session`
if 'topwordoption' not in session:
session['topwordoption'] = constants.DEFAULT_TOPWORD_OPTIONS
if 'analyoption' not in session:
session['analyoption'] = constants.DEFAULT_ANALYZE_OPTIONS
# get result(optional, usually we don't need to get result in 'GET' request)
ClassdivisionMap = fileManager.getClassDivisionMap()[1:]
# error handlation
if ClassdivisionMap != [] and len(ClassdivisionMap[0]) == 1:
session['topwordoption']['testMethodType'] = 'pz'
session['topwordoption']['testInput'] = 'useAll'
# render_template
return render_template('topword.html', labels=labels, classmap=ClassdivisionMap, topwordsgenerated='class_div')
# split request ('POST')
if request.method == "POST":
# get result
result = utility.GenerateZTestTopWord(fileManager) # get the topword test result
# turn result into display form (generally handle something like generate preview of the result) or save the result in a file (for download) (optional)
path = utility.getTopWordCSV(result, 'pzClass')
# not saving filemanager
# cache session
session_manager.cacheAnalysisOption()
session_manager.cacheTopwordOptions()
# render_template or send_file
return send_file(path, attachment_filename=constants.TOPWORD_CSV_FILE_NAME, as_attachment=True)- special comment:
- in
lexos.pywe recommend you avoid including complicated statements; a general rule of thumb is that there should be no nestedlooporifstatements because this file is used to just send information to the front-end. If you need to use a complicated statement, add a function somewhere else.
- in
-
Description: there are 3 type of functions in this file:
- the function loads a request remotely, and turns them into the option that the processor can understand
- for example
getTopWordOption()
- for example
- the function that is used to combine all the information together to give a result that can be sent to the front-end
- for example
GenerateZTestTopWord(filemanager)
- for example
- other functions:
-
saveFileManager(),loadFileManager()
-
- the function loads a request remotely, and turns them into the option that the processor can understand
-
Calling map:
utility.py -> file_manager.py (used to get file information. Be cautious when changing lexos_file information)
-> session_manager.py (used to get the session_folder only)
-> processor/* (used to do calculations)
-> helpers/* (these files can be accessed throughout the entire project)
-
Programming workflow:
- get remote option function
- none
- other function
- none
- the function that is used to combine all the information together to give a result that can send to the front-end
0. not none! (surprise!)
- get remote option: either call the corresponding get remote option function or write it inside this function
- load the local content from
file_manager.py - convert the data into the data structure that the processor can understand (optional)
- send the data to the processor and get result(s)
- combine other information together with the data structure (optional, for example file names, labels and so on)
- get remote option function
-
programming workflow example
this code is from GenerateZTestTopWord(filemanager) test for class branch
# get remote option: either call the corresponding get remote option function or write it inside this function (call get remote function)
testbyClass, option, Low, High = getTopWordOption()
# load the local content from `file_manager.py`
ngramSize, useWordTokens, useFreq, useTfidf, normOption, greyWord, showDeleted, onlyCharGramsWithinWords, MFW, culling = filemanager.getMatrixOptions()
countMatrix = filemanager.getMatrix(useWordTokens=useWordTokens, useTfidf=False, normOption=normOption,
onlyCharGramsWithinWords=onlyCharGramsWithinWords, ngramSize=ngramSize,
useFreq=False, greyWord=greyWord, showGreyWord=showDeleted, MFW=MFW,
cull=culling)
# convert the data into the data structure that processor can understand (optional)
WordLists = matrixtodict(countMatrix)
# send the data to the processor and get result
analysisResult = testall(WordLists, option=option, Low=Low, High=High)
# combine other information together with the data structure (optional)
# stick the temp label in front of the data
humanResult = [[countMatrix[i + 1][0], analysisResult[i]] for i in range(len(analysisResult))]
# return
return humanResult- special comment:
- in this file we should only handle data structure transformation, not calculations (calculation is handled in
/processors/*) - if a function doesn't need to get
requestand doesn't need to callfileManager, this function does not belong in this file. - if a function is doing intense math and calculation, this function does not belong in this file. (calculation is handled in
/processors/*)
- in this file we should only handle data structure transformation, not calculations (calculation is handled in
-
Description: the file that is used to edit, save, load, and initiate a session.
-
Calling map:
session_manager.py -> helpers/* (these files can be accessed throughout the whole project)
-
programming workflow:
- cache functions:
- cache functions have 4 types of options that we need to cache:
- box (check box)
- input (radio button and input box)
- list (multiple requests with the same name, for example, in the word cloud select document section all requests have the name:
'segmentlist') - files (this is complicated; for now, we only cache filenames, see
cacheMultiCloudOptions()for more information)
- cache functions have 4 types of options that we need to cache:
- other functions
- these functions are (pretty) stable; do not add or change them unless absolutely necessary
- load default function:
- let the session load the default options on a page when you first go into that page
- Note: THIS DOES NOT EXISTS IN THE PROJECT YET
- cache functions:
-
programming workflow example
- for example you need to cache the option for
lalala, because we just decide to name our new featurelalala, and everyone loved this name :)
- for example you need to cache the option for
helpers/constant.py:
# these are the names of the requests that you want to cache:
LALALAINPUT = ('input1', 'input2')
LALALALIST = ('list1',) # make sure you have the ending ',' when you only have one element
LALALAFILE = ('file1', 'file2')
LALALABOX = ('box1', 'box2', 'box3', 'box4', 'box5', 'box6', 'god!-we-really-have-lot-of-boxes')
# those are the default options that will show on the page; you should add the defualt even if you are not caching it
# input and file are mapped to a string
# boxes map to a boolean value to indicate whether that is checked
# lists map to a list
DEFAULT_LALALA_OPTION = {'input1': 'the-default-of-input1', 'input2': 'the-default-of-input2',
'box1': True, 'box2': True, 'box3': True, 'box4': True, 'box5': False, 'box6': False, 'god!-we-really-have-lot-of-boxes': False
,'list1': [], 'file1': '', 'file2': '',
'this-is-the-option-that-I-do-not-want-to-cache': 'lalalahahaha', 'this-is-another-option-that-I-do-not-want-to-cache': False}managers/session_manager.py:
# caching the input
for input in constants.MULTICLOUDINPUTS:
session['lalalaoptions'][input] = (
request.form[input] if input in request.form else constants.DEFUALT_LALALA_OPTION[input])
# caching the list
for list in constants.CLOUDLIST:
session['lalalaoption'][list] = request.form.getlist(list)
# caching check boxs
for box in constants.RWBOXES:
session['lalalaoptions'][box] = (box in request.form)
# caching the filename
for file in constants.MULTICLOUDFILES:
filePointer = (request.files[file] if file in request.files else constants.DEFUALT_LALALA_OPTION[file])
topicstring = str(filePointer)
topicstring = re.search(r"'(.*?)'", topicstring)
filename = topicstring.group(1)
if filename != '':
session['lalalaoptions'][file] = filenametemplate/lalala.html:
<!-- inputs radio button -->
<label>input1 option1<input type="radio" name="input1" value="option1" {{ 'checked' if session['lalalaoptions']['input1'] == 'option1' }}/></label>
<!-- inputs input box -->
<input type="number" name="input2" id="max_iter" min="1" step="1" value="{{ session['lalalaoptions']['input2'] }}" />
<!-- check box -->
<label> box1 <input type="checkbox" name="box1" {{ 'checked' if session['lalalaoptions']["box1"] }}/> </label>
<!-- list -->
{% for fileID, label in labels.items() %}
<label>{{label}}
<input type="checkbox" name="list1" class="lalalalist" {{ 'checked' if fileID|unicode in session['lalalaoptions']['list1']}} id="{{fileID}}_selector" value="{{fileID}}">
</label>
{%- endfor %}
<!-- file (name) -->
<input type="file" id="lalalafile1" name="file1"/>
<div class="lalalafileclass" id="lalalafileid" name="">{{ session['lalalaoptions']['file1']}}</div>- special comment
- do not add any strings or numbers in the caching function; put all of them in constant.py (as shown above)
- for caching functions, you don't usually get all 4 type of options, just write what you need.
-
description
-
file_manager.pydeal with the local file accessing and editing -
lexos_file.pyis a class that represents a file inside the Lexos program. It has class label, active or not, and other properties
-
-
calling map
file_manager.py -> lexos_file.py
-> session_managers.py (for session_folder only)
-> helpers/*
lexos_file.py -> session_managers.py (for session_folder only)
-> helpers/*
- special comment
- these two files are functioning in a (relatively) stable fashion and these two classes can handle any thing we need on the file side.
- do not edit these two files unless you have to.
- do not access the method and property of
LexosFileoutside offile_manager.py - the processor should not be accessed in
lexos_file.py(for now, cut and scrub)
- special comment
- all the filenames and directories should be constant
- all the numbers should be in constant
- all the caching and default options in the session should be in constant (see
mananagers/session_manger.pyfor more info)
- special comment
- this includes some of the more intense Python and "math land"
- comment the code as you are write
- PLEASE do not write ugly code here, think before you begin; re-read when you finish.
- User's Guide
- Developer's Guide
- Lexos Bootcamp
- Git Basic
- Git on Pycharm
- Python Tutorial
- Python Coding Style Guide
- Back End Developer's Guide
- The Lexos 4 Frontend
- Javascript Library Maintenance
- In the Margins Content Guide
- Lexos Server Deployment Guide
- How to Install scikit-bio on Windows
- Ajax and jQuery
- Wiki Archiving Guide
- Unit Testing Guide
- Repo Administration Guide
- Proposals