Overview
After each experiment on the arena, the data that was collected will be processed according to the settings you created (see processing setup). What follows is a step by step description of what the processing does - how it splits up the raw data, aligns data and checks the quality, breaks it up into datasets, and all assumptions that are made along the way. If you see anomalies in your data and need to confirm the data processing is doing what you expect it to, this is the place to look.
If you’d like to follow along in the code, open the file G4_Display_Tools\G4_Data_Analysis\new_data_processing\process_data.m
.
Data Processing Step by Step
Establishing variables
The first 150 lines of code are setting up variables that will be used throughout the processing. Variables are pulled from the processing settings file and saved for use. Starting around line 47, the code checks for the presence of certain variables in the processing settings, and if they’re not present it sets a default value. This is so that an older settings file could be used and still work with the processing even if it was created before those variables were added. Comments in the code describe what each variable does. Around line 125 we load the TDMS Log file with all the raw data, and after that we get the index of each channel so we know which array index in the log is which data type. Finally, we load the metadata file so we can access information about if and how many trials were re-run at the end of the experiment. Around line 160 starts the actual processing of data which is split into a sequence of functions. Next we will go through each function one by one to see what they all do. All functions are separate .m files that can be found in G4_Display_Tools\G4_Data_Analysis\support\data_processing_modules
.
Get experiment order - Line 163
get_exp_order.m
is a function that loads the file exp_order.mat
, determines the number of conditions and number of repetitions in the experiment, and gets the total number of trials expected in the experiment. It returns four variables. exp_order
is an array of size number of conditions x number of repetitions. It contains the condition numbers so we know it what order the conditions actually displayed. num_conds
is the number of conditions in the experiment. num_reps
is the number of repetitions in the experiment. total_exp_trials
is the total number of trials (including pre, inter, and post-trials) in the experiment.
Get Position Functions - Line 169
get_position_functions.m
is a function that loads the experiment protocol and saves the data of each position function so it can be easily accessed and used later when checking the alignment and quality of the frame position data. For each condition in the experiment, it gets the mode of that condition and, if applicable, loads the associated position function. If the condition does not have a position function, it leaves a NaN in place of the position function data. This function returns two variables:
position_functions
is a cell array with an element for each condition in the experiment. Each element contains the position function data for that condition, or a NaN if that condition has no position function. exp
is a struct containing the loaded protocol file, saved in case we need to use any other information from the experiment protocol later on.
Get Start and Stop Times - Line 176
get_start_stop_times.m
is a function that finds the indices in the raw data of the start and stop commands, whether those be the Start-Display
command or the combined command.
The Log structure containing the raw data has a field called Log.Commands.Name. In line 4 of this function we find the index of each command name that matches the command_string
variable (which may be ‘Start-Display’ or the combined command) and save in the variable start_idx
. We then plug these indices into the Log.Commands.Time array to get the timestamps at which each start command was received. This returns the start_times
array.
Since the Stop-Display
command is not used commonly to end a trial, the stop_idx
and stop_times
arrays are found exactly the same way, except that the first index and timestamp found is manually removed. So the stopping point of one trial is the point at which the start command is received for the next trial. This will inevitably lead to some extra noise at the end of each trial which we will deal with later. This also means that there’s no stop point found for the last trial of the experiment. Generally the last trial is ended with the ‘Stop-Display’ command so at line 11 we search for that command, find the index, and add the associated timestamp to the end of the stop_times
array. If the ‘Stop-Display’ command cannot be found, we use the ‘Stop-Log’ command instead. If neither can be found, we simply assume the last data point in the data is the stop time of the last trial. A message is displayed to the user pointing out that the stop time could not be found so the last trial’s data may be longer than expected.
At the end of this function we check one more thing - the manual_first_start
variable. If this is set to 1, then the first trial was not started by the start command, but instead manually by the user. If this is the case, our method above would not have found the start point of the first trial. Therefore we assume the start time of the first trial is the first data point of the data, and add that associated timestamp to the beginning of the start_times
array.
This function returns four variables, all of which have been discussed: start_idx
, stop_idx
, start_times
, and stop_times
.
Separating Original Conditions from Re-runs - Line 188
separate_originals_from_reruns.m
is a function found on line 188. It determines what data is from the initial protocol and what data comes from trials being re-run at the end of the experiment because the first attempt was marked as bad. This is only relevant if the experiment was run using the streaming protocol and the Conductor was set to re-run bad conditions.
First it calculates the total number of original trials and the total number of rerun trials. It then compares the sum of these to the length of the start_times
array created in the last function. If the number of calculated total trials is greater than the length of start_times
, it’s assumed that the experiment was ended early and processing continues, though a warning is provided to the user. It then calculates the number of extra trials that were not run due to the experiment ending early. If the total number of calculated trials is less than the length of start_times
, the processing also attempts to continue but a warning is provided to the user and an error will likely occur later on, since there’s no acceptable reason for there to have been more start commands than the total number of trials and reruns put together. By line 25 there are two more variables that have been set that will be used later on, ended_early
which is either 1 if the experiment was ended early or 0 if it was not, and num_extra_trials
which is the number of trials that ended up being missed if an experiment was ended early.
Next starting at line 28, if there were more trials run than the expected number from the protocol (so there were reruns at the end), we split the start_times
array into origin_start_times
which is the start times of all the original trials, and rerun_start_times
which are all the start times for the rerun trials at the end. Please note that, in the case that no trials were rerun at the end, origin_start_times
is simply the same as start_times
and rerun_start_times
is an empty array.
At the end, starting at line 78, these arrays are all saved into a struct called times
for ease of passing them in and out of functions. The function returns three variables, all of which have been discussed: times
, ended_early
, and num_extra_trials
.
Get the modes and pattern IDS of conditions in order - Line 192
The function get_modeID_order.m
works similarly to get_exp_order.m
except in this one we are getting the pattern and mode IDs. It again gets the index of each time the ‘Set Pattern ID’ command was recieved from the Log.Commands.Name array, and gets the value that was received with each command by plugging those indices into the Log.Commands.Data array. It does the same with the ‘Set Control Mode’ command. You end up with two arrays that are returned from the function:
modeID_order
is an array with the mode of each trial in the order it was received. PatternID_order
is an array of the pattern IDs of each trial in the order it as received.
Further differentiate the start and stop times of trials - Line 205
get_trial_startStop.m
is a function that determines the start and stop times for different trial types (pre, inter, post, conditions, etc). In addition, if any conditions were bad and then rerun at the end of the experiment, it replaces the start times of the bad conditions with those of the rerun conditions. This way, when we use the start times later to pull the data for those trials, we will pull the good run and not the bad one.
First this function pulls the start and stop times from the times
struct created earlier. It gets the number of trials run (conditions only) and adjusts the trial_options
variable if needed. This variable indicates the presence of a pre-trial, inter-trial, and/or post-trial. For example, if the experiment was ended early and was intended to have a post-trial, that needs to be changed since ending the experiment early means the post-trial was not run. It gets the index of the first condition to be run and the last condition to be run in the origin_start_times
array. At line 39 it creates a new variable, trial_start_times
which includes only the start times of conditions, not of inter-trials or other types of trials. It does the same for stop times and modes. It also finds the start times in between each condition, or in other words, the intertrial start times, and uses this data to get the modes and durations of each intertrial. It does the same thing for reruns, so we end up with an array for rerun trial start times and rerun intertrial start times.
Starting at line 96, the code goes through each rerun trial, get the condition and repetition number that was being rerun, and then finds that in the exp_order
array to determine which original trial was being repeated. It then goes into trial_start_times
(and stop times) and replaces the start time of that bad trial with the start time of the rerun trial.
All arrays created (start/stop times for reruns, intertrials, and conditions) are then saved to the times struct, which is returned from the function as well as several of the other variables discussed.
Line 209 in processing
Here we have a bit of code not contained in a function. This simply calculates the number of conditions short the data is if the experiment was ended early. Later this will let us use the exp_order
variable to determine exactly which conditions were skipped by ending early.
Organize durations and modes by condition - Line 213
The function organize_durations_modes.m
is found on line 213. This function calculates the measured duration of each condition, the time gaps between conditions (expected to be the length of the intertrial), and organizes start times by condition and repetition.
At line 27 it adjusts the expected number of trials based on if the experiment was ended early. Then for each trial, it finds condition and repetition number, calculates the duration and saves it to cond_dur
, gets the mode and saves it to cond_modes
, and calculates the gap between it and the next condition, saving it to the variable cond_gaps
. In addition, it creates a variable called cond_start_times
which reorganizes the start times from a single long array to an array shaped by condition and repetition number. So all of these arrays are of a size number of conditions x number of reps, so data for any given condition/rep pair is easy to find later in these arrays. The four arrays just discussed are returned from the function.
Create empty timeseries arrays - Line 218
The function create_ts_arrays.m
calculates the size and shape of the timeseries data and returns arrays full of NaNs which will later have condition and intertrial timeseries data assigned to them. It first finds the condition with the longest duration (even though most conditions in theory have the same duration, the measured duration will vary slightly). It then sets the data_period
which is calculated from the variable data_rate
provided by the user. The data period is the time between one data point and the next, generally 1 ms for behavioral data and 2 ms for frame position data. It then establishes the variable ts_time
. This variable later becomes the timestamps
variable and is the x-axis against which timeseries data will be plotted. If the user has provided a value to the variable pre_dur
in the settings, then the ts_time
variable is defined as an array of numbers from -pre_dur-data_period to the longest duration plus the post_dur plus the data_period, with steps betwen defined by the data period. This is done on line 9. ts_data
then is defined on line 10 as an array of size [number of timeseries data types x number of conditions x number of repetitions x length of ts_time
]. If intertrials are included in the experiment, then the same thing is done for intertrial data.
The four variables returned then are the ts_time
, ts_data
, inter_ts_time
, and inter_ts_data
. Note that these variables still only contain NaNs at this point.
Get unaligned timeseries data organized by datatype, condition, and repetition - Line 224
The function get_unaligned_data.m
takes in many of the arrays we’ve generated up to this point and uses them to split up the behavioral and frame position data from the Log into an array organized by datatype, condition, and repetition. It is unaligned because the start of each trial is defined by when the ‘Start-Display’ command was received. It has not yet been cross correlated or aligned to when the pattern started moving. Data is saved to the array unaligned_ts_data
which is 4 dimensional - channel number x condition x repetition x duration of longest condition.
For each trial, whe use the exp_order
variable to get the condition number and calculate the repetition number. The variable num_ADC_chans
tells us how many channels there are, so for each channel we find the start index and stop index of that trial by finding the first element in Log.ADC.Time which is greater than or equal to the timestamp given in trial_start_times
and less than or equal to the timestamp given in trial_stop_times
. We then, at line 22, use those indices to get the data and time data for that particular trial.
For formatting purposes, we need each element in the timeseries data array to be the same length, but each condition will vary slightly in the length of the data. So lines 24-31 check for the difference between the length of the data we just pulled and the length of the array we defined earlier. If the data we just pulled is shorter, we add NaNs to the end to fill in the space. If it’s longer, we remove data at the end to shorten it. The latter case should not really ever happen, since the array was defined based on the longest condition.
Starting at line 39 we do the same process but for the frame position data. Find the start and stop indices, use them to get the data from Log.Frames.Position. At line 51 we start one extra step which is expanding the frame position data. Since generally the frame position data is collected at a data rate of every 2 ms, while the behavioral data is collected at a rate of every 1 ms, the frame position data will only be half the length of the behavioral data. So we create an array called full_fr_data
which is twice the length of the frame data we just got out of the Log. Then we go through it and fill in the gaps between each data point with the data point previous to it. So for example, frame data of [1 1 2 2 3 3] becomes [1 NaN 1 Nan 2 NaN 2 NaN 3 NaN 3 NaN] and then NaNs get replaced with the number preceding it, so this becomes [1 1 1 1 2 2 2 2 3 3 3 3]. Each of these representing the frame position over 12 milliseconds. This expanded data then is saved to the unaligned_ts_data
array.
Next, starting at line 70, we get the unaligned intertrial data, assuming the experiment included intertrials. It is the exact same process as above, using the intertrial start and stop timestamps to pull the correct data from the Log. The array unaligned_inter_data
is a slightly different size because intertrials do not have repetition or condition numbers, they are simply numbered 1 through the number of intertrials run in the experiment. So this array is 3 dimensions instead of 4 - channel x intertrial number x length of intertrial data. Frame position data for intertrials is expanded using the same method as before.
This function returns two arrays, unaligned_ts_data
and unaligned_inter_data
.
Check for conditions with the wrong duration - Line 231
check_condition_durations.m
is a function that searches for any conditions that had a duration significantly longer or shorter than expected by comparing the cond_dur
and intertrial_durs
arrays created earlier to the expected durations stored in the protocol.
We get the expected duration of each condition directly from the loaded experiment protocol, and then go through each element in cond_dur
and compare the found duration with the expected duration. If the percent difference between them is greater than the limit set by the user in the variable duration_diff_limit
, then we add that condition and rep pair to the bad_conds
variable. The same is done for intertrials.
Two variables are returned from this function, bad_duration_conds
and bad_duration_intertrials
which may or may not be empty. bad_duration_conds
contains two element arrays that look like [repeptition condition] where as the bad_duration_intertrials
is just a one dimensional array of intertrial numbers.
Check for flat conditions if relevant - Line 233
If the experiment does not contain any intentionally static conditions, then the function check_flat_conditions.m
looks for any conditions where the frame position data is flat, meaning the screen did not move at all.
It cycles through the unaligned_ts_data
array and looks at the frame position data for each. It goes through each data point in the frame position data, and if there is never a difference between one and the next, then the data is completely flat and that repetition condition pair are added to the bad_slope_conds
variable, which is returned.
Find conditions where the fly wasn’t flying if relevant - Line 238
Assuming this is a flying experiment, and the variable remove_nonflying_trials
has been set to 1, the function find_bad_wbf_trials.m
runs and searches the unaligned data for flies where the wing beat frequency falls out of range too much.
The variable F_chan_idx
tells us which channel contains the wing beat frequency data, so we use that to pull the right data from unaligned_ts_data
. We go through the wing beat frequency data for each condition and repetition comparing each data point to the wbf range provided by the user. For every data point that falls outside of that range, we add it to the bad_indices
variable. At line 34 we compare the percentage of bad data points to the cutoff determined by the user. If the percentage of data points outside of range is too high, we then check to see if these bad data points are clustered at the end of the trial at line 39. If a larger percentage of the bad data points are clustered at the end than set by the wbf_end_percent
limit, then we keep the trial, but if the portion of them clustered at the end falls below that limit, then that repetition and condition pair are added to the variable bad_trials
.
This function returns the variables bad_WBF_conds
and wbf_data
which contains all the wing beat frequency data for easy use later.
Consolidate bad conditions - Line 248
At this point in the processing, we’ve done all the quality checks that can be done before alignment. There will be more quality checks after alignment, but because cross correlation and alignment take the largest chunk of time when processing data, we want to remove as much bad data as possible before doing the cross correlation. This way we don’t waste time aligning data we already know is bad. Therefore, we next consolidate the bad conditions we’ve collected so far, and remove them, before then moving on to the cross correlation. This does mean that after cross correlation and alignment steps, we may find more bad conditions and will have to repeat these steps.
The function consolidate_bad_conds.m
takes in the various arrays of bad conditions produced by the last few sections, combines them into one array of bad conditions, and removes any duplicates. The function returns three variables, bad_conds
, bad_reps
, and bad_intertrials
. bad_conds
and bad_reps
are each a one dimensional array of condition and repetition numbers of bad trials. They line up such that bad_conds(1)
and bad_reps(1)
are, together, the condition repetition pair of the first bad trial. bad_intertrials
is a one dimensional list of bad intertrials.
Remove bad trial data - Line 255
The function remove_bad_conditions.m
takes in the dataset you want data removed from, as well as the list of bad conditions and repetitions. It sets the data for those conditions and repetitions to NaNs. It only removes condition data, not data for intertrials.
Cross correlation of position data - Line 265
The function position_cross_corr.m
cross correlates the collected frame position data with the expected position function data and gets a lag number indicating how the data should be shifted to best line up with the expected position function.
It goes through each condition in unaligned_ts_data
and checks the mode first. If the condition is in a mode that does not use a position function, then no cross correlation can be done and that condition is skipped. The matlab function xcorr
is used to get the lag. A few different numbers are saved. shift_numbers
is an array of size number conditions x number repetitions that gives the lag for that condition and repetition. We also create avg_shift_numbers
, which contains the average lag, and percent_off_zero
which gives the percentage by which the data needs to be shifted.
We compare the percentage off zero to the correlation tolerance provided by the user, and if it’s too high then that condition and repetition pair is saved in the array conds_outside_corr_tolerance
. If the data to be cross correlated is all NaNs (meaning it has been removed because the data was bad), then these arrays get NaN values for that condition and rep pair. These arrays are saved in a struct called alignment_data
which is returned by the function.
Compile bad conditions from the cross correlation - Line 271
Though we have collected the condition/repetition pairs that fell outside of the correlation tolerance, they’re formatted to be easily viewed by the user in the processed data, not to easily be removed by the function that removes bad data. So in compile_bad_xcorr_conds.m
the bad conditions are reformatted into bad_corr_conds
and bad_corr_reps
.
Remove bad conditions - Line 276
The same function from line 255, used here to remove any conditions that fell outside of the cross correlation tolerance.
Shift data by its cross correlation lag - Line 281
The function shift_xcorrelated_data.m
actually shifts the data by the lag values found by the cross correlation. It saves the shifted data in an array called shifted_ts_data
which is the same size as unaligned_ts_data
. We use matlab’s circshift function in order to do this shift. The code goes through each channel of each condition/repetition pair. It gets the unshifted data from the unaligned_ts_data
array.
Assuming the data is not all NaNs, meaning it’s already been removed, we then check the lag value for that condition/repetition. If it’s greater than zero, that means the data needs to be shifted to the right. We use circshift to get shifted_data
. Circshift shifts in a circular pattern, meaning if you shift data to the right, the data at the end of the array is moved to fill the space left at the beginning of the array. We don’t want this data there, so after shifting, the datapoints from index 1 to the lag value are set to NaNs.
If the lag is less than zero, that means the data needs to shift to the left. We use circshift again, and in this case, the data from the beginnig of the array will be moved to the end. We don’t want that data at the end, we want it gone, so we then set the data points from the end minus the lag to the end of the array to NaNs.
Filling in the gaps with NaNs after shifting the data ensures that the array itself will remain the same size, since the lag value may be different for each trial.
The function returns shifted_ts_data
, an array of all the timeseries data after it has been shifted according to the cross correlation lag.
Get the pattern movement times - Line 291
get_pattern_move_times.m
is a function that goes through the shifted frame position data to determine at what point the pattern on the screens actually started moving. We will later align our data to “start” at this point.
Please note that there are some old functions in the folder which have similar names to this one. These were older methods for finding the pattern movement time but are not currently in use, so please make sure you’re looking at get_pattern_move_times.m
. These extra functions may be removed in future releases.
For each condition/repetition pair, this function looks at the loaded position function data and finds the first movement of the pattern. For example, a position function may stay static for some amount of time and then start moving. We find the first movement and save it as an array, such as [1 2] if the first change in the frame position is from frame 1 to frame 2. It then looks at the frame position data in the shifted_ts_data
array and searches from the beginning of the array for the first movement from frame 1 to 2 (or whatever movement it found in the position function). The index of first motion is the index of the changed value (the 2, rather than the 1). This index is saved in the array pattern_movement_times
which is of the size number of conditions x number of repetitions.
In the case that a change in frame position matching the position function is never found, a warning is put out to the user alerting them that the movement time could not be found for that condition/repetition pair, and it is added to an array called bad_conds_movement
. This means the frame position data never follows the intended position function but was not caught in other quality checks.
In the case that there is no position function for that condition (due to it being a different mode, for example), then we search the frame position data for the first time it changes frames without comparing to any position function data. In this case, we skip the first 11 data points in the frame position data because there tends to be noise at the very beginning where the frame position can jump around. The first time after the first 11 datapoints that the frame position changes will be marked as the pattern’s movement time.
This function returns several variables. pattern_movement_times
, pos_func_movement_times
, bad_conds_movement
, and bad_reps_movement
. Pos_func_movement_times
are the indices in the position function where movement happened, as compared to the pattern_movement_times
which contains the indices in the collected frame position data where movement happened. We save both just in case we want to use them for any kind of quality analysis in the future.
Get intertrial movement times - Line 293
For each intertrial, this function finds the index of the first datapoint where the frame position changes. It returns one variable, intertrial_move_times
, which is a one dimensional array giving one index value per intertrial.
Remove bad movement conditions - Line 297
The remove_bad_conditions.m
function is used one more time to remove any bad conditions found when getting the pattern movement times.
Align data to pattern movement time - Line 302
shift_data_to_movement.m
is the alignment function that actually shifts the data so each condition’s data starts at the point that the pattern started moving.
It uses the same general method as the function that shifted data according to its cross correlation lags. The shift value now is the movement time (which is not a timestamp, but the index at which movement happens). Like before, it uses circshift and then removes the data that was shifted from the front of the array to the back. In this case, we are never really going to be shifting right, but only to the left.
The second half of the function shifts the intertrial data based on its movement time. It first checks if the percentage to be shifted is greater than the limit set by the user. If so, the intertrial is added to a list of bad intertrials and the data is set to NaNs. Otherwise, the data is shifted using the same method as before.
This function returns three variables. ts_data
is the final, aligned timeseries array. inter_ts_data
is the aligned intertrial timeseries data, and bad_movement_intertrials
which is the list of bad intertrials, if any.
Re-formatting all bad conditions for the text file report - Line 307
Lines 307 - 330 are spent reformatting the bad trials. This is only done so that older code that generates the text report can be re-used. It was simpler than re-writing the text report data. But that’s it, there’s no fundamental reason it needs to be formatted this way.
Bad conditions are assigned to an array named for the reason they were tossed out. So we end up with duration_conds
, slope_conds
, xcorr_conds
, posfunc_conds
, wbf_conds
, duration_intertrials
, and movement_intertrials
. Each of these are a list of repetition/condition pairs, or intertrial numbers. These are all then saved to a struct called bad_trials_summary for ease of passing the data in and out of functions.
Preparing the report of bad conditions - Line 332
After the reformatting is done, create_bad_conditions_report.m
is called. This function creates a cell array called Summary
where each element is a line of text that will later be printed in a txt file. For each bad condition, an element is added to Summary
with the condition and repetition number and a code telling the user why that trial was tossed. It then does the same thing with the intertrials.
This returns the summary
variable which will be used at the end of processing when everything is saved to produce a text file reporting on all conditions and intertrials that were removed from the data.
Add buffer data back to beginning of timeseries - Line 339
If the variable pre_dur
is set to something other than 0, then a certain amount of data needs to be tacked back on to the front of the timeseries data. The data will be plotted so that x=0 is the point at which the pattern started moving, and this data added back on to the front will align with x = -pre_dur:0.
add_pre_dur.m
is the function that does this process. The variable ts_time
, which holds the timestamps against which timeseries data are plotted, has already been created with the pre_dur
value in mind, meaning it starts at -pre_dur
and goes through the length of the longest condition in steps of 1 ms. We find the index at which ts_time=0
. This index minus 1 will give us the number of data points (or number of milliseconds) that needs to be added to the front of the timeseries data.
Then for each channel, condition, and repetition, we get the previously found time at which movement occurred. We then access the raw unaligned_data
variable, find the index at which movement occurred, and then copy the data preceding that index back to whatever amount of time pre_dur
indicates. Assuming there is enough data, that data is pulled and saved in a variable called data_to_add
. If not, say movement happened sooner than the pre_dur
amount of time, we take what data is there and then tack NaNs on to the front of it so it is still the required lenth. We then go thorugh the ts_data
array and add this data to the front. Again, to maintain each element of the array having the same length, we use circshift to shift the data to the right and then replace the first pre_dur
number of milliseconds with the data_to_add
.
This function returns ts_data
after adding the data indicated by the pre_dur
variable.
Get normalization parameters - Line 354
The function get_max_process_normalization.m
is just a few lines of code which gets the max values (based on the percentile provided by the user) from the timeseries data which will then be used to normalize the data.
Normalize timseries data - Line 357
The function normalize_ts_data.m
takes the max values just calculated and normalizes the timeseries data. First it gets the maximum value from the list of max values for the left wing channela nd the right wing channel. It then establishes the datatypes to normalize, which are simply the left and right wing channels. Then for each of these, each data point in the timeseries data is divided by the max value.
This function returns the normalized timeseries data array and the max value used.
Calculating data sets - Lines 365-398
The next set of code is not divided into functions but executed here in the process_data.m
function. First, at line 365 and 366, we calculate the Left minus Right (LmR) and Left plus Right (LpR) data sets by subtracting or adding the left and right channel data.
Next, at lines 370 and 371 we do the same, but with the normalized data.
The code commented out from lines 374-389 is old and will likely be removed in future releases.
Starting at line 392 we create some more datasets which don’t contain any new information but are likely to be useful to the user. These are:
ts_avg_reps
which is the timeseries data averaged over the repetitions. LmR_avg_over_reps
which is specifically the LmR data averaged over repetitions. LpR_avg_over_reps
which is specifically the LpR data averaged over repetitions. ts_avg_reps_norm
which is the same as ts_avg_reps
but using normalized data. LmR_avg_reps_norm
is LmR_avg_over_reps
using normalized data LpR_avg_reps_norm
is LpR_avg_over_reps
using normalized data
Calculating flipped and averaged LmR data - Line 405
On line 405 you’ll find the function get_falmr.m
. Assuming the faLmR setting is turned on, this will run and return both normalized and unnormalized faLmR data.
If the condition_pairs
setting, which tells the software which trials to flip and average together, is empty then by default it pairs conditions with the one that follows it (condtions 1 and 2, 3 and 4, 5 and 6, etc). Lines 4-14 create the default pairs assuming none were provided. Lines 16 and 17 establish variables to hold the faLmR data. Lines 19-30 actually generate hte faLmR data.
This function returns two variables, faLmR_data
and faLmR_data_norm
.
Lines 406-407 in the main processing function then use these to create faLmR_avg_over_reps
and faLmR_avg_reps_norm
which simply takes the mean of each output over all the repetitions.
Calculate data for tuning curves - Line 416
Lines 416-417 create the tuning curve datasets, which are simply the ts_data
averaged over the 4th dimension (the dimension with the actual data).
Calculating histograms of pattern position data - Line 422
Assuming the user provided datatypes for which to create histograms, the function calculate_histograms.m
will run and generate the histogram data.
Line 6 gets the max value of the frame position data. We then manipulate the arrays a bit and get an array of 1:max position value. Line 15 then gives us the indices at which the data is equal to each value in that array. In line 16, we sum them all up and save them to hist_data
which ends up being a total sum for each value from 1:max position value.
This function returns one variable, hist_data
, which is a four dimensional array of size [num datatypes, num conditions, num repetitions, max frame position].
Calculate intertrial histograms - Line 433
Next we check to see if intertrials were run. If so, we calculate histograms for them using the calculate_intertrial_histograms.m
function. Before calling the function, we pull the data we need out of the inter_ts_data
array, and then pass it into the function. Just like the last function, we simply get the maximum frame position value among the intertrials, create an array of 1:max value, and then get the indices where the data is equal to each point in that array. Summing up those indices tells us how many data points are of each value. We save this to inter_hist_data
and return it.
Calculate position series if relevant - Line 443
Assuming the setting for position series is set to 1, the function get_position_series.m
is called at line 443.
If pos_conditions
, the variable determining which conditions you want position series for, is empty, then the function uses all conditions. First it checks the entire frame position dataset for non-integer numbers, which would cause an error if present. Then in lines 28-31 we get the indices of the actual data we want to use, assuming there may be NaNs at the beginning and/or end, removing any data set by data_pad
, etc.
At line 34 it finds all indices were there is a “big step”, or a change greater than 1, in the position data. If only one big step is found, it determines whether it happens more toward the beginning or end of the trial. If there are more than one big steps, then it searches for the big steps where the latter value is more than four times the previous value. It then finds all the step candidates (all indices where the frame position changes at all) but removes the data before and after a “big step”, meaning it leaves us with just one cycle of frame position data, assuming at some point the frame position makes a large jump. We get the median step size and then for each step, check to see if the change is within 50% of the median. If not, we skip it, otherwise we get the mean step value. In the end we get pos_series
where the first dimension is condition, second dimension is repetition, third dimension is the frame position steps, and the fourth dimension is the LmR data. This allows us to plot LmR data gainst the change in frame position instead of the change in time (hence position series instead of time series).
The function returns pos_series
, discussed earlier, and mean_pos_series
which is the position series averaged over repetitions.
Saving the processed data - Lines 455-484
Lines 455-468 simply save some of the variables produced throughout the processing as new variables which have names that are more easily understood by the user. These variables will all be saved in a .mat file and the goal is for a user to be able to understand what they are by their variable name alone, rather than having to reference the documentation every time.
After the re-naming, we save a long list of variables in the experiment folder, under the processed filename given by the user.
Note the else
statement at 486, tied to the if
statement started at line 347. Most of the dataset creation listed above is for flying experiments, and this if statement separates flying experiments from other types (walking). After this else
comes the dataset generation done for non-flying experiments, and a new save command saving different variables. The dataset generation is not nearly as extensive for non-flying experiments. We still average the timeseries data over repetitions, get the tuning curve data, and create histogram data for the intertrials (all done the same way). The rest is not included. As such, there are many fewer variables being saved for a non-flying experiment.
Bad conditions reporting - Line 525
The last thing done is creating a text file in which the bad conditions are summarized. This uses the variable created earlier in the processing, bad_conds_summary
which is a cell arrray of text lines that are printed into a text file and saved based on the file path and name provided by the user.