documented the reward function (closes #50)

2019-10-03 22:31:27 +02:00
parent 04b8b91df2
commit 6f0ec08c47
1 changed files with 98 additions and 1 deletions
--- a/docs/usage.md
+++ b/docs/usage.md
@@ -1,4 +1,101 @@
-### UI
+### Usage
 At its core Pwnagotchi is a very simple creature: we could summarize its main algorithm as:
 ```python
 # main loop
 while True:
    # ask bettercap for all visible access points and their clients
    aps = get_all_visible_access_points()
    # loop each AP
    for ap in aps:
        # send an association frame in order to grab the PMKID
        send_assoc(ap)
        # loop each client station of the AP
        for client in ap.clients:
            # deauthenticate the client to get its half or full handshake
            deauthenticate(client)
    wait_for_loot()
 ```
 Despite its simplicity, this logic is controlled by several parameters that regulate the wait times, the timeouts, on which channels to hop and so on.
 From `config.yml`:
 ```yaml
 personality:
    # advertise our presence
    advertise: true
    # perform a deauthentication attack to client stations in order to get full or half handshakes
    deauth: true
    # send association frames to APs in order to get the PMKID
    associate: true
    # list of channels to recon on, or empty for all channels
    channels: []
    # minimum WiFi signal strength in dBm
    min_rssi: -200
    # number of seconds for wifi.ap.ttl
    ap_ttl: 120
    # number of seconds for wifi.sta.ttl
    sta_ttl: 300
    # time in seconds to wait during channel recon
    recon_time: 30
    # number of inactive epochs after which recon_time gets multiplied by recon_inactive_multiplier
    max_inactive_scale: 2
    # if more than max_inactive_scale epochs are inactive, recon_time *= recon_inactive_multiplier
    recon_inactive_multiplier: 2
    # time in seconds to wait during channel hopping if activity has been performed
    hop_recon_time: 10
    # time in seconds to wait during channel hopping if no activity has been performed
    min_recon_time: 5
    # maximum amount of deauths/associations per BSSID per session
    max_interactions: 3
    # maximum amount of misses before considering the data stale and triggering a new recon
    max_misses_for_recon: 5
    # number of active epochs that triggers the excited state
    excited_num_epochs: 10
    # number of inactive epochs that triggers the bored state
    bored_num_epochs: 15
    # number of inactive epochs that triggers the sad state
    sad_num_epochs: 25
 ```
 There is no optimal set of parameters for every situation: when the unit is moving (during a walk for instance) smaller timeouts and RSSI thresholds might be preferred
 in order to quickly remove routers that are not in range anymore, while when stationary in high density areas (like an office) other parameters might be better. 
 The role of the AI is to observe what's going on at the WiFi level, and adjust those parameters in order to maximize the cumulative reward of that loop / epoch.
 #### Reward Function
 After each iteration of the main loop (an `epoch`), the reward, a score that represents how well the parameters performed, is computed as 
 (an excerpt from `pwnagotchi/ai/reward.py`):
 ```python
 # state contains the information of the last epoch
 # epoch_n is the number of the last epoch
 tot_epochs = epoch_n + 1e-20 # 1e-20 is added to avoid a division by 0
 tot_interactions = max(state['num_deauths'] + state['num_associations'], state['num_handshakes']) + 1e-20
 tot_channels = wifi.NumChannels
 # ideally, for each interaction we would have an handshake
 h = state['num_handshakes'] / tot_interactions
 # small positive rewards the more active epochs we have
 a = .2 * (state['active_for_epochs'] / tot_epochs)
 # make sure we keep hopping on the widest channel spectrum
 c = .1 * (state['num_hops'] / tot_channels)
 # small negative reward if we don't see aps for a while
 b = -.3 * (state['blind_for_epochs'] / tot_epochs)
 # small negative reward if we interact with things that are not in range anymore
 m = -.3 * (state['missed_interactions'] / tot_interactions)
 # small negative reward for inactive epochs
 i = -.2 * (state['inactive_for_epochs'] / tot_epochs)
 reward = h + a + c + b + i + m
 ```
 By maximizing this reward value, the AI learns over time to find the set of parameters that better perform with the current environmental conditions.
 ### User Interface
 The UI is available either via display if installed, or via http://pwnagotchi.local:8080/ if you connect to the unit via `usb0` and set a static address on the network interface (change `pwnagotchi` with the hostname of your unit).