Prepared by Soňa Dulíková, Lukáš Lafférs and Miroslav Štefánik ()
Institute of Economic Research, Slovak Academy of Sciences
This work was supported by the Slovak Research and Development Agency under the contract no. APVV-17-0329.


The Commuting allowance

This is an automated report providing evidence on Commuting allowance, one of the active labour market policy (ALMP) programmes, implemented in Slovakia between during 2017.
The Commuting allowance is provided based on the § 53 of the Act on Employment Services Col. 2004/5.

if(qualitative_data_condition) {
  knitr::knit_exit()
}

1. The description of the programme

Based on the Labour Market Policy Database (LMP) administrated by DG Employment of the European Commission, the Commuting allowance is classified as an “Employment incentives” type of ALMP programme, with the programme specific code 41_SK20.

Commuting allowance aims: Employment maintenance incentives

Beneficiaries are registered job seekers.

The Eligible are citizens who were registered jobseekers for at least 3 months before starting to work or performance

Implementation: The contribution for commuting to work shall be provided a monthly for travel to work to cover a part of an employee s travel costs for commuting from his/her place of permanent residence or place of temporary residence to the place of employment specified in his/her employment contract and back if the employee was a jobseeker registered in the register of jobseekers for at least three months who was removed from the register of jobseekers for the reason in section 36(1)(a) and if he/she applies for the contribution in writing no later than one month after starting employment. The contribution can be provided again after two years from the end of provision. The contribution shall be provided to an employee for at most six months from the start of employment .The contribution shall be provided to cover travel costs using public transport up to a maximum of EUR 135 per month depending on the distance of the place of employment from the place of permanent or temporary residence. The contribution shall not be provided to an employee who commutes from his/her place of permanent or temporary residence to the place of employment specified in his/her employment contract within one municipality.


1.1 Participants and expenditures

#Preparing of participants and expenditures tables 
partSK <- subset(part, geo == 'SK' & year == format(as.Date(params$ep_start),"%Y") & age == 'TOTAL' & sex == 'T' & stk_flow == 'ENT' )
names(programesSK)[1] <- 'lmp_type'
partSK <- merge(partSK, programesSK, by = 'lmp_type', all = TRUE)
partSK <- subset(partSK, substr(Classification, 1, 1) == '2' | substr(Classification, 1, 1) == '4' | substr(Classification, 1, 1) == '5' | substr(Classification, 1, 1) == '6' | substr(Classification, 1, 1) == '7')
partSK_datapie <- subset(partSK, lmp_type == lmp_code | lmp_type == '2' | lmp_type == '4' | lmp_type == '5' | lmp_type == '6' | lmp_type == '7')
partSK_datapie$value <- ifelse(is.na(partSK_datapie$value), sum(partSK[substr(partSK$lmp_type, 1, 6) == substr(lmp_code, 1, 6) , 'value'], na.rm = TRUE), partSK_datapie$value)


expSK <- subset(exp, year == format(as.Date(params$ep_start),"%Y"))
expSK <- merge(expSK, programesSK, by = 'lmp_type', all = TRUE)
expSK <- subset(expSK, substr(Classification, 1, 1) == '2' | substr(Classification, 1, 1) == '4' | substr(Classification, 1, 1) == '5' | substr(Classification, 1, 1) == '6' | substr(Classification, 1, 1) == '7')
expSK_datapie <- subset(expSK, lmp_type == lmp_code | lmp_type == '2' | lmp_type == '4' | lmp_type == '5' | lmp_type == '6' | lmp_type == '7')
expSK_datapie$value <- ifelse(is.na(expSK_datapie$value), sum(expSK[substr(expSK$lmp_type, 1, 6) == substr(lmp_code, 1, 6) , 'value'], na.rm = TRUE), expSK_datapie$value)

#Share of expenditures and participants at programm

partSK_per <- partSK_datapie %>% select(lmp_type, value) %>% 
  subset(lmp_type != subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1]) %>%
  mutate(per = paste(round(100 * value / sum(value),2),'%'))

expSK_per <- expSK_datapie %>% select(lmp_type, value) %>% 
  subset(lmp_type != subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1]) %>%
  mutate(per = paste(round(100 * value / sum(value),2),'%'))

program_part <-  data.frame(lmp_type  = c('total', as.character(subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1])),
                            participants = c(sum(partSK_per$value), subset(partSK_datapie, partSK_datapie$almp == params$measure)$value[1]),
                            per = c('100 %', paste(round(subset(partSK_datapie, partSK_datapie$almp == params$measure)$value[1] / sum(partSK_per$value) * 100,2),'%'))
                            )

program_exp <-  data.frame(lmp_type  = c('total', as.character(subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1])),
                            expenditures = c(sum(expSK_per$value), subset(expSK_datapie, expSK_datapie$almp == params$measure)$value[1]),
                            per = c('100 %', paste(round(subset(expSK_datapie, expSK_datapie$almp == params$measure)$value[1] / sum(expSK_per$value) * 100,2),'%'))
                            )


type_share_par <- partSK_per$per[partSK_per$lmp_type == as.character(subset(qualitative, qualitative$almp == params$measure)$Classification[1])] 
prog_share_par <- program_part$per[program_part$lmp_type == as.character(subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1])]
prog_numb_par <- program_part$participants[program_part$lmp_type == as.character(subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1])]

type_share_exp <- expSK_per$per[expSK_per$lmp_type == subset(qualitative, qualitative$almp == params$measure)$Classification[1]]
prog_share_exp <- program_exp$per[program_exp$lmp_type == as.character(subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1])]
prog_numb_exp <- program_exp$expenditures[program_exp$lmp_type == as.character(subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1])]

The left pie chart Participants displays the share of participants in ALMP programmes grouped by ALMP types of the LMP classification. Shares are based on LMP Database stock figures reported for the calendar year 2017.

The right pie chart Expenditure display the share of expenditure using the same LMP typology as the Participants pie chart. Comparing the share on the total pie of participants to the share on the total pie of expenditures provides an assessment of unit costs of the programme relative to the average. If the percentage of participants is higher than on total spending, then the programme is less expensive than the average ALMP programme (and vice versa).

Graph 1: Resources flowing to Commuting allowance during 2017

#Preparing data for Pie Chart
#PARTICIPANTS
piechart_P <- select(partSK_datapie, lmp_type, value)

piechart_P$lmp_type <- ifelse(piechart_P$lmp_type == '2' | piechart_P$lmp_type == '4' | piechart_P$lmp_type == '5' | piechart_P$lmp_type == '6' | piechart_P$lmp_type == '7',       
                              qualitative$class[match(piechart_P$lmp_type, qualitative$Classification)], 
                              paste(qualitative$class[match(piechart_P$lmp_type, qualitative$lmp_type.x)], 
                                    qualitative$Labour.market.services[match(piechart_P$lmp_type, qualitative$lmp_type.x)], 
                                    sep =': ' ))

piechart_P <- filter(piechart_P, !duplicated(piechart_P$lmp_type))
piechart_P$lmp_type <- ifelse(piechart_P$lmp_type == 'Training: [Component] Projects and programmes - Projects - Education and training','Training: [Component] Projects and programmes - Projects - Education and training Repas',piechart_P$lmp_type)

piechart_P <- piechart_P[order(piechart_P$lmp_type),]
piechart_P$focus <- ifelse(
  piechart_P$lmp_type == paste(
    subset(qualitative, qualitative$almp == params$measure)$class[1],
    subset(qualitative, qualitative$almp == params$measure)$Labour.market.services[1],
    sep =': '), 
  0.05,
  0)

#EXPENDITURES 
piechart_E <- select(expSK_datapie, lmp_type, value)

piechart_E$lmp_type <- ifelse(piechart_E$lmp_type == '2' | piechart_E$lmp_type == '4' | piechart_E$lmp_type == '5' | piechart_E$lmp_type == '6' | piechart_E$lmp_type == '7',       
                              qualitative$class[match(piechart_E$lmp_type, qualitative$Classification)], 
                              paste(qualitative$class[match(piechart_E$lmp_type, qualitative$lmp_type.x)], 
                                    qualitative$Labour.market.services[match(piechart_E$lmp_type, qualitative$lmp_type.x)], 
                                    sep =': ' ))

piechart_E <- filter(piechart_E, !duplicated(piechart_E$lmp_type))
piechart_E$lmp_type <- ifelse(piechart_E$lmp_type == 'Training: [Component] Projects and programmes - Projects - Education and training','Training: [Component] Projects and programmes - Projects - Education and training Repas',piechart_E$lmp_type)

piechart_E <- piechart_E[order(piechart_E$lmp_type),]
piechart_E$focus <- ifelse(
  piechart_E$lmp_type == paste(
    subset(qualitative, qualitative$almp == params$measure)$class[1],
    subset(qualitative, qualitative$almp == params$measure)$Labour.market.services[1],
    sep =': '), 
  0.05,
  0)


## PIE CHART
#PARTICIPANTS
piechart_P$lmp_type <- gsub("(.{25,}?)\\s", "\\1\n", piechart_P$lmp_type)
ev_measure <- gsub("(.{25,}?)\\s", "\\1\n",
                   paste(subset(qualitative, qualitative$almp == params$measure)$class[1],
                         subset(qualitative, qualitative$almp == params$measure)$Labour.market.services[1],
                         sep =': '))

ev_type <- gsub("(.{25,}?)\\s", "\\1\n",
                subset(qualitative, qualitative$almp == params$measure)$class[1])

piechart_P$value <- ifelse(piechart_P$lmp_type == ev_type, piechart_P$value - piechart_P[piechart_P$lmp_type == ev_measure,"value"], piechart_P$value)

piechart_P <- piechart_P %>% 
  mutate(per = paste0(round(100 * value / sum(value),2), "%", by = ""),
         total = sum(value),
         end_angle = 2*pi*cumsum(value)/total,      
         start_angle = lag(end_angle, default = 0),   
         mid_angle = 0.5*(start_angle + end_angle))

rpie <- 1
rlabel <- 0.6 * rpie 

plot_piechart_P <- ggplot(piechart_P) + 
  theme_no_axes() + 
  theme_void() +
  coord_fixed()+
  geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0, r = 1, amount = value, 
                   fill = lmp_type, explode  = focus),
               data = piechart_P, stat = 'pie',color='white')+
  ggtitle('Participants') +
  geom_text(aes(x = rlabel*sin(mid_angle), y = rlabel*cos(mid_angle), label = ifelse(lmp_type == ev_type | lmp_type == ev_measure, per,"")),
            size = 4, position=position_jitter(width=0,height=0.3))+
  theme(plot.title = element_text(color="black", size=18, face="bold.italic"),
        legend.title = element_text(size=15, face="bold"),
        legend.text = element_text(size=12),
        legend.key.height=unit(1.1, "cm")) +
  guides(fill=guide_legend(title="Classification of LMP\n and measure")) +
  scale_fill_manual(values=c(ifelse(piechart_P$value != 0 &  piechart_P$lmp_type == ev_measure,  
                                    "steelblue3", 
                                    ifelse(piechart_P$lmp_type == ev_type, 
                                           "steelblue1", 
                                           gray.colors(6, start = 0.8)))))

#EXPENDITURES
piechart_E$lmp_type <- gsub("(.{25,}?)\\s", "\\1\n", piechart_E$lmp_type)

piechart_E$value <- ifelse(piechart_E$lmp_type == ev_type, piechart_E$value - piechart_E[piechart_E$lmp_type == ev_measure,"value"], piechart_E$value)

piechart_E <- piechart_E %>% 
  mutate(per = paste0(round(100 * value / sum(value),2), "%", by = ""),
         total = sum(value),
         end_angle = 2*pi*cumsum(value)/total,      
         start_angle = lag(end_angle, default = 0),   
         mid_angle = 0.5*(start_angle + end_angle))

rpie <- 1
rlabel <- 0.6 * rpie 

plot_piechart_E <- ggplot(piechart_E) + 
  theme_no_axes() + 
  theme_void() +
  coord_fixed()+
  geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0, r = 1, amount = value, 
                   fill = lmp_type, explode  = focus),
               data = piechart_E, stat = 'pie',color='white')+
  ggtitle('Expenditure') +
  geom_text(aes(x = rlabel*sin(mid_angle), y = rlabel*cos(mid_angle), label = ifelse(lmp_type == ev_type | lmp_type == ev_measure, per,"")),
            size = 4, position=position_jitter(width=0,height=0.3))+
  theme(plot.title = element_text(color="black", size=18, face="bold.italic"),
        legend.title = element_text(size=15, face="bold"),
        legend.text = element_text(size=12),
        legend.key.height=unit(1.1, "cm")) +
  guides(fill=guide_legend(title="Classification of LMP\n and measure")) +
  scale_fill_manual(values=c(ifelse(piechart_E$value != 0 &  piechart_E$lmp_type == ev_measure,  
                                    "steelblue3", 
                                    ifelse(piechart_E$lmp_type == ev_type, 
                                           "steelblue1", 
                                           gray.colors(6, start = 0.8)))))

Based on the LMP database, 5 441 individuals participated in the Commuting allowance during 2017. This accounted for the 2.82 % percent of the total number of participants in all Slovak ALMP programmes (LMP types 2-7), while the share of the programme on the total expenditure was 0.57 %. The total roofing LMP type Employment incentives presents 60.68 % percent of the total Slovak ALMP (type 2-7) expenditure and 61.68 % percent of the total ALMP participants.


1.2 The Commuting allowance in the context of ALMPs in Slovakia

Using administrative data, we first provide a picture of the importance of the Commuting allowance in the context of Slovak ALMP. The following flowchart displays flow of job seekers registered into the database of unemployment during . Flows are based on the movement of these individuals during the two years following their registration; this period is divided into six months sub-periods (0/6/12/18/24). For each of these sub-periods, we observe the flows of registered jobseekers to employment, or their de-registration for another reason.
Job seekers can also move to one of the ALMP programmes. The lines highlighted in red represent the flow of jobseekers participating in measure P053, Commuting allowance.

Graph 2: The Commuting allowance in the structure of the flows of jobseekers registered during 2017

The next table shows full names of programmes shown in the graph above.

Table 1: Explanatory table for Graph 2

Sankey_description <- select(qualitative, almp, Labour.market.services, Labour.market.services_SK)
Sankey_description[32,1] <- 'P054'
Sankey_description<- rbind(Sankey_description, Sankey_description[32,])
Sankey_description[nrow(Sankey_description),1] <- 'P54O'
Sankey_description[28,1] <- 'P54D'

Sankey_tabel1 <- data.frame(Measures = c(setdiff(nastroj_kod, c("another reason", "employed"))))
Sankey_tabel1$'Name of programme' <- Sankey_description$Labour.market.services[match(Sankey_tabel1$Measures, Sankey_description$almp)]

Sankey_tabel1  %>% kbl(format = 'html', booktabs = TRUE , align = 'c', row.names = FALSE)%>%
  kable_classic('hover', full_width = FALSE)%>%
  column_spec(1,  border_right = TRUE) 
Measures Name of programme
P54R [Component] Projects and programmes - Projects - Education and training Repas
P054 Projects and programmes
P051 Support for graduate work experience
P54O [Component] Projects and programmes - Self-employment
P060 Reimbursement of operating costs of sheltered workshops and workplaces
P54K [Component] Projects and programmes - Projects - Education and training
P052 Work in minor services for municipalities or self-governing regions
P52A Voluntary work
P049 Self-employment grant
P50J Support for the development of local and regional employment
P053 Commuting allowance
## Preparing of df
# we observe only  young people under XX (age) which inflow in XXXX (entry year) 
df_r <- subset(df, age <= age_group_max)
df_r <- subset(df_r, format(as.Date(df_r$entry),"%Y")== format(as.Date(params$ep_start),"%Y"))
#df_r <- df_r %>% mutate_all(na_if,"") #if cells are empty -> change it to NAs
df_r <- subset(df_r, healthy < 3)

# DOVOD VYRADENIA 
dovod_vyradenia = c('V01','V02','V03','V1','V12','V15') #zamestnali sa 

# Opatrenie P032 vyhodiť (ak sa budú meniť aj iné opatrenia ako napr. 54R a 54Rp tak tu sa to opraví (%in% c()))
delete <- setdiff(c('P032','P54P','P54D'),params$measure)
Salmps <- subset(almps, !nastroj  %in% delete)

## Spojenie DF a ALMPS 
df_almps <- merge(Salmps, df_r, by = 'klient_id')

## Vyfiltruj iba tie klient_id. pri ktorých nástroj je params$measure
partic_measure <- df_almps[df_almps$nastroj == params$measure, 'klient_id']
df_almps <- filter(df_almps, klient_id %in% partic_measure) #tu su tí, ktorí boli na aj na params$measure ale aj na iných opatreniach 

# Podmienka prieniku času pri databáze nezamestnaných a v zúčastnili sa evaluated measure
df_almps_measure <- subset(df_almps, entry <= entrya & exita <= (exit +7) & nastroj == params$measure) # tu su tí, ktorí išli v skúmanom čase do opatrenia params$measure 
##### toto je moja základňa môj prvý stĺpec 

##### Podmienka prieniku času pri databáze nezamestnaných a iných programoch ako evaluated measue ale zároveň sú to tí, ktorí už niekedy na evaluated measure už boli 
df_almps_other <- subset(df_almps, entry <= entrya & exita <= (exit +7) & nastroj != params$measure)

# upravíme si dáta ktoré budeme používať pri grafe
df_almps_measure <- select(df_almps_measure, klient_id, entry, exit, entrya, exita, nastroj, dovod_vyradenia_kod)
df_almps_other <- select(df_almps_other, klient_id, entry, exit, entrya, exita, nastroj, dovod_vyradenia_kod)


#smojím iba tých ktorí boli aj v evaluated measure aj v iných opatreniach, all.x = TRUE lebo chceme aj tých ktorý boli iba na evaluated measure (nemuseli sa zúčastniť aj iných programov)
flow <- merge(df_almps_measure, df_almps_other, by = 'klient_id', all.x = TRUE) 

#dni od začatia opatrenia entrya.x po začatie iného opatrenia entrya.y alebo po zamestnanie/odchod z iného dôvodu exit.x
flow$days <- ifelse(is.na(flow$nastroj.y),
                    as.numeric(difftime(flow$exit.x, flow$entrya.x, units = 'days')),
                    as.numeric(difftime(flow$entrya.y, flow$entrya.x, units = 'days')))

#ak je dovod vyradenia NA ale zúčastnili sa na opatrení 
flow$dovod_vyradenia_kod.y <- ifelse(is.na(flow$dovod_vyradenia_kod.y) & !is.na(flow$nastroj.y), 'V01', flow$dovod_vyradenia_kod.y)

#vyfiltruj tých ktorý boli aj na evaluated measure aj na inom opatrení alebo sa zamestnali a days nie je záporné 
#podmienka -> entrya do ďalšieho projektu musí byť väčšie ako entrya do evaluated measure
#flow$days nemôže byť záporné 
flow <- filter(flow, days >= 0)

#nastroj -> ak dovod vyradenia sa rovna nejakému prvku z vektoru dovodov vyradenia tak -> employed inak another reason
flow  <- flow %>% 
  mutate(nastroj.y = case_when(is.na(flow$nastroj.y) & flow$dovod_vyradenia_kod.x %in% dovod_vyradenia ~ 'employed',
                               is.na(flow$nastroj.y) & !flow$dovod_vyradenia_kod.x %in% dovod_vyradenia ~ 'another reason',
                               !is.na(flow$nastroj.y) ~ flow$nastroj.y)
  )


#dataframe, ktorý budem používať pri tvorbe grafu
Sankey_measure <- flow %>% select(nastroj.x, nastroj.y, days) %>% 
           mutate(month = ceiling(days/30.417))

Sankey_measure <- Sankey_measure %>%mutate(
  time = case_when(
    Sankey_measure$month %in% seq(0,6,1)  ~ 6,
    Sankey_measure$month %in% seq(7,12,1)  ~ 12, 
    Sankey_measure$month %in% seq(13,18,1)  ~ 18,
    Sankey_measure$month %in% seq(19,100,1)  ~ 24,
  ) 
)

Sankey_measure$sources <- ifelse(Sankey_measure$time == 6 | Sankey_measure$time == 12 |
                                   Sankey_measure$time == 18 | Sankey_measure$time == 24, 
                                 Sankey_measure$time - 6, 
                                 Sankey_measure$time)


#zosumarizuj, koľký mladí išli do ktorého opatrenia, zamestnali sa alebo odišli z registra z iných dôvodov
San_measure <- Sankey_measure %>% select(nastroj.y, time, sources) %>%
  group_by(nastroj.y, time, sources) %>% summarise(num = n(), .groups = 'drop') %>%
  rename(nastroj = nastroj.y)


# rozdeľ opatrenia, na tie ostatné almps - OTHER ALMPS 
nastroj_kod <- c('another reason','employed')
NastrojKod <- San_measure[!San_measure$nastroj %in% nastroj_kod,]  %>%  group_by(nastroj) %>% summarise(num = sum(num), .groups = 'drop') %>%
  mutate(perc = num*100 / sum(num),
         cut = case_when(perc >= 5 ~ 1,
                         perc < 5 ~ 0))
nastroj_kod <- c(nastroj_kod, NastrojKod$nastroj[NastrojKod$cut == '1'])

San_aplmps <- subset(San_measure, nastroj %in% nastroj_kod | nastroj == params$measure)
San_other_aplmps <- subset(San_measure, !nastroj %in% nastroj_kod & !nastroj == params$measure)

San_other_aplmps <- San_other_aplmps %>%  group_by(sources , time) %>% summarise(num = sum(num), .groups = 'drop') 
San_other_aplmps$nastroj <- 'OTHER ALMPS'
San_other_aplmps <- relocate(San_other_aplmps, c(nastroj, time), .before = sources,)

San_measure <- rbind(San_aplmps, San_other_aplmps)
remove(San_aplmps, San_other_aplmps)


# uzly grafu (jedinečné), musia tu byť všetky opatrenia
node_m <- data.frame(
  name=c(as.character(San_measure$nastroj), as.character(San_measure$sources))%>% unique()
)

# definovanie koľko registrovaných bude medzi tými rokmi  
velky_df <- data.frame()
for (i in seq(6,24,6)){
  pocet <- San_measure %>%  group_by('sources' = sources >= i) %>% summarise(num = sum(num), .groups = 'drop') 
  pocet <- subset(pocet, sources == TRUE)
  pocet$sources <- i
  velky_df <- rbind(velky_df, pocet)
}

# musím si velky_df prisposobiť tak, aby roky boli ako nodes aby som to mohla spojiť s dataframe San s ktorým potom budem ďalej robiť graf
# preto sources budu ako nastroj -> aby som spravila nodes, years su sources ale sources su years -1 v skutočnosti (v san grafe)
colnames(velky_df) <- c('nastroj', 'num')
velky_df$time <- velky_df$nastroj
velky_df$sources <- San_measure$sources[match(velky_df$time, San_measure$time)] 
velky_df <- relocate(velky_df, num, .after = sources)

San_measure <- rbind(San_measure, velky_df)

#urobím IDsources a ID target podľa uzlov aby garf vedel ten flow medzi jednotlivími uzlami 
San_measure$IDsource <- match(San_measure$sources, node_m$name)-1 
San_measure$IDtarget <- match(San_measure$nastroj, node_m$name)-1

#Color 
time <- seq(0,24,6)
NOALMP <- c('another reason', 'employed')
node_m <- node_m %>% mutate(group = case_when(node_m$name %in% NOALMP ~ 'A',
                                              !node_m$name %in% NOALMP & !node_m$name %in% time ~ 'B',
                                              node_m$name %in% time ~ 'C'
)
)

San_measure$group <- 'type_a'

my_color <- 'd3.scaleOrdinal() .domain(["type_a", "A","B", "C"]) .range(["lightgray", "darkseagreen", "thistle", "rosybrown", "red"])'

San_measure <- as.data.frame(San_measure)

Sankey_ev.measure <- sankeyNetwork(Links = San_measure, Nodes = node_m,
                                   Source = "IDsource", Target = "IDtarget",
                                   Value = "num", NodeID = "name", 
                                   sinksRight=F, fontSize = 14,
                                   fontFamily = "sans-serif",
                                   width = 900,
                                   colourScale=my_color, LinkGroup="group", NodeGroup="group",
                                   nodePadding=10)

condition_for_Sankeyplot <- length(unique(San_measure$IDsource))>=1 && length(unique(San_measure$IDtarget))>=1 && !any(is.na(San_measure$IDsource)) && !any(is.na(San_measure$IDtarget))

The second flow chart shows the further state of the participants of the evaluated programme (further branching of the red line in graph 2) and what happened with them after participation in the evaluated programme. After participation in the programme, the participants can be employed, de-registered for another reason, or they can participate in another ALMP programme. We observed this behaviour for periods of two years after their participation. This period is divided into six months sub-periods (0/6/12/18).

Graph 2.2: Flowchart of participants in evaluation progname participated during evaluation period.

if(condition_for_Sankeyplot){
  Sankey_ev.measure
}
cond_for_table <- FALSE
if(condition_for_Sankeyplot){
  Sankey_tabel2 <- data.frame(Measures = c(setdiff(nastroj_kod, c("another reason", "employed"))))
  Sankey_tabel2$'Name of programme' <- Sankey_description$Labour.market.services[match(Sankey_tabel2$Measures, Sankey_description$almp)]
  
  cond_for_table <- nrow(Sankey_tabel2) >=1
}

The next table shows full names of programmes shown in graph above.

Table 1.2: Explanatory table for Graph 2.2

if (cond_for_table){
  Sankey_tabel2  %>% kbl(format = 'html', booktabs = TRUE , align = 'c', row.names = FALSE)%>%
    kable_classic('hover', full_width = FALSE)%>%
    column_spec(1,  border_right = TRUE)
}
Measures Name of programme
P050 Support for hiring disadvantaged job seekers
P054 Projects and programmes
P51A2 Incentives to hire young unemployed (<25) into their first job



2. Data and the evaluation sample

This evaluation report is based on administrative data on registered unemployed jobseekers in Slovakia, inter-linkable with the database of participants in ALMP programmes. The export was provided by the Central Office of Labour, Social Affairs and Family of the Slovak Republic (COLSAF) at the beginning of 2021. It covered a period from January 2014 until December 2020. The raw data were processed using a data-preparation script, available on request from: .
The data frame “df” covers all unemployment spells of unemployed jobseekers, with attributes collected at the moment of their registration as unemployed jobseekers (Application Form).

### DEFINE THE EVALUATION PERIOD #
ep_start <- as.Date(params$ep_start)
ep_end <- as.Date(params$ep_end)
un_spell <- spell
measure <- params$measure

    ########################################x
    ## SELECTING THE EVALUATION SAMPLE #
    ########################################x,
    
treated<-filter(almps, nastroj==toString(params$measure))

#Sub-groups to be dropped: 
# - those with ALMP participation 2 years before the EP
IDalmps_before<-unique(almps$klient_id[almps$entrya<ep_start & almps$entrya>=ep_start-730]) 
# - those with ALMP participation in other ALMP during the EP
IDalmps_during_ep<-unique(almps$klient_id[(almps$entrya<=ep_end & almps$entrya>=ep_start) & almps$nastroj!=toString(params$measure)])

###DEFINE THE ELIGIBILITY CRITERIA 
#the EC are measure specific, in the case of looping over multiple measures EC need to be elaborated t a form of table or a list and added to the parameters
#SUBSETTING THE BASE EVALUATION DATASET OF ELIGIBLE 
cond0<-as.logical(df$entry<=ep_end & df$exit>=ep_start) # Being on the register of unemployed during the evaluation period
cond1<-as.logical(df$age < age_group_max) 
cond2<-as.logical((df$exit-df$entry)>=un_spell) #LENGTH OF PREVIOUS UNEMPLOYMENT SPELL
cond3<-as.logical(df$entry>=ep_start-730) # Dropping old unemployment spells (cases inflowing more than 730 days before the start of the evaluation period)

dfe<-df[cond0 & cond1 & cond2 & cond3,]
n1 <- dim(dfe)[1]
sampleIDs<-unique(df$klient_id[cond0 & cond1 & cond2 & cond3])
n2 <- length(sampleIDs)

###
#### ONLY KEEP THE SPELLS OF PARTICIPANTS DURING WHICH THEY PARTICIPATED 
#### Creating dataframe of participants in the evaluated programme during the evaluation period.
dfa<-filter(treated, entrya<=ep_end & entrya>=ep_start)
npart0<-length(unique(dfa$klient_id))
npart1<-dim(dfa)[1]

#Drop other ALMP participations from the group of participants as well as the eligible non-participants
n3 <- nrow(filter(dfa, klient_id %in% IDalmps_before))
n4 <- nrow(filter(dfe, klient_id %in% IDalmps_before))
dfa<-filter(dfa, !klient_id %in% IDalmps_before)
dfe<-filter(dfe, !klient_id %in% IDalmps_before)

n5 <- nrow(filter(dfa, klient_id %in% IDalmps_during_ep))
n6 <- nrow(filter(dfe, klient_id %in% IDalmps_during_ep))
dfa<-filter(dfa, !klient_id %in% IDalmps_during_ep)
dfe<-filter(dfe, !klient_id %in% IDalmps_during_ep)



#### Only participants with one participation during the evaluation period are sampled. 
#### JS with multiple participations are droped from the sample
dfa<-dfa %>%
  group_by(klient_id) %>% 
  mutate(rep=n()) # rep is the number of participations of one JS repeating during 2014

n7 <- nrow(filter(dfa, rep!=1))
dfa<-filter(dfa, rep==1)
npart2<-dim(dfa)[1] # Number of participants after cleaning with multiple ALMP participations

###Participants who also participated in other ALMP measures (§54) are dropped
progOUT<- setdiff(c("P050", "P50A", "P50C","P50J", "P50K" ,"P51A", "P054", "P54D", "P54E", "P54O", "P54P", "P54U"),params$measure)
outIDs<-unique(almps$klient_id[(almps$entrya>=as.Date(params$ep_start) & almps$entrya<=as.Date(params$ep_end)+730) & as.logical(almps$nastroj %in% progOUT)])

n8 <- nrow(filter(dfa, klient_id %in% outIDs))
n9 <- nrow(filter(dfe, klient_id %in% outIDs))

dfa<-(filter(dfa, !klient_id %in% outIDs))
dfe<-(filter(dfe, !klient_id %in% outIDs))

npart3<-length(unique(dfa$klient_id)) # The number of participants after we drop participations in supported employment during the outcome observation period 

#MERGING PARTICIPATIONS AND UNEMPLOYMENT SPELLS
#First we add the date of the entry and exit from the registration into the table of participations in measure (evaluated measure params$measure). We only import entry dates for the individuals in the evaluation sample. 
dfa<-merge(dfa, select(dfe, klient_id, entry, exit), by="klient_id", all.x = TRUE)
nrowdfa <- nrow(dfa)

#Second we filter only the registrations of members of the evaluation sample during which the programme participation took place. 
dfa<-filter(dfa, dfa$exit+30>=dfa$entrya & dfa$entrya<=ep_end & entrya >= entry) # Keeping only the participations happening during an unemployment spell
n10 <- nrowdfa-dim(dfa)[1]
npart4<-dim(dfa)[1] # Number of participants after cleaning participations outside an unemployment spell (data quality issue)

## Participants #
particIDs<-unique(dfa$klient_id)
## Eligible #
# nonpartIDs<-sampleIDs[!(sampleIDs %in% particIDs)]
nonpart<-filter(dfe, !(klient_id %in% particIDs))
###Out of the participants only one-time participations happening during an unemployment spell are used 
partic<-merge(dfe, dfa, by = c("klient_id", "entry"), all.x = FALSE)

#LL: sanity check
#nonpart$klient_id
#partic$klient_id
#intersect(nonpart$klient_id,partic$klient_id)
#this should be empty. OK

### Cleaning and renaming #
partic$exit.y<-NULL
partic<-partic %>% rename(exit=exit.x)

nonpart$entrya<-NA
nonpart$exita<-NA
nonpart$nastroj<-NA
nonpart$naklady<-NA
nonpart$projekt<-NA
partic$rep<-NULL

#Filter extreme values (1%) of the waiting time until participation in the evaluated measure 
wte<-quantile(as.numeric(partic$entrya)-as.numeric(partic$entry),na.rm=TRUE, probs=0.99)
n11 <- nrow(filter(filter(partic, as.numeric(entrya)-as.numeric(entry)>wte)))
partic<-filter(partic, as.numeric(entrya)-as.numeric(entry)<=wte)

esample<-rbind(nonpart, partic)
esample$treated<-!is.na(esample$entrya)
#Filter extreme values of the waiting time until participation in the evaluated measure 

#LL:
#summary(esample$treated)

###Unemployment spells ending with LM placements
#esample<-filter(esample, dovod_vyradenia_kod == 'V01' | dovod_vyradenia_kod == 'V02' | dovod_vyradenia_kod == 'V03' | dovod_vyradenia_kod == 'V1' | dovod_vyradenia_kod == 'V12' | dovod_vyradenia_kod == 'V15')

    ########################################x
    ## GENERATING EXPLANATORY VARIABLES #
    ########################################x,
    esample$ent <- as.numeric(as.Date(ep_start))-as.numeric(as.Date(esample$entry))
    
    #LL: quantile(esample$ent)
    #niektore unemployment spells boli 
    # negativne (JS zacal byt nezamestnany po 1-1-2017), 
    # niektore pozitivne (JS zacal byt nezamestnany pred 1-1-2017)

    ########AGEG
    esample$ageg <- cut_interval(esample$age, 5, labels=FALSE)
    esample$ageg <- as.factor(esample$ageg)
    
    ####Extra columns for dummy variables go into the esample_est for further testing
    esample <- dummy_cols(esample, select_columns = c("ageg"), remove_first_dummy = TRUE)
    ageg_dummies<-colnames(esample)[grepl("ageg_", colnames(esample))]

  #######Regional Unemployment rate during the implementation period
    esample[, "UR_region"] <- esample[, paste0("UR_region_",year(ep_start), "")]
    esample[,grepl("UR_region_", colnames(esample))]<-NULL
    #esample$UR_region <- as.numeric(gsub(",", ".", gsub("\\.", "", esample$UR_region)))
    
    ##########Difference between entry into unemployment register and started of participation in measure
    esample$diff_entry <- ceiling(as.integer(as.Date(esample$entrya) - as.Date(esample$entry))/30.417)

#Cleaning
#nonpart<-NULL
#partic<-NULL




# Share of repeated unemployment after participation in ALMP
# Merge esample, history table
h_esample <- merge(select(esample, klient_id, entry, exit, dovod_vyradenia_kod, entrya, exita, nastroj, treated), dfh, by = 'klient_id', all.x = TRUE, all.y = FALSE)

# as.Date
entry <- paste('entry', seq(1:11), sep = '')
exit <- paste('exit', seq(1:11), sep = '')

for (e in entry){
  h_esample[ ,e] <- as.Date(h_esample[ ,e], origin = '1970-01-01')
}
for (x in exit){
  h_esample[ ,x] <- as.Date(h_esample[ ,x], origin = '1970-01-01')
}
h_esample$exita <- as.Date(h_esample$exita, origin = '1970-01-01')

#Dropping cases with over than 15 unemployment spells 
#h_esample<-filter(h_esample, is.na(entry16)) #subset (klient_id) jobseekers who became unemployment only 15 times 
#esample <- subset(esample, klient_id %in% h_esample$klient_id) 
#remove entry16+ and exit16+ columns
#h_esample<-select(h_esample, -entry16, -entry17, -entry18, -entry19, -entry20, -exit16, -exit17, -exit18, -exit19, -exit20)

esample_condition <- (mean(esample$treated) < 0.00001)
if(esample_condition) {
  knitr::knit_exit()
}

2.1 Description of participants and eligible

Our evaluation sample consists of 300 626 eligible individuals, registered as jobseekers during the evaluation period starting from 2017-01-01 until 2017-12-31, These jobseekers have generated in total 316 875 unemployment registrations during the period 2014-2020. Out of them, in 5 014 jobseekers participated in the evaluated measure Commuting allowance during the evaluation period. 3 174 participants and 127 929 eligible JS were dropped from the sample, because of multiple participations in the evaluated program (or other relevant ALMPs) during and after the evaluation period. After cleaning we have sampled 1 840 JS with one-time participation during the evaluation period. At the same time, there was 188 946 eligible non-participants present in the register of unemployed jobseekers during 2017.

The group of participants and eligible show differences in a number of observed characteristics. Table 2 displays an overview of these differences on mean values of selected characteristics.


Table 2: Descriptive statistics of participants and eligible (selected characteristics)

####
## number of participants and eligible 
####

#separate table dfe with participants in ALMP  and eligible 
elig <- distinct_at(nonpart,vars(klient_id),.keep_all = TRUE)
part <- distinct_at(partic,vars(klient_id),.keep_all = TRUE)

#BASIC DESCRIPTIVE TABLE

# The number of participants and eligible in the sample'
#tab1<-cbind(sum(!is.na(dfe$entrya)),sum(is.na(dfe$entrya)))
tab1 <- cbind(format(length(unique(esample$klient_id[esample$treated==TRUE])), big.mark=" ", scientific=FALSE), format(length(unique(esample$klient_id[esample$treated==FALSE])), big.mark=" ", scientific=FALSE))
colnames(tab1) <- c('Participants', 'Eligible')
tab1 <- data.frame(cbind(Description = 'Number of observations', tab1))

####
## Age distribution
####

age_par <- part %>% select(age) %>% group_by(age) %>% dplyr::summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%"))

age_elig <- elig %>% select(age) %>% group_by(age) %>% dplyr::summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%"))

age <- merge(age_par, age_elig, by='age', all = TRUE)
age$Participants_percent <- ifelse(is.na(age$Participants_percent), paste(0,'%'), age$Participants_percent)
age$Participants_total <- ifelse(is.na(age$Participants_total), 0, age$Participants_total)

mean_age_par <- round(mean(part$age),1)
mean_age_elig <- round(mean(elig$age),1)

##### MEAN 
mean_age<-data.frame(mean_age_par,mean_age_elig)
mean_age <- cbind(Description = 'Age (years)', mean_age)
mean_age <- rename(mean_age, Participants = mean_age_par, Eligible=mean_age_elig)
####

age_elig$desc <- 'Eligible'
age_par$desc <- 'Participants'
age_elig <- age_elig %>% rename(total = Eligible_total, percent = Eligible_percent)
age_par <- age_par %>% rename(total = Participants_total, percent = Participants_percent)
age_r <- rbind(age_par,age_elig)

age_plot <- ggplot(age_r, aes(x = age, y = total, group= desc)) +
  geom_point(aes(color = desc), size = 1.5)+
  geom_line(aes(color = desc), size = 1) + 
  ylim(0,23000) +
  theme_light() +
  geom_text(aes(label = paste(percent, '\n\n' )), col ='black', size = 3, fontface ='italic')+
  labs(
    title = "Compare of age distribution (%)",
    x = "Age",
    y = "Total Count"
  )

####
## Gender distribution
####

gender_par <- part %>% select(male) %>% group_by(male) %>% dplyr::summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%"))

gender_elig <- elig %>% select(male) %>% group_by(male) %>% dplyr::summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%"))

gender <- merge(gender_par, gender_elig, by='male', all = TRUE)

####
male <- data.frame(gender_par[2,3], gender_elig[2,3])
male <- cbind(Description = 'Male', male)
male <- rename(male, Participants = Participants_percent, Eligible=Eligible_percent)


####
##  Education distribution
####

education_par <- part %>% select(noedu, primary, lsec, usec, tertiary) %>% 
  group_by(noedu, primary, lsec, usec, tertiary) %>% dplyr::summarise(Participants_total = n(), .groups = 'drop')  %>%
  mutate(Participants = round(100 * Participants_total / sum(Participants_total),2))
education_par <- education_par[!(is.na(education_par$noedu)),]
education_par <- reshape2::melt(education_par, measure.vars = c('noedu', 'primary', 'lsec', 'usec', 'tertiary'))
education_par <- education_par[education_par$value == 1,] 
education_par <- select(education_par, -value)

education_elig <- elig %>% select(noedu, primary, lsec, usec, tertiary) %>% 
  group_by(noedu, primary, lsec, usec, tertiary) %>% dplyr::summarise(Eligible_total = n(), .groups = 'drop')  %>%
  mutate(Eligible = round(100 * Eligible_total / sum(Eligible_total),2))
education_elig <- education_elig[!(is.na(education_elig$noedu)),]
education_elig <- reshape2::melt(education_elig, measure.vars = c('noedu', 'primary', 'lsec', 'usec', 'tertiary'))
education_elig <- education_elig[education_elig$value == 1,] 
education_elig <- select(education_elig, -value)

education <- merge(education_par, education_elig, by='variable', all = TRUE)
education <- select(education, variable, Participants, Eligible)
education <- rename(education, Description = variable)

education <- education%>%mutate(
  Description = case_when(
    education$Description ==  'noedu' ~ 'No education',    
    education$Description ==  'primary' ~ 'Primary', 
    education$Description ==  'lsec' ~ 'Lower secondary',
    education$Description ==  'usec' ~ 'Upper secondary',
    education$Description ==   'tertiary' ~ 'Tertiary', 
    TRUE~as.character(education$Description)
  ) 
)

education <- education %>% group_by(Description) %>%
  dplyr::summarise(Participants = paste0(sum(Participants),  "%"), Eligible = paste0(sum(Eligible),  "%")) 

x <- c('No education','Primary','Lower secondary', 'Upper secondary', 'Tertiary')
education <- education[match(x, education$Description),]

####
##  skills
####

l_skills_par <- part %>%  select(flang) %>%  
  mutate(flang = case_when(part$flang == 1 ~ 'Foreign language')) %>%  
  group_by(flang) %>% summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%")) %>%
  rename(Description = flang) %>% filter(row_number() == 1L)

l_skills_elig <- elig %>%  select(flang) %>%  
  mutate(flang = case_when(elig$flang == 1 ~ 'Foreign language')) %>%  
  group_by(flang) %>% summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) %>%
  rename(Description = flang) %>% filter(row_number() == 1L)

PC_skills_par <- part %>%  select(pc) %>%  
  mutate(pc = case_when(part$pc == 1 ~ 'PC skill')) %>%  
  group_by(pc) %>% summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%")) %>%
  rename(Description = pc) %>% filter(row_number() == 1L)

PC_skills_elig <- elig %>%  select(pc) %>%  
  mutate(pc = case_when(elig$pc == 1 ~ 'PC skill')) %>%  
  group_by(pc) %>% summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) %>%
  rename(Description = pc) %>% filter(row_number() == 1L)

d_skills_par <- part %>%  select(drive) %>%  
  mutate(drive = case_when(part$drive == 1 ~ 'Driving license')) %>%  
  group_by(drive) %>% summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%")) %>%
  rename(Description = drive) %>% filter(row_number() == 1L)

d_skills_elig <- elig %>%  select(drive) %>%  
  mutate(drive = case_when(elig$drive == 1 ~ 'Driving license')) %>%  
  group_by(drive) %>% summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) %>%
  rename(Description = drive) %>% filter(row_number() == 1L)


l_skills <- merge(l_skills_par, l_skills_elig, by='Description', all = TRUE)
PC_skills <- merge(PC_skills_par, PC_skills_elig, by='Description', all = TRUE)
d_skills <- merge(d_skills_par, d_skills_elig, by='Description', all = TRUE)

skills <- rbind(l_skills, PC_skills, d_skills)
skills <- select(skills, Description, Participants_percent, Eligible_percent)
skills <- rename(skills, Participants = Participants_percent, Eligible=Eligible_percent)

####
##  region
####

part <- part %>% mutate(
  okres = case_when(
    grepl('SK010',okres)  ~ 'Bratislavský', 
    grepl('SK021',okres)  ~ 'Trnavský', 
    grepl('SK022',okres)  ~ 'Trenčiansky', 
    grepl('SK023',okres)  ~ 'Nitriansky', 
    grepl('SK031',okres)  ~ 'Žilinský', 
    grepl('SK032',okres)  ~ 'Banskobystrický', 
    grepl('SK041',okres)  ~ 'Prešovský', 
    grepl('SK042',okres)  ~ 'Košický', 
    TRUE~as.character(okres)
  )
)

elig <- elig %>% mutate(
  okres = case_when(
    grepl('SK010',okres)  ~ 'Bratislavský', 
    grepl('SK021',okres)  ~ 'Trnavský', 
    grepl('SK022',okres)  ~ 'Trenčiansky', 
    grepl('SK023',okres)  ~ 'Nitriansky', 
    grepl('SK031',okres)  ~ 'Žilinský', 
    grepl('SK032',okres)  ~ 'Banskobystrický', 
    grepl('SK041',okres)  ~ 'Prešovský', 
    grepl('SK042',okres)  ~ 'Košický', 
    TRUE~as.character(okres)
  )
)

okres_par <- part %>% select(okres) %>% group_by(okres) %>% dplyr::summarise(Participants_total = n())  %>%
  mutate(Participants = paste0(round(100 * Participants_total / sum(Participants_total),2), "%"))

okres_elig <- elig %>% select(okres) %>% group_by(okres) %>% dplyr::summarise(Eligible_total = n())  %>%
  mutate(Eligible = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%"))

okres <- merge(okres_par, okres_elig, by='okres', all = TRUE)
okres <- select(okres, okres, Participants, Eligible)
okres <- rename(okres, Description = okres)
okres <- okres[okres$Description != 'N/A',]

####
##  Previous employment
####

prev_emp_part <- part  %>% select(empl) %>% group_by(empl) %>% summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%")) %>% 
  rename(Description = empl, Participants = Participants_percent)  

prev_emp_elig <- elig  %>% select(empl) %>% group_by(empl) %>% summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) %>%
  rename(Description = empl, Eligible = Eligible_percent)

prev_emp <- merge(prev_emp_part, prev_emp_elig, by='Description', all = TRUE)
prev_emp <- select(prev_emp, Description, Participants, Eligible)
prev_emp <- prev_emp[prev_emp$Description == 1,] 
prev_emp$Description[prev_emp$Description == 1 ] <- 'Previous employment'

####
##  Nationality
####

nat_part <- part  %>% select(slovak, hungarian, roma, czech, othern) %>% 
  group_by(slovak, hungarian, roma, czech, othern) %>% summarise(Participants_total = n(), .groups = 'drop') %>%
  mutate(Participants = paste0(round(100 * Participants_total / sum(Participants_total),2),'%')) 
nat_part$othern <- ifelse(nat_part$othern == 'TRUE', 1,0)
nat_part <- reshape2::melt(nat_part, measure.vars = c('slovak', 'hungarian', 'roma', 'czech', 'othern'))
nat_part <- nat_part[nat_part$value == 1,] 
nat_part <- select(nat_part, -value)

nat_elig <- elig  %>% select(slovak, hungarian, roma, czech, othern) %>% 
  group_by(slovak, hungarian, roma, czech, othern) %>% summarise(Eligible_total = n(), .groups = 'drop') %>%
  mutate(Eligible = paste0(round(100 * Eligible_total / sum(Eligible_total),2),'%')) 
nat_elig$othern <- ifelse(nat_elig$othern == 'TRUE', 1,0)
nat_elig <- reshape2::melt(nat_elig, measure.vars = c('slovak', 'hungarian', 'roma', 'czech', 'othern'))
nat_elig <- nat_elig[nat_elig$value == 1,] 
nat_elig <- select(nat_elig, -value)

nat <-  merge(nat_part, nat_elig, by='variable', all = TRUE)
nat <- select(nat, variable, Participants, Eligible)
nat <- rename(nat, Description = variable)

nat <- nat %>%mutate(
  Description = case_when(
    nat$Description == 'slovak' ~ 'Slovak', 
    nat$Description == 'hungarian' ~ 'Hungarian', 
    nat$Description == 'czech'~ 'Czech', 
    nat$Description == 'roma' ~ 'Roma', 
    nat$Description == 'othern'~ 'Other', 
  ) 
)


x <- c('Slovak','Hungarian', 'Czech','Roma','Other')
nat <- nat[match(x, nat$Description),]

####
##  Length of the unemployment spell
####

part$un_spell <- as.integer(part$exit - part$entry)
elig$un_spell <- as.integer(elig$exit - elig$entry)

un_spell <- cbind(round(mean(part$un_spell),2), round(mean(elig$un_spell),2))
un_spell <- data.frame(cbind(Description = 'Length of the unemployment spell', un_spell))
un_spell <- rename(un_spell, Participants = V2, Eligible=V3)


####
##  Length of spell between unemployment and participation
####

#Rozdiel medzi evidenciou nezamestnanosti a nastúpenia do AOTP 
part$spell_b <- as.integer(part$entrya - part$entry)

spell_bup <- part %>% select(spell_b) %>% 
  mutate(month = ceiling(spell_b/30.417)) #roundup -> ceiling

spell_bup_p <- ggplot(data=spell_bup, aes(month)) + 
  geom_histogram(binwidth=1, fill="grey", color="black", alpha=0.9) +
  theme_light() +
  theme(legend.position = "none")+
  labs(
    title = "Inflow into the programme in months\nsince the start of the unemployment",
    x = "Months",
    y = "Total Count"
  )  + 
  scale_x_continuous()

####
##  Length of AOTP
####

part$spell_aotp <- as.integer(part$exita - part$entrya)

spell_aotp <- part %>% select(spell_aotp) %>% 
  mutate(month = ceiling(spell_aotp/30.417)) #roundup -> ceiling

spell_aotp_p <- ggplot(data=spell_aotp, aes(month)) + 
  geom_histogram(binwidth=1, fill="grey", color="black", alpha=0.9) +
  theme_light() +
  theme(legend.position = "none") +
  labs(
    title = "Length of participation\n (in months)",
    x = "Months",
    y = "Total Count"
  ) + 
  scale_x_continuous(breaks = scales::breaks_extended(length(unique(spell_aotp$month))))


####
##  Compare of length spell
####

spell_p <- ggarrange(spell_bup_p, spell_aotp_p)


####
##  In flow
####

a <- format(seq(as.Date(ep_start),length=3,by="1 month"),"%Y-%m")
b <- format(seq((ymd(as.Date(ep_start)) %m+% months(3)),length=3,by="1 month"),"%Y-%m")
c <- format(seq((ymd(as.Date(ep_start)) %m+% months(6)),length=3,by="1 month"),"%Y-%m")
d <- format(seq((ymd(as.Date(ep_start)) %m+% months(9)),length=3,by="1 month"),"%Y-%m")

#vstúpili do programu
in_part <- part %>% select(entrya) %>% group_by(format(as.Date(entrya),"%Y-%m")) %>%
  summarise(Participants_total = n()) %>%
  mutate(Participants_percent = round(100 * Participants_total / sum(Participants_total),2)) 
colnames(in_part)[1] <- 'Description'
in_part <- filter(in_part, str_detect(in_part$Description, (format(as.Date(ep_start),"%Y"))))

in_part <- in_part%>%mutate(
  Description = case_when(
    in_part$Description %in% a ~  paste0('1Q.', (format(as.Date(ep_start),"%Y"))), 
    in_part$Description %in% b ~  paste0('2Q.', (format(as.Date(ep_start),"%Y"))), 
    in_part$Description %in% c ~  paste0('3Q.', (format(as.Date(ep_start),"%Y"))), 
    in_part$Description %in% d ~  paste0('4Q.', (format(as.Date(ep_start),"%Y"))), 
  ) 
)

in_part <- in_part %>% select(Description, Participants_total) %>% group_by(Description) %>%
  summarise(Participants_total = sum(Participants_total)) %>% 
  mutate(Participants = paste0(round(100 * Participants_total / sum(Participants_total),2),'%'))

#sa stali nezamestnaný 
in_elig <- elig %>% select(entry) %>% group_by(format(as.Date(entry),"%Y-%m")) %>%
  summarise(Eligible_total = n()) %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) 
colnames(in_elig)[1] <- 'Description'
in_elig <- filter(in_elig, str_detect(in_elig$Description, (format(as.Date(ep_start),"%Y"))))

in_elig <- in_elig%>%mutate(
  Description = case_when(
    in_elig$Description %in% a ~ paste0('1Q.', (format(as.Date(ep_start),"%Y"))), 
    in_elig$Description %in% b ~ paste0('2Q.', (format(as.Date(ep_start),"%Y"))), 
    in_elig$Description %in% c ~ paste0('3Q.', (format(as.Date(ep_start),"%Y"))), 
    in_elig$Description %in% d ~ paste0('4Q.', (format(as.Date(ep_start),"%Y"))), 
  ) 
)

in_elig <- in_elig %>% select(Description, Eligible_total) %>% group_by(Description) %>%
  summarise(Eligible_total  = sum(Eligible_total)) %>%
  mutate(Eligible = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) 

inflow <- merge(in_part, in_elig, by='Description', all = FALSE)
inflow <- select(inflow, Description, Participants, Eligible)


####
##  Outflow
####

#vystúpili z programu

out_part <- part %>% select(exita) %>% group_by(format(as.Date(exita),"%Y-%m")) %>%
  summarise(Participants_total = n()) %>%
  mutate(Participants_percent = round(100 * Participants_total / sum(Participants_total),2)) 
colnames(out_part)[1] <- 'Description'
out_part <- filter(out_part, str_detect(out_part$Description, (format(as.Date(ep_start),"%Y"))))

out_part <- out_part%>%mutate(
  Description = case_when(
    out_part$Description %in% a ~ paste0('1Q.', (format(as.Date(ep_start),"%Y"))), 
    out_part$Description %in% b ~ paste0('2Q.', (format(as.Date(ep_start),"%Y"))), 
    out_part$Description %in% c ~ paste0('3Q.', (format(as.Date(ep_start),"%Y"))), 
    out_part$Description %in% d ~ paste0('4Q.', (format(as.Date(ep_start),"%Y"))), 
  ) 
)

out_part <- out_part %>% select(Description, Participants_total) %>% group_by(Description) %>%
  summarise(Participants_total = sum(Participants_total)) %>% 
  mutate(Participants = paste0(round(100 * Participants_total / sum(Participants_total),2),'%'))

#vystúpili z evidencie -> zamestnali sa 
out_elig <- elig %>% select(exit) %>% group_by(format(as.Date(exit),"%Y-%m")) %>%
  summarise(Eligible_total = n()) %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) 
colnames(out_elig)[1] <- 'Description'
out_elig <- filter(out_elig, str_detect(out_elig$Description, (format(as.Date(ep_start),"%Y"))))

out_elig <- out_elig%>%mutate(
  Description = case_when(
    out_elig$Description %in% a ~ paste0('1Q.', (format(as.Date(ep_start),"%Y"))), 
    out_elig$Description %in% b ~ paste0('2Q.', (format(as.Date(ep_start),"%Y"))), 
    out_elig$Description %in% c ~ paste0('3Q.', (format(as.Date(ep_start),"%Y"))), 
    out_elig$Description %in% d ~ paste0('4Q.', (format(as.Date(ep_start),"%Y"))), 
  ) 
)

out_elig <- out_elig %>% select(Description, Eligible_total) %>% group_by(Description) %>%
  summarise(Eligible_total  = sum(Eligible_total)) %>%
  mutate(Eligible = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) 

outflow <- merge(out_part, out_elig, by='Description', all = FALSE)
outflow <- select(outflow, Description, Participants, Eligible)


####
##  Children in the household
####

child_part <- part  %>% select(kids) %>% group_by(kids) %>% summarise(Participants_total = n())  %>%
  mutate(Participants = paste0(round(100 * Participants_total / sum(Participants_total),2), "%")) %>% 
  rename(Description = kids)  

child_elig <- elig  %>% select(kids) %>% group_by(kids) %>% summarise(Eligible_total = n())  %>%
  mutate(Eligible = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) %>%
  rename(Description = kids)

child <- merge(child_part, child_elig, by='Description', all = TRUE)
child <- select(child, Description, Participants, Eligible)
child <- child[child$Description == 1,] 
child$Description[child$Description == 1] <- 'Children in the household'


####
##  Fields of study
####

study_part <- part  %>% select(odbor) %>% group_by(odbor) %>%
  summarise(Participants_total = n()) %>%
  mutate(Participants = round(100 * Participants_total / sum(Participants_total),2)) 

study_elig <- elig  %>% select(odbor) %>% group_by(odbor) %>%
  summarise(Eligible_total = n()) %>%
  mutate(Eligible = round(100 * Eligible_total / sum(Eligible_total),2)) 

study <- merge(study_part, study_elig, by='odbor', all = TRUE)
study <- select(study, odbor, Participants, Eligible)
study$Participants <- ifelse(is.na(study$Participants), 0, study$Participants)

a <- as.character(c(seq(11,19,1)))
b <- as.character(c(seq(21,39,1)))
c <- as.character(c(seq(41,49,1)))
d <- as.character(c(seq(51,59,1)))
e <- as.character(c(seq(61,79,1)))
f <- as.character(c(seq(81,89,1)))
g <- as.character(c(seq(91,98,1)))

study <- study%>%mutate(
  odbor = case_when(
    study$odbor %in% a ~ 'Natural Science', 
    study$odbor %in% b ~ 'Technical sciences', 
    study$odbor %in% c ~ 'Agricultural, forestry and veterinary sciences', 
    study$odbor %in% d ~ 'Medical and pharmaceutical sciences', 
    study$odbor %in% e ~ 'Social sciences and services', 
    study$odbor %in% f ~ 'Sciences of culture and art', 
    study$odbor %in% g ~ 'Military and security sciences',
    study$odbor == 99 || study$odbor == 0 || study$odbor == 10 ~ 'General sciences and services ',
    TRUE~as.character(study$odbor)
  ) 
)

study <- study %>% select(odbor, Participants, Eligible) %>%
  group_by(odbor)  %>% 
  summarise(Participants = sum(Participants), Eligible = sum(Eligible))  %>%
  mutate(Participants = paste0(Participants, "%")) %>%
  mutate(Eligible = paste0(Eligible, "%")) 
study <- rename(study, Description = odbor)


####
##  SUMMARIZE ####
####

colnames(tab1)<-c("Description", "Participants", "Eligible")
colnames(mean_age)<-c("Description", "Participants", "Eligible")
colnames(male)<-c("Description", "Participants", "Eligible")
colnames(prev_emp)<-c("Description", "Participants", "Eligible")
colnames(un_spell)<-c("Description", "Participants", "Eligible")
colnames(child)<-c("Description", "Participants", "Eligible")

tables <- c('tab1', 'mean_age', 'male', 'prev_emp', 'un_spell', 'child')
basics <- data.frame()

for (name in tables){
  table <- get(name)
  table <- mutate(table, across(everything(), as.factor))
  basics <- bind_rows(basics, table)
}

tables <- c('basics', 'education', 'study', 'skills', 'okres', 'nat', 'inflow', 'outflow')

for (name in tables){
  table <- get(name)
  table <- add_column(table, Variable = name, .after = "Eligible")
  colnames(table) <- c("Description", "Participants", "Eligible", "Variable")
  assign(name, table)
}


sum_table <- rbind(basics, education, study, skills, okres, nat, inflow, outflow)
sum_table <- sum_table %>% relocate(Variable, .before = Description) %>%mutate(
  Variable = case_when(
    sum_table$Variable == 'basics' ~ 'Basics',
    sum_table$Variable == 'education' ~ 'Education level',
    sum_table$Variable == 'study' ~ 'Field of study',
    sum_table$Variable == 'skills' ~ 'Skills',
    sum_table$Variable == 'okres' ~ 'Region',
    sum_table$Variable == 'nat' ~ 'Nationality',
    sum_table$Variable == 'inflow' ~ 'Inflow',
    sum_table$Variable == 'outflow' ~ 'Outflow',
    TRUE ~ as.character(sum_table$Variable)
  )
) 

sum_table$Participants <- ifelse(is.na(sum_table$Participants) | gsub('\\D','', sum_table$Participants) == "", 
                                 ifelse(is.na(sum_table$Participants) | gsub('\\D','', sum_table$Participants) == "", 
                                        ifelse(str_detect(as.character(sum_table$Eligible), regex("%")), '0%',0), 
                                        as.character(sum_table$Participants)),
                                 as.character(sum_table$Participants))

sum_table[,2:4]  %>% kbl(format = 'html', booktabs = TRUE , align = 'c', row.names = FALSE) %>%
  column_spec(1,  border_right = TRUE) %>%
  kable_classic('hover', full_width = FALSE) %>%
  pack_rows(index = table(fct_inorder(sum_table$Variable)))
Description Participants Eligible
Basics
Number of observations 1 840 188 946
Age (years) 37.6 38.1
Male 31.96% 50.66%
Previous employment 8.75% 9.02%
Length of the unemployment spell 251.37 337.02
Children in the household 14.35% 12.57%
Education level
No education 0.33% 0.81%
Primary 6.96% 14.8%
Lower secondary 26.03% 29.28%
Upper secondary 39.73% 34.08%
Tertiary 26.96% 21.02%
Field of study
Agricultural, forestry and veterinary sciences 3.59% 4.54%
General sciences and services 25.92% 34.8%
Medical and pharmaceutical sciences 3.09% 1.53%
Military and security sciences 0.33% 0.36%
Natural Science 1.4% 0.69%
Sciences of culture and art 0.6% 1.09%
Social sciences and services 36.19% 25.41%
Technical sciences 28.87% 31.59%
Skills
Foreign language 65.49% 58.71%
PC skill 64.35% 51.95%
Driving license 55.11% 52.47%
Region
Banskobystrický 9.13% 12.36%
Bratislavský 2.72% 11.66%
Košický 20.43% 15.28%
Nitriansky 11.58% 12.37%
Prešovský 21.09% 17.7%
Trenčiansky 8.53% 9.69%
Trnavský 8.97% 8.76%
Žilinský 17.55% 12.18%
Nationality
Slovak 93.7% 90.17%
Hungarian 5.38% 8.52%
Czech 0.38% 0.47%
Roma 0.11% 0.19%
Other 0.43% 0.65%
Inflow
1Q.2017 24.24% 27.76%
2Q.2017 26.36% 24.85%
3Q.2017 22.12% 24.18%
4Q.2017 27.28% 23.21%
Outflow
1Q.2017 1.18% 28.65%
2Q.2017 3.04% 31.04%
3Q.2017 47.55% 21.66%
4Q.2017 48.24% 18.65%

Graph 3: Timing and length of participation in the evaluated program

require(gridExtra)
if (nrow(spell_bup) > 5){
  mx <- max(spell_bup$month)
  mn <- min(spell_bup$month)
  plot1 <- ggplot(data=spell_bup, aes(month)) + 
    geom_histogram(binwidth=3, bins=30, breaks=seq(mn, mx, by=0.5), alpha=0.8,col="white", fill="steelblue3")  +
    theme_minimal()+
    ggtitle("Inflow into the program in months\nsince the start of the unemployment") +
    xlab("Months") +
    ylab("Total Count")+ 
    scale_x_continuous(breaks= pretty_breaks())
  
}

if (nrow(spell_aotp) > 5){
  
  mx <- max(spell_aotp$month)
  mn <- min(spell_aotp$month)
  plot2 <- ggplot(data=spell_aotp, aes(month)) + 
    geom_histogram(binwidth=3, bins=30, breaks=seq(mn, mx, by=0.5), alpha=0.8,col="white", fill="steelblue3")  +
    theme_minimal()+
    ggtitle("Length of participation\n(in months)") +
    xlab("Months") +
    ylab("Total Count")+ 
    scale_x_continuous(breaks= pretty_breaks())
  
}
grid.arrange(plot1, plot2, ncol=2)


3. The impact of participation in the Commuting allowance during 2017

The ultimate objective of ALMP programmes is improving employment chances of its beneficiaries. Constrained by the data provided, we construct four outcome indicators. All four of them are based on the presence in (absence from) the register of JSs administrated by COLSAF. They are, therefore, proxies of the employment status.

#Estimation parameters
#LL: there will be four different samples according to how long have JSs been unemployed prior to receiving the training
Ssamples <- seq(1,4)

# Month of participation since the start of the evaluation period
participation_month<-((year(as.Date(esample$entrya))-min(year(ep_start)))*12)+month(as.Date(esample$entrya)) 

#LL: participation quarter, if JS was treated on 3 Feb, pcpQ == 1
pcpQ<-ceiling(participation_month/3)
max_pcpQ<-max(pcpQ, na.rm = TRUE)

#LL: create an object that will store data
Mdata<-c()

#LL: these are the periods that we will look at
# the negative values correspond to "placebo" effects. We should not see any effect there, or only a small one.
OQm <- seq(-12,36,3)
OQ <- -4:12
O_vars <- c(paste("empl",OQ, sep=""), "firstempl", "cumempl")
O_vars<-str_replace(O_vars, "-", ".")
#LL: O_vars stores outcome variables
# empl.2 means employment 2 quarters before the start of the evaluation period.
# empl0 will correspond to the last quarter of the year before the start of the eval. period
# empl1 will correspond to the first quarter of the start year of the eval. period


#LL: This is a list of baseline covariates that will be used throughout the analysis
# they correspond to a reasonable minimum of information that should be controlled for
# in order to have a meaningful comparison
list_vars <- c('ent', 'male', 'married','kids',
             'slovak', 'noedu','primary', 'lsec', 'usec',
             'flang', 'drive', 'pc',
             'unpast', 'min_urad', 'min_BA',
             'UR_region', 'roma_share', 'population', 'age')

#LL: In order to have a credible comparison groups. We need to look how similar the groups are, how balanced they are.
# we compare the mean differences BEFORE adjustment and AFTER adjustment
Balance_vars <- list_vars

#LL: We allocate objects that will store the results

#LL: number of treated units
N<-nrow(esample[esample$treated==TRUE,])

#LL: number of treated units in a particular esample
N_sp <- matrix(NA, nrow=length(Ssamples))  

#LL: results array ATT. We are interested in the average treatment effect on the TREATED subpopulation.
# in this particular case, most LM programs are intended for a specific subpopulation.
# it is therefore of less interest to look at the whole population of JSs (ATE)
resultsArray_ATT  <- array(NA, dim=c(length(O_vars),length(Ssamples))) 
dimnames(resultsArray_ATT)[[1]] <- c(O_vars)

#LL: we also store standard errors that quantify STATISTICAL uncertainty of our estimates
resultsArray_se  <- array(NA, dim=c(length(O_vars),length(Ssamples))) 
dimnames(resultsArray_se)[[1]] <- c(O_vars)
results <- array(NA, dim=c(length(O_vars),2))

### 1. Counting of ATTs 
Sesample <- esample

###Four sub-samples based on the waiting time until participation in the evaluated measure (cutoffs p25, p50, p75): 
#LL: these will make 4 (roughly equally sized) different groups according to how long JSs have been unemployed
# this is an important determinant of the effect and it is important to control for it
wtc25<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.25)
wtc50<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.50)
wtc75<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.75)

 minToP<-min(as.numeric(Sesample$entrya)-as.numeric(Sesample$entry), na.rm = TRUE)

partic<-Sesample[Sesample$treated==TRUE,]
nonpart<-Sesample[Sesample$treated==FALSE,]

#LL: Nonparticipants have only criterium of minimal length of unemployment
# (thus one non-participant can be used multiple times)
esample1<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>minToP,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc25,])
esample2<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc25,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc25 &
                         as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc50,])
esample3<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc50,], 
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc50 &
                         as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc75,])
esample4<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc75,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc75,])

#LL: We loop over 4 different lenghts of the prior unemployments
for (s in Ssamples) {
    
    esampleS <- get(paste0('esample', s, by = ''))
    
    # we count how many participants are there in a particular group.
    N_sp[s,] <- nrow(esampleS[esampleS$treated==TRUE,])
    
    if (mean(esampleS$treated) > 0.0001){
      
      #LL: we pick a particular subsample of variables.
      esampleS <-esampleS[,c("treated", "klient_id", 
                             "entry", "exit", "entrya", 
                             list_vars)]
      
      # Month of participation since the start of the evaluation period 
      esampleS$participation_month<-((year(as.Date(esampleS$entrya))-year(ep_start))*12)+
        month(as.Date(esampleS$entrya))
      
      #LL: maybe simplify it to
      #month(as.Date(esampleS$entrya))
      #?
      # we can use this with current specification of esample ( participants in the evaluated programme during the evaluation period (ep -> one year) + eligible unemployed at the same period)
      # if we change  specification of esample (i.e. What happened with those ones who became unemployed in a specific year (2017) and they can enter into the program after 3 months since they become unemployed (but they can enter into program 24 months since they become unemployed too))
      
      # Participation quarter
      esampleS$pcpQ<-ceiling(esampleS$participation_month/3)
      
      #LL: instead, we could have just 
      #esampleS$pcpQ<-quarter(as.Date(esampleS$entrya))
      # without the need to define participation_month
      
      # Month of the start of the unemployment since the start of the evaluation period
      esampleS$Ustart_month<-((year(as.Date(esampleS$entry))-min(year(ep_start)))*12)+
        month(as.Date(esampleS$entry)) 
      #LL: notice that this is a relative number, thus it can be negative(!)
      
      # Inflow quarter 
      esampleS$infQ<-ceiling(esampleS$Ustart_month/3)
      #LL: infQ = 0 means that this JS entered the register in Q4 of 2016 (if ep_start=="2017-01-01")
      
      # Outflow quarter
      esampleS$Uend_month<-((year(as.Date(esampleS$exit))-min(year(ep_start)))*12)+
        month(as.Date(esampleS$exit)) 
      esampleS$outQ<-ceiling(esampleS$Uend_month/3)
      
      ##Difference between entry into unemployment register and started of participation in measure
      esampleS$diff_entry <- ceiling(as.integer(as.Date(esampleS$entrya) - as.Date(esampleS$entry))/30.417)
      #LL: NOTEL why 30.417?
      # table(ceiling(difftime(as.Date(esampleS$entrya), as.Date(esampleS$entry), units = "days")/30.417)) 
      # parameters of difftime units  = c("auto", "secs", "mins", "hours", "days", "weeks")
      # haven't "months" unit 
      
      #Adding unemployment history
      #LL: previous unemployment history is an important predictor of both the treatment and outcomes
      # it is important to control for it.
      esampleS<-merge(esampleS, 
                      h_esample[,c("klient_id", "entry", 
                                   paste0("entry", seq(1:11), sep=""), 
                                   paste0("exit", seq(1:11), sep=""))],
                      by=c("klient_id", "entry"), all.y = FALSE)
      
      #Imputing the start of the unemployment spell 
      #LL: wait, isn't it 
      # #Imputing the start of the programme ? YES 
      
      #LL: we wish to make a meaningful comparison.
      # But we're facing a problem because for the non-participants, we don't have date of entry to the course
      # well simply because they did not participate(!)
      # what we do is the following.
      # for every suitable non-participant we chose one particular quarter at random from the evaluation year that is "feasible"
      # "feasible" means that the non-participant _could_have_ potentially participated in that quarter 
      # given that we have a large donor pool, this randomness will not impact our estimates much.
      

            
      #LL: In Pmatrix, we will store for every non-participant, the feasible quarters.
      # e.g. if it is [0 1 1 1], it means that with prob. 1/3 we pick quarter 2,3 or 4
      Pmatrix <- matrix(NA, nrow = nrow(esampleS[esampleS$treated==FALSE,]), ncol=max_pcpQ)
  
          #LL: we loop through the participation quarters
          for (p in 1:max_pcpQ){  
            
            #LL: NOTE!!!! 
            # we restrict ourselves to only non-participants who are at the (beginning of the) particular pcpQ unemployed
            ind <- (esampleS$outQ[esampleS$treated==FALSE]>=p)
            
            # Quarter of inflow to unemployment of participants entering the programme in quarter P
            PinfQ<-unique(esampleS$infQ[esampleS$treated==TRUE & esampleS$pcpQ==p])
            # Only allowing non-participants inflowing to unemployment during the quarters when participants in this sub-sample were inflowing
            #D<-esampleS[as.logical(esampleS$treated==T & esampleS$pcpQ == p) |
            #              as.logical(esampleS$treated==F & esampleS$infQ %in% PinfQ),]
            #LL: NOTE: what is this last commented bit?

            Pmatrix[ind,p]<-as.numeric((esampleS$infQ[esampleS$treated==FALSE])[ind]  %in% PinfQ)
            Pmatrix[!ind,p] <- 0
            
            #LL: does, for this particular quarter of inflow, exists anyone from the list of participants?
            # in other words
            # can we, in a given sample for a given quarter match it to at least one participant?
            
          }
      
        hh <- function(j){
              sample(which(j==1),1)
        }
        
        sumIsZero    <- as.logical(apply(Pmatrix, 1, FUN=sum)==0)
        sumIsNonZero <- as.logical(apply(Pmatrix, 1, FUN=sum)!=0)

        esampleS[esampleS$treated == FALSE,]$pcpQ[sumIsZero] <- 0
        #LL: for every non-participant, for whose inflow we cannot match to ANY participant
        # for any of the four quarters, we assign zero
        
        esampleS[esampleS$treated == FALSE,]$pcpQ[sumIsNonZero] <- apply(Pmatrix[sumIsNonZero,],1,hh)
        #LL: for every non-participant we pick one of the feasible quarters RANDOMLY for given esampleS (we have four of these)
        # (remember: feasible means that there exists at least one participant for inflow)
        #LL: NOTE: did we fix the seed?? if not, we should.
  
        
                #LL: loop across different time periods
                for (q in OQm) { 
                  
                  cond <- FALSE
                  
                  #LL: we add variables whether someone is in the register.
                  # we consider at most 11 unemployment spells.
                    for (n in 1:11) { 
                      entry_n <- esampleS[ ,paste0('entry', n, '')]
                      exit_n <- esampleS[, paste0('exit', n, '')]
                      
                      cond <- cond | isInRegister(p = esampleS$pcpQ, q, entry_n, exit_n) 
                    }
                  
                  esampleS[ ,paste0('empl',ceiling(q/3), '')] <- ifelse(cond, 0,1)
                }
             
           #LL: NOTE: can't this be in the same loop (?)
              for(q in OQm){
                esampleS <- esampleS[!is.na(esampleS[ ,paste0('empl',ceiling(q/3), '')]),]
              }
        
            #Pre-estimation data preparation
            #Additional outcome variables
            esampleS[ ,"cumempl"] <- rowSums(esampleS[, paste0("empl", seq(0,12,1), sep="")])
            esampleS[ ,"firstempl"] <- ceiling(as.numeric((as.Date(esampleS$exit)-
                                                      (as.Date(ep_start) + months(esampleS$pcpQ*3)))/90)) 
            #LL: NOTE: here persons with multiple unempl spell WITHIN the studied year could create problems
            
            # Cleaning the Xs
            y <- data.frame(treated = esampleS[ ,'treated'])
            D <- data.frame(esampleS) #D - su tvoje data ako data.frame()
            spec <- as.formula(cbind(y,D[,c(list_vars, 
                                            paste0("empl.", seq(1,4,1), sep=""))])) 
            #je tvoja specifikacia modelu, je to object as.formula()
            colTh <- 0.8 
            #Treshold for the acceptable correlation between vars ( je maximalne tolerovana korelacia medzi dvoma premennymi)
            dumTh <- 0.0001 
            #Treshold for the acceptable concentration of dummy variables (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            TdumTh <- 0.005
            #Treshold for the acceptable concentration of dummy variables in the sub-group of participants (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            TdumThN <- 5
            #Treshold for the minimal number of observations of a dummy variables in the sub-group of participants (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            result <- DatPrep(D = D, spec = spec, colTh = colTh, dumTh = dumTh, TdumTh = TdumTh, TdumThN = TdumThN) 
            
        for (col in colnames(result$D)){
          D <-  D[!is.na(D[,col]),]
        }    
            
        # ESTIMATION:
        m.1 <- matchit(as.formula(paste(result$spec[2], '~', result$spec[3] , sep = ' ')), 
                       data = D,
                       method = "nearest", 
                       exact = c("infQ"),
                       distance = "glm", link = "probit")
        #plot(summary(m.1))
        m.data1 <- match.data(m.1)
        match.matrix <- data.frame("untreated" = m.1$match.matrix[,1] ,
                                   "treated" = rownames(m.1$match.matrix))
        m.data1[as.character(match.matrix[, 'untreated']), 'entrya'] <- m.data1[as.character(match.matrix[, 'treated']),'entrya']
        m.data1$diff_entry <- ceiling(as.integer(as.Date(m.data1$entrya) - as.Date(m.data1$entry))/30.417)
        
        assign(paste("Mdata",s, sep=""), m.data1)
        Mdata <- bind_rows(Mdata, m.data1)
        

        assign(paste0('balancegraph',s, by=''),summary(m.1, subclass = TRUE))
        
        
        #distance in m.data1 is Propensity score
        #trim = 0.005
        #m.data1 <- m.data1[!(m.data1$distance <= trim | m.data1$distance >= (1-trim)),]
  
        
            for (iQ in O_vars){
              fit <- lm(as.formula(paste(iQ , '~',  result$spec[2], '+' , result$spec[3], by = ' ')), 
                        data = m.data1, 
                        weights = weights)
              res <- coeftest(fit, vcov. = vcovCL, cluster = ~subclass)
              att <- res[2,1]
              se <- res[2,2]
              
              resultsArray_ATT[iQ,s] <- att
              resultsArray_se[iQ,s] <- se
              
        }
    }
  } 


for (iQ in 1:length(O_vars)){
      results[iQ,1] <- sum((resultsArray_ATT[iQ,]*(N_sp/N)))
      results[iQ,2] <- sum((resultsArray_se[iQ,]*(N_sp/N)))
      #results[iQ,2] <- sum((resultsArray_se[iQ,]^2*(N_sp/N)))
}

# LL: sanity check
#plot(resultsArray_ATT[1:17,1])
#lines(resultsArray_ATT[1:17,2])
#lines(resultsArray_ATT[1:17,3])
#lines(resultsArray_ATT[1:17,4])

results<-cbind(O_vars, results)   

#Mdata<-rbind(Mdata1, Mdata2, Mdata3, Mdata4)

resultsPSM <- matrix(NA, nrow=length(O_vars), ncol = 2)

for (iQ in 1:length(O_vars)){
            fit <- lm(as.formula(paste(O_vars[iQ] , '~',  result$spec[2], '+' , result$spec[3], by = ' ')), 
                      data = Mdata, 
                      weights = weights)
            res <- coeftest(fit, vcov. = vcovCL, cluster = ~subclass)
            #LL: source of clustering?
            att <- res[2,1]
            se <- res[2,2]
            
            resultsPSM[iQ,1] <- att
            resultsPSM[iQ,2] <- se
            
        }

resultsPSM<-cbind(O_vars, resultsPSM)

#LL: why do results and results PSM give different values?
# I see, it is the standard errors. They are effectively cut in half
# that is in line with square-root convergence, because the sample size is quadrupled.

Graph 4: Balance of the groups of participants and eligible before and after weighting

match.vars <- c('distance',list_vars,paste0("empl.", seq(1,4,1), sep=""),'infQ')


balance_g  <- array(NA, dim=c(length(match.vars),2,length(Ssamples))) 
dimnames(balance_g)[[1]] <-match.vars
dimnames(balance_g)[[2]] <- c('Balance_for_All_Data','Balance_for_Matched_Data')

for(i in 1:length(Ssamples)){
  
  if(exists(paste0('balancegraph', i, by = ''))){
    x <- get(paste0('balancegraph', i, by = ''))
    balance_g[,1,i] <- abs(x$sum.all[,3][match(rownames(balance_g), names(x$sum.all[,3]))])
    balance_g[,2,i] <- abs(x$sum.matched[,3][match(rownames(balance_g), names(x$sum.matched[,3]))])
  }                  
}

balance_f <- array(NA, dim=c(length(match.vars),2)) 
dimnames(balance_f)[[1]] <-match.vars
dimnames(balance_f)[[2]] <- c('Balance_for_All_Data','Balance_for_Matched_Data') 

for (iQ in 1:length(match.vars)){
      balance_f[iQ,1] <- sum(sum((as.numeric(balance_g[iQ,1,])*(N_sp/N)), na.rm = TRUE), na.rm = TRUE)
      balance_f[iQ,2] <-sum(sum((as.numeric(balance_g[iQ,2,])*(N_sp/N)), na.rm = TRUE), na.rm = TRUE)
}

balance_fg <- data.frame(balance = c(balance_f[,1], balance_f[,2]),
                 Balance = c(rep("before matching",length(match.vars)),rep("after matching",length(match.vars))),
                 names = c(match.vars,match.vars) ) 

p <- balance_fg %>% 
    mutate(names = fct_reorder(names, balance)) %>%
    ggplot(aes(x=balance, y=names,col=Balance)) + 
    geom_vline(xintercept = 0.05, linetype="dotted", color = 'darkgrey') + 
    geom_vline(xintercept = 0.0, color = 'darkgrey')+
    geom_vline(xintercept = 0.1, color = 'darkgrey')+
    geom_point()+
    theme_minimal()+
    ylab('Variables')+
    xlab('Absolute Standardized Meand Difference')+
    theme(plot.caption = element_text(hjust = 0), legend.position = "top")+
    labs(caption="Source: ÚPSVaR")
  
p

After inspecting the achieved balance of the “control group” and participants, we can proceed with reporting the estimated effects of participation in the Commuting allowance during 2017.

3.1 Labour market outcomes of participants and eligible

First, we construct a proxy of the employment rate. It is a share of persons out of the register of unemployed JSs observed on the first day of quarters before and after participation (Quarter 0 was the quarter when participation started).

Graph 5: The share of individuals out of the unemployment register (proxy for employment rate)

graph <- matrix(NA, nrow=length(O_vars[1:17]), ncol = 2)
rownames(graph) <- O_vars[1:17]
for(o in O_vars[1:17]){
  graph[o,1] <- apply(Mdata[Mdata$treated == TRUE,o,drop = FALSE], 2, mean, na.rm = TRUE)
  graph[o,2] <- apply(Mdata[Mdata$treated == FALSE,o,drop = FALSE], 2, mean, na.rm = TRUE)
}

graph <- data.frame(cbind(O_vars[1:17],graph))
colnames(graph) <- c('O_vars','treated', 'untreated')
graph$treated <- as.numeric(as.character(graph$treated))
graph$untreated <- as.numeric(as.character(graph$untreated))
graph$O_vars <- factor(graph$O_vars, levels=c(O_vars[1:17]))
graph$Q<-seq(-4,12,1)

graphER<- ggplot(data=graph) +
  geom_line(aes(x=Q, y=treated, colour="Treated"), size = 1 , group = 1) +
  geom_line(aes(x=Q, y=untreated, colour="Control group"), size = 1 , group = 1) +
  scale_x_continuous(breaks=seq(-4,12,1))+
  theme_minimal()+
  labs( 
       y = "Share of treated",
       x = "Quarters before and after the start of the participation (0)",
       colour = "Group", 
       caption="Source: ÚPSVaR") +
  theme(plot.caption = element_text(hjust = 0), legend.position = "top")

graphER

Second, we look at the number of quarters until the first exit from the register of unemployed JSs. This indicates how participation in the ALMP programme contributed to the shortening of the participants’ unemployment spell. The graph displays the whole distribution of participants and the control group based on the values of this outcome indicator.

Graph 6: Quarters until the first exit

#### THE NUMBER OF MONTHS UNTIL THE FIRST EXIT
########## Months until the first exit
# how many months after the entrya JS got a job

firstempl_P <- data.frame('quarter' = Mdata[Mdata$treated==TRUE, 'firstempl']) %>% 
  group_by(quarter) %>% 
  summarise(total = n(), .groups = 'drop') %>%
  mutate(percent =  round( total / sum(total), 4),
         Group = "Participants") 

firstempl_E <- data.frame('quarter' = Mdata[Mdata$treated==FALSE, 'firstempl']) %>% 
  group_by(quarter) %>% 
  summarise(total = n(), .groups = 'drop') %>%
  mutate(percent =  round( total / sum(total), 4),
         Group = "Eligible") 

firstempl <- rbind(firstempl_P, firstempl_E)

ggplot(firstempl, aes(fill=Group, y=percent, x=quarter)) + 
          geom_bar(position="dodge", stat="identity") +
          facet_wrap(~Group) +
          theme_minimal() + 
          scale_fill_manual(values=c('grey', 'steelblue3')) +
          scale_x_continuous(breaks=seq(0,16,1)) +
          theme(legend.position="none") +
          xlab("Quarters after the start of the participation (0)") + 
          ylab("") +
          labs(caption="Source: COLSAF")+
          scale_y_continuous(labels = percent)

The following graph displays the cumulative number of months spent out of the register during the observation period following the evaluation program’s end.

Graph 7: Number of quarters out of the unemployment register after the end of the participation

## Plotting the number of months in cumulative employment
###### Participants

empl36m_P <- data.frame('quarter' = Mdata[Mdata$treated==TRUE, 'cumempl']) %>% 
  group_by(quarter) %>% 
  summarise(total = n(), .groups = 'drop') %>%
  mutate(percent =  round( total / sum(total), 4),
         Group = "Participants") 

empl36m_E <- data.frame('quarter' = Mdata[Mdata$treated==FALSE, 'cumempl']) %>% 
  group_by(quarter) %>% 
  summarise(total = n(), .groups = 'drop') %>%
  mutate(percent =  round( total / sum(total), 4),
         Group = "Eligible") 

empl36m <- rbind(empl36m_P, empl36m_E)

ggplot(empl36m, aes(fill=Group, y=percent, x=quarter)) + 
  geom_bar(position="dodge", stat="identity") +
  facet_wrap(~Group) +
  theme_minimal() + 
  scale_fill_manual(values=c('grey', 'steelblue3')) +
  scale_x_continuous(breaks=seq(0,16,1)) +
  theme(legend.position="none") +
  xlab("Quarters after the start of the participation (0)") + 
  ylab("") +
  labs(caption="Source: COLSAF")+
  scale_y_continuous(labels = percent)

3.2 Estimation of the average treatment effects on the treated (ATTs)

Now, we explore the average treatment effects on the treated (ATTs).

Graph 8: The average treatment effects on the treated (ATTs) estimated for the participation in the Commuting allowance during 2017

graphATT

3.3 Statistical significance and heterogeneity of ATT effects

Further, we report the ATTs on the quarterly proxy for employment status, complemented by the indicators of:

  • Number of months until the first exit from the unemployment register JSs (firtempl)
  • Cumulative number of months outside the unemployment register of unemployed JSs (cumempl)

Table 3: The average treatment effect on the treated (ATT)

resultsDF <- select(resultsDF, -pval)
colnames(resultsDF) <- c("", "effect", "se", "sig.")

resultsDF %>% kbl(format = 'html', booktabs = TRUE , align = 'c') %>%
  column_spec(1,  border_right = TRUE) %>%
  kable_classic('hover', full_width = FALSE) %>%
  row_spec(1:nrow(resultsDF), font_size = '1cm') %>%
  column_spec(1:ncol(resultsDF),width = "1.1cm") %>%
  kableExtra::footnote(number = paste('Significance (sig.):',  0.0000, ' " *** " ', 0.001, ' " ** " ', 0.01, ' " * " ', 0.05, ' " . " ', 0.1, '" "',  1))
effect se sig.
empl.4 -0.002 0.007
empl.3 0.015 0.009 .
empl.2 -0.019 0.009
empl.1 0.050 0.006 ***
empl0 0.808 0.009 ***
empl1 0.609 0.012 ***
empl2 0.402 0.012 ***
empl3 0.273 0.012 ***
empl4 0.187 0.011 ***
empl5 0.144 0.011 ***
empl6 0.107 0.010 ***
empl7 0.094 0.010 ***
empl8 0.061 0.010 ***
empl9 0.059 0.010 ***
empl10 0.053 0.011 ***
empl11 0.056 0.011 ***
empl12 0.057 0.011 ***
firstempl -3.163 0.067 ***
cumempl 2.910 0.081 ***
1 Significance (sig.): 0 " *** " 0.001 " ** " 0.01 " * " 0.05 " . " 0.1 " " 1

Additionally, we explore the heterogeneity in the effects programme participation has on various subgroups of participants.

We explore the ATTs for subgroups based on:

  • Gender - differences in effects for women and men
  • Education level - differences in effects for JSs with no education, primary education, lower and secondary education.
  • Share of Roma in the city of residence - differences in effects for JSs who lives in the city with the share of Roma below 10% and above 10%
  • The size of the city of residence - differences in effects for JSs who lives in the city with a population less than 4 000 and more than 4 000

Table 4: The average treatment effect on the treated: subgroup based difference between entry into unemployment register and started of participation in measure

diff_entry
Timing of participation
Sample
0-6 months
7-12 months
13+ months
effect se sig. effect se sig. effect se sig. effect se sig.
empl.4 -0.002 0.007 -0.018 0.014 0.012 0.015 -0.021 0.007 **
empl.3 0.015 0.009 . 0.063 0.013 *** -0.037 0.020 . 0.000 0.001
empl.2 -0.019 0.009
-0.007 0.020 -0.040 0.008 *** 0.000 0.000
empl.1 0.050 0.006 *** 0.041 0.011 *** 0.061 0.010 *** 0.056 0.013 ***
empl0 0.808 0.009 *** 0.772 0.015 *** 0.909 0.012 *** 0.701 0.026 ***
empl1 0.609 0.012 *** 0.553 0.018 *** 0.744 0.019 *** 0.493 0.029 ***
empl2 0.402 0.012 *** 0.365 0.019 *** 0.481 0.021 *** 0.325 0.031 ***
empl3 0.273 0.012 *** 0.246 0.018 *** 0.326 0.021 *** 0.230 0.030 ***
empl4 0.187 0.011 *** 0.159 0.016 *** 0.236 0.020 *** 0.160 0.028 ***
empl5 0.144 0.011 *** 0.124 0.015 *** 0.161 0.019 *** 0.157 0.027 ***
empl6 0.107 0.010 *** 0.076 0.014 *** 0.139 0.017 *** 0.111 0.027 ***
empl7 0.094 0.010 *** 0.069 0.014 *** 0.118 0.017 *** 0.100 0.026 ***
empl8 0.061 0.010 *** 0.044 0.014 ** 0.074 0.017 *** 0.074 0.025 **
empl9 0.059 0.010 *** 0.069 0.015 *** 0.061 0.017 *** 0.037 0.026
empl10 0.053 0.011 *** 0.056 0.015 *** 0.066 0.018 *** 0.015 0.027
empl11 0.056 0.011 *** 0.058 0.017 *** 0.072 0.019 *** 0.021 0.028
empl12 0.057 0.011 *** 0.073 0.017 *** 0.056 0.018 ** 0.026 0.028
firstempl -3.163 0.067 *** -2.767 0.092 *** -3.749 0.120 *** -2.945 0.193 ***
cumempl 2.910 0.081 *** 2.664 0.112 *** 3.444 0.139 *** 2.451 0.227 ***
1 Significance codes (sig.): 0 " ** " 0.001 " * " 0.01 " * " 0.05 " . " 0.1 " " 1
2 Sample: 3 668 observations 0-6 months: 1 582 observations 7-12 months: 1 380 observations 13+ months: 706 observations

Table 5: The average treatment effect on the treated: subgroup based on gender

gender
Gender
Sample
Women
Men
effect se sig. effect se sig. effect se sig.
empl.4 -0.002 0.007 -0.009 0.010 0.004 0.019
empl.3 0.015 0.009 . 0.008 0.012 0.023 0.020
empl.2 -0.019 0.009
-0.006 0.012 -0.047 0.018 **
empl.1 0.050 0.006 *** 0.044 0.008 *** 0.058 0.012 ***
empl0 0.808 0.009 *** 0.808 0.011 *** 0.802 0.017 ***
empl1 0.609 0.012 *** 0.622 0.015 *** 0.576 0.023 ***
empl2 0.402 0.012 *** 0.421 0.015 *** 0.359 0.023 ***
empl3 0.273 0.012 *** 0.293 0.015 *** 0.228 0.023 ***
empl4 0.187 0.011 *** 0.216 0.014 *** 0.126 0.020 ***
empl5 0.144 0.011 *** 0.157 0.013 *** 0.117 0.020 ***
empl6 0.107 0.010 *** 0.121 0.013 *** 0.077 0.017 ***
empl7 0.094 0.010 *** 0.102 0.012 *** 0.075 0.018 ***
empl8 0.061 0.010 *** 0.074 0.012 *** 0.035 0.018 .
empl9 0.059 0.010 *** 0.081 0.012 *** 0.015 0.017
empl10 0.053 0.011 *** 0.068 0.013 *** 0.024 0.019
empl11 0.056 0.011 *** 0.054 0.013 *** 0.060 0.020 **
empl12 0.057 0.011 *** 0.046 0.013 *** 0.083 0.019 ***
firstempl -3.163 0.067 *** -3.347 0.091 *** -2.781 0.108 ***
cumempl 2.910 0.081 *** 3.064 0.102 *** 2.577 0.144 ***
1 Significance codes (sig.): 0 " ** " 0.001 " * " 0.01 " * " 0.05 " . " 0.1 " " 1
2 Sample: 3 668 observations Women: 2 443 observations Men: 1 225 observations

Table 6: The average treatment effect on the treated: subgroup based on education level

education
Education
Sample
No education
Primary
Lower secondary
Upper secondary
effect se sig. effect se sig. effect se sig. effect se sig. effect se sig.
empl.4 -0.002 0.007 14.905 NaN -0.099 0.050
0.007 0.023 0.001 0.016
empl.3 0.015 0.009 . 6.169 NaN 0.023 0.048 0.007 0.023 0.013 0.018
empl.2 -0.019 0.009
-4.914 NaN 0.034 0.043 -0.021 0.022 -0.014 0.018
empl.1 0.050 0.006 *** -0.431 NaN 0.000 0.012 0.054 0.011 *** 0.035 0.011 ***
empl0 0.808 0.009 *** 1.000 NaN 0.825 0.034 *** 0.838 0.018 *** 0.793 0.016 ***
empl1 0.609 0.012 *** -4.833 NaN 0.640 0.047 *** 0.642 0.024 *** 0.577 0.021 ***
empl2 0.402 0.012 *** -2.082 NaN 0.418 0.053 *** 0.440 0.026 *** 0.352 0.021 ***
empl3 0.273 0.012 *** 9.817 NaN 0.375 0.052 *** 0.297 0.024 *** 0.234 0.019 ***
empl4 0.187 0.011 *** 4.823 NaN 0.315 0.047 *** 0.202 0.023 *** 0.150 0.017 ***
empl5 0.144 0.011 *** 3.904 NaN 0.197 0.051 *** 0.169 0.022 *** 0.110 0.017 ***
empl6 0.107 0.010 *** 3.904 NaN 0.160 0.052 ** 0.118 0.022 *** 0.071 0.015 ***
empl7 0.094 0.010 *** 25.381 NaN 0.126 0.051
0.120 0.021 *** 0.041 0.013 **
empl8 0.061 0.010 *** 27.214 NaN 0.103 0.050
0.060 0.021 ** 0.013 0.014
empl9 0.059 0.010 *** 1.732 NaN 0.095 0.049 . 0.039 0.023 . 0.036 0.016
empl10 0.053 0.011 *** -1.429 NaN 0.080 0.051 0.001 0.024 0.043 0.015 **
empl11 0.056 0.011 *** -1.429 NaN 0.034 0.054 0.026 0.024 0.041 0.017
empl12 0.057 0.011 *** -3.262 NaN 0.057 0.052 0.028 0.023 0.038 0.016
firstempl -3.163 0.067 *** -80.511 NaN -4.243 0.348 *** -3.401 0.160 *** -2.673 0.089 ***
cumempl 2.910 0.081 *** 64.738 NaN 3.426 0.386 *** 2.979 0.178 *** 2.498 0.117 ***
1 Significance codes (sig.): 0 " ** " 0.001 " * " 0.01 " * " 0.05 " . " 0.1 " " 1
2 Sample: 3 668 observations No education: 16 observations Primary: 279 observations Lower secondary: 932 observations Upper secondary: 1 438 observations

Table 7: The average treatment effect on the treated: subgroup based on share of Roma in the city of residence

romas
Roma share
Sample
0-10%
10-100%
effect se sig. effect se sig. effect se sig.
empl.4 -0.002 0.007 0.019 0.009
-0.074 0.024 **
empl.3 0.015 0.009 . 0.013 0.011 0.024 0.025
empl.2 -0.019 0.009
-0.021 0.012 . -0.002 0.022
empl.1 0.050 0.006 *** 0.057 0.008 *** 0.023 0.011
empl0 0.808 0.009 *** 0.801 0.010 *** 0.834 0.018 ***
empl1 0.609 0.012 *** 0.604 0.013 *** 0.618 0.027 ***
empl2 0.402 0.012 *** 0.385 0.014 *** 0.448 0.028 ***
empl3 0.273 0.012 *** 0.259 0.013 *** 0.319 0.027 ***
empl4 0.187 0.011 *** 0.172 0.013 *** 0.227 0.026 ***
empl5 0.144 0.011 *** 0.139 0.012 *** 0.158 0.024 ***
empl6 0.107 0.010 *** 0.101 0.011 *** 0.121 0.024 ***
empl7 0.094 0.010 *** 0.089 0.011 *** 0.104 0.024 ***
empl8 0.061 0.010 *** 0.057 0.011 *** 0.063 0.023 **
empl9 0.059 0.010 *** 0.060 0.011 *** 0.052 0.024
empl10 0.053 0.011 *** 0.052 0.011 *** 0.045 0.025 .
empl11 0.056 0.011 *** 0.052 0.012 *** 0.065 0.024 **
empl12 0.057 0.011 *** 0.052 0.012 *** 0.070 0.026 **
firstempl -3.163 0.067 *** -3.014 0.077 *** -3.570 0.168 ***
cumempl 2.910 0.081 *** 2.823 0.090 *** 3.125 0.197 ***
1 Significance codes (sig.): 0 " ** " 0.001 " * " 0.01 " * " 0.05 " . " 0.1 " " 1
2 Sample: 3 668 observations 0-10%: 2 786 observations 10-100%: 882 observations

Table 8: The average treatment effect on the treated: subgroup based on the size of the city of residence

City
Type of city
Sample
Village
City
effect se sig. effect se sig. effect se sig.
empl.4 -0.002 0.007 -0.026 0.012
0.032 0.016
empl.3 0.015 0.009 . 0.017 0.013 0.015 0.018
empl.2 -0.019 0.009
-0.018 0.013 -0.020 0.017
empl.1 0.050 0.006 *** 0.041 0.008 *** 0.064 0.012 ***
empl0 0.808 0.009 *** 0.805 0.012 *** 0.813 0.014 ***
empl1 0.609 0.012 *** 0.598 0.015 *** 0.626 0.019 ***
empl2 0.402 0.012 *** 0.403 0.016 *** 0.398 0.020 ***
empl3 0.273 0.012 *** 0.280 0.016 *** 0.263 0.019 ***
empl4 0.187 0.011 *** 0.193 0.015 *** 0.179 0.017 ***
empl5 0.144 0.011 *** 0.144 0.014 *** 0.140 0.017 ***
empl6 0.107 0.010 *** 0.109 0.013 *** 0.098 0.016 ***
empl7 0.094 0.010 *** 0.092 0.013 *** 0.089 0.016 ***
empl8 0.061 0.010 *** 0.067 0.012 *** 0.048 0.016 **
empl9 0.059 0.010 *** 0.062 0.013 *** 0.054 0.017 ***
empl10 0.053 0.011 *** 0.047 0.014 *** 0.058 0.017 ***
empl11 0.056 0.011 *** 0.046 0.014 *** 0.067 0.018 ***
empl12 0.057 0.011 *** 0.054 0.014 *** 0.061 0.018 ***
firstempl -3.163 0.067 *** -3.196 0.088 *** -3.069 0.102 ***
cumempl 2.910 0.081 *** 2.901 0.108 *** 2.893 0.128 ***
1 Significance codes (sig.): 0 " ** " 0.001 " * " 0.01 " * " 0.05 " . " 0.1 " " 1
2 Sample: 3 668 observations Village: 2 214 observations City: 1 454 observations

4. Technical appendix

The technical appendix provides additional details on the data processing procedures, complemented by results obtained by an alternative estimation strategy based on inverse probability weighting.

4.1 Sample Selection

This section displays additional detail about the selection and cleaning of the evaluation sample.

info_table <- data.frame(matrix(ncol=2,nrow=0, dimnames=list(NULL, c("Desciption", "Value"))))
info_table[1,] <- c('Start of the evaluation period', paste(ep_start))
info_table[2,] <- c('End of the evaluation period', paste(ep_end))

info_table <-
  info_table[,-3] %>% kbl(format = 'html', booktabs = TRUE , align = 'c', row.names = FALSE) %>%
  column_spec(1,  border_right = TRUE) %>%
  kable_classic('hover', full_width = FALSE)


sample_selection <- data.frame(matrix(ncol=3,nrow=0, dimnames=list(NULL, c("Desciption", "Dropped","Total"))))

sample_selection[1,] <- c('Total registrations ', 0,format(n1, big.mark=" ", scientific=FALSE))
sample_selection[2,] <- c('Total eligible jobseekers', 0,format(n2, big.mark=" ", scientific=FALSE))
sample_selection[3,] <- c('Total participations in the evaluated measure', 0,format(npart1, big.mark=" ", scientific=FALSE))
sample_selection[4,] <- c('Total participants in the evaluated measure', 0 ,format(npart0, big.mark=" ", scientific=FALSE))

x<-n1-n4
y<-npart1-n3
sample_selection[5,] <- c('Dropped eligible JSs', format(n4, big.mark=" ", scientific=FALSE), format(x, big.mark=" ", scientific=FALSE))
sample_selection[6,] <- c('Dropped participants', format(n3, big.mark=" ", scientific=FALSE), format(y, big.mark=" ", scientific=FALSE))

x<-x-n6
y<-y-n5
sample_selection[7,] <- c('Dropped eligible JSs', format(n6, big.mark=" ", scientific=FALSE),format(x, big.mark=" ", scientific=FALSE))
sample_selection[8,] <- c('Dropped participants', format(n5, big.mark=" ", scientific=FALSE),format(y, big.mark=" ", scientific=FALSE))

y<-y-n7
sample_selection[9,] <- c('Dropped participants', format(n7, big.mark=" ", scientific=FALSE),format(y, big.mark=" ", scientific=FALSE))

x<-x-n9
y<-y-n8
sample_selection[10,] <- c('Dropped eligible JSs', format(n9, big.mark=" ", scientific=FALSE),format(x, big.mark=" ", scientific=FALSE))
sample_selection[11,] <- c('Dropped participants', format(n8, big.mark=" ", scientific=FALSE),format(y, big.mark=" ", scientific=FALSE))

y<-nrowdfa-n10
sample_selection[12,] <- c('Inflating participants by merging dataframes', 0,format(nrowdfa, big.mark=" ", scientific=FALSE))
sample_selection[13,] <- c('Dropped participants', format(n10, big.mark=" ", scientific=FALSE),format(y, big.mark=" ", scientific=FALSE))

y<-y-n11
sample_selection[14,] <- c('Dropped participants', format(n11, big.mark=" ", scientific=FALSE),format(y, big.mark=" ", scientific=FALSE))

sample_selection[15,] <- c('Eligibles', format(x-(length(unique(esample$klient_id[esample$treated==FALSE]))), big.mark=" ", scientific=FALSE), format(length(unique(esample$klient_id[esample$treated==FALSE])), big.mark=" ", scientific=FALSE))
sample_selection[16,] <- c('Participants', format(y-length(unique(esample$klient_id[esample$treated==TRUE])), big.mark=" ", scientific=FALSE), format(length(unique(esample$klient_id[esample$treated==TRUE])), big.mark=" ", scientific=FALSE))

sample_selection[17,] <- c('Eligibles', 0, format(length(unique(esample$klient_id[esample$treated==FALSE])), big.mark=" ", scientific=FALSE))
sample_selection[18,] <- c('Participants', 0, format(length(unique(esample$klient_id[esample$treated==TRUE])), big.mark=" ", scientific=FALSE))

sample_selection$Variable <- c('All registrations (before cleaning)', 'All registrations (before cleaning)', 'All registrations (before cleaning)','All registrations (before cleaning)',
                             
                             'Dropped JSs with ALMP participation 2 years before the EP', 'Dropped JSs with ALMP participation 2 years before the EP',
                             
                             'Dropped JSs with ALMP participation in other ALMP during the EP', 'Dropped JSs with ALMP participation in other ALMP during the EP',
                             
                             'Dropped participants with multiple ALMP participations in evaluated programme',
                             
                             'Dropped JSs with ALMP participation in supported employment programs (ALMP) during the EP',  'Dropped JSs with ALMP participation in supported employment programs (ALMP) during the EP',
                             
                             'Dropped participations which not happening during an unemployment spell','Dropped participations which not happening during an unemployment spell', 
                             
                             'Dropped participations with extreme values (1%) of the waiting time until participation in the evaluated measure ',
                             
                             'Dropped JSs with multiple registrations', 'Dropped JSs with multiple registrations',
                             
                             'Total registrations after cleaning', 'Total registrations after cleaning'
)

sample_selection <-sample_selection[,-4] %>% kbl(format = 'html', booktabs = TRUE , align = 'c', row.names = FALSE) %>%
  column_spec(1,  border_right = TRUE) %>%
  kable_classic('hover', full_width = FALSE)%>%
  pack_rows(index = table(fct_inorder(sample_selection$Variable)))

Table 9: Information table about the beginning and end of the evaluation period

info_table
Desciption Value
Start of the evaluation period 2017-01-01
End of the evaluation period 2017-12-31

Table 10: Data cleaning documentation

sample_selection
Desciption Dropped Total
All registrations (before cleaning)
Total registrations 0 316 875
Total eligible jobseekers 0 300 626
Total participations in the evaluated measure 0 5 014
Total participants in the evaluated measure 0 4 966
Dropped JSs with ALMP participation 2 years before the EP
Dropped eligible JSs 47 278 269 597
Dropped participants 1 166 3 848
Dropped JSs with ALMP participation in other ALMP during the EP
Dropped eligible JSs 43 808 225 789
Dropped participants 1 404 2 444
Dropped participants with multiple ALMP participations in evaluated programme
Dropped participants 63 2 381
Dropped JSs with ALMP participation in supported employment programs (ALMP) during the EP
Dropped eligible JSs 25 970 199 819
Dropped participants 142 2 239
Dropped participations which not happening during an unemployment spell
Inflating participants by merging dataframes 0 2 349
Dropped participants 490 1 859
Dropped participations with extreme values (1%) of the waiting time until participation in the evaluated measure
Dropped participants 19 1 840
Dropped JSs with multiple registrations
Eligibles 10 873 188 946
Participants 0 1 840
Total registrations after cleaning
Eligibles 0 188 946
Participants 0 1 840

4.2 Description of explanatory variables (used in balancing)

Description of variables, used in the estimation of ATTs. We also show their balance before and after matching in graph 4 and balance before and after weighting in graph 10.

Table 11: List and description of variables used in estimation

#TREBA DOPLNIŤ DO POSTUP_EXPORT
Q[["df"]][["flang"]] <- "knowledge of a foreign language (1: Yes, 0: No)"
Q[["df"]][["drive"]] <- "driving license holder (1: Yes, 0: No)"
Q[["df"]][["pc"]] <- "computer skills (1: Yes, 0: No)"
Q[["df"]][["unpast"]] <- "registered citizen in the past (1: Yes, 0: No), unemployed in past"
Q[["df"]][["min_urad"]] <- "traveling time to the nearest Labour office (in minutes)"
Q[["df"]][["min_BA"]] <- "traveling time to the Capital city - Bratislava (in minutes)"
Q[["df"]][["population"]] <- "number of inhabitants of the place of residence"



list_vars_table <- data.frame(Variables = list_vars)

desc <- c()
for (i in list_vars_table$Variables){
  Q[["df"]][[i]][1]
  desc <- append(desc, ifelse(is.null(Q[["df"]][[i]][1]), NA, Q[["df"]][[i]][1]))
}

list_vars_table$Description <- desc
list_vars_table[list_vars_table$Variables == "ent","Description"] <- "difference between entry into unemployment register and started of evalueted period"
list_vars_table[list_vars_table$Variables == "UR_region","Description"] <- "unemployment rate in region during valuated period"
list_vars_table[list_vars_table$Variables == "roma_share","Description"] <- "share of Roma in the place of residence"

list_vars_table[,-3] %>% kbl(format = 'html', booktabs = TRUE , align = 'c', row.names = FALSE) %>%
  column_spec(1,  border_right = TRUE) %>%
  kable_classic('hover', full_width = FALSE) 
Variables Description
ent difference between entry into unemployment register and started of evalueted period
male combination of klient_id and the date of entry is an Unique ID. (we created and used dup column to identify duplicates and remove them)
married marital status: single
kids nationality: other
slovak marital status: other
noedu kids under 10 years
primary level of education: no education
lsec level of education: primary
usec level of education: ower secondary
flang knowledge of a foreign language (1: Yes, 0: No)
drive driving license holder (1: Yes, 0: No)
pc computer skills (1: Yes, 0: No)
unpast registered citizen in the past (1: Yes, 0: No), unemployed in past
min_urad traveling time to the nearest Labour office (in minutes)
min_BA traveling time to the Capital city - Bratislava (in minutes)
UR_region unemployment rate in region during valuated period
roma_share share of Roma in the place of residence
population number of inhabitants of the place of residence
age gender (1: man, 0: woman)

4.3 Alternative estimation strategy (Inverse probability weighting)

In this section, we explore the sensitivity of our results to an alternative model specifications. The main results, reported above, were obtained by [propensity score matching] (https://cran.r-project.org/web/packages/MatchIt/MatchIt.pdf). An alternative estimation strategy was applied using an [inverse probability weighting estimator] (https://www.rdocumentation.org/packages/causalweight/versions/1.0.2/topics/treatweight).

#Estimation parameters
Ssamples <- seq(1,4)
participation_month<-((year(as.Date(esample$entrya))-min(year(ep_start)))*12)+month(as.Date(esample$entrya)) # Month of participation since the start of the evaluation period
pcpQ<-ceiling(participation_month/3)
max_pcpQ<-max(pcpQ, na.rm = TRUE)

OQm <- seq(-12,36,3)
OQ <- -4:12
O_vars <- c(paste("empl",OQ, sep=""), "firstempl", "cumempl")
O_vars<-str_replace(O_vars, "-", ".")

list_vars <- c('ent', 'male', 'married','kids',
           'slovak', 'noedu','primary', 'lsec', 'usec',
           'flang', 'drive', 'pc',
           'unpast', 'min_urad', 'min_BA',
           'UR_region', 'roma_share', 'population', 'age')

#     # All potentially useful explanatory variables (Xs)
#     list_vars <- c('ent', 'male', 'single', 'married','kids',
#                    'slovak', 'hungarian', 'roma', 
#                    'noedu','primary', 'lsec', 'usec', 'tertiary'
#                    , 'zaujem_vzdel',
#                    'flang', 'drive', 'pc',
#                    'healthy', 'barrier', 'graduate', 'ziad_undn_sp', 'cvyhl_poisteu', 
#                    'empl', 'unpast', 'employee', 'selfempl', 'zaujem_szco',
#                    'look_ptime', 'commute', 'relocate', 'zaujem_zam_zahr',
#                    'min_kraj', 'min_urad', 'min_BA', 
#                    'UR_region', 'roma_share', 'population', 
#                    ageg_dummies,
# #                    paste0("urad_",seq(from=1, to=46), sep=""), 
#                    paste0("isco1_",seq(from=1, to=3), sep=""),
#                    paste0("odbor1_",seq(from=1, to=5), sep=""))
# #                   paste0(colnames(df)[grepl("nace1_",colnames(df))], sep=""))

Balance_vars <- list_vars
# Result Matrixes
  N<-nrow(esample[esample$treated==TRUE,])
  N_sp <- matrix(NA, nrow=length(Ssamples)) 
  
  resultsArray_ATT  <- array(NA, dim=c(length(O_vars),7,length(Ssamples))) 
  dimnames(resultsArray_ATT)[[1]] <- c(O_vars)
  dimnames(resultsArray_ATT)[[2]] <- c('ATT', 'se', 'pval', 'Y1', 'Y0', 'SampleSize', 'Sign.')
  results <- array(NA, dim=c(length(O_vars),4))

  balance_matrix_w <- array(NA, dim=c(length(list_vars),length(Ssamples))) # S x P x Q2
  dimnames(balance_matrix_w)[[1]] <- c(list_vars)
  
  results_bv_W <- array(NA, dim=c(1,length(list_vars)))
  
  balance_matrix_un <- array(NA, dim=c(length(list_vars),length(Ssamples))) # S x P x Q2
  dimnames(balance_matrix_un)[[1]] <- c(list_vars)
  
  results_bv_un <- array(NA, dim=c(1,length(list_vars)))
  
### 1. Counting of ATTs 
Sesample <- esample

###Four sub-samples based on the waiting time until participation in the evaluated measure (cutoffs p25, p50, p75): 
wtc25<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.25)
wtc50<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.50)
wtc75<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.75)

 minToP<-min(as.numeric(Sesample$entrya)-as.numeric(Sesample$entry), na.rm = TRUE)

partic<-Sesample[Sesample$treated==TRUE,]
nonpart<-Sesample[Sesample$treated==FALSE,]

#NonPart majú iba spodné kritérium minimálnej dĺžky nezamestnanosti 
esample1<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>minToP,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc25,])
esample2<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc25,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc25 &
                         as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc50,])
esample3<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc50,], 
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc50 &
                         as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc75,])
esample4<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc75,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc75,])

DataSample <- c()

for (s in Ssamples) {
    
    esampleS <- get(paste0('esample', s, by = ''))
    
    
    N_sp[s,] <- nrow(esampleS[esampleS$treated==TRUE,])
    
    if (mean(esampleS$treated) > 0.0001){
      esampleS <-esampleS[,c("treated", "klient_id", 
                             "entry", "exit", "entrya", 
                             list_vars)]
      
      # Month of participation since the start of the evaluation period 
      esampleS$participation_month<-((year(as.Date(esampleS$entrya))-year(ep_start))*12)+
        month(as.Date(esampleS$entrya))
      # Participation quarter
      esampleS$pcpQ<-ceiling(esampleS$participation_month/3)
      
      # Month of the start of the unemployment since the start of the evaluation period
      esampleS$Ustart_month<-((year(as.Date(esampleS$entry))-min(year(ep_start)))*12)+
        month(as.Date(esampleS$entry)) 
      # Inflow quarter 
      esampleS$infQ<-ceiling(esampleS$Ustart_month/3)
      
      #Adding unemployment history
      esampleS<-merge(esampleS, 
                      h_esample[,c("klient_id", "entry", 
                                   paste0("entry", seq(1:11), sep=""), 
                                   paste0("exit", seq(1:11), sep=""))],
                      by=c("klient_id", "entry"), all.y = FALSE)
      
      
      #Imputing the start of the unemployment spell 
      Pmatrix <- matrix(NA, nrow = nrow(esampleS[esampleS$treated==FALSE,]), ncol=max_pcpQ)
  
          for (p in 1:max_pcpQ){  
            
            # Quarter of inflow to unemployment of participants entering the programme in quarter P
            PinfQ<-unique(esampleS$infQ
                          [esampleS$treated==TRUE & esampleS$pcpQ==p])
            # Only allowing non-participants inflowing to unemployment during the quarters when participants in this sub-sample were inflowing
            #D<-esampleS[as.logical(esampleS$treated==T & esampleS$pcpQ == p) |
            #              as.logical(esampleS$treated==F & esampleS$infQ %in% PinfQ),]
            Pmatrix[,p]<-as.numeric(esampleS$infQ[esampleS$treated==FALSE]  %in% PinfQ)
          }
      
        hh <- function(j){
              sample(which(j==1),1)
        }
        
        esampleS$pcpQ[as.logical(apply(Pmatrix, 1, FUN=sum)==0) & esampleS$treated == FALSE] <- 0
        Pmatrix<-Pmatrix[as.logical(apply(Pmatrix, 1, FUN=sum)>0),]
        
        esampleS$pcpQ[as.logical(apply(Pmatrix, 1, FUN=sum)>0) & esampleS$treated == FALSE] <- apply(Pmatrix,1,hh)
        
  
        
                for (q in OQm) { 
                  
                  cond <- FALSE
                  
                    for (n in 1:11) { 
                      entry_n <- esampleS[ ,paste0('entry', n, '')]
                      exit_n <- esampleS[, paste0('exit', n, '')]
                      
                      cond <- cond | isInRegister(p = esampleS$pcpQ, q, entry_n, exit_n) 
                    }
                  
                  esampleS[ ,paste0('empl',ceiling(q/3), '')] <- ifelse(cond, 0,1)
                }
             
              for(q in OQm){
                esampleS <- esampleS[!is.na(esampleS[ ,paste0('empl',ceiling(q/3), '')]),]
              }
        
            #Pre-estimation data preparation
            #Additional outcome variables
            esampleS[ ,"cumempl"] <- rowSums(esampleS[, paste0("empl", seq(0,12,1), sep="")])
            esampleS[ ,"firstempl"] <- ceiling(as.numeric((as.Date(esampleS$exit)-
                                                      (as.Date(ep_start) + months(esampleS$pcpQ*3)))/90))    
        
            # Cleaning the Xs
            y <- data.frame(treated = esampleS[ ,'treated'])
            D <- data.frame(esampleS) #D - su tvoje data ako data.frame()
            spec <- as.formula(cbind(y,D[,c(list_vars, 
                                            paste0("empl.", seq(1,4,1), sep=""))])) 
            #je tvoja specifikacia modelu, je to object as.formula()
            colTh <- 0.8 
            #Treshold for the acceptable correlation between vars ( je maximalne tolerovana korelacia medzi dvoma premennymi)
            dumTh <- 0.0001 
            #Treshold for the acceptable concentration of dummy variables (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            TdumTh <- 0.005
            #Treshold for the acceptable concentration of dummy variables in the sub-group of participants (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            TdumThN <- 5
            #Treshold for the minimal number of observations of a dummy variables in the sub-group of participants (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            result <- DatPrep(D = D, spec = spec, colTh = colTh, dumTh = dumTh, TdumTh = TdumTh, TdumThN = TdumThN) 
            
            for (col in colnames(result$D)){
               D <-  D[!is.na(D[,col]),]
             }  
        # ESTIMATION:
            
            d = D$treated*1
            x = as.matrix(D[,c(list_vars, paste0("empl.", seq(1,4,1), sep=""))])
            y_mat <- D[,c(paste0("empl.", seq(1,4,1), sep=""), 
                                    paste0("empl", seq(0,12,1), sep=""), 
                                    'firstempl', 'cumempl')]
            
            att <- treatweight_pmp(y = y_mat, d, x, s = NULL, z = NULL, selpop = FALSE, trim = 0.05, ATET = TRUE, logit = TRUE, boot = 10)
            #att <- treatselDML(y = y_mat, d, x, s = d, z=x, selected=1)
            
            resultsArray_ATT[,1,s] <- round(att$effect,3)
            resultsArray_ATT[,2,s] <- round(att$se,3)
            resultsArray_ATT[,3,s] <- round(att$pval,3)
            resultsArray_ATT[,4,s] <- round(att$y1,3)
            resultsArray_ATT[,5,s] <- round(att$y0,3)
            resultsArray_ATT[,6,s] <- format(length(d)-att$ntrimmed, big.mark=" ", scientific=FALSE)
            resultsArray_ATT[,7,s] <- stars.pval(att$pval)
            
            
            DataSample <- bind_rows(DataSample, D)
            
            #Balance
            #Generating the propensity score variable
            PSmodel<-glm(result$spec, family=binomial(link = "logit"), data=D)
            #print(summary(PSmodel))
            D$PSvar<-as.numeric(PSmodel$fitted.values)
            
            #LL: old weights INCORRECT
            #w_ATE <- D$treated/D$PSvar + (1-D$treated)/(1-D$PSvar)
            
            #LL: new weights CORRECT
            w_ATE <- D$treated + (1-D$treated)*D$PSvar/(1-D$PSvar)
            
                          #Balance_vars <- colnames(result$D)[colnames(result$D) %in% list_vars]
                          for (bv in Balance_vars){
                          
                              if (apply(D[,bv,drop = FALSE] ,2,function(x) { all(x %in% c(0:1)) }) ) {
                                #unweighted discrete
                                p_treat <- apply(D[D$treated==1,bv,drop = FALSE],2,mean)
                                p_contr <- apply(D[D$treated==0,bv,drop = FALSE],2,mean)
                                balance_matrix_un[bv,s] <- abs( 100*(p_treat - p_contr )/sqrt( (p_treat*(1-p_treat) + p_contr*(1-p_contr))/2 ) )

                                #weighted discrete
                                p_treat <- t(w_ATE[D$treated==1]) %*% D[D$treated==1,bv] / sum(w_ATE[D$treated==1], na.rm = TRUE)
                                p_contr <- t(w_ATE[D$treated==0]) %*% D[D$treated==0,bv] / sum(w_ATE[D$treated==0], na.rm = TRUE)
                                
                                balance_matrix_w[bv,s] <- abs( 100*(p_treat - p_contr )/sqrt( (p_treat*(1-p_treat) + p_contr*(1-p_contr))/2 ) )
                            
                              } else {
                                #unweighted continuous
                                balance_matrix_un[bv,s] <- abs( 100*(apply(D[D$treated==1,bv,drop = FALSE],2,mean) - apply(D[D$treated==0,bv,drop = FALSE],2,mean))/
                                                  sqrt( (apply(D[D$treated==1,bv,drop = FALSE],2,sd)^2 + apply(D[D$treated==0,bv,drop = FALSE],2,sd)^2)/2 ) )

                                #weighted continuous
                                p_treat <- t(w_ATE[D$treated==1]) %*% D[D$treated==1,bv] / sum(w_ATE[D$treated==1], na.rm = TRUE)
                                p_contr <- t(w_ATE[D$treated==0]) %*% D[D$treated==0,bv] / sum(w_ATE[D$treated==0], na.rm = TRUE)
                                p_treat_var <- ( sum(w_ATE[D$treated==1]) / (sum(w_ATE[D$treated==1])^2 - sum(w_ATE[D$treated==1]^2)) )* 
                                  t(w_ATE[D$treated==1]) %*% ((D[D$treated==1,bv] - c(p_treat))^2)
                                p_contr_var <- ( sum(w_ATE[D$treated==0]) / (sum(w_ATE[D$treated==0])^2 - sum(w_ATE[D$treated==0]^2)) ) * 
                                  t(w_ATE[D$treated==0]) %*% ((D[D$treated==0,bv] - c(p_contr))^2)
                                
                                balance_matrix_w[bv,s] <- abs(100* (p_treat - p_contr ) / 
                                            sqrt( (p_treat_var + p_contr_var )/2 ) )
                              }

                          }
    }
} 
  

for (iQ in 1:length(O_vars)){
      results[iQ,1] <- sum(sum((as.numeric(resultsArray_ATT[iQ,'ATT',])*(N_sp/N)), na.rm = TRUE), na.rm = TRUE)
      results[iQ,2] <-sum(sum((as.numeric(resultsArray_ATT[iQ,'se',])*(N_sp/N)), na.rm = TRUE), na.rm = TRUE)
      results[iQ,3] <-sum(sum((as.numeric(resultsArray_ATT[iQ,'Y1',])*(N_sp/N)), na.rm = TRUE), na.rm = TRUE)
      results[iQ,4] <-sum(sum((as.numeric(resultsArray_ATT[iQ,'Y0',])*(N_sp/N)), na.rm = TRUE), na.rm = TRUE)
}

results<-cbind(O_vars, results) 
colnames(results) <- c('O_vars', 'ATT', 'se','Y1', 'Y0')

results_bv <- array(NA, dim=c(length(list_vars),2)) 
dimnames(results_bv)[[1]] <- c(list_vars)
dimnames(results_bv)[[2]] <- c("unweighted", "weighted")


for (bVar in 1:length(list_vars)){
      results_bv[bVar,1] <- abs(sum(sum((balance_matrix_w[bVar,]*(N_sp/N)), na.rm=TRUE), na.rm=TRUE))
      results_bv[bVar,2] <- abs(sum(sum((balance_matrix_un[bVar,]*(N_sp/N)), na.rm=TRUE), na.rm=TRUE))
}

We can explore the sensitivity of our results by displaying the average treatment effects on the treated (ATTs) computing by different approach by inverse probability weighting.

Graph 10: Balance of the groups of participants and eligible before and after weighting x

BVDF <- data.frame(balance_vars = c(rownames(results_bv),rownames(results_bv)),
                   balance = c(results_bv[,1], results_bv[,2]),
                   Balance = c(rep("before weighting",nrow(results_bv)),rep("after weighting",nrow(results_bv))))
                          
BVDF %>% subset(!is.na(balance)) %>% 
    mutate(balance_vars = fct_reorder(balance_vars, balance)) %>%
    ggplot(aes(x=balance, y=balance_vars,col=Balance)) + 
    geom_point() +
    ylab('Variables')+
    xlab('Absolute Standardized Meand Difference')+
    theme_minimal()+
    theme(plot.caption = element_text(hjust = 0), legend.position = "top")+
    labs(caption="Source: ÚPSVaR")+
    geom_vline(xintercept = 0.0, color = 'darkgrey')

We show a proxy of the employment rate to see a share of persons out of the register of unemployed JSs observed at the first day of quarters before and after the start of participation (Quarter 0).

Graph 11: The share of individuals out of the unemployment register (proxy for employment rate)

graph <- matrix(NA, nrow=length(O_vars[1:17]), ncol = 2)
rownames(graph) <- O_vars[1:17]
for(o in O_vars[1:17]){
  graph[o,1] <- apply(DataSample[DataSample$treated == TRUE,o,drop = FALSE], 2, mean, na.rm = TRUE)
  graph[o,2] <- apply(DataSample[DataSample$treated == FALSE,o,drop = FALSE], 2, mean, na.rm = TRUE)
}

graph <- data.frame(cbind(O_vars[1:17],graph))
colnames(graph) <- c('O_vars','treated', 'untreated')
graph$treated <- as.numeric(as.character(graph$treated))
graph$untreated <- as.numeric(as.character(graph$untreated))
graph$O_vars <- factor(graph$O_vars, levels=c(O_vars[1:17]))
graph$Q<-seq(-4,12,1)

graphER<- ggplot(data=graph) +
  geom_line(aes(x=Q, y=treated, colour="Treated"), size = 1 , group = 1) +
  geom_line(aes(x=Q, y=untreated, colour="Control group"), size = 1 , group = 1) +
  scale_x_continuous(breaks=seq(-4,12,1))+
  theme_minimal()+
  labs(
       y = "Share of treated",
       x = "Quarters before and after the start of the participation (0)",
       colour = "Group", 
       caption="Source: ÚPSVaR") +
  theme(plot.caption = element_text(hjust = 0), legend.position = "top")

graphER

Graph 12: The average treatment effects on the treated (ATTs) estimated for the participation in the Commuting allowance during 2017

graphATT

Finaly, we report the ATTs in table computing computing by different approach by inverse probability weighting.

Table 12: The average treatment effect on the treated

resultsDF %>% kbl(format = 'html', booktabs = TRUE , align = 'c') %>%
  column_spec(1,  border_right = TRUE) %>%
  kable_classic('hover', full_width = FALSE) %>%
  row_spec(1:nrow(resultsDF), font_size = '1cm') %>%
  column_spec(1:ncol(resultsDF),width = "1.1cm") %>%
  kableExtra::footnote(number = paste('Significance codes (sig.):',  0.0000, ' " *** " ', 0.001, ' " ** " ', 0.01, ' " * " ', 0.05, ' " . " ', 0.1, '" "',  1))
efekt se sig.
empl.4 -0.001 0.001
empl.3 0.000 0.000
empl.2 -0.001 0.000 ***
empl.1 0.000 0.000
empl0 0.760 0.012 ***
empl1 0.598 0.014 ***
empl2 0.446 0.012 ***
empl3 0.349 0.013 ***
empl4 0.264 0.014 ***
empl5 0.201 0.013 ***
empl6 0.159 0.013 ***
empl7 0.128 0.013 ***
empl8 0.094 0.013 ***
empl9 0.077 0.014 ***
empl10 0.062 0.015 ***
empl11 0.065 0.012 ***
empl12 0.070 0.012 ***
firstempl -3.690 0.082 ***
cumempl 3.272 0.109 ***
1 Significance codes (sig.): 0 " *** " 0.001 " ** " 0.01 " * " 0.05 " . " 0.1 " " 1

References

Daniel E. Ho, Kosuke Imai, Gary King, Elizabeth A. Stuart (2011). MatchIt: Nonparametric Preprocessing for Parametric Causal Inference. Journal of Statistical Software, Vol. 42, No. 8, pp. 1-28. https://doi.org/10.18637/jss.v042.i08

Frölich, M., Huber, M. (2014): “Treatment Evaluation With Multiple Outcome Periods Under En- dogeneity and Attrition”, Journal of the American Statistical Association, 109, 1697-1711.

https://bookdown.org/yihui/rmarkdown-cookbook/parameterized-reports.html

LMP Qualitative data in MS Excel, downloaded from (accessed at 14th of April 2021) : https://ec.europa.eu/social/main.jsp?catId=1143&intPageId=3227&langId=en