Impact evaluation of Slovak active labour market policy programmes

Prepared by Soňa Dulíková, Lukáš Lafférs and Miroslav Štefánik (miroslav.stefanik@savba.sk)
Institute of Economic Research, Slovak Academy of Sciences
This work was supported by the Slovak Research and Development Agency under the contract no. APVV-17-0329.

The Mobility allowance

This is an automated report providing evidence on Mobility allowance, one of the active labour market policy (ALMP) programmes, implemented in Slovakia between during 2017.
The Mobility allowance is provided based on the § 53 of the Act on Employment Services Col. 2004/5.

if(qualitative_data_condition) {
  knitr::knit_exit()
}

1. The description of the programme

Based on the Labour Market Policy Database (LMP) administrated by DG Employment of the European Commission, the Mobility allowance is classified as an “Employment incentives” type of ALMP programme, with the programme specific code 41_SK25.

Mobility allowance aims: Employment maintenance incentives

Beneficiaries are registered job seekers who changes the place of permanent residence to the new employment location.

The Eligible are job seekers registered at least 3 months before starting to work or before performance of self-employment gainful activity on the basis of application in writing.

Implementation: Contribution for movement toward work is defined as a compensation of documented expenses occurring in link with moving a job seeker from the place of permanent residence to the new employment location, providing that the new location is at least 50 km distant from the place of previous permanent residence. The Labour Office may grant the contribution once in two years in the form of a lump sum not exceeding 1327,76 EUR. A legislative change in relation to this intervention instrument has occurred in 2016. Current wording of 53a - Contribution to promote labour mobility. The Labour Office provides a monthly contribution to support labour mobility to cover part of the cost of living related to a change of residence in connection with the acquisition of employment for at least six months to an employee who has been a registered jobseeker for at least three months and who was excluded from the register of jobseekers for reasons pursuant to 36 par. (1) a) if he/she requests the contribution in writing no later than three months after the date of delisting from the register of job seekers. Attachment to the application is a copy of the document confirming the change of permanent residence or the report of the temporary residence and a copy of the employment contract. An allowance shall not be granted if an employee is already recieving a contribution under Article 53. Change of residence for the purpose of providing a contribution is a change of residence or a temporary residence report if the place of a new permanent residence in the territory of the Slovak Republic is at least 70 km away from the place of original permanent residence or if the place of temporary stay in the Slovak Republic is remote from the permanent residence at least 70 km. The contribution is granted for a maximum of six months. An employee who has been recruited to disadvantaged jobseekers shall also be provided for a further maximum of six months. The monthly amount of the contribution shall be 80% of the amount of the expenditure referred to in paragraph 3, not more than EUR 250 during the period referred to in the first sentence and not more than EUR 125 during the period referred to in the second sentence.

1.1 Participants and expenditures

#Preparing of participants and expenditures tables 
partSK <- subset(part, geo == 'SK' & year == format(as.Date(params$ep_start),"%Y") & age == 'TOTAL' & sex == 'T' & stk_flow == 'ENT' )
names(programesSK)[1] <- 'lmp_type'
partSK <- merge(partSK, programesSK, by = 'lmp_type', all = TRUE)
partSK <- subset(partSK, substr(Classification, 1, 1) == '2' | substr(Classification, 1, 1) == '4' | substr(Classification, 1, 1) == '5' | substr(Classification, 1, 1) == '6' | substr(Classification, 1, 1) == '7')
partSK_datapie <- subset(partSK, lmp_type == lmp_code | lmp_type == '2' | lmp_type == '4' | lmp_type == '5' | lmp_type == '6' | lmp_type == '7')
partSK_datapie$value <- ifelse(is.na(partSK_datapie$value), sum(partSK[substr(partSK$lmp_type, 1, 6) == substr(lmp_code, 1, 6) , 'value'], na.rm = TRUE), partSK_datapie$value)


expSK <- subset(exp, year == format(as.Date(params$ep_start),"%Y"))
expSK <- merge(expSK, programesSK, by = 'lmp_type', all = TRUE)
expSK <- subset(expSK, substr(Classification, 1, 1) == '2' | substr(Classification, 1, 1) == '4' | substr(Classification, 1, 1) == '5' | substr(Classification, 1, 1) == '6' | substr(Classification, 1, 1) == '7')
expSK_datapie <- subset(expSK, lmp_type == lmp_code | lmp_type == '2' | lmp_type == '4' | lmp_type == '5' | lmp_type == '6' | lmp_type == '7')
expSK_datapie$value <- ifelse(is.na(expSK_datapie$value), sum(expSK[substr(expSK$lmp_type, 1, 6) == substr(lmp_code, 1, 6) , 'value'], na.rm = TRUE), expSK_datapie$value)

#Share of expenditures and participants at programm

partSK_per <- partSK_datapie %>% select(lmp_type, value) %>% 
  subset(lmp_type != subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1]) %>%
  mutate(per = paste(round(100 * value / sum(value),2),'%'))

expSK_per <- expSK_datapie %>% select(lmp_type, value) %>% 
  subset(lmp_type != subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1]) %>%
  mutate(per = paste(round(100 * value / sum(value),2),'%'))

program_part <-  data.frame(lmp_type  = c('total', as.character(subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1])),
                            participants = c(sum(partSK_per$value), subset(partSK_datapie, partSK_datapie$almp == params$measure)$value[1]),
                            per = c('100 %', paste(round(subset(partSK_datapie, partSK_datapie$almp == params$measure)$value[1] / sum(partSK_per$value) * 100,2),'%'))
                            )

program_exp <-  data.frame(lmp_type  = c('total', as.character(subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1])),
                            expenditures = c(sum(expSK_per$value), subset(expSK_datapie, expSK_datapie$almp == params$measure)$value[1]),
                            per = c('100 %', paste(round(subset(expSK_datapie, expSK_datapie$almp == params$measure)$value[1] / sum(expSK_per$value) * 100,2),'%'))
                            )


type_share_par <- partSK_per$per[partSK_per$lmp_type == as.character(subset(qualitative, qualitative$almp == params$measure)$Classification[1])] 
prog_share_par <- program_part$per[program_part$lmp_type == as.character(subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1])]
prog_numb_par <- program_part$participants[program_part$lmp_type == as.character(subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1])]

type_share_exp <- expSK_per$per[expSK_per$lmp_type == subset(qualitative, qualitative$almp == params$measure)$Classification[1]]
prog_share_exp <- program_exp$per[program_exp$lmp_type == as.character(subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1])]
prog_numb_exp <- program_exp$expenditures[program_exp$lmp_type == as.character(subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1])]

The left pie chart Participants displays the share of participants in ALMP programmes grouped by ALMP types of the LMP classification. Shares are based on LMP Database stock figures reported for the calendar year 2017.

The right pie chart Expenditure display the share of expenditure using the same LMP typology as the Participants pie chart. Comparing the share on the total pie of participants to the share on the total pie of expenditures provides an assessment of unit costs of the programme relative to the average. If the percentage of participants is higher than on total spending, then the programme is less expensive than the average ALMP programme (and vice versa).

Graph 1: Resources flowing to Mobility allowance during 2017

#Preparing data for Pie Chart
#PARTICIPANTS
piechart_P <- select(partSK_datapie, lmp_type, value)

piechart_P$lmp_type <- ifelse(piechart_P$lmp_type == '2' | piechart_P$lmp_type == '4' | piechart_P$lmp_type == '5' | piechart_P$lmp_type == '6' | piechart_P$lmp_type == '7',       
                              qualitative$class[match(piechart_P$lmp_type, qualitative$Classification)], 
                              paste(qualitative$class[match(piechart_P$lmp_type, qualitative$lmp_type.x)], 
                                    qualitative$Labour.market.services[match(piechart_P$lmp_type, qualitative$lmp_type.x)], 
                                    sep =': ' ))

piechart_P <- filter(piechart_P, !duplicated(piechart_P$lmp_type))
piechart_P$lmp_type <- ifelse(piechart_P$lmp_type == 'Training: [Component] Projects and programmes - Projects - Education and training','Training: [Component] Projects and programmes - Projects - Education and training Repas',piechart_P$lmp_type)

piechart_P <- piechart_P[order(piechart_P$lmp_type),]
piechart_P$focus <- ifelse(
  piechart_P$lmp_type == paste(
    subset(qualitative, qualitative$almp == params$measure)$class[1],
    subset(qualitative, qualitative$almp == params$measure)$Labour.market.services[1],
    sep =': '), 
  0.05,
  0)

#EXPENDITURES 
piechart_E <- select(expSK_datapie, lmp_type, value)

piechart_E$lmp_type <- ifelse(piechart_E$lmp_type == '2' | piechart_E$lmp_type == '4' | piechart_E$lmp_type == '5' | piechart_E$lmp_type == '6' | piechart_E$lmp_type == '7',       
                              qualitative$class[match(piechart_E$lmp_type, qualitative$Classification)], 
                              paste(qualitative$class[match(piechart_E$lmp_type, qualitative$lmp_type.x)], 
                                    qualitative$Labour.market.services[match(piechart_E$lmp_type, qualitative$lmp_type.x)], 
                                    sep =': ' ))

piechart_E <- filter(piechart_E, !duplicated(piechart_E$lmp_type))
piechart_E$lmp_type <- ifelse(piechart_E$lmp_type == 'Training: [Component] Projects and programmes - Projects - Education and training','Training: [Component] Projects and programmes - Projects - Education and training Repas',piechart_E$lmp_type)

piechart_E <- piechart_E[order(piechart_E$lmp_type),]
piechart_E$focus <- ifelse(
  piechart_E$lmp_type == paste(
    subset(qualitative, qualitative$almp == params$measure)$class[1],
    subset(qualitative, qualitative$almp == params$measure)$Labour.market.services[1],
    sep =': '), 
  0.05,
  0)


## PIE CHART
#PARTICIPANTS
piechart_P$lmp_type <- gsub("(.{25,}?)\\s", "\\1\n", piechart_P$lmp_type)
ev_measure <- gsub("(.{25,}?)\\s", "\\1\n",
                   paste(subset(qualitative, qualitative$almp == params$measure)$class[1],
                         subset(qualitative, qualitative$almp == params$measure)$Labour.market.services[1],
                         sep =': '))

ev_type <- gsub("(.{25,}?)\\s", "\\1\n",
                subset(qualitative, qualitative$almp == params$measure)$class[1])

piechart_P$value <- ifelse(piechart_P$lmp_type == ev_type, piechart_P$value - piechart_P[piechart_P$lmp_type == ev_measure,"value"], piechart_P$value)

piechart_P <- piechart_P %>% 
  mutate(per = paste0(round(100 * value / sum(value),2), "%", by = ""),
         total = sum(value),
         end_angle = 2*pi*cumsum(value)/total,      
         start_angle = lag(end_angle, default = 0),   
         mid_angle = 0.5*(start_angle + end_angle))

rpie <- 1
rlabel <- 0.6 * rpie 

plot_piechart_P <- ggplot(piechart_P) + 
  theme_no_axes() + 
  theme_void() +
  coord_fixed()+
  geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0, r = 1, amount = value, 
                   fill = lmp_type, explode  = focus),
               data = piechart_P, stat = 'pie',color='white')+
  ggtitle('Participants') +
  geom_text(aes(x = rlabel*sin(mid_angle), y = rlabel*cos(mid_angle), label = ifelse(lmp_type == ev_type | lmp_type == ev_measure, per,"")),
            size = 4, position=position_jitter(width=0,height=0.3))+
  theme(plot.title = element_text(color="black", size=18, face="bold.italic"),
        legend.title = element_text(size=15, face="bold"),
        legend.text = element_text(size=12),
        legend.key.height=unit(1.1, "cm")) +
  guides(fill=guide_legend(title="Classification of LMP\n and measure")) +
  scale_fill_manual(values=c(ifelse(piechart_P$value != 0 &  piechart_P$lmp_type == ev_measure,  
                                    "steelblue3", 
                                    ifelse(piechart_P$lmp_type == ev_type, 
                                           "steelblue1", 
                                           gray.colors(6, start = 0.8)))))

#EXPENDITURES
piechart_E$lmp_type <- gsub("(.{25,}?)\\s", "\\1\n", piechart_E$lmp_type)

piechart_E$value <- ifelse(piechart_E$lmp_type == ev_type, piechart_E$value - piechart_E[piechart_E$lmp_type == ev_measure,"value"], piechart_E$value)

piechart_E <- piechart_E %>% 
  mutate(per = paste0(round(100 * value / sum(value),2), "%", by = ""),
         total = sum(value),
         end_angle = 2*pi*cumsum(value)/total,      
         start_angle = lag(end_angle, default = 0),   
         mid_angle = 0.5*(start_angle + end_angle))

rpie <- 1
rlabel <- 0.6 * rpie 

plot_piechart_E <- ggplot(piechart_E) + 
  theme_no_axes() + 
  theme_void() +
  coord_fixed()+
  geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0, r = 1, amount = value, 
                   fill = lmp_type, explode  = focus),
               data = piechart_E, stat = 'pie',color='white')+
  ggtitle('Expenditure') +
  geom_text(aes(x = rlabel*sin(mid_angle), y = rlabel*cos(mid_angle), label = ifelse(lmp_type == ev_type | lmp_type == ev_measure, per,"")),
            size = 4, position=position_jitter(width=0,height=0.3))+
  theme(plot.title = element_text(color="black", size=18, face="bold.italic"),
        legend.title = element_text(size=15, face="bold"),
        legend.text = element_text(size=12),
        legend.key.height=unit(1.1, "cm")) +
  guides(fill=guide_legend(title="Classification of LMP\n and measure")) +
  scale_fill_manual(values=c(ifelse(piechart_E$value != 0 &  piechart_E$lmp_type == ev_measure,  
                                    "steelblue3", 
                                    ifelse(piechart_E$lmp_type == ev_type, 
                                           "steelblue1", 
                                           gray.colors(6, start = 0.8)))))

Based on the LMP database, 865 individuals participated in the Mobility allowance during 2017. This accounted for the 0.45 % percent of the total number of participants in all Slovak ALMP programmes (LMP types 2-7), while the share of the programme on the total expenditure was 0.77 %. The total roofing LMP type Employment incentives presents 60.68 % percent of the total Slovak ALMP (type 2-7) expenditure and 61.68 % percent of the total ALMP participants.

1.2 The Mobility allowance in the context of ALMPs in Slovakia

Using administrative data, we first provide a picture of the importance of the Mobility allowance in the context of Slovak ALMP. The following flowchart displays flow of job seekers registered into the database of unemployment during . Flows are based on the movement of these individuals during the two years following their registration; this period is divided into six months sub-periods (0/6/12/18/24). For each of these sub-periods, we observe the flows of registered jobseekers to employment, or their de-registration for another reason.
Job seekers can also move to one of the ALMP programmes. The lines highlighted in red represent the flow of jobseekers participating in measure P53A, Mobility allowance.

Graph 2: The Mobility allowance in the structure of the flows of jobseekers registered during 2017

The next table shows full names of programmes shown in the graph above.

Table 1: Explanatory table for Graph 2

Sankey_description <- select(qualitative, almp, Labour.market.services, Labour.market.services_SK)
Sankey_description[32,1] <- 'P054'
Sankey_description<- rbind(Sankey_description, Sankey_description[32,])
Sankey_description[nrow(Sankey_description),1] <- 'P54O'
Sankey_description[28,1] <- 'P54D'

Sankey_tabel1 <- data.frame(Measures = c(setdiff(nastroj_kod, c("another reason", "employed"))))
Sankey_tabel1$'Name of programme' <- Sankey_description$Labour.market.services[match(Sankey_tabel1$Measures, Sankey_description$almp)]

Sankey_tabel1  %>% kbl(format = 'html', booktabs = TRUE , align = 'c', row.names = FALSE)%>%
  kable_classic('hover', full_width = FALSE)%>%
  column_spec(1,  border_right = TRUE)

Measures	Name of programme
P54R	[Component] Projects and programmes - Projects - Education and training Repas
P053	Commuting allowance
P054	Projects and programmes
P051	Support for graduate work experience
P54O	[Component] Projects and programmes - Self-employment
P060	Reimbursement of operating costs of sheltered workshops and workplaces
P54K	[Component] Projects and programmes - Projects - Education and training
P052	Work in minor services for municipalities or self-governing regions
P52A	Voluntary work
P049	Self-employment grant
P53A	Mobility allowance

## Preparing of df
# we observe only  young people under XX (age) which inflow in XXXX (entry year) 
df_r <- subset(df, age <= age_group_max)
df_r <- subset(df_r, format(as.Date(df_r$entry),"%Y")== format(as.Date(params$ep_start),"%Y"))
#df_r <- df_r %>% mutate_all(na_if,"") #if cells are empty -> change it to NAs
df_r <- subset(df_r, healthy < 3)

# DOVOD VYRADENIA 
dovod_vyradenia = c('V01','V02','V03','V1','V12','V15') #zamestnali sa 

# Opatrenie P032 vyhodiť (ak sa budú meniť aj iné opatrenia ako napr. 54R a 54Rp tak tu sa to opraví (%in% c()))
delete <- setdiff(c('P032','P54P','P54D'),params$measure)
Salmps <- subset(almps, !nastroj  %in% delete)

## Spojenie DF a ALMPS 
df_almps <- merge(Salmps, df_r, by = 'klient_id')

## Vyfiltruj iba tie klient_id. pri ktorých nástroj je params$measure
partic_measure <- df_almps[df_almps$nastroj == params$measure, 'klient_id']
df_almps <- filter(df_almps, klient_id %in% partic_measure) #tu su tí, ktorí boli na aj na params$measure ale aj na iných opatreniach 

# Podmienka prieniku času pri databáze nezamestnaných a v zúčastnili sa evaluated measure
df_almps_measure <- subset(df_almps, entry <= entrya & exita <= (exit +7) & nastroj == params$measure) # tu su tí, ktorí išli v skúmanom čase do opatrenia params$measure 
##### toto je moja základňa môj prvý stĺpec 

##### Podmienka prieniku času pri databáze nezamestnaných a iných programoch ako evaluated measue ale zároveň sú to tí, ktorí už niekedy na evaluated measure už boli 
df_almps_other <- subset(df_almps, entry <= entrya & exita <= (exit +7) & nastroj != params$measure)

# upravíme si dáta ktoré budeme používať pri grafe
df_almps_measure <- select(df_almps_measure, klient_id, entry, exit, entrya, exita, nastroj, dovod_vyradenia_kod)
df_almps_other <- select(df_almps_other, klient_id, entry, exit, entrya, exita, nastroj, dovod_vyradenia_kod)


#smojím iba tých ktorí boli aj v evaluated measure aj v iných opatreniach, all.x = TRUE lebo chceme aj tých ktorý boli iba na evaluated measure (nemuseli sa zúčastniť aj iných programov)
flow <- merge(df_almps_measure, df_almps_other, by = 'klient_id', all.x = TRUE) 

#dni od začatia opatrenia entrya.x po začatie iného opatrenia entrya.y alebo po zamestnanie/odchod z iného dôvodu exit.x
flow$days <- ifelse(is.na(flow$nastroj.y),
                    as.numeric(difftime(flow$exit.x, flow$entrya.x, units = 'days')),
                    as.numeric(difftime(flow$entrya.y, flow$entrya.x, units = 'days')))

#ak je dovod vyradenia NA ale zúčastnili sa na opatrení 
flow$dovod_vyradenia_kod.y <- ifelse(is.na(flow$dovod_vyradenia_kod.y) & !is.na(flow$nastroj.y), 'V01', flow$dovod_vyradenia_kod.y)

#vyfiltruj tých ktorý boli aj na evaluated measure aj na inom opatrení alebo sa zamestnali a days nie je záporné 
#podmienka -> entrya do ďalšieho projektu musí byť väčšie ako entrya do evaluated measure
#flow$days nemôže byť záporné 
flow <- filter(flow, days >= 0)

#nastroj -> ak dovod vyradenia sa rovna nejakému prvku z vektoru dovodov vyradenia tak -> employed inak another reason
flow  <- flow %>% 
  mutate(nastroj.y = case_when(is.na(flow$nastroj.y) & flow$dovod_vyradenia_kod.x %in% dovod_vyradenia ~ 'employed',
                               is.na(flow$nastroj.y) & !flow$dovod_vyradenia_kod.x %in% dovod_vyradenia ~ 'another reason',
                               !is.na(flow$nastroj.y) ~ flow$nastroj.y)
  )


#dataframe, ktorý budem používať pri tvorbe grafu
Sankey_measure <- flow %>% select(nastroj.x, nastroj.y, days) %>% 
           mutate(month = ceiling(days/30.417))

Sankey_measure <- Sankey_measure %>%mutate(
  time = case_when(
    Sankey_measure$month %in% seq(0,6,1)  ~ 6,
    Sankey_measure$month %in% seq(7,12,1)  ~ 12, 
    Sankey_measure$month %in% seq(13,18,1)  ~ 18,
    Sankey_measure$month %in% seq(19,100,1)  ~ 24,
  ) 
)

Sankey_measure$sources <- ifelse(Sankey_measure$time == 6 | Sankey_measure$time == 12 |
                                   Sankey_measure$time == 18 | Sankey_measure$time == 24, 
                                 Sankey_measure$time - 6, 
                                 Sankey_measure$time)


#zosumarizuj, koľký mladí išli do ktorého opatrenia, zamestnali sa alebo odišli z registra z iných dôvodov
San_measure <- Sankey_measure %>% select(nastroj.y, time, sources) %>%
  group_by(nastroj.y, time, sources) %>% summarise(num = n(), .groups = 'drop') %>%
  rename(nastroj = nastroj.y)


# rozdeľ opatrenia, na tie ostatné almps - OTHER ALMPS 
nastroj_kod <- c('another reason','employed')
NastrojKod <- San_measure[!San_measure$nastroj %in% nastroj_kod,]  %>%  group_by(nastroj) %>% summarise(num = sum(num), .groups = 'drop') %>%
  mutate(perc = num*100 / sum(num),
         cut = case_when(perc >= 5 ~ 1,
                         perc < 5 ~ 0))
nastroj_kod <- c(nastroj_kod, NastrojKod$nastroj[NastrojKod$cut == '1'])

San_aplmps <- subset(San_measure, nastroj %in% nastroj_kod | nastroj == params$measure)
San_other_aplmps <- subset(San_measure, !nastroj %in% nastroj_kod & !nastroj == params$measure)

San_other_aplmps <- San_other_aplmps %>%  group_by(sources , time) %>% summarise(num = sum(num), .groups = 'drop') 
San_other_aplmps$nastroj <- 'OTHER ALMPS'
San_other_aplmps <- relocate(San_other_aplmps, c(nastroj, time), .before = sources,)

San_measure <- rbind(San_aplmps, San_other_aplmps)
remove(San_aplmps, San_other_aplmps)


# uzly grafu (jedinečné), musia tu byť všetky opatrenia
node_m <- data.frame(
  name=c(as.character(San_measure$nastroj), as.character(San_measure$sources))%>% unique()
)

# definovanie koľko registrovaných bude medzi tými rokmi  
velky_df <- data.frame()
for (i in seq(6,24,6)){
  pocet <- San_measure %>%  group_by('sources' = sources >= i) %>% summarise(num = sum(num), .groups = 'drop') 
  pocet <- subset(pocet, sources == TRUE)
  pocet$sources <- i
  velky_df <- rbind(velky_df, pocet)
}

# musím si velky_df prisposobiť tak, aby roky boli ako nodes aby som to mohla spojiť s dataframe San s ktorým potom budem ďalej robiť graf
# preto sources budu ako nastroj -> aby som spravila nodes, years su sources ale sources su years -1 v skutočnosti (v san grafe)
colnames(velky_df) <- c('nastroj', 'num')
velky_df$time <- velky_df$nastroj
velky_df$sources <- San_measure$sources[match(velky_df$time, San_measure$time)] 
velky_df <- relocate(velky_df, num, .after = sources)

San_measure <- rbind(San_measure, velky_df)

#urobím IDsources a ID target podľa uzlov aby garf vedel ten flow medzi jednotlivími uzlami 
San_measure$IDsource <- match(San_measure$sources, node_m$name)-1 
San_measure$IDtarget <- match(San_measure$nastroj, node_m$name)-1

#Color 
time <- seq(0,24,6)
NOALMP <- c('another reason', 'employed')
node_m <- node_m %>% mutate(group = case_when(node_m$name %in% NOALMP ~ 'A',
                                              !node_m$name %in% NOALMP & !node_m$name %in% time ~ 'B',
                                              node_m$name %in% time ~ 'C'
)
)

San_measure$group <- 'type_a'

my_color <- 'd3.scaleOrdinal() .domain(["type_a", "A","B", "C"]) .range(["lightgray", "darkseagreen", "thistle", "rosybrown", "red"])'

San_measure <- as.data.frame(San_measure)

Sankey_ev.measure <- sankeyNetwork(Links = San_measure, Nodes = node_m,
                                   Source = "IDsource", Target = "IDtarget",
                                   Value = "num", NodeID = "name", 
                                   sinksRight=F, fontSize = 14,
                                   fontFamily = "sans-serif",
                                   width = 900,
                                   colourScale=my_color, LinkGroup="group", NodeGroup="group",
                                   nodePadding=10)

condition_for_Sankeyplot <- length(unique(San_measure$IDsource))>=1 && length(unique(San_measure$IDtarget))>=1 && !any(is.na(San_measure$IDsource)) && !any(is.na(San_measure$IDtarget))

The second flow chart shows the further state of the participants of the evaluated programme (further branching of the red line in graph 2) and what happened with them after participation in the evaluated programme. After participation in the programme, the participants can be employed, de-registered for another reason, or they can participate in another ALMP programme. We observed this behaviour for periods of two years after their participation. This period is divided into six months sub-periods (0/6/12/18).

Graph 2.2: Flowchart of participants in evaluation progname participated during evaluation period.

if(condition_for_Sankeyplot){
  Sankey_ev.measure
}

cond_for_table <- FALSE
if(condition_for_Sankeyplot){
  Sankey_tabel2 <- data.frame(Measures = c(setdiff(nastroj_kod, c("another reason", "employed"))))
  Sankey_tabel2$'Name of programme' <- Sankey_description$Labour.market.services[match(Sankey_tabel2$Measures, Sankey_description$almp)]
  
  cond_for_table <- nrow(Sankey_tabel2) >=1
}

The next table shows full names of programmes shown in graph above.

Table 1.2: Explanatory table for Graph 2.2

if (cond_for_table){
  Sankey_tabel2  %>% kbl(format = 'html', booktabs = TRUE , align = 'c', row.names = FALSE)%>%
    kable_classic('hover', full_width = FALSE)%>%
    column_spec(1,  border_right = TRUE)
}

Measures	Name of programme
P054	Projects and programmes
P51A2	Incentives to hire young unemployed (<25) into their first job

2. Data and the evaluation sample

This evaluation report is based on administrative data on registered unemployed jobseekers in Slovakia, inter-linkable with the database of participants in ALMP programmes. The export was provided by the Central Office of Labour, Social Affairs and Family of the Slovak Republic (COLSAF) at the beginning of 2021. It covered a period from January 2014 until December 2020. The raw data were processed using a data-preparation script, available on request from: miroslav.stefanik@savba.sk.
The data frame “df” covers all unemployment spells of unemployed jobseekers, with attributes collected at the moment of their registration as unemployed jobseekers (Application Form).

### DEFINE THE EVALUATION PERIOD #
ep_start <- as.Date(params$ep_start)
ep_end <- as.Date(params$ep_end)
un_spell <- spell
measure <- params$measure

    ########################################x
    ## SELECTING THE EVALUATION SAMPLE #
    ########################################x,
    
treated<-filter(almps, nastroj==toString(params$measure))

#Sub-groups to be dropped: 
# - those with ALMP participation 2 years before the EP
IDalmps_before<-unique(almps$klient_id[almps$entrya<ep_start & almps$entrya>=ep_start-730]) 
# - those with ALMP participation in other ALMP during the EP
IDalmps_during_ep<-unique(almps$klient_id[(almps$entrya<=ep_end & almps$entrya>=ep_start) & almps$nastroj!=toString(params$measure)])

###DEFINE THE ELIGIBILITY CRITERIA 
#the EC are measure specific, in the case of looping over multiple measures EC need to be elaborated t a form of table or a list and added to the parameters
#SUBSETTING THE BASE EVALUATION DATASET OF ELIGIBLE 
cond0<-as.logical(df$entry<=ep_end & df$exit>=ep_start) # Being on the register of unemployed during the evaluation period
cond1<-as.logical(df$age < age_group_max) 
cond2<-as.logical((df$exit-df$entry)>=un_spell) #LENGTH OF PREVIOUS UNEMPLOYMENT SPELL
cond3<-as.logical(df$entry>=ep_start-730) # Dropping old unemployment spells (cases inflowing more than 730 days before the start of the evaluation period)

dfe<-df[cond0 & cond1 & cond2 & cond3,]
n1 <- dim(dfe)[1]
sampleIDs<-unique(df$klient_id[cond0 & cond1 & cond2 & cond3])
n2 <- length(sampleIDs)

###
#### ONLY KEEP THE SPELLS OF PARTICIPANTS DURING WHICH THEY PARTICIPATED 
#### Creating dataframe of participants in the evaluated programme during the evaluation period.
dfa<-filter(treated, entrya<=ep_end & entrya>=ep_start)
npart0<-length(unique(dfa$klient_id))
npart1<-dim(dfa)[1]

#Drop other ALMP participations from the group of participants as well as the eligible non-participants
n3 <- nrow(filter(dfa, klient_id %in% IDalmps_before))
n4 <- nrow(filter(dfe, klient_id %in% IDalmps_before))
dfa<-filter(dfa, !klient_id %in% IDalmps_before)
dfe<-filter(dfe, !klient_id %in% IDalmps_before)

n5 <- nrow(filter(dfa, klient_id %in% IDalmps_during_ep))
n6 <- nrow(filter(dfe, klient_id %in% IDalmps_during_ep))
dfa<-filter(dfa, !klient_id %in% IDalmps_during_ep)
dfe<-filter(dfe, !klient_id %in% IDalmps_during_ep)



#### Only participants with one participation during the evaluation period are sampled. 
#### JS with multiple participations are droped from the sample
dfa<-dfa %>%
  group_by(klient_id) %>% 
  mutate(rep=n()) # rep is the number of participations of one JS repeating during 2014

n7 <- nrow(filter(dfa, rep!=1))
dfa<-filter(dfa, rep==1)
npart2<-dim(dfa)[1] # Number of participants after cleaning with multiple ALMP participations

###Participants who also participated in other ALMP measures (§54) are dropped
progOUT<- setdiff(c("P050", "P50A", "P50C","P50J", "P50K" ,"P51A", "P054", "P54D", "P54E", "P54O", "P54P", "P54U"),params$measure)
outIDs<-unique(almps$klient_id[(almps$entrya>=as.Date(params$ep_start) & almps$entrya<=as.Date(params$ep_end)+730) & as.logical(almps$nastroj %in% progOUT)])

n8 <- nrow(filter(dfa, klient_id %in% outIDs))
n9 <- nrow(filter(dfe, klient_id %in% outIDs))

dfa<-(filter(dfa, !klient_id %in% outIDs))
dfe<-(filter(dfe, !klient_id %in% outIDs))

npart3<-length(unique(dfa$klient_id)) # The number of participants after we drop participations in supported employment during the outcome observation period 

#MERGING PARTICIPATIONS AND UNEMPLOYMENT SPELLS
#First we add the date of the entry and exit from the registration into the table of participations in measure (evaluated measure params$measure). We only import entry dates for the individuals in the evaluation sample. 
dfa<-merge(dfa, select(dfe, klient_id, entry, exit), by="klient_id", all.x = TRUE)
nrowdfa <- nrow(dfa)

#Second we filter only the registrations of members of the evaluation sample during which the programme participation took place. 
dfa<-filter(dfa, dfa$exit+30>=dfa$entrya & dfa$entrya<=ep_end & entrya >= entry) # Keeping only the participations happening during an unemployment spell
n10 <- nrowdfa-dim(dfa)[1]
npart4<-dim(dfa)[1] # Number of participants after cleaning participations outside an unemployment spell (data quality issue)

## Participants #
particIDs<-unique(dfa$klient_id)
## Eligible #
# nonpartIDs<-sampleIDs[!(sampleIDs %in% particIDs)]
nonpart<-filter(dfe, !(klient_id %in% particIDs))
###Out of the participants only one-time participations happening during an unemployment spell are used 
partic<-merge(dfe, dfa, by = c("klient_id", "entry"), all.x = FALSE)

#LL: sanity check
#nonpart$klient_id
#partic$klient_id
#intersect(nonpart$klient_id,partic$klient_id)
#this should be empty. OK

### Cleaning and renaming #
partic$exit.y<-NULL
partic<-partic %>% rename(exit=exit.x)

nonpart$entrya<-NA
nonpart$exita<-NA
nonpart$nastroj<-NA
nonpart$naklady<-NA
nonpart$projekt<-NA
partic$rep<-NULL

#Filter extreme values (1%) of the waiting time until participation in the evaluated measure 
wte<-quantile(as.numeric(partic$entrya)-as.numeric(partic$entry),na.rm=TRUE, probs=0.99)
n11 <- nrow(filter(filter(partic, as.numeric(entrya)-as.numeric(entry)>wte)))
partic<-filter(partic, as.numeric(entrya)-as.numeric(entry)<=wte)

esample<-rbind(nonpart, partic)
esample$treated<-!is.na(esample$entrya)
#Filter extreme values of the waiting time until participation in the evaluated measure 

#LL:
#summary(esample$treated)

###Unemployment spells ending with LM placements
#esample<-filter(esample, dovod_vyradenia_kod == 'V01' | dovod_vyradenia_kod == 'V02' | dovod_vyradenia_kod == 'V03' | dovod_vyradenia_kod == 'V1' | dovod_vyradenia_kod == 'V12' | dovod_vyradenia_kod == 'V15')

    ########################################x
    ## GENERATING EXPLANATORY VARIABLES #
    ########################################x,
    esample$ent <- as.numeric(as.Date(ep_start))-as.numeric(as.Date(esample$entry))
    
    #LL: quantile(esample$ent)
    #niektore unemployment spells boli 
    # negativne (JS zacal byt nezamestnany po 1-1-2017), 
    # niektore pozitivne (JS zacal byt nezamestnany pred 1-1-2017)

    ########AGEG
    esample$ageg <- cut_interval(esample$age, 5, labels=FALSE)
    esample$ageg <- as.factor(esample$ageg)
    
    ####Extra columns for dummy variables go into the esample_est for further testing
    esample <- dummy_cols(esample, select_columns = c("ageg"), remove_first_dummy = TRUE)
    ageg_dummies<-colnames(esample)[grepl("ageg_", colnames(esample))]

  #######Regional Unemployment rate during the implementation period
    esample[, "UR_region"] <- esample[, paste0("UR_region_",year(ep_start), "")]
    esample[,grepl("UR_region_", colnames(esample))]<-NULL
    #esample$UR_region <- as.numeric(gsub(",", ".", gsub("\\.", "", esample$UR_region)))
    
    ##########Difference between entry into unemployment register and started of participation in measure
    esample$diff_entry <- ceiling(as.integer(as.Date(esample$entrya) - as.Date(esample$entry))/30.417)

#Cleaning
#nonpart<-NULL
#partic<-NULL




# Share of repeated unemployment after participation in ALMP
# Merge esample, history table
h_esample <- merge(select(esample, klient_id, entry, exit, dovod_vyradenia_kod, entrya, exita, nastroj, treated), dfh, by = 'klient_id', all.x = TRUE, all.y = FALSE)

# as.Date
entry <- paste('entry', seq(1:11), sep = '')
exit <- paste('exit', seq(1:11), sep = '')

for (e in entry){
  h_esample[ ,e] <- as.Date(h_esample[ ,e], origin = '1970-01-01')
}
for (x in exit){
  h_esample[ ,x] <- as.Date(h_esample[ ,x], origin = '1970-01-01')
}
h_esample$exita <- as.Date(h_esample$exita, origin = '1970-01-01')

#Dropping cases with over than 15 unemployment spells 
#h_esample<-filter(h_esample, is.na(entry16)) #subset (klient_id) jobseekers who became unemployment only 15 times 
#esample <- subset(esample, klient_id %in% h_esample$klient_id) 
#remove entry16+ and exit16+ columns
#h_esample<-select(h_esample, -entry16, -entry17, -entry18, -entry19, -entry20, -exit16, -exit17, -exit18, -exit19, -exit20)

esample_condition <- (mean(esample$treated) < 0.00001)

if(esample_condition) {
  knitr::knit_exit()
}

2.1 Description of participants and eligible

Our evaluation sample consists of 395 328 eligible individuals, registered as jobseekers during the evaluation period starting from 2017-01-01 until 2017-12-31, These jobseekers have generated in total 443 969 unemployment registrations during the period 2014-2020. Out of them, in 816 jobseekers participated in the evaluated measure Mobility allowance during the evaluation period. 712 participants and 173 873 eligible JS were dropped from the sample, because of multiple participations in the evaluated program (or other relevant ALMPs) during and after the evaluation period. After cleaning we have sampled 104 JS with one-time participation during the evaluation period. At the same time, there was 270 096 eligible non-participants present in the register of unemployed jobseekers during 2017.

The group of participants and eligible show differences in a number of observed characteristics. Table 2 displays an overview of these differences on mean values of selected characteristics.

Table 2: Descriptive statistics of participants and eligible (selected characteristics)

####
## number of participants and eligible 
####

#separate table dfe with participants in ALMP  and eligible 
elig <- distinct_at(nonpart,vars(klient_id),.keep_all = TRUE)
part <- distinct_at(partic,vars(klient_id),.keep_all = TRUE)

#BASIC DESCRIPTIVE TABLE

# The number of participants and eligible in the sample'
#tab1<-cbind(sum(!is.na(dfe$entrya)),sum(is.na(dfe$entrya)))
tab1 <- cbind(format(length(unique(esample$klient_id[esample$treated==TRUE])), big.mark=" ", scientific=FALSE), format(length(unique(esample$klient_id[esample$treated==FALSE])), big.mark=" ", scientific=FALSE))
colnames(tab1) <- c('Participants', 'Eligible')
tab1 <- data.frame(cbind(Description = 'Number of observations', tab1))

####
## Age distribution
####

age_par <- part %>% select(age) %>% group_by(age) %>% dplyr::summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%"))

age_elig <- elig %>% select(age) %>% group_by(age) %>% dplyr::summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%"))

age <- merge(age_par, age_elig, by='age', all = TRUE)
age$Participants_percent <- ifelse(is.na(age$Participants_percent), paste(0,'%'), age$Participants_percent)
age$Participants_total <- ifelse(is.na(age$Participants_total), 0, age$Participants_total)

mean_age_par <- round(mean(part$age),1)
mean_age_elig <- round(mean(elig$age),1)

##### MEAN 
mean_age<-data.frame(mean_age_par,mean_age_elig)
mean_age <- cbind(Description = 'Age (years)', mean_age)
mean_age <- rename(mean_age, Participants = mean_age_par, Eligible=mean_age_elig)
####

age_elig$desc <- 'Eligible'
age_par$desc <- 'Participants'
age_elig <- age_elig %>% rename(total = Eligible_total, percent = Eligible_percent)
age_par <- age_par %>% rename(total = Participants_total, percent = Participants_percent)
age_r <- rbind(age_par,age_elig)

age_plot <- ggplot(age_r, aes(x = age, y = total, group= desc)) +
  geom_point(aes(color = desc), size = 1.5)+
  geom_line(aes(color = desc), size = 1) + 
  ylim(0,23000) +
  theme_light() +
  geom_text(aes(label = paste(percent, '\n\n' )), col ='black', size = 3, fontface ='italic')+
  labs(
    title = "Compare of age distribution (%)",
    x = "Age",
    y = "Total Count"
  )

####
## Gender distribution
####

gender_par <- part %>% select(male) %>% group_by(male) %>% dplyr::summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%"))

gender_elig <- elig %>% select(male) %>% group_by(male) %>% dplyr::summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%"))

gender <- merge(gender_par, gender_elig, by='male', all = TRUE)

####
male <- data.frame(gender_par[2,3], gender_elig[2,3])
male <- cbind(Description = 'Male', male)
male <- rename(male, Participants = Participants_percent, Eligible=Eligible_percent)


####
##  Education distribution
####

education_par <- part %>% select(noedu, primary, lsec, usec, tertiary) %>% 
  group_by(noedu, primary, lsec, usec, tertiary) %>% dplyr::summarise(Participants_total = n(), .groups = 'drop')  %>%
  mutate(Participants = round(100 * Participants_total / sum(Participants_total),2))
education_par <- education_par[!(is.na(education_par$noedu)),]
education_par <- reshape2::melt(education_par, measure.vars = c('noedu', 'primary', 'lsec', 'usec', 'tertiary'))
education_par <- education_par[education_par$value == 1,] 
education_par <- select(education_par, -value)

education_elig <- elig %>% select(noedu, primary, lsec, usec, tertiary) %>% 
  group_by(noedu, primary, lsec, usec, tertiary) %>% dplyr::summarise(Eligible_total = n(), .groups = 'drop')  %>%
  mutate(Eligible = round(100 * Eligible_total / sum(Eligible_total),2))
education_elig <- education_elig[!(is.na(education_elig$noedu)),]
education_elig <- reshape2::melt(education_elig, measure.vars = c('noedu', 'primary', 'lsec', 'usec', 'tertiary'))
education_elig <- education_elig[education_elig$value == 1,] 
education_elig <- select(education_elig, -value)

education <- merge(education_par, education_elig, by='variable', all = TRUE)
education <- select(education, variable, Participants, Eligible)
education <- rename(education, Description = variable)

education <- education%>%mutate(
  Description = case_when(
    education$Description ==  'noedu' ~ 'No education',    
    education$Description ==  'primary' ~ 'Primary', 
    education$Description ==  'lsec' ~ 'Lower secondary',
    education$Description ==  'usec' ~ 'Upper secondary',
    education$Description ==   'tertiary' ~ 'Tertiary', 
    TRUE~as.character(education$Description)
  ) 
)

education <- education %>% group_by(Description) %>%
  dplyr::summarise(Participants = paste0(sum(Participants),  "%"), Eligible = paste0(sum(Eligible),  "%")) 

x <- c('No education','Primary','Lower secondary', 'Upper secondary', 'Tertiary')
education <- education[match(x, education$Description),]

####
##  skills
####

l_skills_par <- part %>%  select(flang) %>%  
  mutate(flang = case_when(part$flang == 1 ~ 'Foreign language')) %>%  
  group_by(flang) %>% summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%")) %>%
  rename(Description = flang) %>% filter(row_number() == 1L)

l_skills_elig <- elig %>%  select(flang) %>%  
  mutate(flang = case_when(elig$flang == 1 ~ 'Foreign language')) %>%  
  group_by(flang) %>% summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) %>%
  rename(Description = flang) %>% filter(row_number() == 1L)

PC_skills_par <- part %>%  select(pc) %>%  
  mutate(pc = case_when(part$pc == 1 ~ 'PC skill')) %>%  
  group_by(pc) %>% summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%")) %>%
  rename(Description = pc) %>% filter(row_number() == 1L)

PC_skills_elig <- elig %>%  select(pc) %>%  
  mutate(pc = case_when(elig$pc == 1 ~ 'PC skill')) %>%  
  group_by(pc) %>% summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) %>%
  rename(Description = pc) %>% filter(row_number() == 1L)

d_skills_par <- part %>%  select(drive) %>%  
  mutate(drive = case_when(part$drive == 1 ~ 'Driving license')) %>%  
  group_by(drive) %>% summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%")) %>%
  rename(Description = drive) %>% filter(row_number() == 1L)

d_skills_elig <- elig %>%  select(drive) %>%  
  mutate(drive = case_when(elig$drive == 1 ~ 'Driving license')) %>%  
  group_by(drive) %>% summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) %>%
  rename(Description = drive) %>% filter(row_number() == 1L)


l_skills <- merge(l_skills_par, l_skills_elig, by='Description', all = TRUE)
PC_skills <- merge(PC_skills_par, PC_skills_elig, by='Description', all = TRUE)
d_skills <- merge(d_skills_par, d_skills_elig, by='Description', all = TRUE)

skills <- rbind(l_skills, PC_skills, d_skills)
skills <- select(skills, Description, Participants_percent, Eligible_percent)
skills <- rename(skills, Participants = Participants_percent, Eligible=Eligible_percent)

####
##  region
####

part <- part %>% mutate(
  okres = case_when(
    grepl('SK010',okres)  ~ 'Bratislavský', 
    grepl('SK021',okres)  ~ 'Trnavský', 
    grepl('SK022',okres)  ~ 'Trenčiansky', 
    grepl('SK023',okres)  ~ 'Nitriansky', 
    grepl('SK031',okres)  ~ 'Žilinský', 
    grepl('SK032',okres)  ~ 'Banskobystrický', 
    grepl('SK041',okres)  ~ 'Prešovský', 
    grepl('SK042',okres)  ~ 'Košický', 
    TRUE~as.character(okres)
  )
)

elig <- elig %>% mutate(
  okres = case_when(
    grepl('SK010',okres)  ~ 'Bratislavský', 
    grepl('SK021',okres)  ~ 'Trnavský', 
    grepl('SK022',okres)  ~ 'Trenčiansky', 
    grepl('SK023',okres)  ~ 'Nitriansky', 
    grepl('SK031',okres)  ~ 'Žilinský', 
    grepl('SK032',okres)  ~ 'Banskobystrický', 
    grepl('SK041',okres)  ~ 'Prešovský', 
    grepl('SK042',okres)  ~ 'Košický', 
    TRUE~as.character(okres)
  )
)

okres_par <- part %>% select(okres) %>% group_by(okres) %>% dplyr::summarise(Participants_total = n())  %>%
  mutate(Participants = paste0(round(100 * Participants_total / sum(Participants_total),2), "%"))

okres_elig <- elig %>% select(okres) %>% group_by(okres) %>% dplyr::summarise(Eligible_total = n())  %>%
  mutate(Eligible = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%"))

okres <- merge(okres_par, okres_elig, by='okres', all = TRUE)
okres <- select(okres, okres, Participants, Eligible)
okres <- rename(okres, Description = okres)
okres <- okres[okres$Description != 'N/A',]

####
##  Previous employment
####

prev_emp_part <- part  %>% select(empl) %>% group_by(empl) %>% summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%")) %>% 
  rename(Description = empl, Participants = Participants_percent)  

prev_emp_elig <- elig  %>% select(empl) %>% group_by(empl) %>% summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) %>%
  rename(Description = empl, Eligible = Eligible_percent)

prev_emp <- merge(prev_emp_part, prev_emp_elig, by='Description', all = TRUE)
prev_emp <- select(prev_emp, Description, Participants, Eligible)
prev_emp <- prev_emp[prev_emp$Description == 1,] 
prev_emp$Description[prev_emp$Description == 1 ] <- 'Previous employment'

####
##  Nationality
####

nat_part <- part  %>% select(slovak, hungarian, roma, czech, othern) %>% 
  group_by(slovak, hungarian, roma, czech, othern) %>% summarise(Participants_total = n(), .groups = 'drop') %>%
  mutate(Participants = paste0(round(100 * Participants_total / sum(Participants_total),2),'%')) 
nat_part$othern <- ifelse(nat_part$othern == 'TRUE', 1,0)
nat_part <- reshape2::melt(nat_part, measure.vars = c('slovak', 'hungarian', 'roma', 'czech', 'othern'))
nat_part <- nat_part[nat_part$value == 1,] 
nat_part <- select(nat_part, -value)

nat_elig <- elig  %>% select(slovak, hungarian, roma, czech, othern) %>% 
  group_by(slovak, hungarian, roma, czech, othern) %>% summarise(Eligible_total = n(), .groups = 'drop') %>%
  mutate(Eligible = paste0(round(100 * Eligible_total / sum(Eligible_total),2),'%')) 
nat_elig$othern <- ifelse(nat_elig$othern == 'TRUE', 1,0)
nat_elig <- reshape2::melt(nat_elig, measure.vars = c('slovak', 'hungarian', 'roma', 'czech', 'othern'))
nat_elig <- nat_elig[nat_elig$value == 1,] 
nat_elig <- select(nat_elig, -value)

nat <-  merge(nat_part, nat_elig, by='variable', all = TRUE)
nat <- select(nat, variable, Participants, Eligible)
nat <- rename(nat, Description = variable)

nat <- nat %>%mutate(
  Description = case_when(
    nat$Description == 'slovak' ~ 'Slovak', 
    nat$Description == 'hungarian' ~ 'Hungarian', 
    nat$Description == 'czech'~ 'Czech', 
    nat$Description == 'roma' ~ 'Roma', 
    nat$Description == 'othern'~ 'Other', 
  ) 
)


x <- c('Slovak','Hungarian', 'Czech','Roma','Other')
nat <- nat[match(x, nat$Description),]

####
##  Length of the unemployment spell
####

part$un_spell <- as.integer(part$exit - part$entry)
elig$un_spell <- as.integer(elig$exit - elig$entry)

un_spell <- cbind(round(mean(part$un_spell),2), round(mean(elig$un_spell),2))
un_spell <- data.frame(cbind(Description = 'Length of the unemployment spell', un_spell))
un_spell <- rename(un_spell, Participants = V2, Eligible=V3)


####
##  Length of spell between unemployment and participation
####

#Rozdiel medzi evidenciou nezamestnanosti a nastúpenia do AOTP 
part$spell_b <- as.integer(part$entrya - part$entry)

spell_bup <- part %>% select(spell_b) %>% 
  mutate(month = ceiling(spell_b/30.417)) #roundup -> ceiling

spell_bup_p <- ggplot(data=spell_bup, aes(month)) + 
  geom_histogram(binwidth=1, fill="grey", color="black", alpha=0.9) +
  theme_light() +
  theme(legend.position = "none")+
  labs(
    title = "Inflow into the programme in months\nsince the start of the unemployment",
    x = "Months",
    y = "Total Count"
  )  + 
  scale_x_continuous()

####
##  Length of AOTP
####

part$spell_aotp <- as.integer(part$exita - part$entrya)

spell_aotp <- part %>% select(spell_aotp) %>% 
  mutate(month = ceiling(spell_aotp/30.417)) #roundup -> ceiling

spell_aotp_p <- ggplot(data=spell_aotp, aes(month)) + 
  geom_histogram(binwidth=1, fill="grey", color="black", alpha=0.9) +
  theme_light() +
  theme(legend.position = "none") +
  labs(
    title = "Length of participation\n (in months)",
    x = "Months",
    y = "Total Count"
  ) + 
  scale_x_continuous(breaks = scales::breaks_extended(length(unique(spell_aotp$month))))


####
##  Compare of length spell
####

spell_p <- ggarrange(spell_bup_p, spell_aotp_p)


####
##  In flow
####

a <- format(seq(as.Date(ep_start),length=3,by="1 month"),"%Y-%m")
b <- format(seq((ymd(as.Date(ep_start)) %m+% months(3)),length=3,by="1 month"),"%Y-%m")
c <- format(seq((ymd(as.Date(ep_start)) %m+% months(6)),length=3,by="1 month"),"%Y-%m")
d <- format(seq((ymd(as.Date(ep_start)) %m+% months(9)),length=3,by="1 month"),"%Y-%m")

#vstúpili do programu
in_part <- part %>% select(entrya) %>% group_by(format(as.Date(entrya),"%Y-%m")) %>%
  summarise(Participants_total = n()) %>%
  mutate(Participants_percent = round(100 * Participants_total / sum(Participants_total),2)) 
colnames(in_part)[1] <- 'Description'
in_part <- filter(in_part, str_detect(in_part$Description, (format(as.Date(ep_start),"%Y"))))

in_part <- in_part%>%mutate(
  Description = case_when(
    in_part$Description %in% a ~  paste0('1Q.', (format(as.Date(ep_start),"%Y"))), 
    in_part$Description %in% b ~  paste0('2Q.', (format(as.Date(ep_start),"%Y"))), 
    in_part$Description %in% c ~  paste0('3Q.', (format(as.Date(ep_start),"%Y"))), 
    in_part$Description %in% d ~  paste0('4Q.', (format(as.Date(ep_start),"%Y"))), 
  ) 
)

in_part <- in_part %>% select(Description, Participants_total) %>% group_by(Description) %>%
  summarise(Participants_total = sum(Participants_total)) %>% 
  mutate(Participants = paste0(round(100 * Participants_total / sum(Participants_total),2),'%'))

#sa stali nezamestnaný 
in_elig <- elig %>% select(entry) %>% group_by(format(as.Date(entry),"%Y-%m")) %>%
  summarise(Eligible_total = n()) %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) 
colnames(in_elig)[1] <- 'Description'
in_elig <- filter(in_elig, str_detect(in_elig$Description, (format(as.Date(ep_start),"%Y"))))

in_elig <- in_elig%>%mutate(
  Description = case_when(
    in_elig$Description %in% a ~ paste0('1Q.', (format(as.Date(ep_start),"%Y"))), 
    in_elig$Description %in% b ~ paste0('2Q.', (format(as.Date(ep_start),"%Y"))), 
    in_elig$Description %in% c ~ paste0('3Q.', (format(as.Date(ep_start),"%Y"))), 
    in_elig$Description %in% d ~ paste0('4Q.', (format(as.Date(ep_start),"%Y"))), 
  ) 
)

in_elig <- in_elig %>% select(Description, Eligible_total) %>% group_by(Description) %>%
  summarise(Eligible_total  = sum(Eligible_total)) %>%
  mutate(Eligible = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) 

inflow <- merge(in_part, in_elig, by='Description', all = FALSE)
inflow <- select(inflow, Description, Participants, Eligible)


####
##  Outflow
####

#vystúpili z programu

out_part <- part %>% select(exita) %>% group_by(format(as.Date(exita),"%Y-%m")) %>%
  summarise(Participants_total = n()) %>%
  mutate(Participants_percent = round(100 * Participants_total / sum(Participants_total),2)) 
colnames(out_part)[1] <- 'Description'
out_part <- filter(out_part, str_detect(out_part$Description, (format(as.Date(ep_start),"%Y"))))

out_part <- out_part%>%mutate(
  Description = case_when(
    out_part$Description %in% a ~ paste0('1Q.', (format(as.Date(ep_start),"%Y"))), 
    out_part$Description %in% b ~ paste0('2Q.', (format(as.Date(ep_start),"%Y"))), 
    out_part$Description %in% c ~ paste0('3Q.', (format(as.Date(ep_start),"%Y"))), 
    out_part$Description %in% d ~ paste0('4Q.', (format(as.Date(ep_start),"%Y"))), 
  ) 
)

out_part <- out_part %>% select(Description, Participants_total) %>% group_by(Description) %>%
  summarise(Participants_total = sum(Participants_total)) %>% 
  mutate(Participants = paste0(round(100 * Participants_total / sum(Participants_total),2),'%'))

#vystúpili z evidencie -> zamestnali sa 
out_elig <- elig %>% select(exit) %>% group_by(format(as.Date(exit),"%Y-%m")) %>%
  summarise(Eligible_total = n()) %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) 
colnames(out_elig)[1] <- 'Description'
out_elig <- filter(out_elig, str_detect(out_elig$Description, (format(as.Date(ep_start),"%Y"))))

out_elig <- out_elig%>%mutate(
  Description = case_when(
    out_elig$Description %in% a ~ paste0('1Q.', (format(as.Date(ep_start),"%Y"))), 
    out_elig$Description %in% b ~ paste0('2Q.', (format(as.Date(ep_start),"%Y"))), 
    out_elig$Description %in% c ~ paste0('3Q.', (format(as.Date(ep_start),"%Y"))), 
    out_elig$Description %in% d ~ paste0('4Q.', (format(as.Date(ep_start),"%Y"))), 
  ) 
)

out_elig <- out_elig %>% select(Description, Eligible_total) %>% group_by(Description) %>%
  summarise(Eligible_total  = sum(Eligible_total)) %>%
  mutate(Eligible = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) 

outflow <- merge(out_part, out_elig, by='Description', all = FALSE)
outflow <- select(outflow, Description, Participants, Eligible)


####
##  Children in the household
####

child_part <- part  %>% select(kids) %>% group_by(kids) %>% summarise(Participants_total = n())  %>%
  mutate(Participants = paste0(round(100 * Participants_total / sum(Participants_total),2), "%")) %>% 
  rename(Description = kids)  

child_elig <- elig  %>% select(kids) %>% group_by(kids) %>% summarise(Eligible_total = n())  %>%
  mutate(Eligible = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) %>%
  rename(Description = kids)

child <- merge(child_part, child_elig, by='Description', all = TRUE)
child <- select(child, Description, Participants, Eligible)
child <- child[child$Description == 1,] 
child$Description[child$Description == 1] <- 'Children in the household'


####
##  Fields of study
####

study_part <- part  %>% select(odbor) %>% group_by(odbor) %>%
  summarise(Participants_total = n()) %>%
  mutate(Participants = round(100 * Participants_total / sum(Participants_total),2)) 

study_elig <- elig  %>% select(odbor) %>% group_by(odbor) %>%
  summarise(Eligible_total = n()) %>%
  mutate(Eligible = round(100 * Eligible_total / sum(Eligible_total),2)) 

study <- merge(study_part, study_elig, by='odbor', all = TRUE)
study <- select(study, odbor, Participants, Eligible)
study$Participants <- ifelse(is.na(study$Participants), 0, study$Participants)

a <- as.character(c(seq(11,19,1)))
b <- as.character(c(seq(21,39,1)))
c <- as.character(c(seq(41,49,1)))
d <- as.character(c(seq(51,59,1)))
e <- as.character(c(seq(61,79,1)))
f <- as.character(c(seq(81,89,1)))
g <- as.character(c(seq(91,98,1)))

study <- study%>%mutate(
  odbor = case_when(
    study$odbor %in% a ~ 'Natural Science', 
    study$odbor %in% b ~ 'Technical sciences', 
    study$odbor %in% c ~ 'Agricultural, forestry and veterinary sciences', 
    study$odbor %in% d ~ 'Medical and pharmaceutical sciences', 
    study$odbor %in% e ~ 'Social sciences and services', 
    study$odbor %in% f ~ 'Sciences of culture and art', 
    study$odbor %in% g ~ 'Military and security sciences',
    study$odbor == 99 || study$odbor == 0 || study$odbor == 10 ~ 'General sciences and services ',
    TRUE~as.character(study$odbor)
  ) 
)

study <- study %>% select(odbor, Participants, Eligible) %>%
  group_by(odbor)  %>% 
  summarise(Participants = sum(Participants), Eligible = sum(Eligible))  %>%
  mutate(Participants = paste0(Participants, "%")) %>%
  mutate(Eligible = paste0(Eligible, "%")) 
study <- rename(study, Description = odbor)


####
##  SUMMARIZE ####
####

colnames(tab1)<-c("Description", "Participants", "Eligible")
colnames(mean_age)<-c("Description", "Participants", "Eligible")
colnames(male)<-c("Description", "Participants", "Eligible")
colnames(prev_emp)<-c("Description", "Participants", "Eligible")
colnames(un_spell)<-c("Description", "Participants", "Eligible")
colnames(child)<-c("Description", "Participants", "Eligible")

tables <- c('tab1', 'mean_age', 'male', 'prev_emp', 'un_spell', 'child')
basics <- data.frame()

for (name in tables){
  table <- get(name)
  table <- mutate(table, across(everything(), as.factor))
  basics <- bind_rows(basics, table)
}

tables <- c('basics', 'education', 'study', 'skills', 'okres', 'nat', 'inflow', 'outflow')

for (name in tables){
  table <- get(name)
  table <- add_column(table, Variable = name, .after = "Eligible")
  colnames(table) <- c("Description", "Participants", "Eligible", "Variable")
  assign(name, table)
}


sum_table <- rbind(basics, education, study, skills, okres, nat, inflow, outflow)
sum_table <- sum_table %>% relocate(Variable, .before = Description) %>%mutate(
  Variable = case_when(
    sum_table$Variable == 'basics' ~ 'Basics',
    sum_table$Variable == 'education' ~ 'Education level',
    sum_table$Variable == 'study' ~ 'Field of study',
    sum_table$Variable == 'skills' ~ 'Skills',
    sum_table$Variable == 'okres' ~ 'Region',
    sum_table$Variable == 'nat' ~ 'Nationality',
    sum_table$Variable == 'inflow' ~ 'Inflow',
    sum_table$Variable == 'outflow' ~ 'Outflow',
    TRUE ~ as.character(sum_table$Variable)
  )
) 

sum_table$Participants <- ifelse(is.na(sum_table$Participants) | gsub('\\D','', sum_table$Participants) == "", 
                                 ifelse(is.na(sum_table$Participants) | gsub('\\D','', sum_table$Participants) == "", 
                                        ifelse(str_detect(as.character(sum_table$Eligible), regex("%")), '0%',0), 
                                        as.character(sum_table$Participants)),
                                 as.character(sum_table$Participants))

sum_table[,2:4]  %>% kbl(format = 'html', booktabs = TRUE , align = 'c', row.names = FALSE) %>%
  column_spec(1,  border_right = TRUE) %>%
  kable_classic('hover', full_width = FALSE) %>%
  pack_rows(index = table(fct_inorder(sum_table$Variable)))

Description	Participants	Eligible
Basics
Number of observations	104	270 096
Age (years)	31.9	36.5
Male	37.5%	51.98%
Previous employment	5.77%	8.75%
Length of the unemployment spell	206.49	245.21
Children in the household	7.69%	11.41%
Education level
No education	0%	0.69%
Primary	1.92%	13.86%
Lower secondary	6.73%	28.3%
Upper secondary	29.81%	35.73%
Tertiary	61.54%	21.42%
Field of study
Agricultural, forestry and veterinary sciences	0.96%	4.25%
General sciences and services	24.04%	36.61%
Medical and pharmaceutical sciences	1.92%	1.49%
Military and security sciences	0%	0.34%
Natural Science	0.96%	0.7%
Sciences of culture and art	1.92%	1.06%
Social sciences and services	50.96%	25.17%
Technical sciences	19.21%	30.38%
Skills
Foreign language	96.15%	62.45%
PC skill	87.5%	56.12%
Driving license	67.31%	54.49%
Region
Banskobystrický	14.42%	12%
Bratislavský	0.96%	11.62%
Košický	18.27%	14.26%
Nitriansky	7.69%	12.86%
Prešovský	27.88%	16.69%
Trenčiansky	12.5%	10.36%
Trnavský	3.85%	9.69%
Žilinský	14.42%	12.52%
Nationality
Slovak	92.31%	90.57%
Hungarian	3.85%	8.2%
Czech	0.96%	0.44%
Roma	0%	0.16%
Other	2.88%	0.65%
Inflow
1Q.2017	13.46%	27.17%
2Q.2017	23.08%	25.53%
3Q.2017	25%	25.96%
4Q.2017	38.46%	21.34%
Outflow
2Q.2017	9.09%	28.66%
3Q.2017	27.27%	23.66%
4Q.2017	63.64%	20.04%

Graph 3: Timing and length of participation in the evaluated program

require(gridExtra)
if (nrow(spell_bup) > 5){
  mx <- max(spell_bup$month)
  mn <- min(spell_bup$month)
  plot1 <- ggplot(data=spell_bup, aes(month)) + 
    geom_histogram(binwidth=3, bins=30, breaks=seq(mn, mx, by=0.5), alpha=0.8,col="white", fill="steelblue3")  +
    theme_minimal()+
    ggtitle("Inflow into the program in months\nsince the start of the unemployment") +
    xlab("Months") +
    ylab("Total Count")+ 
    scale_x_continuous(breaks= pretty_breaks())
  
}

if (nrow(spell_aotp) > 5){
  
  mx <- max(spell_aotp$month)
  mn <- min(spell_aotp$month)
  plot2 <- ggplot(data=spell_aotp, aes(month)) + 
    geom_histogram(binwidth=3, bins=30, breaks=seq(mn, mx, by=0.5), alpha=0.8,col="white", fill="steelblue3")  +
    theme_minimal()+
    ggtitle("Length of participation\n(in months)") +
    xlab("Months") +
    ylab("Total Count")+ 
    scale_x_continuous(breaks= pretty_breaks())
  
}
grid.arrange(plot1, plot2, ncol=2)

3. The impact of participation in the Mobility allowance during 2017

The ultimate objective of ALMP programmes is improving employment chances of its beneficiaries. Constrained by the data provided, we construct four outcome indicators. All four of them are based on the presence in (absence from) the register of JSs administrated by COLSAF. They are, therefore, proxies of the employment status.

#Estimation parameters
#LL: there will be four different samples according to how long have JSs been unemployed prior to receiving the training
Ssamples <- seq(1,4)

# Month of participation since the start of the evaluation period
participation_month<-((year(as.Date(esample$entrya))-min(year(ep_start)))*12)+month(as.Date(esample$entrya)) 

#LL: participation quarter, if JS was treated on 3 Feb, pcpQ == 1
pcpQ<-ceiling(participation_month/3)
max_pcpQ<-max(pcpQ, na.rm = TRUE)

#LL: create an object that will store data
Mdata<-c()

#LL: these are the periods that we will look at
# the negative values correspond to "placebo" effects. We should not see any effect there, or only a small one.
OQm <- seq(-12,36,3)
OQ <- -4:12
O_vars <- c(paste("empl",OQ, sep=""), "firstempl", "cumempl")
O_vars<-str_replace(O_vars, "-", ".")
#LL: O_vars stores outcome variables
# empl.2 means employment 2 quarters before the start of the evaluation period.
# empl0 will correspond to the last quarter of the year before the start of the eval. period
# empl1 will correspond to the first quarter of the start year of the eval. period


#LL: This is a list of baseline covariates that will be used throughout the analysis
# they correspond to a reasonable minimum of information that should be controlled for
# in order to have a meaningful comparison
list_vars <- c('ent', 'male', 'married','kids',
             'slovak', 'noedu','primary', 'lsec', 'usec',
             'flang', 'drive', 'pc',
             'unpast', 'min_urad', 'min_BA',
             'UR_region', 'roma_share', 'population', 'age')

#LL: In order to have a credible comparison groups. We need to look how similar the groups are, how balanced they are.
# we compare the mean differences BEFORE adjustment and AFTER adjustment
Balance_vars <- list_vars

#LL: We allocate objects that will store the results

#LL: number of treated units
N<-nrow(esample[esample$treated==TRUE,])

#LL: number of treated units in a particular esample
N_sp <- matrix(NA, nrow=length(Ssamples))  

#LL: results array ATT. We are interested in the average treatment effect on the TREATED subpopulation.
# in this particular case, most LM programs are intended for a specific subpopulation.
# it is therefore of less interest to look at the whole population of JSs (ATE)
resultsArray_ATT  <- array(NA, dim=c(length(O_vars),length(Ssamples))) 
dimnames(resultsArray_ATT)[[1]] <- c(O_vars)

#LL: we also store standard errors that quantify STATISTICAL uncertainty of our estimates
resultsArray_se  <- array(NA, dim=c(length(O_vars),length(Ssamples))) 
dimnames(resultsArray_se)[[1]] <- c(O_vars)
results <- array(NA, dim=c(length(O_vars),2))

### 1. Counting of ATTs 
Sesample <- esample

###Four sub-samples based on the waiting time until participation in the evaluated measure (cutoffs p25, p50, p75): 
#LL: these will make 4 (roughly equally sized) different groups according to how long JSs have been unemployed
# this is an important determinant of the effect and it is important to control for it
wtc25<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.25)
wtc50<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.50)
wtc75<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.75)

 minToP<-min(as.numeric(Sesample$entrya)-as.numeric(Sesample$entry), na.rm = TRUE)

partic<-Sesample[Sesample$treated==TRUE,]
nonpart<-Sesample[Sesample$treated==FALSE,]

#LL: Nonparticipants have only criterium of minimal length of unemployment
# (thus one non-participant can be used multiple times)
esample1<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>minToP,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc25,])
esample2<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc25,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc25 &
                         as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc50,])
esample3<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc50,], 
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc50 &
                         as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc75,])
esample4<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc75,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc75,])

#LL: We loop over 4 different lenghts of the prior unemployments
for (s in Ssamples) {
    
    esampleS <- get(paste0('esample', s, by = ''))
    
    # we count how many participants are there in a particular group.
    N_sp[s,] <- nrow(esampleS[esampleS$treated==TRUE,])
    
    if (mean(esampleS$treated) > 0.0001){
      
      #LL: we pick a particular subsample of variables.
      esampleS <-esampleS[,c("treated", "klient_id", 
                             "entry", "exit", "entrya", 
                             list_vars)]
      
      # Month of participation since the start of the evaluation period 
      esampleS$participation_month<-((year(as.Date(esampleS$entrya))-year(ep_start))*12)+
        month(as.Date(esampleS$entrya))
      
      #LL: maybe simplify it to
      #month(as.Date(esampleS$entrya))
      #?
      # we can use this with current specification of esample ( participants in the evaluated programme during the evaluation period (ep -> one year) + eligible unemployed at the same period)
      # if we change  specification of esample (i.e. What happened with those ones who became unemployed in a specific year (2017) and they can enter into the program after 3 months since they become unemployed (but they can enter into program 24 months since they become unemployed too))
      
      # Participation quarter
      esampleS$pcpQ<-ceiling(esampleS$participation_month/3)
      
      #LL: instead, we could have just 
      #esampleS$pcpQ<-quarter(as.Date(esampleS$entrya))
      # without the need to define participation_month
      
      # Month of the start of the unemployment since the start of the evaluation period
      esampleS$Ustart_month<-((year(as.Date(esampleS$entry))-min(year(ep_start)))*12)+
        month(as.Date(esampleS$entry)) 
      #LL: notice that this is a relative number, thus it can be negative(!)
      
      # Inflow quarter 
      esampleS$infQ<-ceiling(esampleS$Ustart_month/3)
      #LL: infQ = 0 means that this JS entered the register in Q4 of 2016 (if ep_start=="2017-01-01")
      
      # Outflow quarter
      esampleS$Uend_month<-((year(as.Date(esampleS$exit))-min(year(ep_start)))*12)+
        month(as.Date(esampleS$exit)) 
      esampleS$outQ<-ceiling(esampleS$Uend_month/3)
      
      ##Difference between entry into unemployment register and started of participation in measure
      esampleS$diff_entry <- ceiling(as.integer(as.Date(esampleS$entrya) - as.Date(esampleS$entry))/30.417)
      #LL: NOTEL why 30.417?
      # table(ceiling(difftime(as.Date(esampleS$entrya), as.Date(esampleS$entry), units = "days")/30.417)) 
      # parameters of difftime units  = c("auto", "secs", "mins", "hours", "days", "weeks")
      # haven't "months" unit 
      
      #Adding unemployment history
      #LL: previous unemployment history is an important predictor of both the treatment and outcomes
      # it is important to control for it.
      esampleS<-merge(esampleS, 
                      h_esample[,c("klient_id", "entry", 
                                   paste0("entry", seq(1:11), sep=""), 
                                   paste0("exit", seq(1:11), sep=""))],
                      by=c("klient_id", "entry"), all.y = FALSE)
      
      #Imputing the start of the unemployment spell 
      #LL: wait, isn't it 
      # #Imputing the start of the programme ? YES 
      
      #LL: we wish to make a meaningful comparison.
      # But we're facing a problem because for the non-participants, we don't have date of entry to the course
      # well simply because they did not participate(!)
      # what we do is the following.
      # for every suitable non-participant we chose one particular quarter at random from the evaluation year that is "feasible"
      # "feasible" means that the non-participant _could_have_ potentially participated in that quarter 
      # given that we have a large donor pool, this randomness will not impact our estimates much.
      

            
      #LL: In Pmatrix, we will store for every non-participant, the feasible quarters.
      # e.g. if it is [0 1 1 1], it means that with prob. 1/3 we pick quarter 2,3 or 4
      Pmatrix <- matrix(NA, nrow = nrow(esampleS[esampleS$treated==FALSE,]), ncol=max_pcpQ)
  
          #LL: we loop through the participation quarters
          for (p in 1:max_pcpQ){  
            
            #LL: NOTE!!!! 
            # we restrict ourselves to only non-participants who are at the (beginning of the) particular pcpQ unemployed
            ind <- (esampleS$outQ[esampleS$treated==FALSE]>=p)
            
            # Quarter of inflow to unemployment of participants entering the programme in quarter P
            PinfQ<-unique(esampleS$infQ[esampleS$treated==TRUE & esampleS$pcpQ==p])
            # Only allowing non-participants inflowing to unemployment during the quarters when participants in this sub-sample were inflowing
            #D<-esampleS[as.logical(esampleS$treated==T & esampleS$pcpQ == p) |
            #              as.logical(esampleS$treated==F & esampleS$infQ %in% PinfQ),]
            #LL: NOTE: what is this last commented bit?

            Pmatrix[ind,p]<-as.numeric((esampleS$infQ[esampleS$treated==FALSE])[ind]  %in% PinfQ)
            Pmatrix[!ind,p] <- 0
            
            #LL: does, for this particular quarter of inflow, exists anyone from the list of participants?
            # in other words
            # can we, in a given sample for a given quarter match it to at least one participant?
            
          }
      
        hh <- function(j){
              sample(which(j==1),1)
        }
        
        sumIsZero    <- as.logical(apply(Pmatrix, 1, FUN=sum)==0)
        sumIsNonZero <- as.logical(apply(Pmatrix, 1, FUN=sum)!=0)

        esampleS[esampleS$treated == FALSE,]$pcpQ[sumIsZero] <- 0
        #LL: for every non-participant, for whose inflow we cannot match to ANY participant
        # for any of the four quarters, we assign zero
        
        esampleS[esampleS$treated == FALSE,]$pcpQ[sumIsNonZero] <- apply(Pmatrix[sumIsNonZero,],1,hh)
        #LL: for every non-participant we pick one of the feasible quarters RANDOMLY for given esampleS (we have four of these)
        # (remember: feasible means that there exists at least one participant for inflow)
        #LL: NOTE: did we fix the seed?? if not, we should.
  
        
                #LL: loop across different time periods
                for (q in OQm) { 
                  
                  cond <- FALSE
                  
                  #LL: we add variables whether someone is in the register.
                  # we consider at most 11 unemployment spells.
                    for (n in 1:11) { 
                      entry_n <- esampleS[ ,paste0('entry', n, '')]
                      exit_n <- esampleS[, paste0('exit', n, '')]
                      
                      cond <- cond | isInRegister(p = esampleS$pcpQ, q, entry_n, exit_n) 
                    }
                  
                  esampleS[ ,paste0('empl',ceiling(q/3), '')] <- ifelse(cond, 0,1)
                }
             
           #LL: NOTE: can't this be in the same loop (?)
              for(q in OQm){
                esampleS <- esampleS[!is.na(esampleS[ ,paste0('empl',ceiling(q/3), '')]),]
              }
        
            #Pre-estimation data preparation
            #Additional outcome variables
            esampleS[ ,"cumempl"] <- rowSums(esampleS[, paste0("empl", seq(0,12,1), sep="")])
            esampleS[ ,"firstempl"] <- ceiling(as.numeric((as.Date(esampleS$exit)-
                                                      (as.Date(ep_start) + months(esampleS$pcpQ*3)))/90)) 
            #LL: NOTE: here persons with multiple unempl spell WITHIN the studied year could create problems
            
            # Cleaning the Xs
            y <- data.frame(treated = esampleS[ ,'treated'])
            D <- data.frame(esampleS) #D - su tvoje data ako data.frame()
            spec <- as.formula(cbind(y,D[,c(list_vars, 
                                            paste0("empl.", seq(1,4,1), sep=""))])) 
            #je tvoja specifikacia modelu, je to object as.formula()
            colTh <- 0.8 
            #Treshold for the acceptable correlation between vars ( je maximalne tolerovana korelacia medzi dvoma premennymi)
            dumTh <- 0.0001 
            #Treshold for the acceptable concentration of dummy variables (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            TdumTh <- 0.005
            #Treshold for the acceptable concentration of dummy variables in the sub-group of participants (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            TdumThN <- 5
            #Treshold for the minimal number of observations of a dummy variables in the sub-group of participants (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            result <- DatPrep(D = D, spec = spec, colTh = colTh, dumTh = dumTh, TdumTh = TdumTh, TdumThN = TdumThN) 
            
        for (col in colnames(result$D)){
          D <-  D[!is.na(D[,col]),]
        }    
            
        # ESTIMATION:
        m.1 <- matchit(as.formula(paste(result$spec[2], '~', result$spec[3] , sep = ' ')), 
                       data = D,
                       method = "nearest", 
                       exact = c("infQ"),
                       distance = "glm", link = "probit")
        #plot(summary(m.1))
        m.data1 <- match.data(m.1)
        match.matrix <- data.frame("untreated" = m.1$match.matrix[,1] ,
                                   "treated" = rownames(m.1$match.matrix))
        m.data1[as.character(match.matrix[, 'untreated']), 'entrya'] <- m.data1[as.character(match.matrix[, 'treated']),'entrya']
        m.data1$diff_entry <- ceiling(as.integer(as.Date(m.data1$entrya) - as.Date(m.data1$entry))/30.417)
        
        assign(paste("Mdata",s, sep=""), m.data1)
        Mdata <- bind_rows(Mdata, m.data1)
        

        assign(paste0('balancegraph',s, by=''),summary(m.1, subclass = TRUE))
        
        
        #distance in m.data1 is Propensity score
        #trim = 0.005
        #m.data1 <- m.data1[!(m.data1$distance <= trim | m.data1$distance >= (1-trim)),]
  
        
            for (iQ in O_vars){
              fit <- lm(as.formula(paste(iQ , '~',  result$spec[2], '+' , result$spec[3], by = ' ')), 
                        data = m.data1, 
                        weights = weights)
              res <- coeftest(fit, vcov. = vcovCL, cluster = ~subclass)
              att <- res[2,1]
              se <- res[2,2]
              
              resultsArray_ATT[iQ,s] <- att
              resultsArray_se[iQ,s] <- se
              
        }
    }
  } 


for (iQ in 1:length(O_vars)){
      results[iQ,1] <- sum((resultsArray_ATT[iQ,]*(N_sp/N)))
      results[iQ,2] <- sum((resultsArray_se[iQ,]*(N_sp/N)))
      #results[iQ,2] <- sum((resultsArray_se[iQ,]^2*(N_sp/N)))
}

# LL: sanity check
#plot(resultsArray_ATT[1:17,1])
#lines(resultsArray_ATT[1:17,2])
#lines(resultsArray_ATT[1:17,3])
#lines(resultsArray_ATT[1:17,4])

results<-cbind(O_vars, results)   

#Mdata<-rbind(Mdata1, Mdata2, Mdata3, Mdata4)

resultsPSM <- matrix(NA, nrow=length(O_vars), ncol = 2)

for (iQ in 1:length(O_vars)){
            fit <- lm(as.formula(paste(O_vars[iQ] , '~',  result$spec[2], '+' , result$spec[3], by = ' ')), 
                      data = Mdata, 
                      weights = weights)
            res <- coeftest(fit, vcov. = vcovCL, cluster = ~subclass)
            #LL: source of clustering?
            att <- res[2,1]
            se <- res[2,2]
            
            resultsPSM[iQ,1] <- att
            resultsPSM[iQ,2] <- se
            
        }

resultsPSM<-cbind(O_vars, resultsPSM)

#LL: why do results and results PSM give different values?
# I see, it is the standard errors. They are effectively cut in half
# that is in line with square-root convergence, because the sample size is quadrupled.

Graph 4: Balance of the groups of participants and eligible before and after weighting

match.vars <- c('distance',list_vars,paste0("empl.", seq(1,4,1), sep=""),'infQ')


balance_g  <- array(NA, dim=c(length(match.vars),2,length(Ssamples))) 
dimnames(balance_g)[[1]] <-match.vars
dimnames(balance_g)[[2]] <- c('Balance_for_All_Data','Balance_for_Matched_Data')

for(i in 1:length(Ssamples)){
  
  if(exists(paste0('balancegraph', i, by = ''))){
    x <- get(paste0('balancegraph', i, by = ''))
    balance_g[,1,i] <- abs(x$sum.all[,3][match(rownames(balance_g), names(x$sum.all[,3]))])
    balance_g[,2,i] <- abs(x$sum.matched[,3][match(rownames(balance_g), names(x$sum.matched[,3]))])
  }                  
}

balance_f <- array(NA, dim=c(length(match.vars),2)) 
dimnames(balance_f)[[1]] <-match.vars
dimnames(balance_f)[[2]] <- c('Balance_for_All_Data','Balance_for_Matched_Data') 

for (iQ in 1:length(match.vars)){
      balance_f[iQ,1] <- sum(sum((as.numeric(balance_g[iQ,1,])*(N_sp/N)), na.rm = TRUE), na.rm = TRUE)
      balance_f[iQ,2] <-sum(sum((as.numeric(balance_g[iQ,2,])*(N_sp/N)), na.rm = TRUE), na.rm = TRUE)
}

balance_fg <- data.frame(balance = c(balance_f[,1], balance_f[,2]),
                 Balance = c(rep("before matching",length(match.vars)),rep("after matching",length(match.vars))),
                 names = c(match.vars,match.vars) ) 

p <- balance_fg %>% 
    mutate(names = fct_reorder(names, balance)) %>%
    ggplot(aes(x=balance, y=names,col=Balance)) + 
    geom_vline(xintercept = 0.05, linetype="dotted", color = 'darkgrey') + 
    geom_vline(xintercept = 0.0, color = 'darkgrey')+
    geom_vline(xintercept = 0.1, color = 'darkgrey')+
    geom_point()+
    theme_minimal()+
    ylab('Variables')+
    xlab('Absolute Standardized Meand Difference')+
    theme(plot.caption = element_text(hjust = 0), legend.position = "top")+
    labs(caption="Source: ÚPSVaR")
  
p

After inspecting the achieved balance of the “control group” and participants, we can proceed with reporting the estimated effects of participation in the Mobility allowance during 2017.

3.1 Labour market outcomes of participants and eligible

First, we construct a proxy of the employment rate. It is a share of persons out of the register of unemployed JSs observed on the first day of quarters before and after participation (Quarter 0 was the quarter when participation started).

Graph 5: The share of individuals out of the unemployment register (proxy for employment rate)

graph <- matrix(NA, nrow=length(O_vars[1:17]), ncol = 2)
rownames(graph) <- O_vars[1:17]
for(o in O_vars[1:17]){
  graph[o,1] <- apply(Mdata[Mdata$treated == TRUE,o,drop = FALSE], 2, mean, na.rm = TRUE)
  graph[o,2] <- apply(Mdata[Mdata$treated == FALSE,o,drop = FALSE], 2, mean, na.rm = TRUE)
}

graph <- data.frame(cbind(O_vars[1:17],graph))
colnames(graph) <- c('O_vars','treated', 'untreated')
graph$treated <- as.numeric(as.character(graph$treated))
graph$untreated <- as.numeric(as.character(graph$untreated))
graph$O_vars <- factor(graph$O_vars, levels=c(O_vars[1:17]))
graph$Q<-seq(-4,12,1)

graphER<- ggplot(data=graph) +
  geom_line(aes(x=Q, y=treated, colour="Treated"), size = 1 , group = 1) +
  geom_line(aes(x=Q, y=untreated, colour="Control group"), size = 1 , group = 1) +
  scale_x_continuous(breaks=seq(-4,12,1))+
  theme_minimal()+
  labs( 
       y = "Share of treated",
       x = "Quarters before and after the start of the participation (0)",
       colour = "Group", 
       caption="Source: ÚPSVaR") +
  theme(plot.caption = element_text(hjust = 0), legend.position = "top")

graphER

Second, we look at the number of quarters until the first exit from the register of unemployed JSs. This indicates how participation in the ALMP programme contributed to the shortening of the participants’ unemployment spell. The graph displays the whole distribution of participants and the control group based on the values of this outcome indicator.

Graph 6: Quarters until the first exit

#### THE NUMBER OF MONTHS UNTIL THE FIRST EXIT
########## Months until the first exit
# how many months after the entrya JS got a job

firstempl_P <- data.frame('quarter' = Mdata[Mdata$treated==TRUE, 'firstempl']) %>% 
  group_by(quarter) %>% 
  summarise(total = n(), .groups = 'drop') %>%
  mutate(percent =  round( total / sum(total), 4),
         Group = "Participants") 

firstempl_E <- data.frame('quarter' = Mdata[Mdata$treated==FALSE, 'firstempl']) %>% 
  group_by(quarter) %>% 
  summarise(total = n(), .groups = 'drop') %>%
  mutate(percent =  round( total / sum(total), 4),
         Group = "Eligible") 

firstempl <- rbind(firstempl_P, firstempl_E)

ggplot(firstempl, aes(fill=Group, y=percent, x=quarter)) + 
          geom_bar(position="dodge", stat="identity") +
          facet_wrap(~Group) +
          theme_minimal() + 
          scale_fill_manual(values=c('grey', 'steelblue3')) +
          scale_x_continuous(breaks=seq(0,16,1)) +
          theme(legend.position="none") +
          xlab("Quarters after the start of the participation (0)") + 
          ylab("") +
          labs(caption="Source: COLSAF")+
          scale_y_continuous(labels = percent)

The following graph displays the cumulative number of months spent out of the register during the observation period following the evaluation program’s end.

Graph 7: Number of quarters out of the unemployment register after the end of the participation

## Plotting the number of months in cumulative employment
###### Participants

empl36m_P <- data.frame('quarter' = Mdata[Mdata$treated==TRUE, 'cumempl']) %>% 
  group_by(quarter) %>% 
  summarise(total = n(), .groups = 'drop') %>%
  mutate(percent =  round( total / sum(total), 4),
         Group = "Participants") 

empl36m_E <- data.frame('quarter' = Mdata[Mdata$treated==FALSE, 'cumempl']) %>% 
  group_by(quarter) %>% 
  summarise(total = n(), .groups = 'drop') %>%
  mutate(percent =  round( total / sum(total), 4),
         Group = "Eligible") 

empl36m <- rbind(empl36m_P, empl36m_E)

ggplot(empl36m, aes(fill=Group, y=percent, x=quarter)) + 
  geom_bar(position="dodge", stat="identity") +
  facet_wrap(~Group) +
  theme_minimal() + 
  scale_fill_manual(values=c('grey', 'steelblue3')) +
  scale_x_continuous(breaks=seq(0,16,1)) +
  theme(legend.position="none") +
  xlab("Quarters after the start of the participation (0)") + 
  ylab("") +
  labs(caption="Source: COLSAF")+
  scale_y_continuous(labels = percent)

3.2 Estimation of the average treatment effects on the treated (ATTs)

Now, we explore the average treatment effects on the treated (ATTs).

Graph 8: The average treatment effects on the treated (ATTs) estimated for the participation in the Mobility allowance during 2017

graphATT

3.3 Statistical significance and heterogeneity of ATT effects

Further, we report the ATTs on the quarterly proxy for employment status, complemented by the indicators of:

Number of months until the first exit from the unemployment register JSs (firtempl)
Cumulative number of months outside the unemployment register of unemployed JSs (cumempl)

Table 3: The average treatment effect on the treated (ATT)

resultsDF <- select(resultsDF, -pval)
colnames(resultsDF) <- c("", "effect", "se", "sig.")

resultsDF %>% kbl(format = 'html', booktabs = TRUE , align = 'c') %>%
  column_spec(1,  border_right = TRUE) %>%
  kable_classic('hover', full_width = FALSE) %>%
  row_spec(1:nrow(resultsDF), font_size = '1cm') %>%
  column_spec(1:ncol(resultsDF),width = "1.1cm") %>%
  kableExtra::footnote(number = paste('Significance (sig.):',  0.0000, ' " *** " ', 0.001, ' " ** " ', 0.01, ' " * " ', 0.05, ' " . " ', 0.1, '" "',  1))

	effect	se	sig.
empl.4	-0.005	0.056
empl.3	0.027	0.046
empl.2	0.058	0.053
empl.1	0.151	0.046	***
empl0	0.758	0.043	***
empl1	0.657	0.037	***
empl2	0.564	0.055	***
empl3	0.424	0.056	***
empl4	0.307	0.045	***
empl5	0.251	0.045	***
empl6	0.212	0.030	***
empl7	0.186	0.038	***
empl8	0.167	0.051	***
empl9	0.159	0.048	***
empl10	0.146	0.039	***
empl11	0.090	0.036
empl12	0.122	0.044	**
firstempl	-3.977	0.294	***
cumempl	4.044	0.285	***
¹ Significance (sig.): 0 " * " 0.001 " " 0.01 " * " 0.05 " . " 0.1 " " 1

Additionally, we explore the heterogeneity in the effects programme participation has on various subgroups of participants.

We explore the ATTs for subgroups based on:

Gender - differences in effects for women and men
Education level - differences in effects for JSs with no education, primary education, lower and secondary education.
Share of Roma in the city of residence - differences in effects for JSs who lives in the city with the share of Roma below 10% and above 10%
The size of the city of residence - differences in effects for JSs who lives in the city with a population less than 4 000 and more than 4 000

Table 4: The average treatment effect on the treated: subgroup based difference between entry into unemployment register and started of participation in measure

diff_entry

				Timing of participation
	Sample			0-6 months			7-12 months			13+ months
	effect	se	sig.	effect	se	sig.	effect	se	sig.	effect	se	sig.
empl.4	-0.005	0.056		-0.064	0.036	.	0.178	0.132		-0.601	0.296
empl.3	0.027	0.046		0.041	0.045		0.167	0.130		0.000	0.000
empl.2	0.058	0.053		0.157	0.079		-0.079	0.046	.	0.000	0.000
empl.1	0.151	0.046	***	0.140	0.056		0.225	0.073	**	0.431	0.458
empl0	0.758	0.043	***	0.641	0.091	***	0.868	0.073	***	1.000	0.000	***
empl1	0.657	0.037	***	0.582	0.075	***	0.685	0.112	***	0.355	0.296
empl2	0.564	0.055	***	0.485	0.086	***	0.637	0.098	***	0.135	0.324
empl3	0.424	0.056	***	0.352	0.082	***	0.477	0.115	***	-0.010	0.444
empl4	0.307	0.045	***	0.251	0.063	***	0.456	0.075	***	-0.247	0.260
empl5	0.251	0.045	***	0.251	0.051	***	0.360	0.097	***	-0.302	0.160	.
empl6	0.212	0.030	***	0.178	0.061	**	0.306	0.082	***	0.075	0.150
empl7	0.186	0.038	***	0.147	0.063		0.370	0.084	***	0.038	0.138
empl8	0.167	0.051	***	0.171	0.051	***	0.196	0.099		-0.115	0.196
empl9	0.159	0.048	***	0.160	0.059	**	0.139	0.076	.	0.038	0.138
empl10	0.146	0.039	***	0.165	0.058	**	-0.005	0.060		0.038	0.138
empl11	0.090	0.036		0.086	0.056		-0.087	0.083		0.112	0.345
empl12	0.122	0.044	**	0.121	0.069	.	0.040	0.080		0.301	0.392
firstempl	-3.977	0.294	***	-3.919	0.478	***	-4.408	0.564	***	-1.157	0.988
cumempl	4.044	0.285	***	3.588	0.471	***	4.442	0.631	***	1.417	1.191
¹ Significance codes (sig.): 0 " ** " 0.001 " * " 0.01 " * " 0.05 " . " 0.1 " " 1
² Sample: 208 observations 0-6 months: 114 observations 7-12 months: 65 observations 13+ months: 29 observations

Table 5: The average treatment effect on the treated: subgroup based on gender

gender

				Gender
	Sample			Women			Men
	effect	se	sig.	effect	se	sig.	effect	se	sig.
empl.4	-0.005	0.056		-0.063	0.066		0.114	0.071
empl.3	0.027	0.046		0.015	0.063		-0.033	0.048
empl.2	0.058	0.053		0.071	0.088		-0.009	0.098
empl.1	0.151	0.046	***	0.195	0.060	***	0.054	0.076
empl0	0.758	0.043	***	0.755	0.050	***	0.722	0.067	***
empl1	0.657	0.037	***	0.666	0.062	***	0.511	0.066	***
empl2	0.564	0.055	***	0.575	0.077	***	0.427	0.085	***
empl3	0.424	0.056	***	0.408	0.090	***	0.372	0.081	***
empl4	0.307	0.045	***	0.265	0.072	***	0.339	0.084	***
empl5	0.251	0.045	***	0.211	0.057	***	0.335	0.075	***
empl6	0.212	0.030	***	0.204	0.036	***	0.278	0.055	***
empl7	0.186	0.038	***	0.198	0.043	***	0.212	0.070	**
empl8	0.167	0.051	***	0.186	0.075		0.108	0.065	.
empl9	0.159	0.048	***	0.149	0.057	**	0.187	0.054	***
empl10	0.146	0.039	***	0.149	0.056	**	0.122	0.066	.
empl11	0.090	0.036		0.107	0.053		0.040	0.085
empl12	0.122	0.044	**	0.156	0.055	**	0.113	0.086
firstempl	-3.977	0.294	***	-4.146	0.411	***	-3.378	0.495	***
cumempl	4.044	0.285	***	4.029	0.417	***	3.766	0.421	***
¹ Significance codes (sig.): 0 " ** " 0.001 " * " 0.01 " * " 0.05 " . " 0.1 " " 1
² Sample: 208 observations Women: 133 observations Men: 75 observations

Table 6: The average treatment effect on the treated: subgroup based on education level

education

				Education
	Sample			Primary			Lower secondary			Upper secondary
	effect	se	sig.	effect	se	sig.	effect	se	sig.	effect	se	sig.
empl.4	-0.005	0.056		0.000	NaN		-0.226	NaN		-0.050	0.127
empl.3	0.027	0.046		0.000	NaN		-1.220	NaN		-0.070	0.101
empl.2	0.058	0.053		0.000	NaN		0.000	NaN		0.107	0.076
empl.1	0.151	0.046	***	0.000	NaN		0.000	NaN		0.171	0.080
empl0	0.758	0.043	***	1.000	NaN		1.774	NaN		0.639	0.106	***
empl1	0.657	0.037	***	1.000	NaN		1.000	NaN		0.498	0.091	***
empl2	0.564	0.055	***	-14.944	NaN		1.000	NaN		0.411	0.082	***
empl3	0.424	0.056	***	1.000	NaN		1.000	NaN		0.395	0.079	***
empl4	0.307	0.045	***	1.000	NaN		1.000	NaN		0.287	0.067	***
empl5	0.251	0.045	***	1.000	NaN		1.000	NaN		0.300	0.087	***
empl6	0.212	0.030	***	0.000	NaN		1.774	NaN		0.235	0.076	**
empl7	0.186	0.038	***	0.000	NaN		1.774	NaN		0.244	0.072	***
empl8	0.167	0.051	***	1.000	NaN		1.774	NaN		0.108	0.064	.
empl9	0.159	0.048	***	1.000	NaN		1.774	NaN		0.074	0.068
empl10	0.146	0.039	***	1.000	NaN		0.774	NaN		0.106	0.041	**
empl11	0.090	0.036		1.000	NaN		0.774	NaN		0.047	0.090
empl12	0.122	0.044	**	15.944	NaN		1.000	NaN		0.117	0.086
firstempl	-3.977	0.294	***	-7.000	NaN		-9.353	NaN		-2.829	0.533	***
cumempl	4.044	0.285	***	10.000	NaN		16.419	NaN		3.460	0.584	***
¹ Significance codes (sig.): 0 " ** " 0.001 " * " 0.01 " * " 0.05 " . " 0.1 " " 1
² Sample: 208 observations Primary: 6 observations Lower secondary: 10 observations Upper secondary: 69 observations

Table 7: The average treatment effect on the treated: subgroup based on share of Roma in the city of residence

romas

				Roma share
	Sample			0-10%			10-100%
	effect	se	sig.	effect	se	sig.	effect	se	sig.
empl.4	-0.005	0.056		0.016	0.048		0.158	0.329
empl.3	0.027	0.046		0.016	0.048		-0.084	0.234
empl.2	0.058	0.053		0.072	0.063		-0.301	0.253
empl.1	0.151	0.046	***	0.149	0.051	**	0.315	0.222
empl0	0.758	0.043	***	0.719	0.050	***	1.000	0.000	***
empl1	0.657	0.037	***	0.623	0.044	***	0.345	0.215
empl2	0.564	0.055	***	0.520	0.062	***	0.461	0.276	.
empl3	0.424	0.056	***	0.376	0.068	***	0.450	0.299
empl4	0.307	0.045	***	0.274	0.050	***	0.264	0.298
empl5	0.251	0.045	***	0.202	0.049	***	0.490	0.221
empl6	0.212	0.030	***	0.155	0.034	***	0.443	0.258	.
empl7	0.186	0.038	***	0.140	0.036	***	0.252	0.179
empl8	0.167	0.051	***	0.110	0.058	.	0.181	0.195
empl9	0.159	0.048	***	0.107	0.054		0.466	0.285
empl10	0.146	0.039	***	0.111	0.043	**	0.466	0.285
empl11	0.090	0.036		0.035	0.042		0.466	0.285
empl12	0.122	0.044	**	0.088	0.045	.	0.386	0.309
firstempl	-3.977	0.294	***	-3.554	0.323	***	-4.370	1.898
cumempl	4.044	0.285	***	3.459	0.296	***	5.668	1.801	**
¹ Significance codes (sig.): 0 " ** " 0.001 " * " 0.01 " * " 0.05 " . " 0.1 " " 1
² Sample: 208 observations 0-10%: 178 observations 10-100%: 30 observations

Table 8: The average treatment effect on the treated: subgroup based on the size of the city of residence

City

				Type of city
	Sample			Village			City
	effect	se	sig.	effect	se	sig.	effect	se	sig.
empl.4	-0.005	0.056		0.002	0.081		-0.026	0.063
empl.3	0.027	0.046		0.033	0.085		0.004	0.058
empl.2	0.058	0.053		0.016	0.072		0.069	0.087
empl.1	0.151	0.046	***	0.122	0.104		0.159	0.062	**
empl0	0.758	0.043	***	0.871	0.060	***	0.688	0.058	***
empl1	0.657	0.037	***	0.754	0.091	***	0.539	0.062	***
empl2	0.564	0.055	***	0.635	0.111	***	0.457	0.067	***
empl3	0.424	0.056	***	0.441	0.090	***	0.355	0.073	***
empl4	0.307	0.045	***	0.308	0.070	***	0.265	0.047	***
empl5	0.251	0.045	***	0.208	0.084		0.262	0.067	***
empl6	0.212	0.030	***	0.182	0.062	**	0.238	0.046	***
empl7	0.186	0.038	***	0.135	0.079	.	0.218	0.048	***
empl8	0.167	0.051	***	0.151	0.065		0.153	0.081	.
empl9	0.159	0.048	***	0.180	0.062	**	0.168	0.065	**
empl10	0.146	0.039	***	0.079	0.064		0.171	0.052	***
empl11	0.090	0.036		0.079	0.064		0.071	0.059
empl12	0.122	0.044	**	0.067	0.103		0.146	0.062
firstempl	-3.977	0.294	***	-4.432	0.622	***	-3.569	0.473	***
cumempl	4.044	0.285	***	4.089	0.524	***	3.732	0.384	***
¹ Significance codes (sig.): 0 " ** " 0.001 " * " 0.01 " * " 0.05 " . " 0.1 " " 1
² Sample: 208 observations Village: 76 observations City: 132 observations

4. Technical appendix

The technical appendix provides additional details on the data processing procedures, complemented by results obtained by an alternative estimation strategy based on inverse probability weighting.

4.1 Sample Selection

This section displays additional detail about the selection and cleaning of the evaluation sample.

info_table <- data.frame(matrix(ncol=2,nrow=0, dimnames=list(NULL, c("Desciption", "Value"))))
info_table[1,] <- c('Start of the evaluation period', paste(ep_start))
info_table[2,] <- c('End of the evaluation period', paste(ep_end))

info_table <-
  info_table[,-3] %>% kbl(format = 'html', booktabs = TRUE , align = 'c', row.names = FALSE) %>%
  column_spec(1,  border_right = TRUE) %>%
  kable_classic('hover', full_width = FALSE)


sample_selection <- data.frame(matrix(ncol=3,nrow=0, dimnames=list(NULL, c("Desciption", "Dropped","Total"))))

sample_selection[1,] <- c('Total registrations ', 0,format(n1, big.mark=" ", scientific=FALSE))
sample_selection[2,] <- c('Total eligible jobseekers', 0,format(n2, big.mark=" ", scientific=FALSE))
sample_selection[3,] <- c('Total participations in the evaluated measure', 0,format(npart1, big.mark=" ", scientific=FALSE))
sample_selection[4,] <- c('Total participants in the evaluated measure', 0 ,format(npart0, big.mark=" ", scientific=FALSE))

x<-n1-n4
y<-npart1-n3
sample_selection[5,] <- c('Dropped eligible JSs', format(n4, big.mark=" ", scientific=FALSE), format(x, big.mark=" ", scientific=FALSE))
sample_selection[6,] <- c('Dropped participants', format(n3, big.mark=" ", scientific=FALSE), format(y, big.mark=" ", scientific=FALSE))

x<-x-n6
y<-y-n5
sample_selection[7,] <- c('Dropped eligible JSs', format(n6, big.mark=" ", scientific=FALSE),format(x, big.mark=" ", scientific=FALSE))
sample_selection[8,] <- c('Dropped participants', format(n5, big.mark=" ", scientific=FALSE),format(y, big.mark=" ", scientific=FALSE))

y<-y-n7
sample_selection[9,] <- c('Dropped participants', format(n7, big.mark=" ", scientific=FALSE),format(y, big.mark=" ", scientific=FALSE))

x<-x-n9
y<-y-n8
sample_selection[10,] <- c('Dropped eligible JSs', format(n9, big.mark=" ", scientific=FALSE),format(x, big.mark=" ", scientific=FALSE))
sample_selection[11,] <- c('Dropped participants', format(n8, big.mark=" ", scientific=FALSE),format(y, big.mark=" ", scientific=FALSE))

y<-nrowdfa-n10
sample_selection[12,] <- c('Inflating participants by merging dataframes', 0,format(nrowdfa, big.mark=" ", scientific=FALSE))
sample_selection[13,] <- c('Dropped participants', format(n10, big.mark=" ", scientific=FALSE),format(y, big.mark=" ", scientific=FALSE))

y<-y-n11
sample_selection[14,] <- c('Dropped participants', format(n11, big.mark=" ", scientific=FALSE),format(y, big.mark=" ", scientific=FALSE))

sample_selection[15,] <- c('Eligibles', format(x-(length(unique(esample$klient_id[esample$treated==FALSE]))), big.mark=" ", scientific=FALSE), format(length(unique(esample$klient_id[esample$treated==FALSE])), big.mark=" ", scientific=FALSE))
sample_selection[16,] <- c('Participants', format(y-length(unique(esample$klient_id[esample$treated==TRUE])), big.mark=" ", scientific=FALSE), format(length(unique(esample$klient_id[esample$treated==TRUE])), big.mark=" ", scientific=FALSE))

sample_selection[17,] <- c('Eligibles', 0, format(length(unique(esample$klient_id[esample$treated==FALSE])), big.mark=" ", scientific=FALSE))
sample_selection[18,] <- c('Participants', 0, format(length(unique(esample$klient_id[esample$treated==TRUE])), big.mark=" ", scientific=FALSE))

sample_selection$Variable <- c('All registrations (before cleaning)', 'All registrations (before cleaning)', 'All registrations (before cleaning)','All registrations (before cleaning)',
                             
                             'Dropped JSs with ALMP participation 2 years before the EP', 'Dropped JSs with ALMP participation 2 years before the EP',
                             
                             'Dropped JSs with ALMP participation in other ALMP during the EP', 'Dropped JSs with ALMP participation in other ALMP during the EP',
                             
                             'Dropped participants with multiple ALMP participations in evaluated programme',
                             
                             'Dropped JSs with ALMP participation in supported employment programs (ALMP) during the EP',  'Dropped JSs with ALMP participation in supported employment programs (ALMP) during the EP',
                             
                             'Dropped participations which not happening during an unemployment spell','Dropped participations which not happening during an unemployment spell', 
                             
                             'Dropped participations with extreme values (1%) of the waiting time until participation in the evaluated measure ',
                             
                             'Dropped JSs with multiple registrations', 'Dropped JSs with multiple registrations',
                             
                             'Total registrations after cleaning', 'Total registrations after cleaning'
)

sample_selection <-sample_selection[,-4] %>% kbl(format = 'html', booktabs = TRUE , align = 'c', row.names = FALSE) %>%
  column_spec(1,  border_right = TRUE) %>%
  kable_classic('hover', full_width = FALSE)%>%
  pack_rows(index = table(fct_inorder(sample_selection$Variable)))

Table 9: Information table about the beginning and end of the evaluation period

info_table

Desciption	Value
Start of the evaluation period	2017-01-01
End of the evaluation period	2017-12-31

Table 10: Data cleaning documentation

sample_selection

Desciption	Dropped	Total
All registrations (before cleaning)
Total registrations	0	443 969
Total eligible jobseekers	0	395 328
Total participations in the evaluated measure	0	816
Total participants in the evaluated measure	0	808
Dropped JSs with ALMP participation 2 years before the EP
Dropped eligible JSs	56 701	387 268
Dropped participants	192	624
Dropped JSs with ALMP participation in other ALMP during the EP
Dropped eligible JSs	53 500	333 768
Dropped participants	199	425
Dropped participants with multiple ALMP participations in evaluated programme
Dropped participants	2	423
Dropped JSs with ALMP participation in supported employment programs (ALMP) during the EP
Dropped eligible JSs	31 177	302 591
Dropped participants	18	405
Dropped participations which not happening during an unemployment spell
Inflating participants by merging dataframes	0	422
Dropped participants	316	106
Dropped participations with extreme values (1%) of the waiting time until participation in the evaluated measure
Dropped participants	2	104
Dropped JSs with multiple registrations
Eligibles	32 495	270 096
Participants	0	104
Total registrations after cleaning
Eligibles	0	270 096
Participants	0	104

4.2 Description of explanatory variables (used in balancing)

Description of variables, used in the estimation of ATTs. We also show their balance before and after matching in graph 4 and balance before and after weighting in graph 10.

Table 11: List and description of variables used in estimation

#TREBA DOPLNIŤ DO POSTUP_EXPORT
Q[["df"]][["flang"]] <- "knowledge of a foreign language (1: Yes, 0: No)"
Q[["df"]][["drive"]] <- "driving license holder (1: Yes, 0: No)"
Q[["df"]][["pc"]] <- "computer skills (1: Yes, 0: No)"
Q[["df"]][["unpast"]] <- "registered citizen in the past (1: Yes, 0: No), unemployed in past"
Q[["df"]][["min_urad"]] <- "traveling time to the nearest Labour office (in minutes)"
Q[["df"]][["min_BA"]] <- "traveling time to the Capital city - Bratislava (in minutes)"
Q[["df"]][["population"]] <- "number of inhabitants of the place of residence"



list_vars_table <- data.frame(Variables = list_vars)

desc <- c()
for (i in list_vars_table$Variables){
  Q[["df"]][[i]][1]
  desc <- append(desc, ifelse(is.null(Q[["df"]][[i]][1]), NA, Q[["df"]][[i]][1]))
}

list_vars_table$Description <- desc
list_vars_table[list_vars_table$Variables == "ent","Description"] <- "difference between entry into unemployment register and started of evalueted period"
list_vars_table[list_vars_table$Variables == "UR_region","Description"] <- "unemployment rate in region during valuated period"
list_vars_table[list_vars_table$Variables == "roma_share","Description"] <- "share of Roma in the place of residence"

list_vars_table[,-3] %>% kbl(format = 'html', booktabs = TRUE , align = 'c', row.names = FALSE) %>%
  column_spec(1,  border_right = TRUE) %>%
  kable_classic('hover', full_width = FALSE)

Variables	Description
ent	difference between entry into unemployment register and started of evalueted period
male	combination of klient_id and the date of entry is an Unique ID. (we created and used dup column to identify duplicates and remove them)
married	marital status: single
kids	nationality: other
slovak	marital status: other
noedu	kids under 10 years
primary	level of education: no education
lsec	level of education: primary
usec	level of education: ower secondary
flang	knowledge of a foreign language (1: Yes, 0: No)
drive	driving license holder (1: Yes, 0: No)
pc	computer skills (1: Yes, 0: No)
unpast	registered citizen in the past (1: Yes, 0: No), unemployed in past
min_urad	traveling time to the nearest Labour office (in minutes)
min_BA	traveling time to the Capital city - Bratislava (in minutes)
UR_region	unemployment rate in region during valuated period
roma_share	share of Roma in the place of residence
population	number of inhabitants of the place of residence
age	gender (1: man, 0: woman)

4.3 Alternative estimation strategy (Inverse probability weighting)

In this section, we explore the sensitivity of our results to an alternative model specifications. The main results, reported above, were obtained by [propensity score matching] (https://cran.r-project.org/web/packages/MatchIt/MatchIt.pdf). An alternative estimation strategy was applied using an [inverse probability weighting estimator] (https://www.rdocumentation.org/packages/causalweight/versions/1.0.2/topics/treatweight).

#Estimation parameters
Ssamples <- seq(1,4)
participation_month<-((year(as.Date(esample$entrya))-min(year(ep_start)))*12)+month(as.Date(esample$entrya)) # Month of participation since the start of the evaluation period
pcpQ<-ceiling(participation_month/3)
max_pcpQ<-max(pcpQ, na.rm = TRUE)

OQm <- seq(-12,36,3)
OQ <- -4:12
O_vars <- c(paste("empl",OQ, sep=""), "firstempl", "cumempl")
O_vars<-str_replace(O_vars, "-", ".")

list_vars <- c('ent', 'male', 'married','kids',
           'slovak', 'noedu','primary', 'lsec', 'usec',
           'flang', 'drive', 'pc',
           'unpast', 'min_urad', 'min_BA',
           'UR_region', 'roma_share', 'population', 'age')

#     # All potentially useful explanatory variables (Xs)
#     list_vars <- c('ent', 'male', 'single', 'married','kids',
#                    'slovak', 'hungarian', 'roma', 
#                    'noedu','primary', 'lsec', 'usec', 'tertiary'
#                    , 'zaujem_vzdel',
#                    'flang', 'drive', 'pc',
#                    'healthy', 'barrier', 'graduate', 'ziad_undn_sp', 'cvyhl_poisteu', 
#                    'empl', 'unpast', 'employee', 'selfempl', 'zaujem_szco',
#                    'look_ptime', 'commute', 'relocate', 'zaujem_zam_zahr',
#                    'min_kraj', 'min_urad', 'min_BA', 
#                    'UR_region', 'roma_share', 'population', 
#                    ageg_dummies,
# #                    paste0("urad_",seq(from=1, to=46), sep=""), 
#                    paste0("isco1_",seq(from=1, to=3), sep=""),
#                    paste0("odbor1_",seq(from=1, to=5), sep=""))
# #                   paste0(colnames(df)[grepl("nace1_",colnames(df))], sep=""))

Balance_vars <- list_vars
# Result Matrixes
  N<-nrow(esample[esample$treated==TRUE,])
  N_sp <- matrix(NA, nrow=length(Ssamples)) 
  
  resultsArray_ATT  <- array(NA, dim=c(length(O_vars),7,length(Ssamples))) 
  dimnames(resultsArray_ATT)[[1]] <- c(O_vars)
  dimnames(resultsArray_ATT)[[2]] <- c('ATT', 'se', 'pval', 'Y1', 'Y0', 'SampleSize', 'Sign.')
  results <- array(NA, dim=c(length(O_vars),4))

  balance_matrix_w <- array(NA, dim=c(length(list_vars),length(Ssamples))) # S x P x Q2
  dimnames(balance_matrix_w)[[1]] <- c(list_vars)
  
  results_bv_W <- array(NA, dim=c(1,length(list_vars)))
  
  balance_matrix_un <- array(NA, dim=c(length(list_vars),length(Ssamples))) # S x P x Q2
  dimnames(balance_matrix_un)[[1]] <- c(list_vars)
  
  results_bv_un <- array(NA, dim=c(1,length(list_vars)))
  
### 1. Counting of ATTs 
Sesample <- esample

###Four sub-samples based on the waiting time until participation in the evaluated measure (cutoffs p25, p50, p75): 
wtc25<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.25)
wtc50<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.50)
wtc75<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.75)

 minToP<-min(as.numeric(Sesample$entrya)-as.numeric(Sesample$entry), na.rm = TRUE)

partic<-Sesample[Sesample$treated==TRUE,]
nonpart<-Sesample[Sesample$treated==FALSE,]

#NonPart majú iba spodné kritérium minimálnej dĺžky nezamestnanosti 
esample1<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>minToP,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc25,])
esample2<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc25,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc25 &
                         as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc50,])
esample3<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc50,], 
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc50 &
                         as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc75,])
esample4<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc75,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc75,])

DataSample <- c()

for (s in Ssamples) {
    
    esampleS <- get(paste0('esample', s, by = ''))
    
    
    N_sp[s,] <- nrow(esampleS[esampleS$treated==TRUE,])
    
    if (mean(esampleS$treated) > 0.0001){
      esampleS <-esampleS[,c("treated", "klient_id", 
                             "entry", "exit", "entrya", 
                             list_vars)]
      
      # Month of participation since the start of the evaluation period 
      esampleS$participation_month<-((year(as.Date(esampleS$entrya))-year(ep_start))*12)+
        month(as.Date(esampleS$entrya))
      # Participation quarter
      esampleS$pcpQ<-ceiling(esampleS$participation_month/3)
      
      # Month of the start of the unemployment since the start of the evaluation period
      esampleS$Ustart_month<-((year(as.Date(esampleS$entry))-min(year(ep_start)))*12)+
        month(as.Date(esampleS$entry)) 
      # Inflow quarter 
      esampleS$infQ<-ceiling(esampleS$Ustart_month/3)
      
      #Adding unemployment history
      esampleS<-merge(esampleS, 
                      h_esample[,c("klient_id", "entry", 
                                   paste0("entry", seq(1:11), sep=""), 
                                   paste0("exit", seq(1:11), sep=""))],
                      by=c("klient_id", "entry"), all.y = FALSE)
      
      
      #Imputing the start of the unemployment spell 
      Pmatrix <- matrix(NA, nrow = nrow(esampleS[esampleS$treated==FALSE,]), ncol=max_pcpQ)
  
          for (p in 1:max_pcpQ){  
            
            # Quarter of inflow to unemployment of participants entering the programme in quarter P
            PinfQ<-unique(esampleS$infQ
                          [esampleS$treated==TRUE & esampleS$pcpQ==p])
            # Only allowing non-participants inflowing to unemployment during the quarters when participants in this sub-sample were inflowing
            #D<-esampleS[as.logical(esampleS$treated==T & esampleS$pcpQ == p) |
            #              as.logical(esampleS$treated==F & esampleS$infQ %in% PinfQ),]
            Pmatrix[,p]<-as.numeric(esampleS$infQ[esampleS$treated==FALSE]  %in% PinfQ)
          }
      
        hh <- function(j){
              sample(which(j==1),1)
        }
        
        esampleS$pcpQ[as.logical(apply(Pmatrix, 1, FUN=sum)==0) & esampleS$treated == FALSE] <- 0
        Pmatrix<-Pmatrix[as.logical(apply(Pmatrix, 1, FUN=sum)>0),]
        
        esampleS$pcpQ[as.logical(apply(Pmatrix, 1, FUN=sum)>0) & esampleS$treated == FALSE] <- apply(Pmatrix,1,hh)
        
  
        
                for (q in OQm) { 
                  
                  cond <- FALSE
                  
                    for (n in 1:11) { 
                      entry_n <- esampleS[ ,paste0('entry', n, '')]
                      exit_n <- esampleS[, paste0('exit', n, '')]
                      
                      cond <- cond | isInRegister(p = esampleS$pcpQ, q, entry_n, exit_n) 
                    }
                  
                  esampleS[ ,paste0('empl',ceiling(q/3), '')] <- ifelse(cond, 0,1)
                }
             
              for(q in OQm){
                esampleS <- esampleS[!is.na(esampleS[ ,paste0('empl',ceiling(q/3), '')]),]
              }
        
            #Pre-estimation data preparation
            #Additional outcome variables
            esampleS[ ,"cumempl"] <- rowSums(esampleS[, paste0("empl", seq(0,12,1), sep="")])
            esampleS[ ,"firstempl"] <- ceiling(as.numeric((as.Date(esampleS$exit)-
                                                      (as.Date(ep_start) + months(esampleS$pcpQ*3)))/90))    
        
            # Cleaning the Xs
            y <- data.frame(treated = esampleS[ ,'treated'])
            D <- data.frame(esampleS) #D - su tvoje data ako data.frame()
            spec <- as.formula(cbind(y,D[,c(list_vars, 
                                            paste0("empl.", seq(1,4,1), sep=""))])) 
            #je tvoja specifikacia modelu, je to object as.formula()
            colTh <- 0.8 
            #Treshold for the acceptable correlation between vars ( je maximalne tolerovana korelacia medzi dvoma premennymi)
            dumTh <- 0.0001 
            #Treshold for the acceptable concentration of dummy variables (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            TdumTh <- 0.005
            #Treshold for the acceptable concentration of dummy variables in the sub-group of participants (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            TdumThN <- 5
            #Treshold for the minimal number of observations of a dummy variables in the sub-group of participants (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            result <- DatPrep(D = D, spec = spec, colTh = colTh, dumTh = dumTh, TdumTh = TdumTh, TdumThN = TdumThN) 
            
            for (col in colnames(result$D)){
               D <-  D[!is.na(D[,col]),]
             }  
        # ESTIMATION:
            
            d = D$treated*1
            x = as.matrix(D[,c(list_vars, paste0("empl.", seq(1,4,1), sep=""))])
            y_mat <- D[,c(paste0("empl.", seq(1,4,1), sep=""), 
                                    paste0("empl", seq(0,12,1), sep=""), 
                                    'firstempl', 'cumempl')]
            
            att <- treatweight_pmp(y = y_mat, d, x, s = NULL, z = NULL, selpop = FALSE, trim = 0.05, ATET = TRUE, logit = TRUE, boot = 10)
            #att <- treatselDML(y = y_mat, d, x, s = d, z=x, selected=1)
            
            resultsArray_ATT[,1,s] <- round(att$effect,3)
            resultsArray_ATT[,2,s] <- round(att$se,3)
            resultsArray_ATT[,3,s] <- round(att$pval,3)
            resultsArray_ATT[,4,s] <- round(att$y1,3)
            resultsArray_ATT[,5,s] <- round(att$y0,3)
            resultsArray_ATT[,6,s] <- format(length(d)-att$ntrimmed, big.mark=" ", scientific=FALSE)
            resultsArray_ATT[,7,s] <- stars.pval(att$pval)
            
            
            DataSample <- bind_rows(DataSample, D)
            
            #Balance
            #Generating the propensity score variable
            PSmodel<-glm(result$spec, family=binomial(link = "logit"), data=D)
            #print(summary(PSmodel))
            D$PSvar<-as.numeric(PSmodel$fitted.values)
            
            #LL: old weights INCORRECT
            #w_ATE <- D$treated/D$PSvar + (1-D$treated)/(1-D$PSvar)
            
            #LL: new weights CORRECT
            w_ATE <- D$treated + (1-D$treated)*D$PSvar/(1-D$PSvar)
            
                          #Balance_vars <- colnames(result$D)[colnames(result$D) %in% list_vars]
                          for (bv in Balance_vars){
                          
                              if (apply(D[,bv,drop = FALSE] ,2,function(x) { all(x %in% c(0:1)) }) ) {
                                #unweighted discrete
                                p_treat <- apply(D[D$treated==1,bv,drop = FALSE],2,mean)
                                p_contr <- apply(D[D$treated==0,bv,drop = FALSE],2,mean)
                                balance_matrix_un[bv,s] <- abs( 100*(p_treat - p_contr )/sqrt( (p_treat*(1-p_treat) + p_contr*(1-p_contr))/2 ) )

                                #weighted discrete
                                p_treat <- t(w_ATE[D$treated==1]) %*% D[D$treated==1,bv] / sum(w_ATE[D$treated==1], na.rm = TRUE)
                                p_contr <- t(w_ATE[D$treated==0]) %*% D[D$treated==0,bv] / sum(w_ATE[D$treated==0], na.rm = TRUE)
                                
                                balance_matrix_w[bv,s] <- abs( 100*(p_treat - p_contr )/sqrt( (p_treat*(1-p_treat) + p_contr*(1-p_contr))/2 ) )
                            
                              } else {
                                #unweighted continuous
                                balance_matrix_un[bv,s] <- abs( 100*(apply(D[D$treated==1,bv,drop = FALSE],2,mean) - apply(D[D$treated==0,bv,drop = FALSE],2,mean))/
                                                  sqrt( (apply(D[D$treated==1,bv,drop = FALSE],2,sd)^2 + apply(D[D$treated==0,bv,drop = FALSE],2,sd)^2)/2 ) )

                                #weighted continuous
                                p_treat <- t(w_ATE[D$treated==1]) %*% D[D$treated==1,bv] / sum(w_ATE[D$treated==1], na.rm = TRUE)
                                p_contr <- t(w_ATE[D$treated==0]) %*% D[D$treated==0,bv] / sum(w_ATE[D$treated==0], na.rm = TRUE)
                                p_treat_var <- ( sum(w_ATE[D$treated==1]) / (sum(w_ATE[D$treated==1])^2 - sum(w_ATE[D$treated==1]^2)) )* 
                                  t(w_ATE[D$treated==1]) %*% ((D[D$treated==1,bv] - c(p_treat))^2)
                                p_contr_var <- ( sum(w_ATE[D$treated==0]) / (sum(w_ATE[D$treated==0])^2 - sum(w_ATE[D$treated==0]^2)) ) * 
                                  t(w_ATE[D$treated==0]) %*% ((D[D$treated==0,bv] - c(p_contr))^2)
                                
                                balance_matrix_w[bv,s] <- abs(100* (p_treat - p_contr ) / 
                                            sqrt( (p_treat_var + p_contr_var )/2 ) )
                              }

                          }
    }
} 
  

for (iQ in 1:length(O_vars)){
      results[iQ,1] <- sum(sum((as.numeric(resultsArray_ATT[iQ,'ATT',])*(N_sp/N)), na.rm = TRUE), na.rm = TRUE)
      results[iQ,2] <-sum(sum((as.numeric(resultsArray_ATT[iQ,'se',])*(N_sp/N)), na.rm = TRUE), na.rm = TRUE)
      results[iQ,3] <-sum(sum((as.numeric(resultsArray_ATT[iQ,'Y1',])*(N_sp/N)), na.rm = TRUE), na.rm = TRUE)
      results[iQ,4] <-sum(sum((as.numeric(resultsArray_ATT[iQ,'Y0',])*(N_sp/N)), na.rm = TRUE), na.rm = TRUE)
}

results<-cbind(O_vars, results) 
colnames(results) <- c('O_vars', 'ATT', 'se','Y1', 'Y0')

results_bv <- array(NA, dim=c(length(list_vars),2)) 
dimnames(results_bv)[[1]] <- c(list_vars)
dimnames(results_bv)[[2]] <- c("unweighted", "weighted")


for (bVar in 1:length(list_vars)){
      results_bv[bVar,1] <- abs(sum(sum((balance_matrix_w[bVar,]*(N_sp/N)), na.rm=TRUE), na.rm=TRUE))
      results_bv[bVar,2] <- abs(sum(sum((balance_matrix_un[bVar,]*(N_sp/N)), na.rm=TRUE), na.rm=TRUE))
}

We can explore the sensitivity of our results by displaying the average treatment effects on the treated (ATTs) computing by different approach by inverse probability weighting.

Graph 10: Balance of the groups of participants and eligible before and after weighting x

BVDF <- data.frame(balance_vars = c(rownames(results_bv),rownames(results_bv)),
                   balance = c(results_bv[,1], results_bv[,2]),
                   Balance = c(rep("before weighting",nrow(results_bv)),rep("after weighting",nrow(results_bv))))
                          
BVDF %>% subset(!is.na(balance)) %>% 
    mutate(balance_vars = fct_reorder(balance_vars, balance)) %>%
    ggplot(aes(x=balance, y=balance_vars,col=Balance)) + 
    geom_point() +
    ylab('Variables')+
    xlab('Absolute Standardized Meand Difference')+
    theme_minimal()+
    theme(plot.caption = element_text(hjust = 0), legend.position = "top")+
    labs(caption="Source: ÚPSVaR")+
    geom_vline(xintercept = 0.0, color = 'darkgrey')

We show a proxy of the employment rate to see a share of persons out of the register of unemployed JSs observed at the first day of quarters before and after the start of participation (Quarter 0).

Graph 11: The share of individuals out of the unemployment register (proxy for employment rate)

graph <- matrix(NA, nrow=length(O_vars[1:17]), ncol = 2)
rownames(graph) <- O_vars[1:17]
for(o in O_vars[1:17]){
  graph[o,1] <- apply(DataSample[DataSample$treated == TRUE,o,drop = FALSE], 2, mean, na.rm = TRUE)
  graph[o,2] <- apply(DataSample[DataSample$treated == FALSE,o,drop = FALSE], 2, mean, na.rm = TRUE)
}

graph <- data.frame(cbind(O_vars[1:17],graph))
colnames(graph) <- c('O_vars','treated', 'untreated')
graph$treated <- as.numeric(as.character(graph$treated))
graph$untreated <- as.numeric(as.character(graph$untreated))
graph$O_vars <- factor(graph$O_vars, levels=c(O_vars[1:17]))
graph$Q<-seq(-4,12,1)

graphER<- ggplot(data=graph) +
  geom_line(aes(x=Q, y=treated, colour="Treated"), size = 1 , group = 1) +
  geom_line(aes(x=Q, y=untreated, colour="Control group"), size = 1 , group = 1) +
  scale_x_continuous(breaks=seq(-4,12,1))+
  theme_minimal()+
  labs(
       y = "Share of treated",
       x = "Quarters before and after the start of the participation (0)",
       colour = "Group", 
       caption="Source: ÚPSVaR") +
  theme(plot.caption = element_text(hjust = 0), legend.position = "top")

graphER

Graph 12: The average treatment effects on the treated (ATTs) estimated for the participation in the Mobility allowance during 2017

graphATT

Finaly, we report the ATTs in table computing computing by different approach by inverse probability weighting.

Table 12: The average treatment effect on the treated

resultsDF %>% kbl(format = 'html', booktabs = TRUE , align = 'c') %>%
  column_spec(1,  border_right = TRUE) %>%
  kable_classic('hover', full_width = FALSE) %>%
  row_spec(1:nrow(resultsDF), font_size = '1cm') %>%
  column_spec(1:ncol(resultsDF),width = "1.1cm") %>%
  kableExtra::footnote(number = paste('Significance codes (sig.):',  0.0000, ' " *** " ', 0.001, ' " ** " ', 0.01, ' " * " ', 0.05, ' " . " ', 0.1, '" "',  1))

	efekt	se	sig.
empl.4	0.004	0.009
empl.3	0.001	0.010
empl.2	0.004	0.006
empl.1	0.001	0.008
empl0	0.695	0.053	***
empl1	0.640	0.049	***
empl2	0.597	0.054	***
empl3	0.485	0.055	***
empl4	0.390	0.043	***
empl5	0.275	0.056	***
empl6	0.253	0.045	***
empl7	0.196	0.038	***
empl8	0.162	0.049	***
empl9	0.132	0.053
empl10	0.144	0.030	***
empl11	0.122	0.042	**
empl12	0.117	0.039	**
firstempl	-4.556	0.291	***
cumempl	4.207	0.348	***
¹ Significance codes (sig.): 0 " * " 0.001 " " 0.01 " * " 0.05 " . " 0.1 " " 1

References

Daniel E. Ho, Kosuke Imai, Gary King, Elizabeth A. Stuart (2011). MatchIt: Nonparametric Preprocessing for Parametric Causal Inference. Journal of Statistical Software, Vol. 42, No. 8, pp. 1-28. https://doi.org/10.18637/jss.v042.i08

Frölich, M., Huber, M. (2014): “Treatment Evaluation With Multiple Outcome Periods Under En- dogeneity and Attrition”, Journal of the American Statistical Association, 109, 1697-1711.

https://bookdown.org/yihui/rmarkdown-cookbook/parameterized-reports.html

LMP Qualitative data in MS Excel, downloaded from (accessed at 14th of April 2021) : https://ec.europa.eu/social/main.jsp?catId=1143&intPageId=3227&langId=en