Documentação Recoll
/etc/profile.d/recollindex01.sh
:/media/hdvm05/index/007/002/002/002/cena-internaciona/bdx\
:/media/hdvm05/index/007/002/002/003/contexto-internacional/bdx\
:/media/hdvm05/index/007/002/002/004/meridiano-47/bdx\
:/media/hdvm05/index/007/002/002/005/rbce/bdx\
:/media/hdvm05/index/007/002/002/006/rbcs/bdx\
:/media/hdvm05/index/007/002/002/007/rbpi/bdx\
:/media/hdvm05/index/007/002/002/008/revista-cepal/bdx\
:/media/hdvm05/index/007/002/002/009/revista-dep/bdx\
:/media/hdvm05/index/007/002/002/011/via-mundi/bdx\
:/media/hdvm05/index/007/002/002/010/revista-politica_externa/bdx\
:/media/hdvm05/index/007/002/002/014/revista-moncoes/bdx\
:/media/hdvm05/index/007/002/002/015/revista-mural-inter/bdx\
:/media/hdvm05/index/007/002/002/018/revista-conj-global/bdx\
:/media/hdvm05/index/007/002/002/020/revista-conj-austral/bdx\
:/media/hdvm05/index/007/002/002/019/revista-conj-internacional/bdx\
Indices Externos
- https://app.diagrams.net/#G17oJWYW0KyWZR-N7aEMg9qVEiZKbLFFiG
- https://drive.google.com/file/d/17oJWYW0KyWZR-N7aEMg9qVEiZKbLFFiG/view?usp=sharing
Rever
Recoll
Agendar indexação do crontab -ok
Fazer a indexação dos dados no recoll - ok
Disponibilizar a indexação nos índices externos (acessível para todos os usuários) -ok
Verificação manual dos índices
- Pelo recoll >> Preferencias >> "índices externos" >> "selecionar indice" >> "fazer busca genérica"
- Verificar se retorno da busca do recoll é igual a quantidade de arquivos na pastas de dados (anotar na planilha termo de busca utilizado e atualizar o status da coluna no recoll, indicar a quantidade de arquivos indexados).
- Caso a quantidade de arquivos de retorno do recoll seja diferente da quantidade de dados na pasta colocar na coluna do recoll "Falha" e a quantidade for igual colocar "indexado"
- Quando a quantidade de arquivos na pasta for diferentes da busca nos índices do recoll fazer uma verificação específica no índice. No terminal digitar:
recoll -c diretorio_indexação
Script para criação dos indices externos
/media/hdvm05/scripts/shell-script/recoll-multiplo02.sh
30 11 24 09 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index/002/998/001/001/recortes-noticias-mercosul recollindex
30 12 24 09 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index/002/998/001/002/notícias-inforel recollindex
30 13 24 09 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index/002/998/001/003/noticias-mre recollindex
30 14 24 09 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index/002/998/001/004/noticias-latn recollindex
30 15 24 09 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index/002/998/001/005/noticias-rebrip recollindex
30 16 24 09 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index/003/001/001/001/001/brasil-govfederal-presidencia recollindex
30 17 24 09 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index/003/001/001/001/002/001/mre-notas-imprensa recollindex
30 18 24 09 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index/003/001/001/001/002/002/mre-rpeb recollindex
30 19 24 08 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index/003/001/001/001/002/003/mre-gov-lula recollindex
30 20 24 09 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index/003/001/001/001/002/004/mre-boletim-adb recollindex
30 21 24 09 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index/003/001/001/001/002/005/mre-anuarios recollindex
30 22 24 09 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index/003/001/001/001/002/006/mre-balanco-peb-2003-2010 recollindex
30 23 24 09 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index/003/001/001/001/002/007/mre-boletim-diplomatico recollindex
30 00 25 09 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index/003/001/001/001/002/008/mre-cartas-genebra recollindex
30 01 25 09 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index/003/001/001/001/003/001/ipea recollindex
30 02 25 09 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index/003/001/001/002/001/camara-federal-noticias recollindex
30 03 25 09 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index/003/001/002/001/001/cefir recollindex
30 04 25 09 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index/004/002/omc recollindex
30 05 25 09 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index005/001/001/001-a/mercosul-atas recollindex
30 06 25 09 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index/005/001/001/001-b/mercosul-atas recollindex
30 07 25 09 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index/005/001/002/001/Unasul-site recollindex
30 08 25 09 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index/005/001/003/OLADE recollindex
30 09 25 09 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index/005/001/004/OTCA recollindex
30 10 25 09 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index/006/001/001/ictsd recollindex
30 11 25 09 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index/006/001/002/wwf recollindex
30 12 25 09 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index/006/001/003/ibc recollindex
30 12 25 09 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index/006/001/004/fas recollindex
30 13 25 09 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index/006/001/005/idesam recollindex
30 14 25 09 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index/006/001/006/raisg recollindex
30 15 25 09 * RCLCRON_RCLINDEX= RECOLL_CONFDIR=/media/hdvm05/index/006/001/007/latn recollindex
compactar arquivos pós-processados
PASSO 01: retirar os espaços da pasta rais
/media/hdvm05/scripts/shell-script/scripts/scripts/colocar-underline.sh
PASSO 02: compactar arquivos para a realização de OCR
/media/hdvm05/scripts/shell-script/pos-processamento-rpe.sh
Rever documentação abaixo
Habilitar versão atualizado do recoll
# sudo add-apt-repository ppa:recoll-backports/recoll-1.15-on
# sudo apt-get update
# sudo apt-get install recoll
Habilitar versão atualizada do xapian
# sudo add-apt-repository ppa:xapian/backports
# sudo apt-get update
# sudo apt-get install apt-xapian-index
# sudo update-apt-xapian-index -vf
Instalar Network File System (NFS) - Server
# sudo apt-get install portmap rpcbind nfs-kernel-server
#sudo systemctl enable portmap
# sudo systemctl enable rpcbind
# sudo systemctl enable nfs-kernel-server
#sudo systemctl start portmap
# sudo systemctl start rpcbind
# sudo systemctl start nfs-kernel-server
# service nfs-kernel-server status
Configurar NFS - Server
# sudo chmod 777 /directory/…
# sudo nano /etc/exports
# /media/hdvm03/bdlantri_02/bibliografia-academica 200.145.122.125/27(rw,sync,no_subtree_check,no_root_squash)
# sudo exportfs -ra;
# sudo systemctl restart nfs-kernel-server
# service nfs-kernel-server status
???# sudo nano /etc/hosts.allow /etc/hosts.deny
ro >> exporta sistema de arquivos como read-only (somente leitura);
rw >> exporta sistema de arquivos como read-write (leitura e escrita);
secure >> usa um protocolo mais seguro para transações NFS;
soft >> retorna um erro se o servidor não responder;
hard >> tenta montar ate que o servidor responda;
no_subtree_check >> desabilita a verificação da sub-árvore assim pode aumentar a taxa de transferência;
sync >> o servidor apenas responde a uma consulta NFS quando a operação de disco corrente é concluída, isso pode ser desabilitado com a opção async. Assim, a escrita assíncrona aumenta um pouco a performance, mas ela diminui a confiança já que existe o risco de perda de dados no caso do servidor falhar entre comunicar a escrita e realmente escrever no disco;
root_squash >> para que não seja dado acesso de root no sistema de arquivos a nenhum cliente NFS, todas as consultas que parecem vir do usuário root são consideradas pelo servidor como vindo do usuário nobody.
no_root_squash >> desabilita esse comportamento, é arriscada e só deverá ser usado em ambientes controlados.
Instalar e configurar NFS - Client
# sudo apt-get install nfs-common
# sudo mkdir r -p /mnt/nfs_client_dir/bibliografia-academica
# sudo chmod 755 /mnt/nfs_client_dir/bibliografia-academica
# mount -o vers=3 -v 200.145.122.125:/media/hdvm03/bdlantri_02/bibliografia-academica /mnt/nfs_client_dir/bibliografia-academica;
sudo mount -o vers=3 -v 200.145.122.125:/media/hdvm05 /media/lantrivm02-nfs/hdvm05
mount.nfs: timeout set for Tue Mar 31 18:22:26 2020 mount.nfs: trying text-based options 'vers=3,addr=200.145.122.125' mount.nfs: prog 100003, trying vers=3, prot=6 mount.nfs: trying 200.145.122.125 prog 100003 vers 3 prot TCP port 2049 mount.nfs: prog 100005, trying vers=3, prot=17 mount.nfs: trying 200.145.122.125 prog 100005 vers 3 prot UDP port 42106
NFS ver: https://www.digitalocean.com/community/tutorials/how-to-set-up-an-nfs-mount-on-ubuntu-18-04 https://linux.die.net/man/5/nfs https://devblog.drall.com.br/nfs-opcoes-disponiveis-para-o-cliente-etcfstab-e-para-o-servidor-etcexports https://www.thegeekdiary.com/common-nfs-mount-options-in-linux/ https://docs.oracle.com/cd/E19120-01/open.solaris/819-1634/rfsrefer-16/index.html https://docs.aws.amazon.com/pt_br/efs/latest/ug/mounting-fs-nfs-mount-settings.html https://linuxize.com/post/how-to-mount-an-nfs-share-in-linux/
Múltiplos indexes
recollindex-multiplo01.sh
echo "Nome do index"
read nomeindex
mkdir /media/lantrivm02-nfs/hdvm05/index-recoll/$nomeindex
recoll -c /media/lantrivm02-nfs/hdvm05/index-recoll/$nomeindex
Non-fatal indexing message:
aspell : aspell dictionary creation command failed: /usr/bin/aspell --lang=pt --encoding=utf-8 create master /media/hdvm05/recollindex001/mre-rpeb/aspdict.pt.rws One possible reason might be missing language data files for lang = pt. Maybe try to execute the command by hand for a better diag.
ver:
https://archive.is/XTkkC
https://www.lesbonscomptes.com/recoll/bitbucket-issues-recoll/issue-356.html
External applications/commands needed for your file types and not found, as stored by the last indexing pass in /media/hdvm05/recollindex001/teste03/missing:
No helpers found missing
Habilitar Indexes externos
/etc/profile.d/recollindex01
sudo chmod 776 /etc/profile.d/recollindex01.sh
export RECOLL_EXTRA_DBS=\
chmod a+x recollindex-multiplo.sh
<code class="python">
#!/bin/bash
###
### Incluir external index para todos os usuários
###
### recollindex-multiplo.sh
if [ $# -lt 1 ]; then
echo "Por favor, coloque o caminho do index"
echo "Exemplo: " $0 "/media/pasta01/pasta02"200.145.122.125:/media/hdvm05/bdhdvm05 /media/hdvm05 nfs defaults,bg 0 0
exit
fi
# Declaração da variável local
index=$1
echo :$index'\' >> /etc/profile.d/recollindex01
</code>
Links simbolicos
sudo ln -s /home/zank conecta
https://archive.is/4rjiD https://archive.is/6NVM9 https://askubuntu.com/questions/503216/how-can-i-set-a-single-bashrc-file-for-several-users/503222
sudo apt-get purge --auto-remove recoll
sudo dpkg -i *.deb /workrecoll
b/
https://superuser.com/questions/658075/how-do-i-move-files-out-of-nested-subdirectories-into-another-folder-in-ubuntu
rename -e 's/\d+/sprintf("%02d",$&)/e' -- *.jpg
https://unix.stackexchange.com/questions/346917/rename-files-to-add-leading-zeros-to-numbers
colocar extensão html
rename 's/$/\.html/' *
unir html
cat *.html > texto01.html
extrair
grep -Po '(?<=href=")[^"]*' texto01.html >> teste02.txt
retirar linhas com determinadas palavras
sed '/.pdf/d;/mailto:/d;/linkedin/d;/facebook/d;/whatsapp/d;/twitter/d;/javascript:/d;/statics.estadao/d;//d;/acesso.estadao/d;/assine.estadao/d' -i extrair04.txt
retirar linhas repetidas
sort teste01.txt | uniq >> text02.txt
retirar linhas com menos do que 45 caracteres
sed -r '/^.{,45}$/d' -i
colocar arquivos em pastas
i=0; for f in *; do d=esp_$(printf %03d $((i/25+1))); mkdir -p $d; mv "$f" $d; let i++; done
excluir a virgula no final
sed 's/,$//'
find . -name \*.shtml -type f -delete
https://superuser.com/questions/112078/delete-matching-files-in-all-subdirectories
para cron
1. ps -e | grep "php name file"
2. kill - 9 'process ID'
echo $DISPLAY