About TrenDTF Search

The TrenDTF Search, prototype, is a direct outcome of TrenDTF, a BMBF funded joint project of TIB with Fraunhofer ISI and Fraunhofer IML, 01.10.2019 - 30.09.2022.

The prototype covers a large (69,961 reports), though incomplete, collection of German research reports.

TrenDTF Search leverages, advanced algorithms (encompassing Named Entity Recognition, topic modeling, subject classification, full report text, etectra. ) and query/data weightage techniques to deliver the most relevant results for the query. The search can be refined using various filters like language, publication year, publisher, and topic.

Following are some example queries:

1. Simple keyword(s) based queries like Software engineering, or Informatik

2. Queries using filters along with keywords:

   a. lang
      It supports eng and de as paramters – lang:eng

   b. date
      It supports date as a single parameter date:2000 and as a range date:2000-2015

   c. publisher
      It supports filtering on a particular publisher e.g., publisher:"TUM"

   d. topic
      It is not actually a filter but it influences the weightage of query for matching over entity annotations, topic modeling, and subject classification. Example query: informatik topic:Computergraphik

   e. PPN
      Any PPN can be search directly e.g., 687982758

For each found research report, the result entry provides, as far as available:

• Report title
• Publisher (i.e. the research institution)
• Author(s)
• DOI link
• Language
• Extracted fulltext
• Generic preprocessed text

And additionally, as separate metadata sets

• FTX (TIBKAT's original metadata set)
• JSON (developer friendly)
• DC (Dublin Core JSON, optimized for DSpace 7.*)
• XML (XML metadata)

The full text JSON metadata also contains DBpedia terms, which were derived through the NER. Extracted concepts and DBpedia terms for full text are only available for a small subset of the reports, mostly from computer science. However, concepts and DBpedia terms from title are available for all reports.

Topic modeling based classification is available for 66,777 research reports.

Some additional technical background:

• TrenDTFsearch is based on Elastic Search, Flask and FastAPI. Machine access is possible and documented at https://service.tib.eu/trendtfAPI.

• All metadata delivered from this service is provided under the Public Domain conditions (CC 0). This applies NOT to the extracted text of the research reports.

• Many thanks to Asim Qazi, TrenDTF's lead developer!

For further questions, please get in contact with TrenDTF project coordinator Lambert Heller.